Historical Corpus of the Welsh Language

Display and output files

Each text can be viewed here in basically three ways: as a diplomatic version, as a running text, and as a concordance. The diplomatic versions and running text files were created by applying stylesheets to the XML-files. The concordances were created using specially adapted versions of the text, also created using stylesheets; these were then processed using the Concordance software application.

Versions of the text

The diplomatic version renders the text in a form close to the original, with line and page breaks preserved as in the original. The text is presented with spelling, word divisions and punctuation as in the original. Scribal additions and deletions are marked explicitly. Editorical corrections or emendations are also marked explicitly, and the original uncorrected reading is always given. A key to the graphical conventions used for displaying this information can be found here.

The edited version sacrifices some of the accuracy and faithfulness to the original found in the diplomatic version for the sake of readability. The text is presented as a running text. The line breaks of the original are not preserved. Page breaks are noted as numerals, but pages are not formatted individually. Spelling and punctuation is as in the original, but word divisions and apostrophes are, in general, adjusted to reflect modern practice. Scribal additions and deletions are not marked explicitly: the text gives only the emended forms. The original uncorrected reading of editorial corrections is not given, although the fact that the text has been corrected editorially is noted. A key to the graphical conventions used for displaying this information can be found here.

The concordances

The concordances to individual texts use a third text format. This preserves the line and page divisions of the original, but otherwise presents the text in an edited form. Spelling and punctuation are as in the original, but word divisions and apostrophes are, in general, adjusted to reflect modern practice. Scribal additions and corrections are not marked, and editorial corrections are made silently.

Words included in the concordances

The concordances contain a list of every word in the text. Numbers are not listed, nor are English words where they form part of a stretch of text in English. Where a single English word or phrase is found in a text, the English words it contains are listed in the concordance.

Alphabetical order

Most of the concordances use Welsh alphabetical order for the index of wordforms in the text (with <ch>, <dd>, <ff>, <ng>, <ll>, <ph>, <rh>, <th> as separate individual letters). However, a few use English alphabetical order. Accents are ignored for determining alphabetical order. Note that the alphabetical order is not sensitive to the distinction between <ng> representing ng [ŋ] and <ng> representing ng-g [ŋg], or between <rh> representing [r^h] and <rh>: representing [rh] across a break between two syllables. Nor is it sensitive to the phonetic value of characters in particular environments. So, for instance, words written with initial >k< are listed under K, not under C, the modern form of the sound represented.

Spelling and grammatical alternants

No attempt has been made to take account of initial consonant mutations or to group wordforms into relevant lexical items. Hence, for instance, to find all forms of the word braint, it is necessary to look under BRAINT, FRAINT and VRAINT. Some account has been taken of spelling variation. The concordance texts include regularised forms of word spelled in unusual ways (in forms such as "ystronawl [~ estronol]"). These regularised forms are listed separately in the concordances, in addition to the entry for the original wordform. Common variants, however, are not treated in this way.

Navigating the concordances

It is possible to navigate around the concordance by clicking on the letters of the alphabet at the top of the screen and the alphabetical list of words on the left-hand side. The main (white) part of the concordance is divided into to sections: the upper section shows each attestation of the word under consideration with its reference; the lower section shows the text. It is possible to jump to the relevant part of the text by clicking on the reference. Note that accents are not displayed at all in the upper section, but are displayed correctly in the full text section of the display.

The full concordance

Concordances are available for each individual text. However, for many purposes, the most useful research tool may be the concordance to the entire corpus (accessible from the Search the texts link in the main menu). This works in the same way as the individual concordances, except in two respects. The top section of the main part of the display simply lists the word itself: to see its context, it is necessary to click on the reference. Secondly, this concordance omits very common words, namely words that occur more than 250 times in the corpus. A full list of these words, and their frequency in the corpus, can be found here.