Basically, transcriptions in the TEI-files are - within limits - diplomatic, that is, features such as the spelling, capitalisation, punctuation, word division and the presence or absence of apostrophes are retained in the transcription as they were in the source. The structure of the source (division into pages, lines, stanzas, chapters, verses etc.) is also recorded.
TEI-XML allows a ‘dual’ transcription. That is, features may be recorded at two or more levels of analysis or representation. The TEI-files make some use of this, in order to allow texts to be displayed in different ways. This affects principally word division, apostrophes, and editorial corrections. Complete consistency in these matters is, as must be pointed out, a noble but ultimately unachievable goal.
Note, however, that word separation has been normalised in one respect, namely that preverbal particles have generally been separated form the following verb where they have been written together in the original. Other very common variants, such as ymhen<.span> ~ ym mhen remain unnormalized.
The project does not aim to capture every single feature of the text at hand, instead concentrating on the actual text. For this reason, a number of features are not recorded. These include:
Different letter-variants (for instance, long s [ſ] versus round s, 'normal' r versus round, two-shaped r etc.) are not normally distinguished in the transcription. However, special letters, such as d with a stroke for dd or dotted letters, are retained in the transcription. Deciding whether a letter form is a letter variant or a special letter is sometimes difficult, and, in these cases, the policy adopted has usually been commented upon and justified in the notes to a particular text.