XML tags used in encoding the text | MARGOT

Lire en Français

We have used Extensible Markup Language (XML) for encoding the text. Our tag set is derived from elements proposed by the Text Encoding Initiative (TEI), but our tags are abbreviated and simplified to make the input of data from manuscripts easier at this stage of the project In the next stage of the project, as we refine our data, we will revise our tag set to conform to TEI guidelines, as well as making the database more XML compliant. (For more details on TEI and XML, follow this link to TEI.)

XML tags, enclosed in <> (angle brackets) are used to identify the structure of the data, which is arranged in hierarchial order. Smaller elements are nested within larger elements. XML tags can also take any number of attributes, defined by the user, to further characterize or describe the structure.

Our data has been tagged for structural elements at both the level of the manuscript book (or codex), and within the text.

Codicological Elements

We have coded the codicological elements as follows:

<manuscript>Text of the manuscript </manuscript>

To this tagged element we add various attributes, as follows:

<manuscript place="London" lib="British Library" id="Additional 70513">.

The manuscript is divided into folios, identified by:

(<pgtop pn="1r"/> (each folio tag has a folio number), and each folio has one or more columns, tagged:
<coltop cn="1ra"/>

The hierarchical structure of the codicological elements is, then:

<manuscript>
	<pgtop>
		<coltop>
		</coltop>
	</pgtop>
</manuscript>

(By convention, matching end and start tags </tag><tag> can be shortened to a boundary marker tag <tag/>, as in <pgtop pn="n"/>)

Textual Elements

The textual elements are coded as follows:

<work> is used to indicate each separate text in the manuscript.
	<work> has the following attributes:
		T (Title),
		short.title (abbreviated title references),
		aunam (author’sname), as in the following example:
<work T="La Vie saynte Elizabeth" short.title="S. Elizabeth" aunam="Nicole Bozon">

Nested within <work> are:

<rubric>
<s> (for strophe or stanza)
<v> (for verse, or metrical line).

The most common, of course, are <v> and <s>, both of which also take attributes indicating strophe number, and line number. Each line also has a siglum identifying the work (see Sigla, in menu bar):

<s sn="n">
<v ln="Siglum+n">

Tagging of Scribal Elements, Editorial Interventions

The main scribal features recorded are corrections and deletions in the text. A limited number of tags used is given below (a complete list will be given in phase two of this project, pending revision of our tags to accommodate scribal complexities in the manuscript sources):

<sic corr="corrected text">uncorrected text</sic>
- used to mark editorial corrections, displayed in the Standard style in red
<sic corr1="corrected text">uncorrected text</sic>
- used to mark corrections by the initial scribe.
<sic corr2="corrected text">uncorrected text</sic>
- used to mark corrections by second scribe.
- Scribal additions by the first scribe while writing, are coded:
<add place="sl">added letter</add>
- used to mark supralinear scribal additions

Deletions are coded:

<del></del>
- to which are added a number of attributes to indicate both the type of deletion, and who is responsible for the correction.
xpc="text of expunctuation"
bar="text of barred passage"
erasure="text, legible or illegible"
corr1="correction by main scribe"
corr2="correction by second scribe"
corr="correction by editor" (rare)

Simple erasures done while writing can be coded:

<erasure>erased letter(s)</erasure>

To code corrections by a second medieval hand which we judge to be unnecessary corrections the following is used:

<stet corr2="addition by scribe 2">original text</stet>

Scribal variations (i.e. text which is given as parallel text, often introduced by uel in the manuscript) are coded:

<sv place="sl" (above the line, or ="margin")>text of the variant</sv>

Expansion of abbreviations is coded in unambiguous cases as:

<xp>expansion</xp> (default abbreviation forms will be converted to more complex tags in phase two of this project).

When two different forms of abbreviation are used, the coding distinguishes them by using tags of this sort:

<abbr type="c2" xp="com"></abbr>

(used to distinguish the normal abbreviation for com/cum/cun/con which resembles the digit 9, with descender below the line, from the form which resembles the digit 2).

Entity References

Diacritical marks (é=é, ç=ç) and some punctuation are also coded, in order to be searchable. These include opening and closing single and double quotation marks (&osq;, &csq;, &odq;, &cdq;), and the apostrophe ('), used to indicate elided letters.

Letters which are subject to normalization are also coded:

&u; (means the MS form is v but it would be normalized as u);
&v;(means MS u has been normalized to v).

When letters can be construed in several fashions, a two-letter sequence may be used, e.g.:

(&un; means MS n has been normalized as u).

Large initials used to mark sections in the text are coded in this fashion: &lgA4(2); =large initial A, over 4 line spaces, with descender or flourish over two more line spaces.

(The sections and subsections of the text indicated by these letters are also marked by <sec/> (for section) or <sec1/>)(for subsection).