From Manuscript to Digital Record

The following report describes the process of transforming the text of a medieval manuscript or a fragment of it into human and machine readable code for display on the World Wide Web.

During a twelve week work study in the Humanities Computing and Media Center of the University of Victoria I tried to acquire the necessary computer technical skills to do a trial mark up project using Extensible Mark up Language (XML). As a student of German and Medieval Studies, I had virtually no computing knowledge beyond the general essay writing and emailing skills at the beginning of the project.

This report will focus not on the manuscript chosen but on the process of marking it up to produce the results. It will center on successes, limits and failures encountered in the process of applying the Berkeley Digital Scriptorium DTD and XML tags for the mark up of medieval documents. Based on those limits and failures additions were made to the existing XML tag set and the DTD as well as new developments for the rendering of the tagged text for viewing in a browser.

The manuscript that presented itself as most appropriate to our purpose of marking up a medieval text was the University of Victoria's Special Collections SC 070 Fall of Princes (ca. 2nd half of 15th c.) by John Lydgate (ca.1370-1449). Neither a manuscript description nor a genealogy of the text will be provided here. For a short description of the manuscript see A. Edwards "Lydgate's Fall of Princes; A 'Lost' Manuscript Found", Manuscripta, Vol. XXII, 1978, 176-178. The full text of Fall of Princes is transcribed and collated in Henry Bergen Lydgate's Fall of Princes, Early English Text Society, London, Oxford University Press, Vol 1-3, 1924; Vol. 4, 1927.

The work study was a joint project of the Humanities Computing and the Library Special Collections of the University of Victoria. None of the participating parties (consisting of two programmers, the head of special Collections and myself) began with a clear idea what this project would produce aside from the use of XML on a medieval document. Because my background is in manuscript studies, the Special Collections concern is archiving, and the programmers focus was on XML and XSLT technology, it took some time to reach a consensus on what requirements the trial product was to fulfill. Some of the confusion resulted from my ignorance of what could be done to a document besides digitizing and displaying it. The programmers had all the technical skill but little insight into the details of manuscript studies, while the library's focus was more on the kind of meta-tagging used for archiving.

The first step was therefore to get together and collect information from all participating parties, their expectations, aims and level of knowledge. It was left to me to decide what would be most useful to mark up in a medieval document but I also consulted other medievalists. After an Introduction to Manuscript Studies course my foremost concern was the readability of a document and the correct interpretation of all characters, to provide as accurate a transcription of the text as possible. By transcription I mean the conversion of all glyphs present on the folio into typed script without the immediate expansion of abbreviations such as truncation, suspension and contraction.

I imagined an image of the manuscript side by side with the corresponding transcription. Such a display would be useful to beginners of manuscript studies because they could compare the letter shapes in the manuscript with the type script and so learn the script of a particular period, hand and region. Most of all it would provide them with the ability to read the document itself after becoming familiar with the use of certain letters. The side by side display would make line, word or letter comparisons easier, in contrast to some electronic editions of medieval manuscripts that provide only one view at a time, either the image or the transcription. Abbreviations could be expanded through pop up boxes giving the full word without obstructing an easy reading of the transcription of underlined and italic letters that are indicated in the manuscript but are not visually present in their 'normal' shape.

Now that a finished useful product was imagined I could get directions from both the programmers and the head of Special Collections. The first step was to learn about DTDs and XML. Through the library of the Humanities Computing and Media Center I had access to various books on the subject. Because this was only a twelve week work study Teach yourself XML in 21 Days sounded the most promising. But after a week I was still at Day 4 with the strong feeling that I had absorbed nothing of what I read and a growing frustration that led to the conclusion that Humanities and Computing does not go together. Luckily I found a book review online criticizing the book for its overload of information, lack of examples and visuals of the DTD and XML code. I turned to Elisabeth Castro's XML for the World Wide Web and found the material presented in a way that worked for me, quick and highly visual with enough detail for a beginner to be comfortable with the system of a mark up language.

The library's concern is the process of archiving web sites such as the one we produced. Most libraries use Dublin Core standard to archive the meta data for each document, so we had to model our meta data on the tag set provided by Dublin Core. These tags mainly correspond to the Text Encoding Initiative (TEI)-based tag set for manuscript transcription.

After the tagging of the meta data I transcribed a story from the manuscript. I chose this particular one because it was written on parchment and most legible in an otherwise badly damaged manuscript. While rendering the transcribed text in XML I realized that there are difficulties in coding certain abbreviations as characters such as the 'pro', a 'p' with a backward loop on its descender, the 'par', a 'p' with a horizontal stroke through its descender, and superscripts for 'er' and 'ur'. They are neither present in the current Unicode set, nor is it possible to combine unicode characters to achieve a satisfying result, so we were forced to adopt conventions such as a 'p' with a tilde or a macron through its descender for these troublesome items. The line (<l>) element of the ds2.dtd had to be extended in order to accomodate the required attributes of line number, column number and page number.

The trial mark up on this site was finished at the end of the 5th week into the work study.

[Back to Top]