Handout for 1999 MLA Paper:

"Digital Alchemy: Research, Writing, and Rendition-
Electronic Theses and Dissertations in XML

John Robert Gardner

ATLA-CERTR
Emory University



All the links you need to know to get started are included below. A discussion of flat data, XML, and archiving follows, with a short example of XML tagging to conclude this handout.



XML is Flat Data for ETD Longevity and Access

Flat data is essential to the integrity of library, information, and archival considerations, and has been canonized as the fundamental reliable format by key organizations in the information management world. It meets multiple criteria:

Based on these considerations, our pilot builds upon the existing standard of the Text Encoding for Initiative (TEI, lite version) of information designed for the particulars of academic needs for preservation and access to electronic data. The TEI standard is an implementation of the internation standard (of which hypertext, or HTML, is a part) called Standard General Markup Language (SGML), ISO 8879. All three systems can be reviewed at the http://www.w3.org"/World Wide Web Consortium site for electronic information resources. For more detailed annotated information on XML, SGML, and TEI, see the http://www.oasis-open.org, the OASIS site and the http://www.xml.com site. For software resources, see http://www.xmlsoftware.com/, James Tauber's site.





Simple Example of XML vs. other tagging like HTML

In short look at this sentence:

     In Carlo Michelstaedter's Persuasione e rettorica, there is a highly original
treatment of modernity and extremes of fin de siècle angst.


We see three italicized text portions:

Conventional wordprocessing has made such ambiguity "standard." Anyone who has had to reformat a document from one publisher--or software--to another knows how frustrating this can be. XML makes the identification of information much more specific. So, in the example above (which is done in normal HTML), the phrases have really only been marked (in what the computer reads) with ambiguous italics :

    <i> highly original </i>

This kind of marking assumes every reader knows what italicizing signifies, and that all computers can read it (which, we all know, is not often the case!!). With XML we would have a different marking that you would tell the computer about each part (in sort of the same way you'd tell it to italicize something):

Aside from the obvious increase in precision, there are other advantages. For instance, anything you ever write using these tags (or ones like them for your discipline, for instance) can be reformatted--however many documents you've done this way--simply by telling one file to "make all titles italic, all foreign underlined," etc. You never have to re-format the content itself, just tell your computer what you want it to do with all "title" parts, or "emphasis" parts, etc. Thus your original content is always "safe" from later re-formatting. You don't have to risk damaging your composition just to change its rendition. Plus XML takes up less disk space than most word-processing documents, it's Year 2000 safe, and ANYONE can read them on their computer without special softare. In other words, in TEOTWAWKI ("the end of the world as we know it") XML is a safe format.

You can also use this for more precise searching. You can choose all occasions of Shakespeare as author, or distinguish between a search for Coleridge's Xanadu and the song by Olivia Newton-John. Check out the links above to learn more, or go back to the top.

Dissertation XML Home Other atman@vedavid.org



(© Copyright 1999, John Robert Gardner, All Rights Reserved.)