XML as a kind of SGML




Simple Example of XML vs. other tagging like HTML

In short look at this sentence:

     In Carlo Michelstaedter's Persuasione e rettorica, there is a highly original treatment of modernity and extremes of fin de siècle angst.

We see three italicized text portions:

Conventional wordprocessing has made such ambiguity "standard." Anyone who has had to reformat a document from one publisher--or software--to another knows how frustrating this can be. XML makes the identification of information much more specific. So, in the example above (which is done with standard HTML), the phrases have really only been marked (in what the computer reads) with ambiguous italics :

    <i> highly original </i>

This kind of marking assumes every reader knows what italicizing signifies, and that all computers can read it (which, we all know, is not often the case!!). With XML we would have a different marking that you would tell the computer about each part (in sort of the same way you'd tell it to italicize something):

Aside from the obvious increase in precision, there are other advantages. For instance, anything you ever write using these tags (or ones like them for your discipline, for instance) can be reformatted--however many documents you've done this way--simply by telling one file to "make all titles italic, all foreign underlined," etc. You never have to re-format the content itself, just tell your computer what you want it to do with all "title" parts, or "emphasis" parts, etc. Thus your original content is always "safe" from later re-formatting. You don't have to risk damaging your composition just to change its rendition. Plus XML takes up less disk space than most word-processing documents, are Year 2000 safe, and ANYONE can read them on their computer.

You can also use this for more precise searching. You can choose all occasions of Shakespeare as author, or distinguish between a search for Coleridge's Xanadu and the song by Olivia Newton-John. Check out the links above to learn more.

XML, unlike SGML, has a few differences so that browsers can read it, and individuals can literally make their own tags for labelling new kinds of data or discoveries. This is a set of rules for how to make a Document Type Definition (DTD). HTML is a DTD, and, generally speaking, anyone can write one. It's a set of instructions that says "when I want to indicate that Xanadu is a place talked about by Coleridge--instead of an obnoxious song, for instance--I will write this:

		<poemgeography>Xanadu</poemgeography>  

Right now, WordPerfect 8 for Windows creates it (I'm beta-testing WP 9, and it is way cool) and a plug-in called S45 by i4i works with MSWord, and of course we mustn't forget Adobe's Framemaker 5.5+SGML (see links suggested on the main page for a full range of XML/SGML software), and there are various gizmos for other software. Internet Explorer 4+ reads it (5.0 beta lets the bells and whistles work), as does the Panorama browser.

With new tools, XML can be converted to PDF, RTF, or HTML for various forms of reading. Disciplines such as Astonomy, Bioinformatics, and mathematics have already made their own set of XML tags.

Disciplines can make their own specific XML tags--even particular departments can--without need of international agreement. This is because XML does what SGML does not: allows any tag to be used as long as it is used consistently, is always closed (with a </tagname> marker), has its attributes in quote marks, and is consistent with upper/lowercase use.

Formating for printing, PDF, RTF (rich text for most word processors), or HTML can be done with JADE written by visionary Jim Clark (who also spearheaded DSSSL standards) in order to create a printable copy, like PDF and the use of XSL style sheets, and there is an automated free mechanism for this. One can also get html with the XMLStyler from Arbortext, see the main page links for points of departure. Of course, Corel, Adobe, and MSWord/with plugin allow multiple print or html outputs.