Library Issues and ETD's
Gail McMillan had the most directly relevant things to say in an overview of the library issues. She had three memorable sound-bites for the typical kinds of questions she gets:
- Regarding the standard worry of how long the copy lasts, she notes: "ETD's can't be checked out and not returned! When someone checks out a hardcopy and fails to bring it back, it's gone."
- Regarding the issue of personnel workflow (cf. M. Hansbrough, the cataloguer), the basic duties of entering are the same or, if anything, simpler: "Why hire more people to do what is actually less work?"
- Regarding longevity and future technology changes, apart from SGML's ultimate reliability, she noted a quote she finds appropriate: "We can't predict the future, but we can build it" [i.e., with the greatest present foresight reasonably possible, while still not waiting forever].
- She does not see a digital library as a second or seperate library-- "Libraries are always incorporating and are composed of different information formats, that's what we do.
I spent a very informative hour with their cataloguer, Mary Hansbrough. She went through an ETD entry, beginning first with their standard Online Computer Library Center (OCLC) networked library resource, which then automatically exports to their equivalent of OASIS. Data comes in with cut-and-paste fields, so she finds this less laborious and less time consuming than the current scenario. Primary technical need was a larger monitor for ease of working.
UI Main Library Questions and Technical Concerns
Cut and pasted below are questions forwarded from Ed Shreeves and Carol Hughes, some are more specifically answered elsewhere.
- We'd like to know more about the ETD-ML. What is it, what are its advantages and drawbacks?
See the section on SGML. ETD-ML is a DTD for ETD's, it builds on the Text Encoding Initiative (TEI), but adds dissertation-specific materials like identifiers for abstract, committee, and so forth
- What do you need to read a document in this format?
Until Internet Explorer 4.5 ships this fall, and Netscape a little later, Panorama viewer, also free, will work. You have to tell your browser that you're using ETD-ML, though. WordPerfect 8 also reads/checks/writes SGML's according to any DTD you give it.
- How do they monitor and track ETDs which are not immediately available at the request of the submitter?
This has been a complication, Gail McMillan has had the best idea for this revision. The first step implemented was the establishment of proxies, Gail suggests making the release automatic at the time the student says unless they say otherwise.
- What sort of security (if any, beyond what is typical) is there for the server?
UNIX file controls for read/write access, htAccess passwords, and they have a proxy server for folks accessing the VT-only releases from servers--e.g. AOL--other than VT's. There is also a free encrypted Telnet daemon. The directory into which these go is known by name--apart even from its password--only by 3 people.
- Have they measured the amount of time required (within the library, by library staff) per dissertation to the end of the cataloging process?
In fact it is less time. It took Mary Hansbrough just 3 minutes to do one while I was there. See above.
- How are external links within ETDs supported?
Not unlike a single MSS consulted for a dissertation, it is up to the committee to accept these links and to judge whether the dissertation is validly argued if the link changes. Most grad's are already worried about this, and so pick their URL's carefully.
- Who supports (writes, maintains) the scripts which seem to control much of the workflow?
Paul Mather and Tony Atkins. They are happy to help us out. If we join NLDTD, we get all these free.
- How much (if any) encoding is done by library staff? And is it all related to the metadata?
See answer above re. workload to catalogue (Mary H. The scripts by Paul and Tony automate this a great deal when the student submits.
- Are all the ETD's served up live, or are some maintained or migrated to CD-ROM?
No separate media, other than the automated digital tape backup as part of server maintenance.
- Who give the ETD workshop for students? The Grad. College?
Neil Kipp, hired by the Graduate School, does these several times a year, and they have video-taped it.
- Do they have enough ETDs with added graphics, multimedia etc. to have an idea of an average filesize for ETDs that go beyond simple text?
Average size is under 10mb, largest, with many sound files, is 40mb.
- Are there any processes in place to certify that ETDs remain unchanged over time?
Secure nature of the server helps with this, there is also software like TripWire which can check this. Note John Eaton's comment on their only security breach.
- Do they have any processes or plans to archive the collection?
Regular tape back up, plus they have a Tape-bot (tape robot) to automatically tape as back-up dissertations not accessed for "xx" years.
- Do they consider a PDF version archival?
No, it is a means to a long-term end. It was essential b/c of the lack of SGML tools to begin with PDF, however (see problems).
- How worried are students about their work being "out there" on the web?
No direct feedback on this, relatively few are held back by students.
- What server software are they using? Which search engine?
Netscape Enterprise server, and they're building a searcher around database software such as SQL, or the free version MSQL. At the higher pricetag is Oracle's SQL.
- What is the "SGML-based workflow system" (from the website)? Is it a turnkey or homegrown system?
It is an evolution of both, with major systematizing due to be complete and in place this fall. The summer has afforded opportunity to streamline solutions for the problems found over the first year and a half.
- Are the allowable formats for submitting an ETD still PDF, ETD-ML and DVI?
Yes. DVI has only had one, and they are working out simpler ways of dealing with it. Once TrueType 1 fonts were installed, this made PDF from DVI easier.
- Output features: Are ETD-ML and DVI translated to HTML on the fly?
They can be-- at least ETD-ML, per tools such as those at UVA.
- Do people need Panorama to read ETDs submitted in ETD-ML?
For now, but the current beta of Internet Explorer 4.5 reads it and Netscape will soon. Wordperfect 8 also reads it.
- How do they keep track when part or all of the ETD is not immediately available?
This is what Tony Atkins is readying their SQL database to do.
- What software is used to create the SGML header? Is it homegrown? Perl scripts?
Perl scripts available by joining NLDTD (free membership).