Z39.50 Introduction, XML, RDF Ideas, By John Robert Gardner

Z39.50 Introduction, XML, RDF Ideas

From NISO and ZIG sessions at
San Antonio, ALA Meeting, January, 2000
John Robert Gardner

Resources to Learn more
Generalities
Facilities
Z39.50 "under the hood"
Issues: Profiles, Software, and XML

Resources

There is a raft of resources from which this report draws many of its specifics, and which also suffice to replace redundant repetitions thereof. I started with Duane Harbin's excellent summary from Diktuon (November, 1999) "An Overview of Z39.50" (see links from http://purl.org/CERTR/ under "articles") From there, a slightly more detailed nuance is available at

http://www.ariadne.ac.uk/issue21/z3950/#20

(reading time: 30-45 minutes)

or

http://archive.dstc.edu.au/DDU/projects/ZINC/zsimple.htm

From there, you'll have a healthy grasp not only of basic conceptual terminology and framework, but also the "issues" with Z39.50. No standard is ever perfect: implicit in the notion of standard is accommodation of the range of differences it was designed to mediate--no standard is a solution, but rather, a good standard is an intersection for solutions.

Next you'll want to look at a bit more detail, such as implications for various library services, relevant software, etc. Go to:

http://www.biblio-tech.com/html/z39_50.html

(reading time: 30 minutes)

Then it's time to get a peak under the hood, and nice detail-by-detail set of summaries (broken into palatable but still palpable sub-sections) is the continuation of the biblio-tech.com article:

http://www.biblio-tech.com/html/z39_50_part_2.html

(reading time: 60-80 minutes)

That will provide you with the basics to be conversant with the issues related to Z39.50 and enable you to evaluate solutions and proposals. For more links to resources, try William Moen's: http://www.unt.edu/wmoen/Z3950/BasicZReferences.htm

Generalities

Z39.50- Think of it as a sort of database query language-meets-"http" protocol-meets-search engine-meets-Esperanto. Z39.50 has a higher pedigree than http in that Z39.50 was born to serve information interchange specifically. By contrast, http is a "one size fits all" gateway which allows anyone and anything (almost) to pass through. Z39.50 requires information packets to have specific pedigrees, and so these packets are "smarter" than the average net traffic. In the mid-'80's, the OCLC and other such noble library resources wanted a way to standardize comparisons of library holdings records. After some work, the first Z39.50 standard came out in 1988, followed by a "fix" in 1992 (Version 2- what many U.S. institutions support), finally coming of information age in 1995 (Version 3- what most of Europe uses). It's not really a query language , but an attempt at a translation brokerage between query languages which depends wholly upon adoption for its success--hence the Esperanto analogy.

Z39.50 formalities-: It is ISO23950-1998 (see how the "2" looks like a "Z"? clever), and has its own group of zealots (in XML, the equivalent is "evangelists") behind it--known as ZIG (Z39.50 Implementers Group)--and a raft of software which supports it. It is also designated as ANSI/NISO Z39.50-1995, Information Retrieval. It uses CCL (Common Command Language).
Z39.50 terminology-: This is less a list of vocabulary (the resources above furnish most of that, as does the facilities summary below) than a matter of "how to talk about Z39.50." "Z39.50" is a very misused reference sometimes, which leads to bad implementations based upon miscommunication expectations. For instance, one does not say "our database is Z39.50 compliant" but, rather, "our database is accessible to Z39.50 queries." Z39.50 is not a query language, but a translation broker for query languages, it's not a search engine but an assistant to the engine, not a search command but a protocol for handling search commands.
Think of a summit meeting between world leaders of different language cultures. The leaders talk to each other, but the exchange is "brokered" by the interpreters. Much as the interpreters must recognize that one-to-one correspondence between vocabularies is often not possible, Z39.50 recognizes that one-to-one equivalence between the resources it negotiates is frequently impossible (see below under issues).
Z39.50 function-: So, after all this about what Z39.50 is not, what is it that it actually is (and, to quote our noble leader, that "depends on what the definition of "is" is")? Z39.50 is (ontologically) a specification, or set of rules, for information retrieval. It permits one search from one kind of system to find things in differing systems--as well as retrieve it--without having to know each and every system it searches in detail. It performs these functions on all manner of materials, not only bibliographic records, while remaining hardware and software independent. While maintaining interoperability between systems (translation: you don't have to change how your system stores or searches for information in order to use it), it provides the user the "ability to successfully search and retrieve information in a meaningful way and have confidence in the results "(W.E. Moen, U. North Texas, NISO Standards Tutorial, January 18, 2000; wemoen@unt.edu).

Facilities

"Facilities" is the technical word for the 11 things Z39.50 covers or "does." Some of them are obvious, some less so, and not all are supported by all Z39.50 systems (see issues below). Here are the basic summaries of the Z39.50 facilities, and key feature potentials (based upon the information provided at http://www.biblio-tech.com/html/z39_50_part_2.html):

Facility	Notes
1. Initialization setting up the Z-Association, negotiating levels of service	Analogous to the handshake you hear when your mode hooks up to a dialup connection, this enables non-authorized users to know we're out here, and--see #5 below--confirm a resource discovery without giving away the record, or only giving a part of it (see #9 too).
2. Search sending a search string at a database and getting back a result set and the first few records	The basic act of searching performed from any OPAC, enables searches to hit on/have discoveries on other Z39.50 resources (e.g., CORC, etc.), and vice-versa (cf. # 5, #9)
3. Retrieval Retrieval of records from the result set as specified by the Z-client	Z39.50 enables a search to "broadcast" to the online sources of the world, and net only those resources which conform to information sciences specifications --in other words, Olivia Newton-John songs won't turn up in your search for Coleridge's "Xanadu," but you will get resources from as far as Timbuktu. Non-authorized users might receive information of a title of an article, or simply that there are resources there, as a prompt/marketing option to garner subscribers.
4. Result-set-delete deleting a set of search results held on the Z-server	Just what it says, and an added security bonus so searches are not left "open"
5. Access Control allowing the Z-server to ask for passwords etc.	If someone finds a resource, Z39.50 will enable it to be communicated that the MARC record--and or online article--is there, but intercepts automatically with a password call. This can enable varied levels of access, for instance, if desired.
Accounting / Resource Control allowing accounting, credit control etc.	Yes, you can even do per-use billing if you want to, as well as, "one-time free introductory accesses" -- but you have to program this in to your server
7. Sort sorting a result set in a defined order on the Z-server	what order you want the results in-- "sort by author, ascending alphabetical order, " or, "sort with Whitman project holdings first"
8. Browse scanning an index on the Z-server	This provides a starter-hint, of sorts, as a reference to what kinds of keywords/subject field words are available for searching -- a way to maximize the careful work with 6xx fields in MARC records.
9. Extended Services allowing Z-client to start a "task packages" e.g. ILL on Z-server	Think of this as sort of setting up little applications, or Applets which can be triggered for various user levels, perhaps even a "subscription" interface for online acquisition of new subscribers
10. Explain allowing the Z-client to query a database of implementation details on the Z-server	Newly-arising feature, previously largely unsupported, but available on free Zebra Z39.50 software, Z'mbol from the same group, and SIM/Structure Information Manager. Sort of like the way two translators introduce themselves to each other before brokering a state leaders' meeting.
11. Termination Closes down a Z-Association.	(self-evident)

Most Z39.50 systems support the first three--initialization, search, retrieval--(often called the "core facilities") and, of course, the last one--termination. Other levels of support vary. Since it is often unknown from the user end when "broadcasting" a Z39.50-compliant search what facilities each system supports, it's important to have a way for the user's search to ascertain this automatically without requiring a repeat/revise search. This is what Explain is to enable. Previously, Explain was not widely supported, however, in software. This situation give rise to the primary group of issues for Z39.50 implementation below.

Z39.50: "under the hood"

Before we get into all the implementation issues, it's important to know just what sort of message Z39.50 is transmitting, and how it enables a communication/translation to take place. In the following section, I am indebted to the clearly-spoken NISO tutorial given by William E. Moen, University of North Texas, in San Antonio, January 18, 2000.

A series of "attributes" qualify a given Z39.50 query. Each query is an information "packet" with its own Object Identifier or "OID." These attributes are all numerical which enhances speed, but makes human readability a bit forbidding. Z39.50 has its own "namespace" or data-type identifying prefix. If you think of a Z39.50 target sitting with its ear to the virtual ground, when it hears the "hoofbeats" of this namespace, it picks this signal from the other data noise of the web:

1: the ISO identifier
2: that it's a member body
840: that it happens, in this case, to be USA
10003: that it happens to be the ANSI-standard-Z39.50

Which looks like: 1.2.840.10003. That's the prefix on every Z39.50 OID. What comes after that specifies the kind of query and the actual human-readable content of the query. Next comes the indicator saying that this query packet has a Z39.50 attribute set (which has its own OID of 3) and that this will be a Bib-1 attribute set (the main attribute set, the defacto default attribute set, which has six sub-types), so a 1 is added. Now our OID is: 1.2.840.10003.3.1

All this just says what the packet is, where it comes from, and what language it's going to speak. Now, let's let it do some of that speaking. Everything that comes next explains the nature of the human-readable search query which comes at the end of the packet. There are six types of attributes in Bib-1 (see http://www.biblio-tech.com/html/z39_50_bib-1.html for more info, or ftp://ftp.loc.gov/pub/z3950/defs/bib1.txt for a complete set with MARC equivalents):

Use (how the human-readable text is used --there's 10 of these);
Relation (is it equal to, more than, relevant to, etc.);
Position (first in field, subfield, or anywhere);
Structure (is it a phrase, word, year, etc.);
Truncation (right, left, do not, etc.);
Completeness (incomplete/complete subfield, complete field);

Numbers 1-4 are pretty clear. For #2, Relation, this allows you to say, everything prior to 1975 by specifying "less than." A note on #5, truncation, this says is "run" the beginning of a word (do not truncate), or the whole word (right truncate). As to #6, this says, is "run" the entire title, or are other words allowed to also be in the title.

So, a search for a match (relation/2, equals/3) on Shakespeare as an author's (use/1, author/1003) last name (structure/4, lastname un-normalized/102), anywhere in the field (position/3, any/3), and just those letters (well, that's his whole last name, isn't it?, but what about getting rid of a hack naming themselves "Shakespearea"-- so truncation/5, right truncate/1), and allowing "William" to be included (completeness/6, incomplete/1)? It looks like this:

1.2.840.10003.3.1 (1,1003) (2,3) (3,3) (4,102) (5,1) (6,1) Shakespeare

Issues: Profiles, Software, and XML

So, that all seems pretty clear when peeked at under the hood, right? Well, no one's giving a test on this right now, but you get the idea. The problem arises when some institutions do some of Bib-1, but not other parts. What if I'm asking for a name un-normalized, as above (4,102), but your institution's Z39.50 OPAC only has phrase (4,1); word (4,2); and key word (4,3)--or worse, no structure attribute at all? What does it do with my (4,102) request? What if I don't specify a structure field, but your Z39.50 OPAC has this data?

This is why the same query to five Z39.50 OPACs often yields five different result sets. Some systems will "guess" at what you meant to include, or what they think the unsupported data is. Best practice recommends that an error be returned saying 'such and such is not supported here, please refine your query.' That is the current best practice.

Explain-

You might be wondering-- "if Explain serves to furnish this kind of information up front, why is there any problem, or need for such a best practice?" Well, until recently, there was hardly any software that supported Explain. As a result, folks around the world have been coming up with a sort of "agreed-upon" basic thresholds of Bib-1 implementations, so everyone can kind of "assume" there is support in key areas. You can think of this as a "granularity" of detail problem.

More on this below, as these agreed thresholds--called "profiles"--which are usually disciplinarily focused and not unlike DTD's (Document Type Definitions in SGML and XML), suffer from many of the same problems as DTD's (everyone agreeing, DTD's/attribute sets that fit everyone's system or needs, etc.). Ironically, the same solution for the DTD problem which is surfacing in the XML world with XSchema's and RDF (Resource Description Framework), may ultimately resolve the implementation granularity/profiles issue (cf. Poul Henrik Jorgensen, phj@dbc.dk).

Software

Explain is now supported however, in several software systems on several platforms. A free program called Zebra (see http://http://www.indexdata.dk/download.shtml), and a more deluxe product for purchase, Z'mbol, from the same company are available. So also for Solaris and NT is Structured Information Manager, from RMIT (http://www.mds.rmit.edu.au/) is another such product, managing all manner of media and text items in a robust, full Z39.50 environment.

Profiles-

These Z39.50 "DTD's," as it were, seek to generalize and establish/express consensus among Z39.50 OPACs for what Bib-1 attributes are present. There are several of these with varying levels of mutual interoperability. While Z39.50 can allow a MARC-based query (show me the 245 field wherein I find Hamlet) to access a Dublin Core record (where the title element contains "Hamlet"), something or some mechanism must communicate that the target has DC records so Z39.50 brings back the right data.

The trick that folks have worked out is that a bunch of OPACs get together and say, "okay, we're doing these attributes, and this many variables in each subset, so let's all agree to do it that way and keep it that way." This way, if you are in--say--Texas' library system, you know what everyone's using. If you are an OPAC in Europe's network, you know what they are using. This certainly widens the circle of interoperability, but does not solve the fundamental issue designed to be addressed in the specification itself by Explain. As long as you can get other folks to join your profile group, it improves. And certain obvious commonalties are always there, most anyone wants to search on an author name or title-- but the levels of detail below that can vary (e.g., a normalized lastname-firstname order, or an un-normalized one?).

So profiles reflect the fledgling beginnings of consensus about how to end-run the absence of the standards-based solution of Explain. Profiles are not ISO, but Explain is. Now that Explain is supported in software, the question of profiles becomes more a set of implementation guidelines than actual proscriptions for insuring interoperability.

There are several of these, including those listed below, as well as museum-specific sets (CIMI) and geo-spatial (GEO):

Bath Profile-
( http://www.ukoln.ac.uk/interop-focus/activities/z3950/int_profile/bath/draft/: Currently getting the most consensus, it builds upon the Texas profile. three conformance levels specify degrees of supported granularity. For instance, Conformance 0 is like Texas Profile Category 1, allows author, title, subject, keyword, Boolean operators, and truncation. Limits options in the 6 attribute sets, but supports all 6, has no effective defaults, e.g., structure has only 3 options (phrase/1, word/2, and name normalized/101) instead of the full 16
Texas Profile-
( http://www.tsl.state.tx.us/LD/z3950/TZIGProfile99Apr20.htm): Uses Bib-1 attribute set, and sets limits and specifics within it. Deployed throughout Texas library system, and furnished a model for the Bath Profile. Does not work with defaults, but follows best practice of returning detailed error report, asking for a re-issued query responding to strengths/weaknesses in granularity. Has category levels for granularity distinctions, the lowest, at Category 1, allows author, title, subject, keyword, Boolean operators, and truncation. No default attributes, but all 6 categories supported.
ONE (OPAC Network in Europe)-
(http://www.dbc.dk): Basic requirements are a default to Bib-1, and a set of "assumed" values in it for handling cases where the origin or target does not specify an attribute. Relation is assumed to be "equal" (2,3); truncation is "no" (5,100); completeness is incomplete subfield (6,1); position is any (3,3); and structure defaults to phrase (4,1). Then each of these attribute sets are limited as to the number of variables supported as well. Currently developing ONE-2

XML and Z39.50-

XML is now one of the preferred formats for the returning of data from discoveries in a search. XML is likely to supplant GRS-1 (Generic Record Syntax) and SUTRS (Simple Unstructured Text Record Syntax) due to its strict syntax which enables back-mapping to MARC if need be, and full selection on which fields to display, etc. It is also more easily negotiated from Z39.50 targets to the range of origins from which a query begins.

XML is seen most frequently in library implementations in the Dublin Core set of 15 basic elements (all of which are included in the Bath Profile, and are supported in Bib-1). It is interesting to note that Dublin Core, and the XML/RDF set of solutions arose from the Z39.50 community as part of the Warwick Framework of April, 1996.

As mentioned above under "Explain" in this section, there is a great deal of potential for XML to resolve many of these issues. The opportunity for information necessary to systems not supporting Explain to be routed through RDF structures (a series of empty tags, with namespaces for different institutions identifying which attributes and facilities their Z39.50 OPAC supports) could serve as a bridge to the developing future wherein Explain and the other rich facilities of Z39.50 are more widely adopted and deployed.