next up previous
Next: Components of Testbed Up: CURRENT STATUS OF Previous: Implementation Team

Architecture of the Testbed System

At the highest level, the Testbed's logical architecture has four components:

  
Figure 1: The logical architecture of the testbed system.

The storage component maintains and serves up the digital holdings of the library; these correspond to the ``stacks'' of physical holdings (books, journals, etc.) in a traditional library. The catalog component manages, and facilitates searches of the metadata describing the holdings, analogous to a traditional library card catalog. Catalog metadata are associated with storage objects by unique object identifiers, analogous to traditional library call numbers. The ingest component comprises the mechanisms by which librarians and other authorized users populate the catalog and storage components. Finally, the user interface component is the collection of mechanisms by which one interacts with the catalog (to conduct a search) or the storage (to retrieve objects corresponding to search results).

The simplest mapping of this architecture onto a wide-area network environment like the World Wide Web yields the refinement shown in Figure 2:

  
Figure 2: Mapping testbed architecture onto wide-area network: Refinement I.

The obvious refinement is the greater specificity of information flowing between the components: queries are expressed in SQL over a database client-server connection; storage objects are referenced by URLs and retrieved by FTP; etc.

A less-obvious refinement is the partitioning of user interface functionality between the Web server and client (browser; e.g. Netscape). At the highest level at which our architecture is specified, there is really no difference between the Web client and server; collectively, they implement a user interface. However, the limited capabilities of (pre-Java) Web browsers, coupled with the relative extensibility of Web servers (particularly the AOLserver), leads to the use of the Web server as a de facto ``middleware'' layer between the catalog and the user interface.

The current implementation of this architecture in the Testbed is the ``Web Prototype'', whose primary goal is making the ADL accessible from the World Wide Web. We illustrate this architecture in Figure 3.

  
Figure 3: The current implementation of the WP.

The Web Prototype uses three database management systems. Any metadata that must be entered manually are entered through Microsoft Access. The primary reason for using Access is that it permits the librarians entering the data to build their own customized user interfaces, thus maximizing entry speed and minimizing errors. Access' ODBC interface allows it to function as a front end to any of our other DBMSs.

All metadata entering the Web Prototype, either manually (through Access) or automatically (by batch scripts) is staged through a Sybase database. This is partly an historical artifact, since Sybase was our first DBMS; however, it also gives us the opportunity to quality-check the metadata under a standard relational schema (i.e. one without vendor-specific type, function, or rule extensions). We are thus assured that metadata exported from the Sybase schema will not only be reasonably free of internal inconsistencies, but will also be supportable in any relational database. Finally, performing the staging and Q/A in a separate database keeps a substantial burden off our primary catalog database server.

The database that actually supports catalog searches is currently implemented in Illustra. This is primarily because only Illustra (of our current databases) supports SQL-level spatially-indexed searches (i.e. ``contains'' or ``overlaps'' operations on polygonal data types.) This extension reduces the time required by a typical spatial search by at least a factor of 10. To preserve database independence, we add the Illustra polygon attributes to the basic catalog schema without disturbing the rest of the attributes.

The Excalibur RetrievalWare package currently supports a specialized user interface function. When querying relational databases, it is much more efficient to request exact matches for string-valued attributes than to request subset (e.g. ``leads with'', case-insensitive, etc.) matches. However, the domain of many of our catalog attributes is so large (e.g. our placename list has almost 6 million entries) that discovering exact matches by exhaustive search is impractical. We therefore use RetrievalWare's ``fuzzy'' match capability (a semantic network built from various dictionaries and thesauri) to suggest possible exact matches for imprecisely specified attributes (e.g. ``sanba'' matches both ``SANTA BARBARA'' and ``SAN BUENA VENTURA''). The user can then use any combination of the suggested values to rapidly search the actual catalog.

The nexus of the Web Prototype system is the AOLserver Web server. In addition to supporting the WWW standard HTTP protocol, the AOLserver supports two important additional capabilities:

  1. The AOLserver can connect directly to a DBMS, instead of having to spin off a separate process to manage each database transaction. This dramatically increases the speed of catalog accesses from the Web. This feature is also used for state maintenance, so that the notion of a ``session'' with the Web Prototype can be sustained over the stateless HTTP protocol. The current state of the user interface is saved into and restored from a separate Illustra database after each HTTP transaction.
  2. The AOLserver has an embedded interpreter for the Tcl/Tk scripting language. The server can assign URLs to a Tcl script, causing that script to be invoked when a user (through their Web browser) requests the URL. This dramatically increases the speed of dynamic Web page generation, compared to the more common CGI mechanism, whereby the server invokes a separate process for each dynamically-generated page.

The scripting capability of the AOLserver, coupled with the ubiquity of Web browsers, has led to the most significant architectural feature of the Web Prototype: a user interface predicated on a ``dumb client - smart server" model. The client is assumed to be capable of rendering HTML 2.0 with very few extensions (most notably Netscape tables), and of supporting the basic HTTP protocol plus Netscape cookies. Not only must we make the AOLserver go to heroic lengths to simulate statefulness over a stateless protocol, but we must also attempt to paint a rich, compelling user interface using an extremely restricted palette. The single most salutary consequence we expect from a world of Java-enabled Web browsers is a more equitable distribution of user interface creation between client and server.

Tcl/Tk, running in the AOLserver, provides the glue for the Web Prototype. The importance of having a layer in the architecture that is scriptable, and thus highly malleable, cannot be overstated. For example, connecting a proxy server for the Stanford InfoBus to the Web Prototype was a single day's work, once we realized that we could build an alternative "HTML-less" interface to the AOLserver in Tcl/Tk that was almost exactly what an already-developed proxy server expected.

More importantly, a middleware layer is crucial to our database independence strategy. The user interface to the Web Prototype expresses its queries as simple boolean expressions in conjunctive normal form (CNF). The middleware reads from the underlying database a definition schema that describes the mapping from the Web Prototype's uniform logical schema to the particular database's physical implementation thereof, and then translates the CNF query into database-specific SQL.





next up previous
Next: Components of Testbed Up: CURRENT STATUS OF Previous: Implementation Team



Terence R. Smith
Thu Feb 20 13:50:53 PST 1997