Membership: Smith (leader), Freeston, Geffner, Hill, Gritton, Hill, Larsgaard
Mission Statement of Team: The focus of the team is the modeling and design of metadata components for ADL.
Gazetteer
The Alexandria Digital Library Project has implemented a gazetteer component for the Library based on two gazetteers maintained by the federal government, as described in the last ADL annual report. The current ADL gazetteer has approximately 5.9 million geographic names and a hybrid version of the feature classes and types from the two sources. Most of the names are identified by point locations. Point locations are used in ADL to locate a geographic search window around the user selected place names.
ADL has also modeled the gazetteer as a knowledge representation system (KRS) that represents geographic names in at least two ways: (1) by spatial representations: where the place is on the face of the Earth and optionally its spatial extent and (2) by class or type of feature. With such relationships established between geographic names and their spatial footprints, two-way translations can take place: a geographic name can be translated into its spatial footprint and a spatial footprint can be translated into the geographic names with related footprints. The assignment of geographic features to classes provides the additional power of being able to find and sort geographic features by class or subclass - e.g. find hydrographic features in an area.
ADL has developed a proposed Gazetteer Content Standard to represent the core attributes for a gazetteer KRS. Beyond the core attributes of name, spatial footprint, and class, the standard provides for the description of additional information about named geographic features: description, history, physical characteristics (e.g. size, height), and data about population, climate, economics, etc. Variant forms of the name of the place (by language, historical period, colloquial variants, etc.) and an extensible set of spatial representations (a simple point, a bounding box, linear features, and irregular polygons) are supported by the proposed standard. The latest version of the Gazetteer Content Standard can be found at: http://www.alexandria.ucsb.edu/public-documents/metadata/
This Gazetteer Content Standard is based structurally on the Federal Geographic Data Committee's Content Standard for Geospatial Data. It is therefore like a metadata description. It prescribes how the attributes of a geographic feature are to be described; it leaves the actual contribution of gazetteer entries open to multiple contributors who follow the standard way of formatting the information.
ADL proposes to submit the proposed Standard to a larger community for discussion. At the same time, it will be partially implemented within ADL to get experience with its use. ADL also proposes to develop ingest software that will allow anyone to contribute gazetteer information to the ADL Gazetteer. Through this way of gathering information, it is anticipated that more detailed footprints can be obtained as well as a more extensive set of geographic names and more supplementary information about the places.
Discussions about the potential of Alexandria's approach to the gazetteer have taken place with the USGS and with the group at Hughes Information Technology Corporation who are working on the interface for NASA's Earth Observing System Data and Information System. Hughes is establishing a formal working group with ADL participation to plan their implementation of a gazetteer service for EOSDIS based on the ADL developments.
Dublin Core
A strong movement in Web circles is provision of metadata for Webpages by Webpage creators. OCLC, Inc. (Dublin OH) is the principle mover behind the creation of a core set of metadata that would be: searchable by Web engines; provide a minimal-level set of information for a Webpage; and be straightforward enough for any Webpage creator to use to describe the creator's Webpage. Alexandria has participated in all of the Dublin-Core conferences , including two during the annual-report period, one in early April 1996 at the University of Warwick in Warwick, England, and another in late September at OCLC in Dublin OH. At both of these conferences, Alexandria representatives gave presentations; at the Warwick meeting, Terence Smith presented a paper on the theory of metadata, and Mary Larsgaard presented a crosswalk between the Dublin Core and the Alexandria Digital Library metadata fields. At the OCLC meeting, which focused on metadata for images, Mary Larsgaard gave a presentation on metadata search parameters required for searching for georeferenced information. For a paper that merges the two presentations by Larsgaard, see:
http://www.library.ucsb.edu/people/larsgaard/warwick.htm
Some Dublin-Core-style fields (sometimes termed ``search buckets'' are currently used in the Alexandria search interface, and more will be added for the general-user interface.
Multilevel Descriptions
Since easily 80% of all spatial data has some sort of relationship with other spatial data, multilevel description (that is, parent records and child records that link back to parent records) is essential. Work has been done on the various types of relationships and the various types of metadata records that may be constructed to express those relationships. See paper plus examples at:
http://www.library.ucsb.edu/people/larsgaard/[2 files]
Ideal Metadata-Creation Workstation for Spatial Data:
While the vast majority of persons seeking spatial data will probably never create metadata, without the metadata creators there is nothing for the searchers to find. It is thus important that metadata creation be made as efficient, as accurate, and as easy as possible. For one-time metadata creators or for persons who create it seldom, Alex-Meta staff formulated a Portable Ingest Form, of Dublin-Core-type fields, and put it up on the ADL homepage for those wishing to enter metadata quickly and easily. The form is also available in Microsoft Access, since so many persons use Microsoft products and often are already familiar with this software. For the professional metadata creator, more sophisticated methods and a full panoply of fields are required. For a paper outlining the ideal metadata-creation workstation for spatial data, see:
http://www.library.ucsb.edu/people/larsgaard/ideal.htm
Joint UCSB/Illinois GIS semantic interoperability
A joint DLI supplemental project between Illinois and UCSB was funded by NSF in November, 1996. This research aims to examine semantic interoperability issues related to spatially-oriented, multimedia geographic information access. Based on the concept space approach developed by the Illinois Digital Library Initiative (DLI) project and the Alexandria geo-referenced collections, this research proposes to develop knowledge representations and structures to capture concepts of relevance to spatial and multimedia information (natural language phrases and geo-related textures). Selected machine learning techniques and general Artificial Intelligence (AI) graph traversal algorithms will also be adopted to assist in semantic, concept-based spreading activation in integrated knowledge networks. Due to the size of the geo-referenced collections, extensive data analysis (knowledge discovery) will be performed using the shared-memory multiprocessor supercomputers at NCSA (National Center for Supercomputing Applications).
It is expected that the scope of the applicability of the proposed research would extend to generic textual and multimedia digital libraries, and that the research would inspire the development of techniques to facilitate efficient and effective semantic access.
In particular, Hsinchun Chen of University of Arizona and Terry Smith are experimenting with concept spaces and self-organizing map (SOM) techniques in general visual thesauri for geospatial information. A testbed of several hundred aerial photos has been created and will grow to a few thousand in the next three months. The collection has been analyzed using the NCSA SGI Power Challenge and Convex Exemplar supercomputers. A Java based interface for viewing visual thesaurus is also under development at Chen's lab.
Using an airphoto testbed made available through the Map and Imagery Lab at UCSB, we developed a prototype system to assist in image-based visual thesaurus browsing. About 300 air photos were scanned and each frame was represented as 5000 by 5000 pixels of image file (approximately 50 MBs per image). A 1994 aerial survey flight had covered all of Santa Barbara County. Each image was then partitioned into about 1600 (40X40) small image blocks and indexed using Gabor filters (represented by 60 image features).).
Using the Gabor filters and a simple Euclidean distance similarity function, we were able to perform an image-based similarity search. By clicking on any small image block, the system brought out a display window of top 9 images (in different airphotos and different parts of the same photo) that were similar to the selected image pattern (the first image displayed). The prototype system currently supports image-based similarity search of airphotos.
A second prototype system was later developed to support image-based visual thesaurus browsing. Using the same 60 image features as input vector and the SOM categorization algorithm, we clustered similar image patterns (e.g. residential areas, vegetation, parking lots, farm lands, etc.) in a graphical two-dimensional display. Similar image patterns were grouped together in different regions of the (semantic map) display (e.g. vegetation patterns, highway patterns, housing development patterns.) Clicking on each representative image brought out another window that displayed other image patterns that were classified as similar. By mapping each image's coordinate to the GNIS gazetteer, we were able to suggest place information that matches the geographic location of each image, (e.g. San Andreas Rift for many image blocks.)
Abstracts of Published Papers
We present a general framework to support the modeling of digital documents and user queries in the context of digital libraries (DL's). The basis of the framework is a four-component model of a DL catalog involving a document modeling component, a query modeling component, a match component, and a catalog interoperability component. Meta-information in such a catalog provides models of library documents and facilitates efficient access to information represented in the documents. In particular, meta-information is conceptualized in terms of sets of relations between nominal representations of library documents and their properties, and sets of relations between document properties. The properties of the documents are modeled in meta-information in terms of a multiplicity of languages which vary between the catalog components and between catalogs. Each of the catalog components is modeled in terms of a set of formal systems related to the languages employed in the component. Using this framework, we discuss the two critical issues of catalog intraoperability and catalog interoperability. The framework provides a basis both for the rational design of meta-information and catalogs in DL contexts, and for an analysis and resolution of the intraoperability and interoperability issues. We provide examples of the issues discussed in terms of the Alexandria Digital Library.
The goal of this essay is to suggest a framework for the design of the meta-information environments for DL's that takes advantage of digital technology and compensates for the loss of direct user-librarian interactions. We briefly examine the use of the terms ``metadata'' and ``meta-information''. We then employ a simple scenario of library use in order to characterize the meta-information environment of a TL. We generalize this characterization to the meta-information environment of libraries in general. The environment is modeled in terms of a set of high-level services which are, in turn, supported by sets of lower level services, some of which are provided by an extensible set of ``knowledge representation systems''. Finally, we examine the implications of this general characterization in terms of a design for the meta-information environment of a DL. In particular, we suggest a design that is implementable within a distributed object framework.