Presented by
Linda L. Hill
Alexandria
Digital Library Project
University of California, Santa Barbara
lhill@alexandria.ucsb.edu
For
Taxonomy
Authority Files Workshop
Washington, D.C.
June 22-23, 1998
[Accompanying PowerPoint slides]
The Alexandria Digital Library (ADL) is one of six four-year Digital
Library Initiatives funded by the federal government (NSF, DARPA, and NASA).
These projects are now in their last year. The Alexandria Digital Library
focuses on georeferenced collections: maps, aerial photos, remote sensing
images, text, and so forth. The collections built for the testbed include
a Catalog of approximately 750,000 items, a Gazetteer of nearly 6 million
placenames, and other smaller collections of bibliographic records (GeoRef),
volcanoes, and earthquakes. The Gazetteer contains the combined contents
of the two major U.S. federal government gazetteers: the Geographic Names
Information Service of the U.S. Geological Survey and the GeoNames set
from the National Imagery and Mapping Agency. ADL is a research project
but it has also produced an operational service, which will become a component
of the newly established California Digital Library (CDL) when it becomes
operational by the end of 1998.
The ADL testbed system is designed in modules: the user interface client,
the system middleware, and the database level (metadata and data archiving).
It is designed to accommodate multiple metadata and data formats while
providing middleware metadata for high-level search parameters that are
mapped to the underlying metadata collections. Established metadata standards
such as U.S. MARC and the FGDC Content Standard are used to represent the
objects in the collections. Interoperability and integration is our focus.
In terms of the gazetteer, we have taken the approach of designing the
content standard for the representation of gazetteer entries rather than
in specifically building authority files. The standard can be used both
for authority files and for special gazetteers for local purposes.
Gazetteers can be defined as dictionaries of named geographic
places (also called features and placenames). We further specify that for
digital library purposes a gazetteer entry must have the following three
attributes at a minimum: a Feature Name, a Spatial Footprint (latitude/longitude
coordinates for location), and a Feature Type (category).
A gazetteer service is a digital library service that supports descriptive access via feature names and feature type to spatially represented information objects. Examples of the types of services that can be performed are:
An example is a query to a gazetteer service such as "Where is Philadelphia?"
where the answer is a footprint on a map showing the area of Pennsylvania
where Philadelphia is located.
The next query may be "What rivers are in the Philadelphia area?" For
this, the service would compare the footprint of Philadelphia and to the
footprints of entries in the gazetteer of the type "rivers" and return
a list of rivers whose footprints overlap the Philadelphia footprint.
The user may now want to search for datasets that are relevant to Philadelphia
that are represented in the digital library catalog. The query may be "What
remote-sensing images does the library have that overlap the Philadelphia
area?" Again a comparison is made between the Philadelphia footprint and
metadata for objects of the type "remote-sensing images" in the catalog.
The return is the set of remote-sensing images that are "about" Philadelphia,
most of which will not actually have the word "Philadelphia" in their metadata.
Given that you want to create a gazetteer, there are four structural approaches.
If a hierarchical thesaurus model is used, there are two types of hierarchical
relationships that can be represented: (1) the whole-part relationship,
where Santa Barbara is part of Santa Barbara County which is part of California,
etc., is the most frequently applied hierarchy for placenames; or (2) the
genus-species relationship where Santa Barbara "is a" City which is an
Administrative Area (for example). The metadata model accommodates both
of these ways of representing placename relationships.
Based on the experience of building the initial ADL Gazetteer, the ADL
team has designed a new approach on the metadata model that we hope will
lead to standard representation formats for gazetteer data and thus to
integration and interoperability among gazetteer products and services.
The ADL Gazetteer Content Standard is accessible through the ADL homepage:
<http://www.alexandria.ucsb.edu> (Publications, Metadata in the Documents/Tools
section). The relational database model for this Content Standard, developed
by Qi Zheng, can be viewed through my homepage: <http://www.alexandria.ucsb.edu/~lhill>.
Both of these developments were partially funded by the NASA EOSDIS Project
through Hughes Information Technology Systems (now Raytheon).
The Gazetteer Content Standard has 13 sections. The contents of these
sections are briefly described here. Required elements are marked with
an asterisks (*); repeatable sections are marked with (R). Each section,
with the exception of 1 and 4, can be attributed to a particular contributor
and/or source.
Each of the gazetteer entries must be categorized by the type of feature it is so that groups of features of a particular type can be identified for a region and so that features with similar names can be distinguished from one another. In establishing its first gazetteer, ADL combined the category schemes used by the two federal gazetteers to come up with a class/type hierarchy. This was a very difficult job and the result was not a true thesaurus of terminology. Therefore, ADL has developed a Thesaurus of Feature Types. It was designed according the ANSI/NISO Standard Z39.19 ("Guidelines for the Construction, Format, And Management of Monolingual Thesauri"), using MultiTes thesaurus software <http://www.concentric.net/~Multites/>. It has (currently) 578 terms of which 195 are preferred terms and 383 are variant or synonymous terms that point to the preferred terms. There are six top terms that form the basic organization of the hierarchies:
Terms were drawn from existing gazetteers and related publications.
The Feature Type Thesaurus can be browsed at <http://www.alexandria.ucsb.edu/~lhill/html/index.htm>.
Comments and suggestions are welcome.
The current status of the new ADL Gazetteer is that we are just getting
started with the process of populating the new relational database. We
have already loaded bounding boxes for 3,111 U.S. counties, 50 U.S. states,
and 171 countries/continents/regions, and point locations for 1,508 volcanoes.
We are developing the conversion rules needed to convert the original ADL
Gazetteer to the new format - converting the categories correctly is problematic
and will take some manual editing. We have several sets of additional gazetteer
data waiting to be converted when we have the hardware and the person power
to do it.
A Master's student in the Computer Science Department, Zheng Wang, has
developed a metadata creator tool based on Java and XML and customized
it for the ADL Gazetteer Content Standard. We are in the process of further
developing this for integration into our digital library system. It incorporates
the use of the ADL interactive map to create and display the geographic
footprints for the gazetteer entries. It also accesses the Feature Type
Thesaurus so that appropriate type terminology can be selected to describe
the new entry.
Our plans call for continuing to populate the new ADL Gazetteer and
promoting the wider use of the Gazetteer Content Standard and the Feature
Type Thesaurus. Each would benefit from collaborative development by other
groups who are actively building and using georeferenced collections of
data and information. We are also embarking on a project to mine
bibliographic records that are georeferenced with placenames, using the
gazetteer to add geographic footprints (coordinate values) to those records.
This has the potential to open up the vast number of georeferenced materials
represented in library catalogs and online bibliographic files to spatial
searching.
Some of the issues that we face in connection with gazetteer development are
Gazetteers are key components of georeferenced information systems. Yet we are not aware of any other efforts to support the integration and interoperability of gazetteer data so that the results of numerous local efforts in creating this information can be shared. If we can create the means to do so, the resulting availability of gazetteer information will bring tremendous gains across the board in all types of information systems. ADL has developed both a Gazetteer Content Standard and a Thesaurus of Feature Types, which we offer freely to others in the hopes that these will encourage standardization and sharing of data. In particular, consideration of shared gazetteer files for taxonomic description should be part of the discussion of taxonomic authority files.