The WP metadata model incorporates two extensions to the RP metadata model, including the addition of interpretations involving: 1) image ``texture''; 2) features found in maps and images. The first extension permits searching for images containing textures matching either preselected sets of textures or specific images selected by the user. We discuss this case further in the section on wavelet transformations.
The second extension permits retrieving ADL holdings based on overlaps
between the footprints of collection items
and footprints of named instances of various classes of features,
such towns or rivers.
The WP employs precomputed representations of footprints
that may be found in available gazetteers
.
A gazetteer is basically a list of feature classes
(sometimes hierarchically organized), a set of their named instances,
and a footprint of each instance
.
We are using both the Geographic Names Information System (GNIS) digital gazetteer of US features and the Board of Geographic Names (BGN) digital gazetteer of worldwide features. The GNIS gazetteer contains about 1.8M names of US features, organized hierarchically into 15 classes of features, while the BGN gazetteer contains approximately 4.5M names of land and undersea features. Specific issues that we are addressing include: 1) ingesting non-digital gazetteers (e.g. historic place names); 2) merging gazetteers of different feature classes, formats, and accuracies; 3) constructing meaningful footprints for entities; 4) organizing the feature classes into meaningful hierarchies.
The issue of footprints is the most troubling since those of existing gazetteers are often point locations, rather than sets of points that define an area. When a gazetteer employs a single point to represent the footprint of a feature possessing non-zero extent, it is not always clear how the points were chosen. For example, they may be centroids, corners, or arbitrary points.
In constructing a gazetteer for the WP, we are having to choose appropriate footprints for features and to extract these footprints. The definition of an appropriate footprint for many classes of features is difficult. One must in general decide whether a single point, a simple polygon, a complex polygon, or some other representation is the most appropriate. Furthermore, there may be no unambiguous definition of the footprint of some feature (where does a mountain begin and end?). This ambiguity and fuzziness is inherent in a person's notion of the spatial extent of a feature, and is particularly difficult to specify. Finally, extracting and entering footprints into a gazetteer is expensive. Footprints may be generated manually, or from existing digital data, or from other ancillary information. An associated problem is finding and correcting errors in existing gazetteers.
Database support for the gazetteer information is currently provided by the ConQuest text-retrieval engine. A significant feature of ConQuest is its ability to handle fuzziness in the feature specifications. While the WP metadata are currently stored in the Sybase RDBMS, we are also implementing our metadata in the O2 object database, since many of the range values in informal interpretive mappings of interest are best represented as structured objects.
As the size of the metadatabase grows, it is critical to provide efficient support for different types of queries over the footprints with the use of appropriate spatial indexing methods. We have been exploring new methods of indexing multiply-nested spatial data [9] such as footprints. In particular, we have extended B-trees to ``IB-trees'' for handling data objects that span a range of values (intervals) rather than single-valued points in the data space. This allows two distinct approaches for indexing multidimensional hierarchical data. The first decomposes the d-dimensional data objects into d intervals, one per dimension, and indexes the intervals in each dimension separately. The second organizes all data objects at the same level, using standard spatial indexing methods.