Responsible party:
http://www.alexandria.ucsb.edu
1. Purpose
2. Overview
5. Treatment of geospatial description
6. Treatment of temporal description
7. Links to external sources of information
8. Treatment of attribution to source
of data
11. Availability and contact point
12. Acknowledgements
The ADL Gazetteer Content Standard (GCS) is designed to be a comprehensive framework for recording descriptions of named geographic places, including the core elements of toponyms (and their history), spatial location (in various representations), and classification (according to referenced typing schemes), and source attribution for pieces of description gathered from various resources for a particular place. The intention is to demonstrate the use of the GCS and promote its adoption and use so that gazetteer data created by various local, national, and international agencies, and by special knowledge groups, can be shared and, when gathered from various sources, understood. The GCS is designed to meet the needs of gazetteers containing current details of named geographic places and the needs of gazetteers containing historical data. It is designed to support international and multilingual applications. It is designed to link to other sources of information about a particular place. As a comprehensive structure for recording gazetteer descriptions, it can be considered to be an “archival” structure. Implementations of it for gazetteer services will include additional tables to support searching and report generation functions.
An underlying purpose is to direct attention to the components of description for named geographic places and to inform future developments of collections, database design, and services that link current and historical toponyms (the names we give to geographic places) to mapable locations (e.g., longitude & latitude coordinates) and that support the answering of queries such as “What schools are in the Tucson area?” because a typing scheme has been used to classify the entries.
A companion to the ADL GCS is the ADL Gazetteer Protocol (http://www.alexandria.ucsb.edu/gazetteer/protocol/) that provides a standard XML-based query and response structure for the machine-to-machine querying of distributed gazetteers. The protocol and an open-source Java-based server implementation are available through the ADL web pages. The protocol and the GCS are independent structures.
The GCS, version 3.2, was developed as an XML schema. From this, a relational database (rdb) logical model has been developed. An implementation has been developed for the PostgreSQL database software, with additional tables to support specific query matching and report generation requirements.
Sections of the GCS deal with
· Names and details of their origin, language, and use
· Classification (typing according to a referenced scheme)
· Codes associated with the place (e.g., FIPS code)
· Spatial location (bounding box and detailed geometries)
· Street address
· Relationships to other named places
· Data (e.g., population, elevation)
· Description (narrative)
· Links to external resources about the feature
· Other: supplemental note; entry metadata
A separate, companion XML schema is used to describe the contributors and their sources for pieces of data included in a gazetteer entry.
Views and files of the GCS include the following:
· HTML graphics of the XML schemas (.html files)
o GCS 3.2 (large file – please wait for it to load completely)
· XML schemas (.xsd files)
o GCS 3.2
o GCS 3.2 required elements and attributes only
o GCS 3.2 all elements and attributes
o Source 3.2 required elements and attributes only
o Source 3.2 all elements and attributes
Graphics of the relational database model are described below.
Time, attribution to source, and entry date are applicable throughout the GCS to pieces of information gathered from multiple sources about a particular place. Time is treated in a similar fashion to spatial location. Time can be represented as a time range (similar to the bounding box), as detailed time instances and ranges (similar to the spatial geometries), and also as a named time period. The time period of the feature itself (e.g., for a school building that no longer exists) as well as the time periods for names, spatial footprints, data, and classification (e.g., a building changes its use from a church to a school) can all be represented. A general temporal status is part of the time period representation, with current, former, and proposed as the three status values.
Attribution to source and entry date are represented in the XML schema as applicable to sections of the description; e.g., for a particular placename, a particular spatial footprint, a description, etc. In the rdb, this linking of source to data has been extended to most of the attributes in the whole gazetteer entry through the use of mirror tables where the source of each bit of data and its entry date can be represented.
The documentation of the source of pieces of information is structured as a separate XML schema and is integrated into the rdb model as a discrete set of tables with unique IDs for each distinct combination of contributor and the contributor’s source of reference. Linking a particular piece of information to a contributor and source is done with these source IDs.
The core elements (required elements of description) of the GCS are a small subset of the whole GCS. In the XML schema graphic, required elements appear in solid-lined boxes. For the rdb, we have created specific lite schema views of the structure which can be used as a starting point.
The Alexandria Digital Library Project, which
started in 1994, created the first ADL Gazetteer early in the project. After a
period of use and experimentation, a formal structure was created for
gazetteers – the first ADL Gazetteer Content Standard – and the ADL Gazetteer
was recreated using a relational database implementation based on the GCS.
Revisions to the first GCS have been ongoing as a result of consultations with
other potential implementers. In particular, the requirements of historical and
multilingual gazetteers were contributed by member of the Electronic Cultural
Atlas Initiative (ECAI) at
A gazetteer record using only the required elements of the GCS might look like the following. Please note that the record is presented here in a report format with customized element labels and without entry dates and attribution to source. The encoded geometry section is presented in XML format to make the point that this section is represented by an externally referenced scheme.
feature ID: 12123434
feature status: current
name:
primary display: true
name status: current
feature
class: populated
places
primary display: true
classification scheme:
name: ADL Feature Type Thesaurus
version:
class status: current
spatial location
planet: Earth
bounding box:
geodetic basis: WGS-84
west coordinate: -111.00278
east coordinate: -110,86778
south coordinate: 32.12278
north coordinate: 32.26883
how generated: calculated maximum and
minimum extent of detailed geometry
source geometry(ies): primary geometry
geometry(ies):
primary geometry: true
geometry status: current
reference link to external geometry: false
geometry coding scheme:
name: DLESE geospatial.xsd
version: 1
encoded geometry (example
only):
<geospatialCoverages>
<geospatialCoverage>
<body>
<planet>Earth</planet>
</body>
<geodeticDatumGlobalOrHorz>DLESE:WGS84</geodeticDatumGlobalOrHorz>
<projection type="DLESE:Mercator">Information about the
projection goes here.</projection>
<coordinateSystem type="DLESE:Geographic latitude and longitude">Information about the coordinate system goes here</coordinateSystem>
<detGeos>
<detGeo>
<typeDetGeo>Polygon</typeDetGeo>
<geoNumPts>5</geoNumPts>
<geoPtOrder>Clockwise</geoPtOrder>
<longLats>
<longLat longitude="-110.5" latitude="32.26883"/>
<longLat longitude="-110.86778" latitude="32.15"/>
<longLat longitude="-110.6" latitude="32.12278"/>
<longLat longitude="-111.00278" latitude="32.186"/>
<longLat longitude="-110.75" latitude="32.2"/>
</longLats>
<detSrcIDandURL URL="some URL">some source</detSrcIDandURL>
<detSrcDesc>Generalized polygon
derived from shapefile</detSrcDesc>
<detAccEst>+/- 5 mile perimeter</detAccEst>
<description>Extra information goes
here about the detailed geometry</description>
<detVert>
<geodeticDatumGlobalOrVert>DLESE:CGD28-CDN</geodeticDatumGlobalOrVert>
<vertBase>Average sea level</vertBase>
<vertMin units="feet (ft)">2410</vertMin>
<vertMax units="feet (ft)">2410</vertMax>
<vertAcc>Generalized point
elevation for
</detVert>
</detGeo>
</detGeos>
</geospatialCoverage>
</geospatialCoverages>
entry date: 2000-07-01
modification date: 2001-05-15
In this example, the core gazetteer elements of the feature’s name, classification, and spatial location are represented with some supporting information. This is all that is required by the GCS. The full gazetteer entry for this same place could include multiple placenames and details about each placename; multiple feature classes, possibly from different classification schemes; multiple spatial geometries from different sources or for different time periods; and much more. For any particular gazetteer entry, a selection of the non-required elements can be added.
Please note that some required elements can be treated as defaults; for example, planet = Earth and status = current (if the portion of historical information is minimal).
Also note that the encoded geometry shown above is an example (not complete) to show how an external geospatial description standard can be used to represent the encoded geometries needed for the gazetteer description.
For links to sample minimum and full XML records, click here.
For views of schema and xml files, go to views.
For views of the relational database model, go to section 9.
Required:
Optional:
Application:
The bounding box (aka minimum bounding rectangle) consists
of the maximum extent of the feature’s footprint on the Earth’s surface in
terms of longitude (east and west) and latitude (north and south). It is
required to support basic spatial query matching operations. Separate
coordinates for each side of the bounding box (e.g., west coordinate) are used
so that there is no confusion when the box extends across the 180º meridian.
The specific
elements of description for detailed geometries are not spelled out in the GCS.
Instead, the details of the geometries are to be expressed according to a
public geospatial representation standard, such as the Geography Markup
Language (OpenGIS), the FGDC’s
Content Standard for Digital Geospatial Metadata, or ISO’s TC 211 Geography
Metadata standard. For the GCS, this is an opaque description to be interpreted
by the referenced geospatial coding standard.
The detailed
geometry representation can be included in the gazetteer entry or it can be
held external to the gazetteer database and referenced through a URL. In either
case, the documentation about the format of the representation must be clear
enough for correct computer interpretation.
Best practices for
detailed geometries are that the following attributes be included:
For views of schema and xml files, go to views.
For views of the relational database model, go to section 9.
Required:
Optional:
Application:
In this version of the GCS, the temporal aspects of a gazetteer entry have been designed to mirror the treatment of the spatial aspects. In both cases, there is a generalized representation (the bounding box and the time range) and detailed representations. Beyond this basic common high-level structure, the treatment of time is distinct because time applies to many aspects of a gazetteer entry and because often the beginning and ending dates are not known, only that the time in question is current or former (e.g., historical) or, to make the set complete, proposed (e.g., a shopping center).
Also, for time there doesn’t seem to be an external standard for the representation of time that covers the needs of the gazetteer. Therefore, a descriptive structure for time representation has been designed for the GCS. It includes the date range as a generalized temporal footprint, the statement of uncertainty for the detailed times, and the association of named time periods.
In anticipation that there will be web-accessible schemes, like gazetteers, that define named time periods in terms of date ranges, the structure for including named time periods allows for linking to an external scheme as the source of the named time period definition.
In the GCS and its associated relational database, the time component is normalized and linked to other components. That is, the treatment of time is consistent wherever it is used in the gazetteer entry.
Best practice is to add whatever dates are known to be associated with the feature or one of its descriptive aspects, even if the dates are not precise (e.g., only expressed to the decade or the century). This information will support some degree of searching and display by date range.
For views of schema and xml files, go to views.
For views of the relational database model, go to section 9.
Where there are data sources that supplement the information included in the gazetteer, the GCS provides elements that can be used to link to these external resources. This version of the GCS provides the following linking elements (all are optional and repeatable):
A basic tenet of the GCS is that there will be one gazetteer entry for a particular named geographic location. That is, there will not be more than one entry for the same place. Therefore, information about a place that comes from different sources will be merged into a single record. It is important that the source of the different pieces of information be traceable back to a particular contributor and reference source.
Source identification consists of two parts:
Each ADL Gazetteer Source entry is uniquely identified with a mnemonic (e.g. “USGS-GNIS-1”) and by a system-assigned ID number. This ID number is associated with individual pieces of data in a gazetteer entry.
In the rdb model, attribution to sources has been implemented through mirror tables. The result is that attribution can be associated with each row in each column of the main tables. This is an expansion from the basic attribution included in the XML schema and provides a comprehensive solution for tracing bits of information back to the contributor and reference source. The mirror tables also include the entry date for each piece of information.
For views of schema and xml files, go to views.
For views of the relational database model, go to section 9.
Graphics showing parts of and the whole relational database model
For views of schema and xml files, go to views.
During the summer of 2003, the rdb logical model will be implemented as a DB2 database. Tables needed to support searching and report generation will be added as needed. The existing ADL Gazetteer database will be converted to the new schema and database model and the existing clients and services will be moved to access the new database.
Links to the schemas and the relational database model are elsewhere in this document.
The primary contact point for further information is Linda Hill, lhill@alexandria.ucsb.edu.
The development of the ADL Gazetteer Content Standard and the implementation of the ADL Gazetteer and its associated services have been funded primarily by grants from the National Science Foundation through its Digital Library Program. In addition, funds have been provided by NASA, ESRI, and the Digital Library for Earth System Education (DLESE).
The ADL Gazetteer Development Team includes
Jim Frew
Jordan Hastings
Havår Valeur
Linda Hill
Greg Janée
David Valentine
Pilar Montes developed the relational database model on a contracting basis with ADL.
Many have contributed to the design and contents of the GCS
through their feedback to early versions. In particular, the Electronic
Cultural Atlas Initiative (ECAI) at
ADL Gazetteer Development web page: http://www.alexandria.ucsb.edu/gazetteer/
ADL Gazetteer Protocol: http://www.alexandria.ucsb.edu/gazetteer/protocol/
ADL Gazetteer publications: http://www.alexandria.ucsb.edu/gazetteer/#pubs