This document describes a protocol for accessing general-purpose gazetteer services.
A gazetteer is a dictionary of geographic placenames. Gazetteers have traditionally appeared as back-of-the-book indexes in atlases; as place encyclopedias, such as the Columbia Gazetteer of the World; as thesauri, such as the Getty Thesaurus of Geographic Names; and as toponymic authority files, such as NIMA's GEOnet Names Server and the U.S. Geological Survey's Geographic Names Information System. In an atlas, a gazetteer provides an alphabetical list of the placenames that appear in the atlas, and it maps those names to page numbers and map grid locations. Place encyclopedias often include descriptive information for locations, as does the Getty Thesaurus, and sometimes include latitude/longitude coordinates as well. Toponymic authority files focus on differentiating official placenames versus variant names, and they associate names with coordinate locations primarily for disambiguation purposes. Other toponymic reference works publish scholarly information about the origins of geographic names.
A digital gazetteer builds on these traditional gazetteers. It maps geographic placenames (the names of natural features such as mountains and lakes and the names of human constructs such as cities and states) to coordinate-based geographic locations. The services it provides are largely oriented around searching: answering "Where is...?" queries given all or a portion of a geographic name ("Where is the place named 'Santa Barbara'?") and "What's there?" queries which return all places, or all places of a specified class, within a given region ("What schools are in Santa Barbara County?"). Digital gazetteers augment traditional gazetteers by providing bidirectional mappings among placenames, map locations, and classifications. And they expand on the notion of named geographic features to include virtually any category of feature that can be geolocated (e.g., weather events such as hurricanes), any type of name or label for a place (e.g., postal codes and UTM grid names), and names with only local or specialized scope (e.g., research study areas). Descriptive information and associated data (e.g., population and elevation) can also be included in digital gazetteers. The ADL gazetteer protocol builds on this generalized concept of what a gazetteer is.
This document first semi-formally defines an abstract model of a gazetteer. That model is then used as the basis for defining a set of services (i.e., a set of network-invokable functions), several report formats, and a query language.
A caveat: the gazetteer protocol described herein provides relatively low-level services. The services are intended to be simple enough that they can be implemented by all gazetteers, yet powerful enough to be useful to clients both in their own right and for combining into higher-level services. To get a sense of the level of this specification, consider the common gazetteer functionality of finding places by entering qualified placenames, as in "find 'Santa Barbara, California'". The ADL gazetteer protocol does not provide such high-level functionality, but it does provide sufficient building blocks for achieving that functionality. Specifically, the protocol supports 1) finding a place named "California" belonging to class "states"; 2) disambiguation in the case of multiple returns; and 3) finding a place named "Santa Barbara" that is contained within the place named "California".
In this section we semi-formally define an abstract model of a gazetteer. The ADL gazetteer protocol is built on (i.e., is written against) this model.
A gazetteer is a set of gazetteer entries. There is no intrinsic structure to a gazetteer beyond simple containment of gazetteer entries, although relationships between entries may be explicitly represented by the gazetteer (see below).
A gazetteer entry describes a single, conceptual geographic place by:
and several key attributes of the place:
There should be a one-to-one correspondence between gazetteer entries and conceptual places (i.e., two gazetteer entries should not describe the same place) but, strictly speaking, this is not required or enforced by the gazetteer protocol.
A gazetteer entry's identifier and its zero or more codes are all strings that unambiguously identify the entry or place. The identifier identifies the entry within the gazetteer; it need not be universally unique. A code identifies the place within a specified code scheme, namespace, or system. For example, the state of California is identified by FIPS 5-2 code "06".
The place status is the status of the place's existence, and may be former (the place itself no longer exists), current (the place exists), or proposed (the place does not exist, but its creation is anticipated). For example, the place status of the now-nonexistent country of Yugoslavia would be former.
A name is a complete, unqualified name for the place. For example, the name of the city of Los Angeles is "Los Angeles", not "Los Angeles, California". A gazetteer entry can have more than one name, in which case the names may denote alternative names for the place (e.g., the city "Köln" is also known as "Cologne") or varying names over time (e.g., the country "Thailand" was formerly known as "Siam").
A footprint is an approximation, expressed in latitude/longitude coordinates, of the subset of the Earth's surface occupied by the place. Note that a footprint need not be contiguous. For example, a footprint for the state of Hawaii might consist of a union of disjoint polygons, one per island. A gazetteer entry can have more than one footprint, though the semantics of this are undefined by the protocol.
A class classifies the place with respect to a set of terms. More specifically, a class is the association of the place with a term drawn from a simple vocabulary of terms or thesaurus (a vocabulary augmented with inter-term relationships). A gazetteer entry may belong to multiple classes, and even to multiple classes from the same thesaurus. Note that if a gazetteer consists of a single class of places (consider "The Knopf Gazetteer of Cemeteries of the Southwest"), its entries will not be considered to be classified for the purposes of the protocol unless each entry carries the classification for searching and reporting purposes.
Certain attribute values of a gazetteer entry (namely, each name, each footprint, and each class) are further qualified using two qualifiers. The primary qualifier, a boolean, indicates if the attribute is the preferred or official value. For example, a gazetteer entry for the city Köln may mark the name "Köln" as primary but not "Cologne". The status qualifier indicates the validity of the attribute value using the same terms (former, current, proposed) as the entry's place status attribute. The place status attribute and the status qualifier on attribute values should not be confused; the former refers to the place as a whole, the latter to just the attribute value. For example, a gazetteer entry for the country Thailand may have the place status current but qualify the name "Siam" as former.
For each gazetteer entry, the following conditions on qualifiers must hold:
Finally, a gazetteer may be augmented with inter-entry
relationships. A relationship is a named, directed, binary
association between gazetteer entries. For example, a gazetteer might
support a capital-of relationship which relates capital
cities and administrative areas: the city of Sacramento is the capital
of the state of California, and so forth. (Note that the ADL
gazetteer protocol defines the necessary structures to support
relationships in general, but it does not define any particular
relationships, just as it does not define any particular
classification scheme.)
Functionally speaking, the ADL gazetteer protocol consists of the
following three independent, stateless services. Each service follows
the classical model of function invocation: zero or more arguments are
passed to the service, the service executes synchronously, and a
result and/or an error indication is returned. Support for the
get-capabilities service is mandatory; the other services
are optional. Clients should anticipate that gazetteers may apply
different access control policies to different services.
<-
get-capabilities()Returns a description of the overall capabilities of the gazetteer (the services and query types the gazetteer supports, the thesauri the gazetteer uses, etc.). See Capabilities below.
<-
query(query,
{"standard"|"extended"} [, geometry
language])Returns reports for the gazetteer entries selected by a query. query is a query expressed in the gazetteer query language; see Query language below. Either standard or extended reports may be returned; see Reports below. The geometry language used in the reports may optionally be requested. The geometry language(s) and the subset of the query language that the gazetteer supports are described in the gazetteer's capabilities; see Capabilities below. Clients should anticipate that a gazetteer may return an error indication in response to a nominally supported query due to implementation limitations. Also, a gazetteer may return both reports and an error indication, as when an internal result limit is reached during otherwise successful query processing.
<-
download({"standard"|"extended"} [,
geometry language])Similar to the query service, the
download service returns standard or extended reports for
every entry in the gazetteer.
An XML-over-HTTP implementation of the services is described next. In this formulation, a gazetteer service is invoked by submitting an HTTP POST request to a URL representing the gazetteer's common access point for all services. The format and discovery of this URL are outside the scope of the protocol.
Both service requests and service responses must have MIME content
type text/xml and consist of a single
<gazetteer-service> element in namespace
"http://www.alexandria.ucsb.edu/gazetteer". The
version attribute of this element indicates the version
of the gazetteer protocol used by the client (in requests) or the
gazetteer implementation (in responses).
In a service request, the <gazetteer-service>
element must contain a single subelement expressing the request.
Subelement <S-request>
corresponds to service S above, e.g., subelement
<get-capabilities-request> corresponds to the
get-capabilities service. Arguments to the request, if
any, are encoded as subelements of the request subelement.
In a service response, the <gazetteer-service>
element must contain a single subelement containing the response.
Similar to requests, subelement
<S-response> corresponds to
service S. Each response subelement contains optional,
service-specific, "successful" content (e.g., reports in the case of
the query service) and an optional
<error> subelement that describes a service
processing error by an implementation-specific code and/or text
description. An implementation may return both successful content
and an error, such as when a query is successfully processed
and results are successfully returned, but the number of results
returned is limited due to an implementation constraint.
Gazetteer implementations should generally return HTTP status code 200 (OK), and should use HTTP error codes only for low-level errors such as syntactically malformed requests and authentication problems. Higher-level errors should be returned using the mechanism described above.
| gazetteer-service.xsd |
|---|
<?xml version="1.0" encoding="UTF-8"?> |
An example of a service request is shown below. The request asks a gazetteer for standard reports for all populated places whose names contain the phrase "las vegas".
<?xml version="1.0" encoding="UTF-8"?> |
A possible successful response to the above request is shown below.
The response contains a single standard report for a place named "Las
Vegas", formerly known as "Sin City". The successfulness of the
response is indicated by the lack of an <error>
subelement.
<?xml version="1.0" encoding="UTF-8"?> |
Finally, here's a possible error response to the above request:
<?xml version="1.0" encoding="UTF-8"?> |
The ADL gazetteer protocol is defined in terms of the relatively simple abstract model given in Gazetteer model above. In practice, however, gazetteer implementations will typically be able to represent more elaborate information about geographic places and model more complex relationships between and among gazetteer entries and attributes. To allow clients to take advantage of such information in a structured manner, the gazetteer protocol defines two transfer formats for gazetteer entries: the standard report and the extended report.
The extended report of a gazetteer entry is a gazetteer-specific format; its actual structure is undefined by the gazetteer protocol. The intention is that all of the information a gazetteer possesses about an entry be representable by the format. If a gazetteer supports extended reports, the report format must be defined by an XML schema; see Capabilities below.
The standard report of a gazetteer entry corresponds to
the abstract gazetteer model. An XML schema for the report format is
listed below. The schema defines element
<gazetteer-standard-report> in namespace
"http://www.alexandria.ucsb.edu/gazetteer". Subelements
<identifier>, <codes>,
<place-status>, <names>,
<footprints>, <classes>, and
<relationships> and element attributes
primary and status correspond directly to
the model.
For the convenience of gazetteer clients, the standard report
includes two additional required elements and one additional required
attribute. Element <display-name> is the entry's
primary name as it is commonly displayed, typically including
qualifications. For example, the display name for the city of Las
Vegas might be "Las Vegas, Clark County, Nevada". Element
<bounding-box> is the bounding box (i.e., the
smallest enclosing graticule-aligned rectangle) of the entry's primary
footprint. And in the <relationship> element, the
target-name attribute is the target gazetteer entry's
primary name. In a slight extension to the abstract gazetteer model,
the <relationship> element's
target-identifier attribute may be omitted, thereby
allowing a gazetteer entry to have a relationship to a place not
represented in the gazetteer.
Each footprint in a standard report may be described either
directly using a <footprint> element or indirectly
using a <footprint-reference> element. In the
direct case the footprint is defined as a single subelement (the
"footprint-defining element") of the <footprint>
element. In the indirect case, the footprint-defining element is
indirectly referred to by a URL, and the optional
geometry-type and num-points attributes can
be used to give clients an indication of the size and type of the
footprint. Attribute geometry-type, if present, must be
the unqualified XML name of the footprint-defining element and
num-points must be the number of points in the
geometry.
In both of the above cases, the possible footprint-defining
elements may be drawn from the Open GIS Consortium's Geography Markup
Language (GML), version 2, or from another geometry language
supported by the gazetteer; see Capabilities, below. Support for GML is
mandatory. GML's footprint-defining elements
(<gml:Box> and elements in class
gml:_Geometry) are defined in terms of an abstract
Cartesian coordinate system, but we mandate here that the coordinate
system must be the WGS84 latitude/longitude coordinate system.
Specifically, the first (X) coordinate must be longitude in signed
decimal degrees east of the Greenwich meridian and the second (Y)
coordinate must be latitude in signed decimal degrees north of the
equator. Longitudes must be in the range [-180,180] except in a
<gml:Box> element, where exactly one of the
longitudinal coordinates may be outside this range to indicate that
the box crosses the ±180 meridian.
| gazetteer-standard-report.xsd |
|---|
<?xml version="1.0" encoding="UTF-8"?> |
Here's an example of a standard report with an indirect footprint:
<?xml version="1.0" encoding="UTF-8"?> |
The footprint corresponding to the above example might like something like this:
<?xml version="1.0" encoding="UTF-8"?> |
The query service, described under Services above, returns all gazetteer entries
that satisfy one or more constraints placed against entry attributes.
The constraints are expressed in the form of a language.
The gazetteer query language consists of boolean combinations (and, or, and and not) of seven types of queries. Support for any given type of query is optional. The query types are as follows:
identifier-query
identifierReturns the gazetteer entry identified by identifier.
code-query [scheme]
codeReturns the gazetteer entry identified by code code. If scheme is given, it indicates the code's scheme, and matching occurs only against like codes; otherwise, matching occurs against all codes. A code query in which the scheme is unsupported or unrecognized by the gazetteer must not be treated as erroneous, but should simply yield zero results.
place-status-query statusReturns all gazetteer entries whose place status matches
status, which must be former,
current, or proposed.
name-query operator
textReturns all gazetteer entries having at least one name that matches text according to text-matching operator operator. If a gazetteer supports name queries, it must support the following operator:
equalsOther text-matching operators gazetteers are encouraged to support include:
contains-all-wordscontains-any-wordscontains-phrasematches-pattern*") in text matches
zero or more characters and a question mark ("?") matches
any single character. Note that a gazetteer implementation may limit
the regular expressions it accepts. For example, a gazetteer may
support right truncation only (i.e., it may accept asterisks only at
the end of text).The semantics of all of the above operators have deliberately been left somewhat fuzzy to accommodate differing implementations. Specifically, exactly what constitutes a word is left undefined, and it is unspecified whether the gazetteer implementation employs word stemming or other fuzzy word matching techniques. In any case, the above operators should be case-insensitive.
footprint-query
operator
{polygon|box|identifier}Returns all gazetteer entries having a footprint that matches a query region according to spatial operator operator. (If a gazetteer entry has multiple footprints, it is unspecified by the protocol which footprint(s) are used for matching.) The query region may take any of the three forms listed next; note that support for any given form is optional.
If a gazetteer supports footprint queries, it must support the following operator:
withinOther spatial operators gazetteers are encouraged to support include:
containsoverlapsA gazetteer implementation may limit the query regions it accepts. For example, an implementation may disallow polygons that enclose a pole. Also, an implementation may support matching on footprint bounding boxes only.
class-query thesaurus
termReturns all gazetteer entries belonging to class term, or any subclass of term recursively (if the gazetteer supports subclasses or thesaurus relationships), where term is a term drawn from a thesaurus or simple vocabulary associated with the gazetteer. For example, if class "capital cities" is a subclass (i.e., a specialization) of class "cities", then a class query of "cities" will return all cities (capital and not) whereas a query of "capital cities" will return only capital cities.
relationship-query
relation target-identifierReturns all gazetteer entries having relationship relation
to a target gazetteer entry identified by target-identifier.
Note that a gazetteer must not consider a relationship query with an
inappropriate target to be malformed or erroneous. For example,
suppose a gazetteer supports the capital-of relationship,
but only for target gazetteer entries that are countries. A
relationship query in which the target is a cemetery is not to be
considered malformed, but should simply yield zero results.
Clients should be aware that a gazetteer implementation may not be able to search over all attribute values of a gazetteer entry. For example, an implementation may be able to search over primary names only.
An XML schema for the gazetteer query language is listed below.
The schema defines element <gazetteer-query> in
namespace "http://www.alexandria.ucsb.edu/gazetteer".
Subelements <identifier-query>,
<code-query>,
<place-status-query>,
<name-query>, <footprint-query>,
<class-query>, and
<relationship-query> correspond to the query types
described above. The elements <and>,
<or>, and <and-not> support
boolean combinations of queries.
Query regions in footprint queries may be specified using the Open GIS Consortium's Geography Markup
Language (GML), version 2, or another geometry language supported
by the gazetteer; see Capabilities, below.
Support for GML is mandatory. GML defines the
<gml:Box> and <gml:Polygon>
elements in terms of an abstract Cartesian coordinate system, but we
mandate here that the coordinate system must be the WGS84
latitude/longitude coordinate system. Specifically, the first (X)
coordinate must be longitude in signed decimal degrees east of the
Greenwich meridian and the second (Y) coordinate must be latitude in
signed decimal degrees north of the equator. Longitudes must be in
the range [-180,180] except in a <gml:Box> element,
where exactly one of the longitudinal coordinates may be outside this
range to indicate that the box crosses the ±180 meridian.
| gazetteer-query.xsd |
|---|
<?xml version="1.0" encoding="UTF-8"?> |
An example of a gazetteer query is shown below. This example requests all currently existing places whose names contain the phrase "santa barbara" and that overlap a given spatial region, and that are neither populated places nor cemeteries. A place named "Santa Barbara County Hospital" might match such a query.
<?xml version="1.0" encoding="UTF-8"?> |
The get-capabilities service described under Services above returns a description of a
gazetteer's overall capabilities. An XML schema for the description
is listed below. The schema defines element
<gazetteer-capabilities> in namespace
"http://www.alexandria.ucsb.edu/gazetteer". Within this
element are the following subelements:
<version><name><description><ADL-collection-metadata><extended-report-schema><code-schemes><thesauri><relationships><other-geometry-languages><services><maximum-query-results><query-types><name-query-operators><footprint-query-operators> and
<footprint-query-operands>| gazetteer-capabilities.xsd |
|---|
<?xml version="1.0" encoding="UTF-8"?> |
Here's an example of a gazetteer capabilities description:
<?xml version="1.0" encoding="UTF-8"?> |
The gazetteer protocol is formally defined by a main XML schema:
and four subschemas:
the last of which is displayed below.
Applications that use this protocol can and should reference only
the main schema, as it implicitly includes the others. The canonical
URL prefix at which all protocol schemas reside is
"http://www.alexandria.ucsb.edu/gazetteer/protocol/".
| gazetteer-types.xsd |
|---|
<?xml version="1.0" encoding="UTF-8"?> |
add-entry, relate-entries, and
remove-entry). Changed the success indicator from a nil
<error> subelement to the absence of an
<error> subelement.<codes>,
<place-status>, and
<display-name> subelements. Replaced the
historical attribute with the tri-valued
status attribute. Renamed the
<relationship> subelement's name and
identifier attributes to relation and
target-identifier, respectively, and added the
target-name attribute. Relaxed the requirement that the
target of a relationship must be another gazetteer entry; it may now
be just a name. Added a note on the interpretation of out-of-range
longitudinal coordinates.<code-query> and
<place-status-query>. Renamed the
<relationship-query> attributes to
relation and target-identifier. Added a
note on the interpretation of out-of-range longitudinal
coordinates.<name>,
<ADL-collection-metadata>,
<code-schemes>, and
<maximum-query-results>. Added a
place-status attribute to the
<query-types> subelement. Removed the
add-entry, relate-entries, and
remove-entry attributes from the
<services> subelement.<description> subelement to
the <gazetteer-capabilities> element.created 2003-09-19; last modified 2009-11-19 21:17