The ADL Gazetteer Protocol

Greg Janée, Linda L. Hill
Alexandria Digital Library Project

Version 1.2

Contents

Introduction

This document describes a protocol for accessing general-purpose gazetteer services.

A gazetteer is a dictionary of geographic placenames. Gazetteers have traditionally appeared as back-of-the-book indexes in atlases; as place encyclopedias, such as the Columbia Gazetteer of the World; as thesauri, such as the Getty Thesaurus of Geographic Names; and as toponymic authority files, such as NIMA's GEOnet Names Server and the U.S. Geological Survey's Geographic Names Information System. In an atlas, a gazetteer provides an alphabetical list of the placenames that appear in the atlas, and it maps those names to page numbers and map grid locations. Place encyclopedias often include descriptive information for locations, as does the Getty Thesaurus, and sometimes include latitude/longitude coordinates as well. Toponymic authority files focus on differentiating official placenames versus variant names, and they associate names with coordinate locations primarily for disambiguation purposes. Other toponymic reference works publish scholarly information about the origins of geographic names.

A digital gazetteer builds on these traditional gazetteers. It maps geographic placenames (the names of natural features such as mountains and lakes and the names of human constructs such as cities and states) to coordinate-based geographic locations. The services it provides are largely oriented around searching: answering "Where is...?" queries given all or a portion of a geographic name ("Where is the place named 'Santa Barbara'?") and "What's there?" queries which return all places, or all places of a specified class, within a given region ("What schools are in Santa Barbara County?"). Digital gazetteers augment traditional gazetteers by providing bidirectional mappings among placenames, map locations, and classifications. And they expand on the notion of named geographic features to include virtually any category of feature that can be geolocated (e.g., weather events such as hurricanes), any type of name or label for a place (e.g., postal codes and UTM grid names), and names with only local or specialized scope (e.g., research study areas). Descriptive information and associated data (e.g., population and elevation) can also be included in digital gazetteers. The ADL gazetteer protocol builds on this generalized concept of what a gazetteer is.

This document first semi-formally defines an abstract model of a gazetteer. That model is then used as the basis for defining a set of services (i.e., a set of network-invokable functions), several report formats, and a query language.

A caveat: the gazetteer protocol described herein provides relatively low-level services. The services are intended to be simple enough that they can be implemented by all gazetteers, yet powerful enough to be useful to clients both in their own right and for combining into higher-level services. To get a sense of the level of this specification, consider the common gazetteer functionality of finding places by entering qualified placenames, as in "find 'Santa Barbara, California'". The ADL gazetteer protocol does not provide such high-level functionality, but it does provide sufficient building blocks for achieving that functionality. Specifically, the protocol supports 1) finding a place named "California" belonging to class "states"; 2) disambiguation in the case of multiple returns; and 3) finding a place named "Santa Barbara" that is contained within the place named "California".

Gazetteer model

In this section we semi-formally define an abstract model of a gazetteer. The ADL gazetteer protocol is built on (i.e., is written against) this model.

A gazetteer is a set of gazetteer entries. There is no intrinsic structure to a gazetteer beyond simple containment of gazetteer entries, although relationships between entries may be explicitly represented by the gazetteer (see below).

A gazetteer entry describes a single, conceptual geographic place by:

and several key attributes of the place:

There should be a one-to-one correspondence between gazetteer entries and conceptual places (i.e., two gazetteer entries should not describe the same place) but, strictly speaking, this is not required or enforced by the gazetteer protocol.

A gazetteer entry's identifier and its zero or more codes are all strings that unambiguously identify the entry or place. The identifier identifies the entry within the gazetteer; it need not be universally unique. A code identifies the place within a specified code scheme, namespace, or system. For example, the state of California is identified by FIPS 5-2 code "06".

The place status is the status of the place's existence, and may be former (the place itself no longer exists), current (the place exists), or proposed (the place does not exist, but its creation is anticipated). For example, the place status of the now-nonexistent country of Yugoslavia would be former.

A name is a complete, unqualified name for the place. For example, the name of the city of Los Angeles is "Los Angeles", not "Los Angeles, California". A gazetteer entry can have more than one name, in which case the names may denote alternative names for the place (e.g., the city "Köln" is also known as "Cologne") or varying names over time (e.g., the country "Thailand" was formerly known as "Siam").

A footprint is an approximation, expressed in latitude/longitude coordinates, of the subset of the Earth's surface occupied by the place. Note that a footprint need not be contiguous. For example, a footprint for the state of Hawaii might consist of a union of disjoint polygons, one per island. A gazetteer entry can have more than one footprint, though the semantics of this are undefined by the protocol.

A class classifies the place with respect to a set of terms. More specifically, a class is the association of the place with a term drawn from a simple vocabulary of terms or thesaurus (a vocabulary augmented with inter-term relationships). A gazetteer entry may belong to multiple classes, and even to multiple classes from the same thesaurus. Note that if a gazetteer consists of a single class of places (consider "The Knopf Gazetteer of Cemeteries of the Southwest"), its entries will not be considered to be classified for the purposes of the protocol unless each entry carries the classification for searching and reporting purposes.

Certain attribute values of a gazetteer entry (namely, each name, each footprint, and each class) are further qualified using two qualifiers. The primary qualifier, a boolean, indicates if the attribute is the preferred or official value. For example, a gazetteer entry for the city Köln may mark the name "Köln" as primary but not "Cologne". The status qualifier indicates the validity of the attribute value using the same terms (former, current, proposed) as the entry's place status attribute. The place status attribute and the status qualifier on attribute values should not be confused; the former refers to the place as a whole, the latter to just the attribute value. For example, a gazetteer entry for the country Thailand may have the place status current but qualify the name "Siam" as former.

For each gazetteer entry, the following conditions on qualifiers must hold:

Finally, a gazetteer may be augmented with inter-entry relationships. A relationship is a named, directed, binary association between gazetteer entries. For example, a gazetteer might support a capital-of relationship which relates capital cities and administrative areas: the city of Sacramento is the capital of the state of California, and so forth. (Note that the ADL gazetteer protocol defines the necessary structures to support relationships in general, but it does not define any particular relationships, just as it does not define any particular classification scheme.)

Services

Functionally speaking, the ADL gazetteer protocol consists of the following three independent, stateless services. Each service follows the classical model of function invocation: zero or more arguments are passed to the service, the service executes synchronously, and a result and/or an error indication is returned. Support for the get-capabilities service is mandatory; the other services are optional. Clients should anticipate that gazetteers may apply different access control policies to different services.

capabilities description <- get-capabilities()

Returns a description of the overall capabilities of the gazetteer (the services and query types the gazetteer supports, the thesauri the gazetteer uses, etc.). See Capabilities below.

reports <- query(query, {"standard"|"extended"} [, geometry language])

Returns reports for the gazetteer entries selected by a query. query is a query expressed in the gazetteer query language; see Query language below. Either standard or extended reports may be returned; see Reports below. The geometry language used in the reports may optionally be requested. The geometry language(s) and the subset of the query language that the gazetteer supports are described in the gazetteer's capabilities; see Capabilities below. Clients should anticipate that a gazetteer may return an error indication in response to a nominally supported query due to implementation limitations. Also, a gazetteer may return both reports and an error indication, as when an internal result limit is reached during otherwise successful query processing.

reports <- download({"standard"|"extended"} [, geometry language])

Similar to the query service, the download service returns standard or extended reports for every entry in the gazetteer.

An XML-over-HTTP implementation of the services is described next. In this formulation, a gazetteer service is invoked by submitting an HTTP POST request to a URL representing the gazetteer's common access point for all services. The format and discovery of this URL are outside the scope of the protocol.

Both service requests and service responses must have MIME content type text/xml and consist of a single <gazetteer-service> element in namespace "http://www.alexandria.ucsb.edu/gazetteer". The version attribute of this element indicates the version of the gazetteer protocol used by the client (in requests) or the gazetteer implementation (in responses).

In a service request, the <gazetteer-service> element must contain a single subelement expressing the request. Subelement <S-request> corresponds to service S above, e.g., subelement <get-capabilities-request> corresponds to the get-capabilities service. Arguments to the request, if any, are encoded as subelements of the request subelement.

In a service response, the <gazetteer-service> element must contain a single subelement containing the response. Similar to requests, subelement <S-response> corresponds to service S. Each response subelement contains optional, service-specific, "successful" content (e.g., reports in the case of the query service) and an optional <error> subelement that describes a service processing error by an implementation-specific code and/or text description. An implementation may return both successful content and an error, such as when a query is successfully processed and results are successfully returned, but the number of results returned is limited due to an implementation constraint.

Gazetteer implementations should generally return HTTP status code 200 (OK), and should use HTTP error codes only for low-level errors such as syntactically malformed requests and authentication problems. Higher-level errors should be returned using the mechanism described above.

gazetteer-service.xsd
<?xml version="1.0" encoding="UTF-8"?>

<schema xmlns="http://www.w3.org/2001/XMLSchema"
  xmlns:gaz="http://www.alexandria.ucsb.edu/gazetteer"
  targetNamespace="http://www.alexandria.ucsb.edu/gazetteer"
  elementFormDefault="qualified">

<include schemaLocation="gazetteer-capabilities.xsd"/>
<include schemaLocation="gazetteer-query.xsd"/>
<include schemaLocation="gazetteer-standard-report.xsd"/>
<include schemaLocation="gazetteer-types.xsd"/>

<element name="gazetteer-service">
  <complexType>
    <choice>
      <element ref="gaz:get-capabilities-request"/>
      <element ref="gaz:get-capabilities-response"/>
      <element ref="gaz:query-request"/>
      <element ref="gaz:query-response"/>
      <element ref="gaz:download-request"/>
      <element ref="gaz:download-response"/>
    </choice>
    <attribute name="version" type="string" use="required"/>
  </complexType>
</element>

<element name="get-capabilities-request">
  <complexType/>
</element>

<element name="get-capabilities-response">
  <complexType>
    <sequence>
      <element ref="gaz:gazetteer-capabilities"
        minOccurs="0"/>
      <element ref="gaz:error" minOccurs="0"/>
    </sequence>
  </complexType>
</element>

<element name="query-request">
  <complexType>
    <sequence>
      <element ref="gaz:gazetteer-query"/>
      <element name="report-format"
        type="gaz:report-format-type"/>
      <element name="geometry-language" type="anyURI"
        minOccurs="0"/>
    </sequence>
  </complexType>
</element>

<element name="query-response">
  <complexType>
    <sequence>
      <choice minOccurs="0">
        <element name="standard-reports">
          <complexType>
            <sequence>
              <element ref="gaz:gazetteer-standard-report"
                minOccurs="0" maxOccurs="unbounded"/>
            </sequence>
          </complexType>
        </element>
        <element name="extended-reports">
          <complexType>
            <sequence>
              <any processContents="lax" minOccurs="0"
                maxOccurs="unbounded"/>
            </sequence>
          </complexType>
        </element>
      </choice>
      <element ref="gaz:error" minOccurs="0"/>
    </sequence>
  </complexType>
</element>

<element name="download-request">
  <complexType>
    <sequence>
      <element name="report-format"
        type="gaz:report-format-type"/>
      <element name="geometry-language" type="anyURI"
        minOccurs="0"/>
    </sequence>
  </complexType>
</element>

<element name="download-response">
  <complexType>
    <sequence>
      <choice minOccurs="0">
        <element name="standard-reports">
          <complexType>
            <sequence>
              <element ref="gaz:gazetteer-standard-report"
                minOccurs="0" maxOccurs="unbounded"/>
            </sequence>
          </complexType>
        </element>
        <element name="extended-reports">
          <complexType>
            <sequence>
              <any processContents="lax" minOccurs="0"
                maxOccurs="unbounded"/>
            </sequence>
          </complexType>
        </element>
      </choice>
      <element ref="gaz:error" minOccurs="0"/>
    </sequence>
  </complexType>
</element>

<element name="error">
  <complexType>
    <sequence>
      <element name="code" type="string" minOccurs="0"/>
      <element name="description" type="string"
        minOccurs="0"/>
    </sequence>
  </complexType>
</element>

</schema>

An example of a service request is shown below. The request asks a gazetteer for standard reports for all populated places whose names contain the phrase "las vegas".

<?xml version="1.0" encoding="UTF-8"?>

<gazetteer-service
  xmlns="http://www.alexandria.ucsb.edu/gazetteer"
  version="1.2">

<query-request>
  <gazetteer-query>
    <and>
      <name-query operator="contains-phrase"
        text="las vegas"/>
      <class-query thesaurus="ADL Feature Type Thesaurus"
        term="populated places"/>
    </and>
  </gazetteer-query>
  <report-format>standard</report-format>
</query-request>

</gazetteer-service>

A possible successful response to the above request is shown below. The response contains a single standard report for a place named "Las Vegas", formerly known as "Sin City". The successfulness of the response is indicated by the lack of an <error> subelement.

<?xml version="1.0" encoding="UTF-8"?>

<gazetteer-service
  xmlns="http://www.alexandria.ucsb.edu/gazetteer"
  xmlns:gml="http://www.opengis.net/gml"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  version="1.2">

<query-response>

  <standard-reports>
    <gazetteer-standard-report>
      <identifier>1001652</identifier>
      <codes>
        <code scheme="FIPS 55-3">40000</code>
      </codes>
      <place-status>current</place-status>
      <display-name>Las Vegas, Nevada</display-name>
      <names>
        <name primary="true">Las Vegas</name>
        <name status="former">Sin City</name>
      </names>
      <bounding-box>
        <gml:coord>
          <gml:X>-115.25</gml:X>
          <gml:Y>36.15</gml:Y>
        </gml:coord>
        <gml:coord>
          <gml:X>-115.12</gml:X>
          <gml:Y>36.25</gml:Y>
        </gml:coord>
      </bounding-box>
      <footprints>
        <footprint-reference xlink:href="http://..."
          geometry-type="Polygon" num-points="4632"
          primary="true"/>
      </footprints>
      <classes>
        <class thesaurus="ADL Feature Type Thesaurus"
          primary="true">populated places</class>
      </classes>
    </gazetteer-standard-report>
  </standard-reports>

</query-response>

</gazetteer-service>

Finally, here's a possible error response to the above request:

<?xml version="1.0" encoding="UTF-8"?>

<gazetteer-service
  xmlns="http://www.alexandria.ucsb.edu/gazetteer"
  version="1.2">

<query-response>

  <error>
    <code>-908</code>
    <description>Database connection failure.</description>
  </error>

</query-response>

</gazetteer-service>

Reports

The ADL gazetteer protocol is defined in terms of the relatively simple abstract model given in Gazetteer model above. In practice, however, gazetteer implementations will typically be able to represent more elaborate information about geographic places and model more complex relationships between and among gazetteer entries and attributes. To allow clients to take advantage of such information in a structured manner, the gazetteer protocol defines two transfer formats for gazetteer entries: the standard report and the extended report.

The extended report of a gazetteer entry is a gazetteer-specific format; its actual structure is undefined by the gazetteer protocol. The intention is that all of the information a gazetteer possesses about an entry be representable by the format. If a gazetteer supports extended reports, the report format must be defined by an XML schema; see Capabilities below.

The standard report of a gazetteer entry corresponds to the abstract gazetteer model. An XML schema for the report format is listed below. The schema defines element <gazetteer-standard-report> in namespace "http://www.alexandria.ucsb.edu/gazetteer". Subelements <identifier>, <codes>, <place-status>, <names>, <footprints>, <classes>, and <relationships> and element attributes primary and status correspond directly to the model.

For the convenience of gazetteer clients, the standard report includes two additional required elements and one additional required attribute. Element <display-name> is the entry's primary name as it is commonly displayed, typically including qualifications. For example, the display name for the city of Las Vegas might be "Las Vegas, Clark County, Nevada". Element <bounding-box> is the bounding box (i.e., the smallest enclosing graticule-aligned rectangle) of the entry's primary footprint. And in the <relationship> element, the target-name attribute is the target gazetteer entry's primary name. In a slight extension to the abstract gazetteer model, the <relationship> element's target-identifier attribute may be omitted, thereby allowing a gazetteer entry to have a relationship to a place not represented in the gazetteer.

Each footprint in a standard report may be described either directly using a <footprint> element or indirectly using a <footprint-reference> element. In the direct case the footprint is defined as a single subelement (the "footprint-defining element") of the <footprint> element. In the indirect case, the footprint-defining element is indirectly referred to by a URL, and the optional geometry-type and num-points attributes can be used to give clients an indication of the size and type of the footprint. Attribute geometry-type, if present, must be the unqualified XML name of the footprint-defining element and num-points must be the number of points in the geometry.

In both of the above cases, the possible footprint-defining elements may be drawn from the Open GIS Consortium's Geography Markup Language (GML), version 2, or from another geometry language supported by the gazetteer; see Capabilities, below. Support for GML is mandatory. GML's footprint-defining elements (<gml:Box> and elements in class gml:_Geometry) are defined in terms of an abstract Cartesian coordinate system, but we mandate here that the coordinate system must be the WGS84 latitude/longitude coordinate system. Specifically, the first (X) coordinate must be longitude in signed decimal degrees east of the Greenwich meridian and the second (Y) coordinate must be latitude in signed decimal degrees north of the equator. Longitudes must be in the range [-180,180] except in a <gml:Box> element, where exactly one of the longitudinal coordinates may be outside this range to indicate that the box crosses the ±180 meridian.

gazetteer-standard-report.xsd
<?xml version="1.0" encoding="UTF-8"?>

<schema xmlns="http://www.w3.org/2001/XMLSchema"
  xmlns:gaz="http://www.alexandria.ucsb.edu/gazetteer"
  xmlns:gml="http://www.opengis.net/gml"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  targetNamespace="http://www.alexandria.ucsb.edu/gazetteer"
  elementFormDefault="qualified">

<include schemaLocation="gazetteer-types.xsd"/>

<import namespace="http://www.opengis.net/gml"
  schemaLocation="geometry.xsd"/>

<import namespace="http://www.w3.org/1999/xlink"
  schemaLocation="xlinks.xsd"/>

<attributeGroup name="qualifiers">
  <attribute name="primary" type="boolean" default="false"/>
  <attribute name="status" type="gaz:status-type"
    default="current"/>
</attributeGroup>

<element name="gazetteer-standard-report">
  <complexType>
    <sequence>

      <element name="identifier" type="string"/>

      <element name="codes" minOccurs="0">
        <complexType>
          <sequence>
            <element name="code" minOccurs="0"
              maxOccurs="unbounded">
              <complexType>
                <simpleContent>
                  <extension base="string">
                    <attribute name="scheme" type="string"
                      use="required"/>
                  </extension>
                </simpleContent>
              </complexType>
            </element>
          </sequence>
        </complexType>
      </element>

      <element name="place-status" type="gaz:status-type"/>

      <element name="display-name" type="string"/>

      <element name="names">
        <complexType>
          <sequence>
            <element name="name" maxOccurs="unbounded">
              <complexType>
                <simpleContent>
                  <extension base="string">
                    <attributeGroup ref="gaz:qualifiers"/>
                  </extension>
                </simpleContent>
              </complexType>
            </element>
          </sequence>
        </complexType>
      </element>

      <element name="bounding-box" type="gml:BoxType"/>

      <element name="footprints">
        <complexType>
          <choice maxOccurs="unbounded">
            <element name="footprint">
              <complexType>
                <choice>
                  <element ref="gml:_Geometry"/>
                  <element ref="gml:Box"/>
                  <element name="other-footprint">
                    <complexType>
                      <sequence>
                        <any processContents="lax"/>
                      </sequence>
                    </complexType>
                  </element>
                </choice>
                <attributeGroup ref="gaz:qualifiers"/>
              </complexType>
            </element>
            <element name="footprint-reference">
              <complexType>
                <attributeGroup ref="xlink:locatorLink"/>
                <attribute name="geometry-type">
                  <simpleType>
                    <restriction base="string">
                      <enumeration value="Box"/>
                      <enumeration value="Point"/>
                      <enumeration value="LineString"/>
                      <enumeration value="Polygon"/>
                      <enumeration value="MultiPoint"/>
                      <enumeration
                        value="MultiLineString"/>
                      <enumeration value="MultiPolygon"/>
                      <enumeration value="other"/>
                    </restriction>
                  </simpleType>
                </attribute>
                <attribute name="num-points"
                  type="positiveInteger"/>
                <attributeGroup ref="gaz:qualifiers"/>
              </complexType>
            </element>
          </choice>
        </complexType>
      </element>

      <element name="classes" minOccurs="0">
        <complexType>
          <sequence>
            <element name="class" minOccurs="0"
              maxOccurs="unbounded">
              <complexType>
                <simpleContent>
                  <extension base="string">
                    <attribute name="thesaurus"
                      type="string" use="required"/>
                    <attributeGroup ref="gaz:qualifiers"/>
                  </extension>
                </simpleContent>
              </complexType>
            </element>
          </sequence>
        </complexType>
      </element>

      <element name="relationships" minOccurs="0">
        <complexType>
          <sequence>
            <element name="relationship" minOccurs="0"
              maxOccurs="unbounded">
              <complexType>
                <attribute name="relation" type="string"
                  use="required"/>
                <attribute name="target-name" type="string"
                  use="required"/>
                <attribute name="target-identifier"
                  type="string"/>
              </complexType>
            </element>
          </sequence>
        </complexType>
      </element>

    </sequence>
  </complexType>
</element>

</schema>

Here's an example of a standard report with an indirect footprint:

<?xml version="1.0" encoding="UTF-8"?>

<gazetteer-standard-report
  xmlns="http://www.alexandria.ucsb.edu/gazetteer"
  xmlns:gml="http://www.opengis.net/gml"
  xmlns:xlink="http://www.w3.org/1999/xlink">

<identifier>1001652</identifier>

<codes>
  <code scheme="FIPS 55-3">40000</code>
</codes>

<place-status>current</place-status>

<display-name>Las Vegas, Nevada</display-name>

<names>
  <name primary="true">Las Vegas</name>
  <name status="former">Sin City</name>
</names>

<bounding-box>
  <gml:coord>
    <gml:X>-115.25</gml:X>
    <gml:Y>36.15</gml:Y>
  </gml:coord>
  <gml:coord>
    <gml:X>-115.12</gml:X>
    <gml:Y>36.25</gml:Y>
  </gml:coord>
</bounding-box>

<footprints>
  <footprint-reference xlink:href="http://..."
    geometry-type="Polygon" num-points="4632"
    primary="true"/>
</footprints>

<classes>
  <class thesaurus="ADL Feature Type Thesaurus"
    primary="true">cities</class>
</classes>

<relationships>
  <relationship relation="principal-city-of"
    target-name="Nevada" target-identifier="1241232"/>
</relationships>

</gazetteer-standard-report>

The footprint corresponding to the above example might like something like this:

<?xml version="1.0" encoding="UTF-8"?>

<Polygon xmlns="http://www.opengis.net/gml">
  <outerBoundaryIs>
    <LinearRing>
      <coordinates>-115.12,36.25 -115.17,...</coordinates>
    </LinearRing>
  </outerBoundaryIs>
</Polygon>

Query language

The query service, described under Services above, returns all gazetteer entries that satisfy one or more constraints placed against entry attributes. The constraints are expressed in the form of a language.

The gazetteer query language consists of boolean combinations (and, or, and and not) of seven types of queries. Support for any given type of query is optional. The query types are as follows:

identifier-query identifier

Returns the gazetteer entry identified by identifier.

code-query [scheme] code

Returns the gazetteer entry identified by code code. If scheme is given, it indicates the code's scheme, and matching occurs only against like codes; otherwise, matching occurs against all codes. A code query in which the scheme is unsupported or unrecognized by the gazetteer must not be treated as erroneous, but should simply yield zero results.

place-status-query status

Returns all gazetteer entries whose place status matches status, which must be former, current, or proposed.

name-query operator text

Returns all gazetteer entries having at least one name that matches text according to text-matching operator operator. If a gazetteer supports name queries, it must support the following operator:

equals
A gazetteer entry name matches text if it equals text, ignoring insignificant differences in whitespace.

Other text-matching operators gazetteers are encouraged to support include:

contains-all-words
A gazetteer entry name matches text if it contains all of the words in text. For example, entry name "San Luis Obispo" matches text "obispo luis" under this operator.
contains-any-words
A gazetteer entry name matches text if it contains any of the words in text. For example, entry name "Hope Ranch" matches text "hope" under this operator.
contains-phrase
A gazetteer entry name matches text if it contains all of the words in text in the same consecutive order. For example, entry name "Black Forest Drive" matches text "forest drive" under this operator, but entry names "Forest Lake Drive" and "Drive Forest" do not.
matches-pattern
A gazetteer entry name matches text if it matches text when the latter is treated as a regular expression. Specifically, an asterisk ("*") in text matches zero or more characters and a question mark ("?") matches any single character. Note that a gazetteer implementation may limit the regular expressions it accepts. For example, a gazetteer may support right truncation only (i.e., it may accept asterisks only at the end of text).

The semantics of all of the above operators have deliberately been left somewhat fuzzy to accommodate differing implementations. Specifically, exactly what constitutes a word is left undefined, and it is unspecified whether the gazetteer implementation employs word stemming or other fuzzy word matching techniques. In any case, the above operators should be case-insensitive.

footprint-query operator {polygon|box|identifier}

Returns all gazetteer entries having a footprint that matches a query region according to spatial operator operator. (If a gazetteer entry has multiple footprints, it is unspecified by the protocol which footprint(s) are used for matching.) The query region may take any of the three forms listed next; note that support for any given form is optional.

polygon
A simple polygon with geodesic edges, defined in WGS84 latitude/longitude coordinates.
box
A rectangle whose edges are aligned with the WGS84 latitude/longitude graticule.
identifier
One of the footprints of the gazetteer entry identified by identifier (which footprint is unspecified).

If a gazetteer supports footprint queries, it must support the following operator:

within
A gazetteer entry footprint matches the query region if the footprint is a subset of the region.

Other spatial operators gazetteers are encouraged to support include:

contains
A gazetteer entry footprint matches the query region if the footprint is a superset of the region.
overlaps
A gazetteer entry footprint matches the query region if the footprint intersects the region.

A gazetteer implementation may limit the query regions it accepts. For example, an implementation may disallow polygons that enclose a pole. Also, an implementation may support matching on footprint bounding boxes only.

class-query thesaurus term

Returns all gazetteer entries belonging to class term, or any subclass of term recursively (if the gazetteer supports subclasses or thesaurus relationships), where term is a term drawn from a thesaurus or simple vocabulary associated with the gazetteer. For example, if class "capital cities" is a subclass (i.e., a specialization) of class "cities", then a class query of "cities" will return all cities (capital and not) whereas a query of "capital cities" will return only capital cities.

relationship-query relation target-identifier

Returns all gazetteer entries having relationship relation to a target gazetteer entry identified by target-identifier. Note that a gazetteer must not consider a relationship query with an inappropriate target to be malformed or erroneous. For example, suppose a gazetteer supports the capital-of relationship, but only for target gazetteer entries that are countries. A relationship query in which the target is a cemetery is not to be considered malformed, but should simply yield zero results.

Clients should be aware that a gazetteer implementation may not be able to search over all attribute values of a gazetteer entry. For example, an implementation may be able to search over primary names only.

An XML schema for the gazetteer query language is listed below. The schema defines element <gazetteer-query> in namespace "http://www.alexandria.ucsb.edu/gazetteer". Subelements <identifier-query>, <code-query>, <place-status-query>, <name-query>, <footprint-query>, <class-query>, and <relationship-query> correspond to the query types described above. The elements <and>, <or>, and <and-not> support boolean combinations of queries.

Query regions in footprint queries may be specified using the Open GIS Consortium's Geography Markup Language (GML), version 2, or another geometry language supported by the gazetteer; see Capabilities, below. Support for GML is mandatory. GML defines the <gml:Box> and <gml:Polygon> elements in terms of an abstract Cartesian coordinate system, but we mandate here that the coordinate system must be the WGS84 latitude/longitude coordinate system. Specifically, the first (X) coordinate must be longitude in signed decimal degrees east of the Greenwich meridian and the second (Y) coordinate must be latitude in signed decimal degrees north of the equator. Longitudes must be in the range [-180,180] except in a <gml:Box> element, where exactly one of the longitudinal coordinates may be outside this range to indicate that the box crosses the ±180 meridian.

gazetteer-query.xsd
<?xml version="1.0" encoding="UTF-8"?>

<schema xmlns="http://www.w3.org/2001/XMLSchema"
  xmlns:gaz="http://www.alexandria.ucsb.edu/gazetteer"
  xmlns:gml="http://www.opengis.net/gml"
  targetNamespace="http://www.alexandria.ucsb.edu/gazetteer"
  elementFormDefault="qualified">

<include schemaLocation="gazetteer-types.xsd"/>

<import namespace="http://www.opengis.net/gml"
  schemaLocation="geometry.xsd"/>

<element name="gazetteer-query">
  <complexType>
    <sequence>
      <group ref="gaz:query"/>
    </sequence>
  </complexType>
</element>

<group name="query">
  <choice>
    <element ref="gaz:identifier-query"/>
    <element ref="gaz:code-query"/>
    <element ref="gaz:place-status-query"/>
    <element ref="gaz:name-query"/>
    <element ref="gaz:footprint-query"/>
    <element ref="gaz:class-query"/>
    <element ref="gaz:relationship-query"/>
    <element ref="gaz:and"/>
    <element ref="gaz:or"/>
    <element ref="gaz:and-not"/>
  </choice>
</group>

<element name="identifier-query">
  <complexType>
    <attribute name="identifier" type="string"
      use="required"/>
  </complexType>
</element>

<element name="code-query">
  <complexType>
    <attribute name="scheme" type="string"/>
    <attribute name="code" type="string" use="required"/>
  </complexType>
</element>

<element name="place-status-query">
  <complexType>
    <attribute name="status" type="gaz:status-type"
      use="required"/>
  </complexType>
</element>

<element name="name-query">
  <complexType>
    <attribute name="operator" use="required">
      <simpleType>
        <restriction base="string">
          <enumeration value="contains-all-words"/>
          <enumeration value="contains-any-words"/>
          <enumeration value="contains-phrase"/>
          <enumeration value="equals"/>
          <enumeration value="matches-pattern"/>
        </restriction>
      </simpleType>
    </attribute>
    <attribute name="text" type="string" use="required"/>
  </complexType>
</element>

<element name="footprint-query">
  <complexType>
    <choice>
      <element ref="gml:Box"/>
      <element ref="gml:Polygon"/>
      <element name="identifier" type="string"/>
      <element name="other-region">
        <complexType>
          <sequence>
            <any processContents="lax"/>
          </sequence>
        </complexType>
      </element>
    </choice>
    <attribute name="operator" use="required">
      <simpleType>
        <restriction base="string">
          <enumeration value="contains"/>
          <enumeration value="overlaps"/>
          <enumeration value="within"/>
        </restriction>
      </simpleType>
    </attribute>
  </complexType>
</element>

<element name="class-query">
  <complexType>
    <attribute name="thesaurus" type="string"
      use="required"/>
    <attribute name="term" type="string" use="required"/>
  </complexType>
</element>

<element name="relationship-query">
  <complexType>
    <attribute name="relation" type="string"
      use="required"/>
    <attribute name="target-identifier" type="string"
      use="required"/>
  </complexType>
</element>

<element name="and">
  <complexType>
    <sequence>
      <group ref="gaz:query" maxOccurs="unbounded"/>
    </sequence>
  </complexType>
</element>

<element name="or">
  <complexType>
    <sequence>
      <group ref="gaz:query" maxOccurs="unbounded"/>
    </sequence>
  </complexType>
</element>

<element name="and-not">
  <complexType>
    <sequence>
      <group ref="gaz:query" minOccurs="2" maxOccurs="2"/>
    </sequence>
  </complexType>
</element>

</schema>

An example of a gazetteer query is shown below. This example requests all currently existing places whose names contain the phrase "santa barbara" and that overlap a given spatial region, and that are neither populated places nor cemeteries. A place named "Santa Barbara County Hospital" might match such a query.

<?xml version="1.0" encoding="UTF-8"?>

<gazetteer-query
  xmlns="http://www.alexandria.ucsb.edu/gazetteer"
  xmlns:gml="http://www.opengis.net/gml">

<and-not>
  <and>
    <place-status-query status="current"/>
    <name-query operator="contains-phrase"
      text="santa barbara"/>
    <footprint-query operator="overlaps">
      <gml:Box>
        <gml:coordinates>-140,30 110,35</gml:coordinates>
      </gml:Box>
    </footprint-query>
  </and>
  <or>
    <class-query thesaurus="ADL Feature Type Thesaurus"
      term="populated places"/>
    <class-query thesaurus="ADL Feature Type Thesaurus"
      term="cemeteries"/>
  </or>
</and-not>

</gazetteer-query>

Capabilities

The get-capabilities service described under Services above returns a description of a gazetteer's overall capabilities. An XML schema for the description is listed below. The schema defines element <gazetteer-capabilities> in namespace "http://www.alexandria.ucsb.edu/gazetteer". Within this element are the following subelements:

<version>
The version of the gazetteer protocol the gazetteer supports.
<name>
The gazetteer's name, if it has one.
<description>
Optionally, a human-readable description of the gazetteer. It is suggested that the description include: the scope and purpose of the gazetteer; details of the gazetteer's interpretation and implementation of the protocol; appropriate usage guidelines; and rights and liability clauses.
<ADL-collection-metadata>
Optionally, the URL of the ADL collection metadata for the gazetteer, which gives synoptic and statistical views of the gazetteer's content.
<extended-report-schema>
If the gazetteer supports extended reports, the URL of the reports' XML schema.
<code-schemes>
The code schemes the gazetteer supports. Each code scheme is described by a name and, optionally, a URL that leads to a description of the scheme.
<thesauri>
The thesauri (or simple vocabularies) the gazetteer uses to classify its entries. Each thesaurus is described by a name and the URL of its ADL Thesaurus Protocol interface.
<relationships>
The names of the relationships the gazetteer is capable of representing.
<other-geometry-languages>
The geometry languages the gazetteer supports (other than GML, which is required). Each language is described by an XML namespace.
<services>
The services the gazetteer supports.
<maximum-query-results>
If present, the maximum number of results the gazetteer returns in response to a query; if absent or zero, query results are not specifically limited in number (though they may still be limited by other metrics, such as query processing time).
<query-types>
The types of queries the gazetteer supports.
<name-query-operators>
If the gazetteer supports name queries, the text-matching operators the gazetteer supports.
<footprint-query-operators> and <footprint-query-operands>
If the gazetteer supports footprint queries, the spatial operators and geometry types the gazetteer supports.
gazetteer-capabilities.xsd
<?xml version="1.0" encoding="UTF-8"?>

<schema xmlns="http://www.w3.org/2001/XMLSchema"
  xmlns:gaz="http://www.alexandria.ucsb.edu/gazetteer"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  targetNamespace="http://www.alexandria.ucsb.edu/gazetteer"
  elementFormDefault="qualified">

<import namespace="http://www.w3.org/1999/xlink"
  schemaLocation="xlinks.xsd"/>

<element name="gazetteer-capabilities">
  <complexType>
    <sequence>

      <element name="version" type="string"/>

      <element name="name" type="string" minOccurs="0"/>

      <element name="description" type="string"
        minOccurs="0"/>

      <element name="ADL-collection-metadata" minOccurs="0">
        <complexType>
          <attributeGroup ref="xlink:locatorLink"/>
        </complexType>
      </element>

      <element name="extended-report-schema" minOccurs="0">
        <complexType>
          <attributeGroup ref="xlink:locatorLink"/>
        </complexType>
      </element>

      <element name="code-schemes" minOccurs="0">
        <complexType>
          <sequence>
            <element name="scheme" minOccurs="0"
              maxOccurs="unbounded">
              <complexType>
                <attribute name="name" type="string"
                  use="required"/>
                <attributeGroup ref="xlink:simpleLink"/>
              </complexType>
            </element>
          </sequence>
        </complexType>
      </element>

      <element name="thesauri" minOccurs="0">
        <complexType>
          <sequence>
            <element name="thesaurus" minOccurs="0"
              maxOccurs="unbounded">
              <complexType>
                <attribute name="name" type="string"
                  use="required"/>
                <attributeGroup ref="xlink:locatorLink"/>
              </complexType>
            </element>
          </sequence>
        </complexType>
      </element>

      <element name="relationships" minOccurs="0">
        <complexType>
          <sequence>
            <element name="relationship" type="string"
              minOccurs="0" maxOccurs="unbounded"/>
          </sequence>
        </complexType>
      </element>

      <element name="other-geometry-languages"
        minOccurs="0">
        <complexType>
          <sequence>
            <element name="geometry-language" minOccurs="0"
              maxOccurs="unbounded">
              <complexType>
                <attribute name="namespace" type="anyURI"/>
              </complexType>
            </element>
          </sequence>
        </complexType>
      </element>

      <element name="services">
        <complexType>
          <attribute name="get-capabilities" type="boolean"
            fixed="true"/>
          <attribute name="query" type="boolean"
            default="false"/>
          <attribute name="download" type="boolean"
            default="false"/>
        </complexType>
      </element>

      <element name="maximum-query-results"
        type="nonNegativeInteger" minOccurs="0"/>

      <element name="query-types" minOccurs="0">
        <complexType>
          <attribute name="identifier" type="boolean"
            default="false"/>
          <attribute name="place-status" type="boolean"
            default="false"/>
          <attribute name="name" type="boolean"
            default="false"/>
          <attribute name="footprint" type="boolean"
            default="false"/>
          <attribute name="class" type="boolean"
            default="false"/>
          <attribute name="relationship" type="boolean"
            default="false"/>
        </complexType>
      </element>

      <element name="name-query-operators" minOccurs="0">
        <complexType>
          <attribute name="contains-all-words"
            type="boolean" default="false"/>
          <attribute name="contains-any-words"
            type="boolean" default="false"/>
          <attribute name="contains-phrase"
            type="boolean" default="false"/>
          <attribute name="equals" type="boolean"
            fixed="true"/>
          <attribute name="matches-pattern"
            type="boolean" default="false"/>
        </complexType>
      </element>

      <element name="footprint-query-operators"
        minOccurs="0">
        <complexType>
          <attribute name="contains" type="boolean"
            default="false"/>
          <attribute name="overlaps" type="boolean"
            default="false"/>
          <attribute name="within" type="boolean"
            fixed="true"/>
        </complexType>
      </element>

      <element name="footprint-query-operands"
        minOccurs="0">
        <complexType>
          <attribute name="box" type="boolean"
            default="false"/>
          <attribute name="identifier" type="boolean"
            default="false"/>
          <attribute name="polygon" type="boolean"
            default="false"/>
        </complexType>
      </element>

    </sequence>
  </complexType>
</element>

</schema>

Here's an example of a gazetteer capabilities description:

<?xml version="1.0" encoding="UTF-8"?>

<gazetteer-capabilities
  xmlns="http://www.alexandria.ucsb.edu/gazetteer"
  xmlns:xlink="http://www.w3.org/1999/xlink">

<version>1.2</version>

<name>ADL Gazetteer</name>

<description>This gazetteer...</description>

<ADL-collection-metadata xlink:href="http://..."/>

<extended-report-schema xlink:href="http://..."/>

<code-schemes>
  <scheme name="FIPS 55-3"
    xlink:href="http://www.itl.nist.gov/fipspubs/fip55-3.htm"/>
</code-schemes>

<thesauri>
  <thesaurus name="ADL Feature Type Thesaurus"
    xlink:href="http://www.alexandria.ucsb.edu/..."/>
</thesauri>

<relationships>
  <relationship>adjacent-to</relationship>
  <relationship>capital-of</relationship>
</relationships>

<other-geometry-languages>
  <geometry-language
    namespace="http://www.esri.com/ArcXML"/>
</other-geometry-languages>

<services query="true"/>

<maximum-query-results>100</maximum-query-results>

<query-types identifier="true" name="true" footprint="true"
  class="true"/>

<name-query-operators contains-all-words="true"
  contains-any-words="true" contains-phrase="true"/>

<footprint-query-operators contains="true"/>

<footprint-query-operands box="true" identifier="true"/>

</gazetteer-capabilities>

Schemas

The gazetteer protocol is formally defined by a main XML schema:

and four subschemas:

the last of which is displayed below.

Applications that use this protocol can and should reference only the main schema, as it implicitly includes the others. The canonical URL prefix at which all protocol schemas reside is "http://www.alexandria.ucsb.edu/gazetteer/protocol/".

gazetteer-types.xsd
<?xml version="1.0" encoding="UTF-8"?>

<schema xmlns="http://www.w3.org/2001/XMLSchema"
  xmlns:gaz="http://www.alexandria.ucsb.edu/gazetteer"
  targetNamespace="http://www.alexandria.ucsb.edu/gazetteer"
  elementFormDefault="qualified">

<simpleType name="report-format-type">
  <restriction base="string">
    <enumeration value="standard"/>
    <enumeration value="extended"/>
  </restriction>
</simpleType>

<simpleType name="status-type">
  <restriction base="string">
    <enumeration value="former"/>
    <enumeration value="current"/>
    <enumeration value="proposed"/>
  </restriction>
</simpleType>

</schema>

Revision history

1.2
Gazetteer model: added the code and place status attributes. Replaced the boolean historical qualifier with the tri-valued status qualifier. Added a note on one-to-one correspondence between gazetteer entries and conceptual places.
Services: removed the update services (add-entry, relate-entries, and remove-entry). Changed the success indicator from a nil <error> subelement to the absence of an <error> subelement.
Reports: added the <codes>, <place-status>, and <display-name> subelements. Replaced the historical attribute with the tri-valued status attribute. Renamed the <relationship> subelement's name and identifier attributes to relation and target-identifier, respectively, and added the target-name attribute. Relaxed the requirement that the target of a relationship must be another gazetteer entry; it may now be just a name. Added a note on the interpretation of out-of-range longitudinal coordinates.
Query language: added two query types, <code-query> and <place-status-query>. Renamed the <relationship-query> attributes to relation and target-identifier. Added a note on the interpretation of out-of-range longitudinal coordinates.
Capabilities: added subelements <name>, <ADL-collection-metadata>, <code-schemes>, and <maximum-query-results>. Added a place-status attribute to the <query-types> subelement. Removed the add-entry, relate-entries, and remove-entry attributes from the <services> subelement.
Schemas: new section.
Other: numerous documentation changes and clarifications throughout.
1.1
Swapped the interpretation of the GML first (X) and second (Y) coordinates. Added a <description> subelement to the <gazetteer-capabilities> element.
1.0a
Clarified the meaning of a gazetteer entry having more than one footprint. Other, minor changes.
1.0
Original version.

created 2003-09-19; last modified 2009-11-19 21:17