The ADL Gazetteer Protocol

Greg Janée
Alexandria Digital Library Project

Linda L. Hill
Center for Global Georeferencing Research

Version 1.1


Contents

Introduction

This document describes a protocol for accessing general-purpose gazetteer services.

A gazetteer is a dictionary of geographic placenames. Gazetteers have traditionally appeared as back-of-the-book indexes in atlases; as place encyclopedias, such as the Columbia Gazetteer of the World; as thesauri, such as the Getty Thesaurus of Geographic Names; and as toponymic authority files, such as NIMA's GEOnet Names Server and the U.S. Geological Survey's Geographic Names Information System. In an atlas, a gazetteer provides an alphabetical list of the placenames that appear in the atlas, and it maps those names to page numbers and map grid locations. Place encyclopedias often include descriptive information for locations, as does the Getty Thesaurus, and sometimes include latitude/longitude coordinates as well. Toponymic authority files focus on differentiating official placenames versus variant names, and they associate names with coordinate locations primarily for disambiguation purposes. Other toponymic reference works publish scholarly information about the origins of geographic names.

A digital gazetteer builds on these traditional gazetteers. It maps geographic placenames (the names of natural features such as mountains and lakes and the names of human constructs such as cities and states) to coordinate-based geographic locations. The services it provides are largely oriented around searching: answering "Where is...?" queries given all or a portion of a geographic name ("Where is the place named 'Santa Barbara'?") and "What's there?" queries which return all places, or all places of a specified class, within a given region ("What schools are in Santa Barbara County?"). Digital gazetteers augment traditional gazetteers by providing bidirectional mappings among placenames, map locations, and classifications. And they expand on the notion of named geographic features to include virtually any category of feature that can be geolocated (e.g., weather events such as hurricanes), any type of name or label for a place (e.g., postal codes and UTM grid names), and names with only local or specialized scope (e.g., research study areas). Descriptive information and associated data (e.g., population and elevation) can also be included in digital gazetteers. The ADL gazetteer protocol builds on this generalized concept of what a gazetteer is.

This document first semi-formally defines an abstract model of a gazetteer. That model is then used as the basis for defining a set of services (i.e., a set of network-invokable functions), several report formats, and a query language.

A caveat: the gazetteer protocol described herein provides relatively low-level services. The services are intended to be simple enough that they can be implemented by all gazetteers, yet powerful enough to be useful to clients both in their own right and for combining into higher-level services. To get a sense of the level of this specification, consider the common gazetteer functionality of finding places by entering qualified placenames, as in "find 'Santa Barbara, California'". The ADL gazetteer protocol does not provide such high-level functionality, but it does provide sufficient building blocks for achieving that functionality. Specifically, the protocol supports 1) finding a place named "California" belonging to class "states"; 2) disambiguation in the case of multiple returns; and 3) finding a place named "Santa Barbara" that is contained within the place named "California".

Gazetteer model

In this section we semi-formally define an abstract model of a gazetteer. The ADL gazetteer protocol is built on (i.e., is written against) this model.

A gazetteer is a set of gazetteer entries. A gazetteer entry describes a single geographic place by an identifier and several key attributes of the place: one or more names, one or more footprints, and zero or more classes. There is no intrinsic structure to a gazetteer beyond simple containment of gazetteer entries, although relationships between entries may be explicitly represented by the gazetteer (see below).

An identifier is a string that unambiguously identifies the entry within the gazetteer. The identifier need not be universally unique.

A name is a complete, unqualified name for the place. For example, the name of the city of Los Angeles is "Los Angeles", not "Los Angeles, California". A gazetteer entry can have more than one name, in which case the names may denote alternative names for the place (e.g., the city "Köln" is also known as "Cologne") or varying names over time (e.g., the country "Thailand" was formerly known as "Siam").

A footprint is an approximation, expressed in latitude/longitude coordinates, of the subset of the Earth's surface occupied by the place. Note that a footprint need not be contiguous. For example, a footprint for the state of Hawaii might consist of a union of disjoint polygons, one per island. A gazetteer entry can have more than one footprint, in which case the footprints must represent different approximations or resolutions of the same conceptual footprint.

A class classifies the place with respect to a set of terms. More specifically, a class is the association of the place with a term drawn from a simple vocabulary of terms or thesaurus (a vocabulary augmented with inter-term relationships). A gazetteer entry may belong to multiple classes, and even to multiple classes from the same thesaurus. Note that if a gazetteer consists of a single class of places (consider "The Knopf Gazetteer of Cemeteries of the Southwest"), its entries will not be considered to be classified for the purposes of this specification unless each entry carries the classification for searching and reporting purposes.

Each attribute of a gazetteer entry (i.e., each name, each footprint, and each class) may be qualified as being primary (i.e., the attribute is the preferred or official value) and/or historical (the attribute is known to not be currently valid). For example, a gazetteer entry for the city Köln may mark the name "Köln" as primary but not "Cologne"; a gazetteer entry for the country Thailand may mark the name "Siam" as historical.

For each gazetteer entry, the following conditions on qualifiers must hold:

  • Exactly one name must be marked as primary.
  • Exactly one footprint must be marked as primary.
  • If the entry has been classified, at least one class must be marked as primary.

Finally, a gazetteer may be augmented with inter-entry relationships. A relationship is a named, directed, binary association between gazetteer entries. For example, a gazetteer might support a capital-of relationship which relates capital cities and administrative areas: the city of Sacramento is the capital of the state of California, and so on. (The ADL gazetteer protocol defines the necessary structures to support relationships in general, but it does not define any particular relationships, just as it does not define any particular classification scheme.)

Services

Functionally speaking, the ADL gazetteer protocol consists of the following six independent, stateless services. Each service follows the classical model of function invocation: zero or more arguments are passed to the service, the service executes synchronously, and a result and/or an error indication is returned. Support for the get-capabilities service is mandatory; all other services are optional. Clients should anticipate that gazetteers may apply different access control policies to different services.

capabilities description <- get-capabilities()

Returns a description of the overall capabilities of the gazetteer (the services and query types the gazetteer supports, the thesauri the gazetteer uses, etc.). See Capabilities below.

reports <- query(query, {"standard" | "extended"} [, geometry language])

Returns reports for the gazetteer entries selected by a query. query is a query expressed in the gazetteer query language; see Query language below. Either standard or extended reports may be returned; see Reports below. The geometry language used in the reports may optionally be requested. The geometry language(s) and the subset of the query language that the gazetteer supports are described in the gazetteer's capabilities; see Capabilities below. Clients should anticipate that a gazetteer may return an error indication in response to a nominally supported query due to implementation limitations. Also, a gazetteer may return both reports and an error indication, as when an internal result limit is reached during otherwise successful query processing.

reports <- download({"standard" | "extended"} [, geometry language])

Similar to the query service, the download service returns standard or extended reports for every entry in the gazetteer.

identifier <- add-entry({standard report | extended report})

Adds an entry to the gazetteer and returns the identifier of the new entry. The entry's attributes are specified by a standard or extended report; see Reports below. (The identifier in the report, if any, is ignored.) A gazetteer may disallow addition of entries using standard reports.

relate-entries(relationship, identifier1, identifier2)

Creates a relationship named relationship between the gazetteer entries identified by identifier1 and identifier2. The relationship must be one of the relationships supported by the gazetteer; see Capabilities below.

remove-entry(identifier)

Removes the entry identified by identifier from the gazetteer. All relationships that reference the removed entry are removed as well.

An XML-over-HTTP implementation of the services is described next. In this formulation, a gazetteer service is invoked by submitting an HTTP POST request to a URL representing the gazetteer's common access point for all services. The format and discovery of this URL are outside the scope of this document.

Both service requests and service responses must have MIME content type text/xml and consist of a single <gazetteer-service> element in namespace "http://www.alexandria.ucsb.edu/gazetteer". The version attribute of this element indicates the version of the gazetteer protocol used by the client (in requests) or the gazetteer implementation (in responses).

In a service request, the <gazetteer-service> element must contain a single subelement expressing the request. Subelement <S-request> corresponds to service S above, e.g., subelement <get-capabilities-request> corresponds to the get-capabilities service. Arguments to the request, if any, are encoded as subelements of the request subelement.

In a service response, the <gazetteer-service> element must contain a single subelement containing the response. Similar to requests, subelement <S-response> corresponds to service S. Each response subelement contains optional, service-specific, "normal" content (e.g., reports in the case of the query service) and a mandatory <error> subelement, the latter of which is nillable. A successful response is indicated by the presence of normal content and a nil <error> element, while a non-nil <error> element indicates an error and describes it by an implementation-specific code and/or text description. An implementation may return both normal content and an error, such as when a query is successfully processed and results are successfully returned, but the number of results returned is limited due to an implementation constraint.

Gazetteer implementations should generally return HTTP status code 200 (OK), and should use HTTP error codes only for low-level errors such as syntactically malformed requests and authentication problems. Higher-level errors should be returned using the mechanism described above.

gazetteer-service.xsd
<?xml version="1.0" encoding="UTF-8"?>

<schema xmlns="http://www.w3.org/2001/XMLSchema"
  xmlns:gaz="http://www.alexandria.ucsb.edu/gazetteer"
  targetNamespace="http://www.alexandria.ucsb.edu/gazetteer"
  elementFormDefault="qualified">

<include schemaLocation="gazetteer-capabilities.xsd"/>
<include schemaLocation="gazetteer-query.xsd"/>
<include schemaLocation="gazetteer-standard-report.xsd"/>

<element name="gazetteer-service">
  <complexType>
    <choice>
      <element ref="gaz:get-capabilities-request"/>
      <element ref="gaz:get-capabilities-response"/>
      <element ref="gaz:query-request"/>
      <element ref="gaz:query-response"/>
      <element ref="gaz:download-request"/>
      <element ref="gaz:download-response"/>
      <element ref="gaz:add-entry-request"/>
      <element ref="gaz:add-entry-response"/>
      <element ref="gaz:relate-entries-request"/>
      <element ref="gaz:relate-entries-response"/>
      <element ref="gaz:remove-entry-request"/>
      <element ref="gaz:remove-entry-response"/>
    </choice>
    <attribute name="version" type="string" use="required"/>
  </complexType>
</element>

<element name="get-capabilities-request">
  <complexType/>
</element>

<element name="get-capabilities-response">
  <complexType>
    <sequence>
      <element ref="gaz:gazetteer-capabilities"
        minOccurs="0"/>
      <element ref="gaz:error"/>
    </sequence>
  </complexType>
</element>

<element name="query-request">
  <complexType>
    <sequence>
      <element ref="gaz:gazetteer-query"/>
      <element name="report-format">
        <simpleType>
          <restriction base="string">
            <enumeration value="standard"/>
            <enumeration value="extended"/>
          </restriction>
        </simpleType>
      </element>
      <element name="geometry-language" type="anyURI"
        minOccurs="0"/>
    </sequence>
  </complexType>
</element>

<element name="query-response">
  <complexType>
    <sequence>
      <choice minOccurs="0">
        <element name="standard-reports">
          <complexType>
            <sequence>
              <element ref="gaz:gazetteer-standard-report"
                minOccurs="0" maxOccurs="unbounded"/>
            </sequence>
          </complexType>
        </element>
        <element name="extended-reports">
          <complexType>
            <sequence>
              <any processContents="lax" minOccurs="0"
                maxOccurs="unbounded"/>
            </sequence>
          </complexType>
        </element>
      </choice>
      <element ref="gaz:error"/>
    </sequence>
  </complexType>
</element>

<element name="download-request">
  <complexType>
    <sequence>
      <element name="report-format">
        <simpleType>
          <restriction base="string">
            <enumeration value="standard"/>
            <enumeration value="extended"/>
          </restriction>
        </simpleType>
      </element>
      <element name="geometry-language" type="anyURI"
        minOccurs="0"/>
    </sequence>
  </complexType>
</element>

<element name="download-response">
  <complexType>
    <sequence>
      <choice minOccurs="0">
        <element name="standard-reports">
          <complexType>
            <sequence>
              <element ref="gaz:gazetteer-standard-report"
                minOccurs="0" maxOccurs="unbounded"/>
            </sequence>
          </complexType>
        </element>
        <element name="extended-reports">
          <complexType>
            <sequence>
              <any processContents="lax" minOccurs="0"
                maxOccurs="unbounded"/>
            </sequence>
          </complexType>
        </element>
      </choice>
      <element ref="gaz:error"/>
    </sequence>
  </complexType>
</element>

<element name="add-entry-request">
  <complexType>
    <choice>
      <element ref="gaz:gazetteer-standard-report"/>
      <element name="extended-report">
        <complexType>
          <sequence>
            <any processContents="lax"/>
          </sequence>
        </complexType>
      </element>
    </choice>
  </complexType>
</element>

<element name="add-entry-response">
  <complexType>
    <sequence>
      <element name="identifier" type="string"
        minOccurs="0"/>
      <element ref="gaz:error"/>
    </sequence>
  </complexType>
</element>

<element name="relate-entries-request">
  <complexType>
    <sequence>
      <element name="relationship" type="string"/>
      <element name="identifier" type="string" minOccurs="2"
        maxOccurs="2"/>
    </sequence>
  </complexType>
</element>

<element name="relate-entries-response">
  <complexType>
    <sequence>
      <element ref="gaz:error"/>
    </sequence>
  </complexType>
</element>

<element name="remove-entry-request">
  <complexType>
    <sequence>
      <element name="identifier" type="string"/>
    </sequence>
  </complexType>
</element>

<element name="remove-entry-response">
  <complexType>
    <sequence>
      <element ref="gaz:error"/>
    </sequence>
  </complexType>
</element>

<element name="error" nillable="true">
  <complexType>
    <sequence>
      <element name="code" type="string" minOccurs="0"/>
      <element name="description" type="string"
        minOccurs="0"/>
    </sequence>
  </complexType>
</element>

</schema>

An example of a service request is shown below. The request asks a gazetteer for standard reports for all populated places whose names contain the phrase "las vegas".

<?xml version="1.0" encoding="UTF-8"?>

<gazetteer-service
  xmlns="http://www.alexandria.ucsb.edu/gazetteer"
  version="1.1">

<query-request>
  <gazetteer-query>
    <and>
      <name-query operator="contains-phrase"
        text="las vegas"/>
      <class-query thesaurus="ADL Feature Type Thesaurus"
        term="populated places"/>
    </and>
  </gazetteer-query>
  <report-format>standard</report-format>
</query-request>

</gazetteer-service>

A possible successful response to the above request is shown below. The response contains a single standard report for a place named "Las Vegas", also known as "Sin City". The success of the response is indicated by the nillity of the <error> subelement.

<?xml version="1.0" encoding="UTF-8"?>

<gazetteer-service
  xmlns="http://www.alexandria.ucsb.edu/gazetteer"
  xmlns:gml="http://www.opengis.net/gml"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  version="1.1">

<query-response>

  <standard-reports>
    <gazetteer-standard-report>
      <identifier>1001652</identifier>
      <names>
        <name primary="true">Las Vegas</name>
        <name>Sin City</name>
      </names>
      <bounding-box>
        <gml:coord>
          <gml:X>-115.25</gml:X>
          <gml:Y>36.15</gml:Y>
        </gml:coord>
        <gml:coord>
          <gml:X>-115.12</gml:X>
          <gml:Y>36.25</gml:Y>
        </gml:coord>
      </bounding-box>
      <footprints>
        <footprint-reference xlink:href="http://..."
          geometry-type="Polygon" num-points="4632"
          primary="true"/>
      </footprints>
      <classes>
        <class thesaurus="ADL Feature Type Thesaurus"
          primary="true">populated places</class>
      </classes>
    </gazetteer-standard-report>
  </standard-reports>

  <error xsi:nil="true"/>

</query-response>

</gazetteer-service>

Finally, here's a possible error response to the above request:

<?xml version="1.0" encoding="UTF-8"?>

<gazetteer-service
  xmlns="http://www.alexandria.ucsb.edu/gazetteer"
  version="1.1">

<query-response>

  <error>
    <code>-908</code>
    <description>Database connection failure.</description>
  </error>

</query-response>

</gazetteer-service>

Reports

The ADL gazetteer protocol is defined in terms of the abstract model given in Gazetteer model above. In practice, gazetteer implementations will differ from the abstract model, typically by being more complex. To allow clients to take advantage of this potentially richer information in a structured manner, the gazetteer protocol defines two formats for gazetteer entries: the standard report and the extended report.

The extended report of a gazetteer entry is a gazetteer-specific format, and is undefined by the gazetteer protocol. The intention is that all of the information a gazetteer possesses about an entry be representable by the format. If a gazetteer supports extended reports, the report format must be defined by an XML schema; see Capabilities below.

The standard report of a gazetteer entry corresponds to the abstract gazetteer model. An XML schema for the report format is listed below. The schema defines element <gazetteer-standard-report> in namespace "http://www.alexandria.ucsb.edu/gazetteer". Subelements <identifier>, <names>, <footprints>, <classes>, and <relationships> and attributes primary and historical correspond directly to the model.

Each footprint may be described either directly using a <footprint> element or indirectly using a <footprint-reference> element. In the direct case the footprint is defined as a single subelement (the "footprint-defining element") of the <footprint> element. In the indirect case, the footprint-defining element is indirectly referred to by a URL, and the optional geometry-type and num-points attributes can be used to give clients an indication of the size and nature of the footprint. Attribute geometry-type, if present, must be the unqualified XML name of the footprint-defining element and num-points must be the number of points in the geometry.

In both of the above cases, the possible footprint-defining elements may be drawn from the Open GIS Consortium's Geography Markup Language (GML) or from another geometry language supported by the gazetteer; see Capabilities, below. Support for GML is mandatory. GML's footprint-defining elements (<gml:Box> and elements in class gml:_Geometry) are defined in terms of an abstract Cartesian coordinate system, but we mandate here that the coordinate system must be the WGS84 latitude/longitude coordinate system. Specifically, the first (X) coordinate must be longitude in signed decimal degrees east of the Greenwich meridian and the second (Y) coordinate must be latitude in signed decimal degrees north of the equator.

Element <bounding-box> is the bounding box (i.e., the smallest enclosing graticule-aligned rectangle) of the entry's primary footprint.

gazetteer-standard-report.xsd
<?xml version="1.0" encoding="UTF-8"?>

<schema xmlns="http://www.w3.org/2001/XMLSchema"
  xmlns:gaz="http://www.alexandria.ucsb.edu/gazetteer"
  xmlns:gml="http://www.opengis.net/gml"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  targetNamespace="http://www.alexandria.ucsb.edu/gazetteer"
  elementFormDefault="qualified">

<import namespace="http://www.opengis.net/gml"
  schemaLocation="geometry.xsd"/>

<import namespace="http://www.w3.org/1999/xlink"
  schemaLocation="xlinks.xsd"/>

<attributeGroup name="qualifiers">
  <attribute name="primary" type="boolean" default="false"/>
  <attribute name="historical" type="boolean"
    default="false"/>
</attributeGroup>

<element name="gazetteer-standard-report">
  <complexType>
    <sequence>

      <element name="identifier" type="string"/>

      <element name="names">
        <complexType>
          <sequence>
            <element name="name" maxOccurs="unbounded">
              <complexType>
                <simpleContent>
                  <extension base="string">
                    <attributeGroup ref="gaz:qualifiers"/>
                  </extension>
                </simpleContent>
              </complexType>
            </element>
          </sequence>
        </complexType>
      </element>

      <element name="bounding-box" type="gml:BoxType"/>

      <element name="footprints">
        <complexType>
          <choice maxOccurs="unbounded">
            <element name="footprint">
              <complexType>
                <choice>
                  <element ref="gml:_Geometry"/>
                  <element ref="gml:Box"/>
                  <element name="other-footprint">
                    <complexType>
                      <sequence>
                        <any processContents="lax"/>
                      </sequence>
                    </complexType>
                  </element>
                </choice>
                <attributeGroup ref="gaz:qualifiers"/>
              </complexType>
            </element>
            <element name="footprint-reference">
              <complexType>
                <attributeGroup ref="xlink:locatorLink"/>
                <attribute name="geometry-type">
                  <simpleType>
                    <restriction base="string">
                      <enumeration value="Box"/>
                      <enumeration value="Point"/>
                      <enumeration value="LineString"/>
                      <enumeration value="Polygon"/>
                      <enumeration value="MultiPoint"/>
                      <enumeration
                        value="MultiLineString"/>
                      <enumeration value="MultiPolygon"/>
                      <enumeration value="other"/>
                    </restriction>
                  </simpleType>
                </attribute>
                <attribute name="num-points"
                  type="positiveInteger"/>
                <attributeGroup ref="gaz:qualifiers"/>
              </complexType>
            </element>
          </choice>
        </complexType>
      </element>

      <element name="classes" minOccurs="0">
        <complexType>
          <sequence>
            <element name="class" minOccurs="0"
              maxOccurs="unbounded">
              <complexType>
                <simpleContent>
                  <extension base="string">
                    <attribute name="thesaurus"
                      type="string" use="required"/>
                    <attributeGroup ref="gaz:qualifiers"/>
                  </extension>
                </simpleContent>
              </complexType>
            </element>
          </sequence>
        </complexType>
      </element>

      <element name="relationships" minOccurs="0">
        <complexType>
          <sequence>
            <element name="relationship" minOccurs="0"
              maxOccurs="unbounded">
              <complexType>
                <attribute name="name" type="string"
                  use="required"/>
                <attribute name="identifier" type="string"
                  use="required"/>
              </complexType>
            </element>
          </sequence>
        </complexType>
      </element>

    </sequence>
  </complexType>
</element>

</schema>

Here's an example of a standard report with an indirect footprint:

<?xml version="1.0" encoding="UTF-8"?>

<gazetteer-standard-report
  xmlns="http://www.alexandria.ucsb.edu/gazetteer"
  xmlns:gml="http://www.opengis.net/gml"
  xmlns:xlink="http://www.w3.org/1999/xlink">

<identifier>1001652</identifier>

<names>
  <name primary="true">Las Vegas</name>
  <name>Sin City</name>
</names>

<bounding-box>
  <gml:coord>
    <gml:X>-115.25</gml:X>
    <gml:Y>36.15</gml:Y>
  </gml:coord>
  <gml:coord>
    <gml:X>-115.12</gml:X>
    <gml:Y>36.25</gml:Y>
  </gml:coord>
</bounding-box>

<footprints>
  <footprint-reference xlink:href="http://..."
    geometry-type="Polygon" num-points="4632"
    primary="true"/>
</footprints>

<classes>
  <class thesaurus="ADL Feature Type Thesaurus"
    primary="true">cities</class>
</classes>

<relationships>
  <relationship name="principal-city-of"
    identifier="1241232"/>
</relationships>

</gazetteer-standard-report>

The footprint corresponding to the above example might like something like this:

<?xml version="1.0" encoding="UTF-8"?>

<Polygon xmlns="http://www.opengis.net/gml">
  <outerBoundaryIs>
    <LinearRing>
      <coordinates>-115.12,36.25 -115.17,...</coordinates>
    </LinearRing>
  </outerBoundaryIs>
</Polygon>

Query language

The query service, described under Services above, returns all gazetteer entries that satisfy one or more constraints placed against entry attributes. The constraints are expressed in the form of a language.

The gazetteer query language consists of boolean combinations (and, or, and and not) of five types of queries. Support for any given type of query is optional. The query types are as follows:

identifier-query identifier

Returns the gazetteer entry identified by identifier.

name-query operator text

Returns all gazetteer entries having at least one name that matches text according to text-matching operator operator. If a gazetteer supports name queries, it must support the following operator:

equals
A gazetteer entry name matches text if it equals text, ignoring insignificant differences in whitespace.

Other text-matching operators gazetteers are encouraged to support include:

contains-all-words
A gazetteer entry name matches text if it contains all of the words in text. For example, entry name "San Luis Obispo" matches text "obispo luis" under this operator.
contains-any-words
A gazetteer entry name matches text if it contains any of the words in text. For example, entry name "Hope Ranch" matches text "hope" under this operator.
contains-phrase
A gazetteer entry name matches text if it contains all of the words in text in the same consecutive order. For example, entry name "Black Forest Drive" matches text "forest drive" under this operator, but entry names "Forest Lake Drive" and "Drive Forest" do not.
matches-pattern
A gazetteer entry name matches text if it matches text when the latter is treated as a regular expression. Specifically, an asterisk ("*") in text matches zero or more characters and a question mark ("?") matches any single character. Note that a gazetteer implementation may limit the regular expressions it accepts. For example, a gazetteer may support right truncation only (i.e., it may accept asterisks only at the end of text).

The semantics of all of the above operators have deliberately been left somewhat fuzzy to accommodate differing implementations. Specifically, exactly what constitutes a word is left undefined, and it is unspecified whether the gazetteer implementation employs word stemming or other fuzzy word matching techniques. In any case, the above operators should be case-insensitive.

footprint-query operator {polygon | box | identifier}

Returns all gazetteer entries having a footprint that matches a query region according to spatial operator operator. (If a gazetteer entry has multiple footprints, it is unspecified which footprint(s) are used for matching.) The query region may take any of the three forms listed next; note that support for any given form is optional.

polygon
A simple polygon with geodesic edges, defined in WGS84 latitude/longitude coordinates.
box
A rectangle whose edges are aligned with the WGS84 latitude/longitude graticule.
identifier
One of the footprints of the gazetteer entry identified by identifier (which footprint is unspecified).

If a gazetteer supports footprint queries, it must support the following operator:

within
A gazetteer entry footprint matches the query region if the footprint is a subset of the region.

Other spatial operators gazetteers are encouraged to support include:

contains
A gazetteer entry footprint matches the query region if the footprint is a superset of the region.
overlaps
A gazetteer entry footprint matches the query region if the footprint intersects the region.

A gazetteer implementation may limit the query regions it accepts. For example, an implementation may disallow polygons that enclose a pole. Also, an implementation may support matching on footprint bounding boxes only.

class-query thesaurus term

Returns all gazetteer entries belonging to class term, or any subclass of term recursively (if the gazetteer supports subclasses or thesaurus relationships), where term is a term drawn from a thesaurus or simple vocabulary associated with the gazetteer. For example, if class "capital cities" is a subclass (i.e., a specialization) of class "cities", then a class query of "cities" will return all cities (capital and not) whereas a query of "capital cities" will return only capital cities.

relationship-query relationship identifier

Returns all gazetteer entries having relationship relationship to a target gazetteer entry identified by identifier. Note that a gazetteer must not consider a relationship query with an inappropriate target to be malformed or erroneous. For example, suppose a gazetteer supports the capital-of relationship, but only for target gazetteer entries that are countries. A relationship query in which the target is a cemetery is not to be considered malformed, but should simply yield zero results.

Clients should be aware that a gazetteer implementation may not be able to search over all attributes of a gazetteer entry. For example, an implementation may be able to search over primary names only.

An XML schema for the gazetteer query language is listed below. The schema defines element <gazetteer-query> in namespace "http://www.alexandria.ucsb.edu/gazetteer". Subelements <identifier-query>, <name-query>, <footprint-query>, <class-query>, and <relationship-query> correspond to the query types described above. The elements <and>, <or>, and <and-not> support boolean combinations of queries.

Query regions in footprint queries may be specified using the Open GIS Consortium's Geography Markup Language (GML) or another geometry language supported by the gazetteer; see Capabilities, below. Support for GML is mandatory. GML defines the <gml:Box> and <gml:Polygon> elements in terms of an abstract Cartesian coordinate system, but we mandate here that the coordinate system must be the WGS84 latitude/longitude coordinate system. Specifically, the first (X) coordinate must be longitude in signed decimal degrees east of the Greenwich meridian and the second (Y) coordinate must be latitude in signed decimal degrees north of the equator.

gazetteer-query.xsd
<?xml version="1.0" encoding="UTF-8"?>

<schema xmlns="http://www.w3.org/2001/XMLSchema"
  xmlns:gaz="http://www.alexandria.ucsb.edu/gazetteer"
  xmlns:gml="http://www.opengis.net/gml"
  targetNamespace="http://www.alexandria.ucsb.edu/gazetteer"
  elementFormDefault="qualified">

<import namespace="http://www.opengis.net/gml"
  schemaLocation="geometry.xsd"/>

<element name="gazetteer-query">
  <complexType>
    <sequence>
      <group ref="gaz:query"/>
    </sequence>
  </complexType>
</element>

<group name="query">
  <choice>
    <element ref="gaz:identifier-query"/>
    <element ref="gaz:name-query"/>
    <element ref="gaz:footprint-query"/>
    <element ref="gaz:class-query"/>
    <element ref="gaz:relationship-query"/>
    <element ref="gaz:and"/>
    <element ref="gaz:or"/>
    <element ref="gaz:and-not"/>
  </choice>
</group>

<element name="identifier-query">
  <complexType>
    <attribute name="identifier" type="string"
      use="required"/>
  </complexType>
</element>

<element name="name-query">
  <complexType>
    <attribute name="operator" use="required">
      <simpleType>
        <restriction base="string">
          <enumeration value="contains-all-words"/>
          <enumeration value="contains-any-words"/>
          <enumeration value="contains-phrase"/>
          <enumeration value="equals"/>
          <enumeration value="matches-pattern"/>
        </restriction>
      </simpleType>
    </attribute>
    <attribute name="text" type="string" use="required"/>
  </complexType>
</element>

<element name="footprint-query">
  <complexType>
    <choice>
      <element ref="gml:Box"/>
      <element ref="gml:Polygon"/>
      <element name="identifier" type="string"/>
      <element name="other-region">
        <complexType>
          <sequence>
            <any processContents="lax"/>
          </sequence>
        </complexType>
      </element>
    </choice>
    <attribute name="operator" use="required">
      <simpleType>
        <restriction base="string">
          <enumeration value="contains"/>
          <enumeration value="overlaps"/>
          <enumeration value="within"/>
        </restriction>
      </simpleType>
    </attribute>
  </complexType>
</element>

<element name="class-query">
  <complexType>
    <attribute name="thesaurus" type="string"
      use="required"/>
    <attribute name="term" type="string" use="required"/>
  </complexType>
</element>

<element name="relationship-query">
  <complexType>
    <attribute name="relationship" type="string"
      use="required"/>
    <attribute name="identifier" type="string"
      use="required"/>
  </complexType>
</element>

<element name="and">
  <complexType>
    <sequence>
      <group ref="gaz:query" maxOccurs="unbounded"/>
    </sequence>
  </complexType>
</element>

<element name="or">
  <complexType>
    <sequence>
      <group ref="gaz:query" maxOccurs="unbounded"/>
    </sequence>
  </complexType>
</element>

<element name="and-not">
  <complexType>
    <sequence>
      <group ref="gaz:query" minOccurs="2" maxOccurs="2"/>
    </sequence>
  </complexType>
</element>

</schema>

An example of a gazetteer query is shown below. This example requests all places whose names contain the phrase "santa barbara" and that overlap a given spatial region, and that are neither populated places nor cemeteries. A place named "Santa Barbara County Hospital" might match such a query.

<?xml version="1.0" encoding="UTF-8"?>

<gazetteer-query
  xmlns="http://www.alexandria.ucsb.edu/gazetteer"
  xmlns:gml="http://www.opengis.net/gml">

<and-not>
  <and>
    <name-query operator="contains-phrase"
      text="santa barbara"/>
    <footprint-query operator="overlaps">
      <gml:Box>
        <gml:coordinates>-140,30 110,35</gml:coordinates>
      </gml:Box>
    </footprint-query>
  </and>
  <or>
    <class-query thesaurus="ADL Feature Type Thesaurus"
      term="populated places"/>
    <class-query thesaurus="ADL Feature Type Thesaurus"
      term="cemeteries"/>
  </or>
</and-not>

</gazetteer-query>

Capabilities

The get-capabilities service described under Services above returns a description of a gazetteer's overall capabilities. An XML schema for the description is listed below. The schema defines element <gazetteer-capabilities> in namespace "http://www.alexandria.ucsb.edu/gazetteer". Within this element are the following subelements:

<version>
The version of the gazetteer protocol the gazetteer supports.
<description>
A human-readable description of the gazetteer. It is suggested that the description include: the scope and purpose of the gazetteer; details on the gazetteer's interpretation and implementation of the protocol; appropriate usage guidelines; and rights and liability clauses.
<extended-report-schema>
If the gazetteer supports extended reports, the URL of the reports' XML schema.
<thesauri>
The thesauri (or simple vocabularies) the gazetteer uses to classify its entries. Each thesaurus is described by a name and the URL of its ADL Thesaurus Protocol interface.
<relationships>
The names of the relationships the gazetteer is capable of representing.
<other-geometry-languages>
The geometry languages the gazetteer supports (other than GML, which is required). Each language is described by an XML namespace.
<services>
The services the gazetteer supports.
<query-types>
The types of queries the gazetteer supports.
<name-query-operators>
If the gazetteer supports name queries, the text-matching operators the gazetteer supports.
<footprint-query-operators> and <footprint-query-operands>
If the gazetteer supports footprint queries, the spatial operators and geometry types the gazetteer supports.
gazetteer-capabilities.xsd
<?xml version="1.0" encoding="UTF-8"?>

<schema xmlns="http://www.w3.org/2001/XMLSchema"
  xmlns:gaz="http://www.alexandria.ucsb.edu/gazetteer"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  targetNamespace="http://www.alexandria.ucsb.edu/gazetteer"
  elementFormDefault="qualified">

<import namespace="http://www.w3.org/1999/xlink"
  schemaLocation="xlinks.xsd"/>

<element name="gazetteer-capabilities">
  <complexType>
    <sequence>

      <element name="version" type="string" default="1.1"/>

      <element name="description" type="string"
        minOccurs="0"/>

      <element name="extended-report-schema" minOccurs="0">
        <complexType>
          <attributeGroup ref="xlink:locatorLink"/>
        </complexType>
      </element>

      <element name="thesauri" minOccurs="0">
        <complexType>
          <sequence>
            <element name="thesaurus" minOccurs="0"
              maxOccurs="unbounded">
              <complexType>
                <attribute name="name" type="string"
                  use="required"/>
                <attributeGroup ref="xlink:locatorLink"/>
              </complexType>
            </element>
          </sequence>
        </complexType>
      </element>

      <element name="relationships" minOccurs="0">
        <complexType>
          <sequence>
            <element name="relationship" type="string"
              minOccurs="0" maxOccurs="unbounded"/>
          </sequence>
        </complexType>
      </element>

      <element name="other-geometry-languages"
        minOccurs="0">
        <complexType>
          <sequence>
            <element name="geometry-language" minOccurs="0"
              maxOccurs="unbounded">
              <complexType>
                <attribute name="namespace" type="anyURI"/>
              </complexType>
            </element>
          </sequence>
        </complexType>
      </element>

      <element name="services">
        <complexType>
          <attribute name="get-capabilities" type="boolean"
            fixed="true"/>
          <attribute name="query" type="boolean"
            default="false"/>
          <attribute name="download" type="boolean"
            default="false"/>
          <attribute name="add-entry" type="boolean"
            default="false"/>
          <attribute name="relate-entries" type="boolean"
            default="false"/>
          <attribute name="remove-entry" type="boolean"
            default="false"/>
        </complexType>
      </element>

      <element name="query-types" minOccurs="0">
        <complexType>
          <attribute name="identifier" type="boolean"
            default="false"/>
          <attribute name="name" type="boolean"
            default="false"/>
          <attribute name="footprint" type="boolean"
            default="false"/>
          <attribute name="class" type="boolean"
            default="false"/>
          <attribute name="relationship" type="boolean"
            default="false"/>
        </complexType>
      </element>

      <element name="name-query-operators" minOccurs="0">
        <complexType>
          <attribute name="contains-all-words"
            type="boolean" default="false"/>
          <attribute name="contains-any-words"
            type="boolean" default="false"/>
          <attribute name="contains-phrase"
            type="boolean" default="false"/>
          <attribute name="equals" type="boolean"
            fixed="true"/>
          <attribute name="matches-pattern"
            type="boolean" default="false"/>
        </complexType>
      </element>

      <element name="footprint-query-operators"
        minOccurs="0">
        <complexType>
          <attribute name="contains" type="boolean"
            default="false"/>
          <attribute name="overlaps" type="boolean"
            default="false"/>
          <attribute name="within" type="boolean"
            fixed="true"/>
        </complexType>
      </element>

      <element name="footprint-query-operands"
        minOccurs="0">
        <complexType>
          <attribute name="box" type="boolean"
            default="false"/>
          <attribute name="identifier" type="boolean"
            default="false"/>
          <attribute name="polygon" type="boolean"
            default="false"/>
        </complexType>
      </element>

    </sequence>
  </complexType>
</element>

</schema>

Here's an example of a gazetteer capabilities description:

<?xml version="1.0" encoding="UTF-8"?>

<gazetteer-capabilities
  xmlns="http://www.alexandria.ucsb.edu/gazetteer"
  xmlns:xlink="http://www.w3.org/1999/xlink">

<version>1.1</version>

<description>This gazetteer...</description>

<extended-report-schema xlink:href="http://..."/>

<thesauri>
  <thesaurus name="ADL Feature Type Thesaurus"
    xlink:href="http://www.alexandria.ucsb.edu/..."/>
</thesauri>

<relationships>
  <relationship>adjacent-to</relationship>
  <relationship>capital-of</relationship>
</relationships>

<other-geometry-languages>
  <geometry-language
    namespace="http://www.esri.com/ArcXML"/>
</other-geometry-languages>

<services query="true" add-entry="true"/>

<query-types identifier="true" name="true" footprint="true"
  class="true"/>

<name-query-operators contains-all-words="true"
  contains-any-words="true" contains-phrase="true"/>

<footprint-query-operators contains="true"/>

<footprint-query-operands box="true" identifier="true"/>

</gazetteer-capabilities>

Revision history

1.1
Swapped the interpretation of the GML first (X) and second (Y) coordinates. Added a <description> subelement to the <gazetteer-capabilities> element.
1.0a
Clarified the meaning of a gazetteer entry having more than one footprint. Other, minor changes.
1.0
Original version.

Greg Janée
Last modified: 2002-12-09 19:58