Simple Geometry Language

This page describes some incomplete work undertaken in 2003 to define a standard, XML-based language for describing geographic regions, i.e., geometric regions on the Earth's surface. For brevity, we'll call any such language a geometry language. A geometry language defines a set of possible shapes and standard representations and encodings of those shapes, and also addresses the handling of cartographic quantities (Earth datums, projections, and coordinate systems), either by mandating standard quantities or by providing standard declaration mechanisms.

The motivation for a standard geometry language is rooted in the observation that every system/service/effort that has had to deal with geographic regions has ended up defining its own geometry language. All these languages have broadly similar capabilities to varying degrees, yet all have enough idiosyncracies to bedevil easy interoperability. It is instructive to compare and contrast the geometry languages embedded in specifications such as:

(A number of additional geometry languages are derived from one or more of the above.) A standard geometry language would facilitate interoperability across different systems, particularly among consumers of geographic regions such as renderers and spatial indexers.

From the perspective of distributed geospatial digital libraries and distributed gazetteer services, which use geometry only for the limited purposes of representing object footprints and query regions and performing spatial comparisons between the two, a geometry language must satisfy three requirements:

  1. The language must support enough possible shapes—and complex enough shapes—so that spatial matching over those shapes yields acceptable search precision. For gazetteers a sufficient set of shapes is not known, but necessary shapes include points for point features such as water wells, polylines for linear features such as rivers, and at least simple polygons for areal features.
  2. The spatial reference system (SRS) in which shapes are defined (i.e., the Earth datum and coordinate system) must not be mandated by the language, but must be declarable in a standard way. Mandating a particular SRS forces language users to translate SRSs, which can be mathematically complex and can introduce unintended consequences such as formation of aggregate shapes.
  3. The language must provide a lingua franca that virtually all geometry producers and consumers can operate on; in practice, due to simplicity of implementation, ease of mappability, and general widespread support, the lingua franca is latitude/longitude-aligned minimum bounding rectangles, or bounding boxes for short.
    1. Notwithstanding requirement 2 above, to support interoperability, bounding boxes must be defined in a standard SRS, e.g., WGS84 latitude/longitude coordinates. (It is reasonably easy to compute such bounding boxes from commonly-used cylindrical and polar projections.)
    2. In principle, bounding boxes are deterministically computable from primary shapes; nevertheless, bounding boxes must explicitly accompany all primary shapes in instance documents. To fail in this regard is to place the burden of computing bounding boxes on the very geometry consumers that are incapable of doing so: those that rely on bounding boxes because they're incapable of operating on more complex shapes.
    3. Bounding boxes must be defined in a manner that supports geodetic continuity, that is, in a manner that recognizes that the Earth is, topologically, a sphere. In particular, there must be no discontinuity that bounding boxes are not allowed to cross such as, in many geometry languages, the ±180° meridian.

The Open GIS Consortium's Geography Markup Language (GML), version 3.0, is one well-known attempt to define a standard geometry langauge. It is a comprehensive specification having many desirable characteristics, but it suffers from two defects that are shared by many of the aforementioned geometry languages. First, in balancing the concerns of consumers of the language, who generally prefer uniformity and simplicity, versus producers, who generally prefer expressiveness and flexibility, GML weighs heavily in favor of producers. It defines many, many possible shapes and shape-related options. The effect of this imbalance is that, in practice, consumers can not and do not accept but an idiosyncratic fraction of the entire GML language. The second defect is that GML does not meet any of the conditions of requirement 3 above.

The XML schema below represents a first effort at defining a geometry language that addresses these concerns. The language is a profile of GML, that is, a subset and logical restriction of GML such that any instance document that adheres to the language below also adheres to GML and can be interpreted by any GML consumer.

This deliberately simple geometry language supports just three possible shapes: points, polylines ("linestrings" in GML parlance), and simple (i.e., self-intersection-free and hole-free) polygons. Each shape is represented in the language by both an XML schema type (e.g., PolygonType) and an XML element (e.g., <Polygon>). However, the intention of the language is that only schema type AbstractFeatureType be referenced by application schemas; this usage forces a bounding box to be associated with every shape in instance documents. SRSs can be declared using the srsName attribute.

ADL-geometry.xsd
<?xml version="1.0" encoding="UTF-8"?>

<schema xmlns="http://www.w3.org/2001/XMLSchema"
  xmlns:gml="http://www.opengis.net/gml"
  targetNamespace="http://www.opengis.net/gml"
  elementFormDefault="qualified">

<element name="coordinates" type="string"/>

<!-- needed only by ADL-geometry-extended.xsd -->
<element name="radius">
  <complexType>
    <simpleContent>
      <extension base="double">
        <attribute name="uom" type="anyURI" use="required"/>
      </extension>
    </simpleContent>
  </complexType>
</element>

<complexType name="AbstractGeometryType" abstract="true">
  <attribute name="srsName" type="anyURI"/>
</complexType>

<element name="_Geometry" type="gml:AbstractGeometryType"/>

<complexType name="PointType">
  <complexContent>
    <extension base="gml:AbstractGeometryType">
      <sequence>
        <element ref="gml:coordinates"/>
      </sequence>
    </extension>
  </complexContent>
</complexType>

<element name="Point" type="gml:PointType"
  substitutionGroup="gml:_Geometry"/>

<complexType name="LineStringType">
  <complexContent>
    <extension base="gml:AbstractGeometryType">
      <sequence>
        <element ref="gml:coordinates"/>
      </sequence>
    </extension>
  </complexContent>
</complexType>

<element name="LineString" type="gml:LineStringType"
  substitutionGroup="gml:_Geometry"/>

<complexType name="PolygonType">
  <complexContent>
    <extension base="gml:AbstractGeometryType">
      <sequence>
        <element name="exterior">
          <complexType>
            <sequence>
              <element name="LinearRing">
                <complexType>
                  <sequence>
                    <element ref="gml:coordinates"/>
                  </sequence>
                </complexType>
              </element>
            </sequence>
          </complexType>
        </element>
      </sequence>
    </extension>
  </complexContent>
</complexType>

<element name="Polygon" type="gml:PolygonType"
  substitutionGroup="gml:_Geometry"/>

<complexType name="AbstractFeatureType" abstract="true">
  <sequence>
    <element name="boundedBy">
      <complexType>
        <sequence>
          <element name="Envelope">
            <complexType>
              <sequence>
                <element ref="gml:coordinates"/>
              </sequence>
            </complexType>
          </element>
        </sequence>
      </complexType>
    </element>
    <element name="location">
      <complexType>
        <sequence>
          <element ref="gml:_Geometry"/>
        </sequence>
      </complexType>
    </element>
  </sequence>
</complexType>

</schema>

The above geometry language, expressed as a profile of GML, has a number of nice properties, not the least of which is that it weeds out 99% of the 600-plus-page GML specification. However, there are a number of serious deficiencies which are still unresolved:

Finally, below is an extension to the above geometry language that adds a disk shape (defined by center and radius) and several convenience declarations. As an extension, it is necessarily incompatible with GML.

ADL-geometry-extended.xsd
<?xml version="1.0" encoding="UTF-8"?>

<schema xmlns="http://www.w3.org/2001/XMLSchema"
  xmlns:adlgml="tag:alexandria.ucsb.edu,2003:geometry"
  xmlns:gml="http://www.opengis.net/gml"
  targetNamespace="tag:alexandria.ucsb.edu,2003:geometry"
  elementFormDefault="qualified">

<import namespace="http://www.opengis.net/gml"
  schemaLocation="ADL-geometry.xsd"/>

<complexType name="DiskType">
  <complexContent>
    <extension base="gml:AbstractGeometryType">
      <sequence>
        <element ref="gml:coordinates"/>
        <element ref="gml:radius"/>
      </sequence>
    </extension>
  </complexContent>
</complexType>

<element name="Disk" type="adlgml:DiskType"
  substitutionGroup="gml:_Geometry"/>

<complexType name="FeatureType">
  <complexContent>
    <extension base="gml:AbstractFeatureType"/>
  </complexContent>
</complexType>

<element name="Feature" type="adlgml:FeatureType"/>
<element name="Footprint" type="adlgml:FeatureType"/>
<element name="QueryRegion" type="adlgml:FeatureType"/>

</schema>

created 2004-08-25; last modified 2009-11-20 10:17