The timing of my talk now couldn't be better. I am going to follow up with what's been going on with standards - and in particular, feature classification standards - and how they relate to this kind of problem of a digital gazetteer. As Linda mentioned, my regular job is as a professor of Urban Studies and Planning at Virginia Commonwealth University down the road a little ways here. But that's not the reason that I'm here. I'm here because in 1982 Joel Morrison decided to move from the University of Wisconsin to the U.S. Geological Survey and left a vacancy of one faculty participant on a standards committee, and I volunteered for that in 1982. Here I am, 17 years later, still struggling with the same issue that Joel left me with then.
The topics that I will be talking about are: first, a review of the gazetteer scope that you've already heard about, and emerging standards for methodology for feature type classification, which is what I have been working on with the International Organization for Standardization (ISO) Technical Committee on Geographic Information. I'm going to talk briefly about the semantic underpinnings to some of the work, and then some comments on the relevance to gazetteers, in particular to searches in gazetteers.
I understand - and we have all seen - that the scope of a gazetteer includes named features, defined locations and searching on one to find the other. Of course, I should now add time to this set of dimensions. Feature classification standards can help a great deal with the question of "what is it?", "what's there?", and then there are other standards as well in spatial referencing to help define that "where" element.
Feature classification standards go back a long ways. Many of the issues we were confronting back in 1982 (and probably for many, many years before that) are still problems in terms of the classification and harmonization between different sources. There are also a few recently emerged issues resulting from new technologies and the object-oriented paradigm for information systems that has been wholeheartedly embraced by the ISO committee recently. The ISO Technical Committee 211 was formed in 1994 to prepare a family of standards that are now nearing the stage of draft international standards. So this is a serious stage of development in that work, and there's very little opportunity for modifying the content of the standards after they become draft international standards. That hasn't happened quite yet. Depending on what happened in Kyoto two weeks ago, which I'm not quite sure of because I wasn't at the latest meeting, there are somewhere between 18 and 20 standards being proposed under this committee, and just to give you an idea of the relevance of some of them: one of them is spatial referencing by coordinates, one spatial referencing by geographic identifier, and Rob Walker (who is here) is the project leader for that item; quality principles and evaluation procedures, and Fred Broome is here, who's working very assiduously on those two related standards and can answer questions about that; and then the metadata standard that Doug Nebert has worked on a lot on the U.S. input is another important standard that is being developed, along with feature cataloging methodology and I happen to be the project lead for that one. The latest plenary session was in Kyoto Japan on September 29-30. Others who were in attendance can update you later on what actually transpired there. As I understand it, meetings were devoted to harmonizing the general feature model, at least in terms of what affects what we are doing here, and the terminology for this family of standards. The cataloging methodology comments I'm going to make may be revised after I show you what the current status is, based on those harmonization efforts.
The feature cataloging methodology part of this standard is what I'm going to focus on, because I know it best, but you will notice as I go through that the other standards in other aspects of spatial referencing, quality, and so forth all have the same kind of relevance to the gazetteer issues that we have been hearing about. The feature cataloging methodology standard is for defining a methodology for cataloging feature types, not for catalogs for all kinds of information about data sets, but in particular for feature classification. The standard was developed based on a careful review of existing practice and what we knew about research directions back in 1994-95 when we began this effort. We had input from all around the world from cataloging experts from a number of countries - about 12 countries in all. It provides a framework for feature classification in a prototype, but this is an important point - the project team made an explicit decision about 2 years ago, they were charged with possibly developing an international standard classification, and decided to duck that issue - well, not really duck it, but decide that it was really beyond the scope of what a volunteer technical committee could possibly achieve, even if it were achievable, and I think that is still an open question.
The ISO feature cataloging framework consists of a classification of feature types. If you are going to do cataloging under this proposed standard, you will have a catalog of feature types, with them their attributes, their relationships to other feature types, and something that has not appeared in other catalogs in the past, but something that we think is a future direction - operations of those features. Also definitions. Roger Payne remembers, since he was on that same committee that Joel Morrison had been part of - the feature working group of the National Cartographic Data Standards, which was in operation from 1982 to 1992. One of the first things they noticed was that there was an absence of definitions of what feature types really mean, and one of the big efforts then - and it's still not true in all catalogs - was to have natural language definitions of what those feature types are. It may say that it has roads and railroads, but what's the definition of a road, and when is a road likely to be in there or not, based on that definition? And then in this development of a future-looking catalog framework we've also added the option for defining the operations of features in a functional programming language.
The ISO committee is working in Unified Modeling Language (UML). The structure of the feature cataloging methodology standard is illustrated in this slide. The feature type is at the top. The object off to the left, the "secretary" for feature type, is feature type alias. This means that the ISO standard supports the concept of "included terms" from SDTS, which is also based on the way feature types are grouped into classes in the Geographic Names Information System. In Part 2 of SDTS, 1,200 included terms have been grouped into 200 standard types.
As I've said, in the ISO structure feature types have properties that include feature attributes, feature relationships, and feature operations. Feature types are defined independently of how they are referenced spatially. There are other ISO standards being prepared to address the spatial schema for geographic information and spatial referencing, whether by geographic identifiers or by coordinates. The diagram illustrates the distinction between classifying feature types on the one hand, and how the features are referenced spatially on the other hand.
The Alexandria digital library gazetteer server can be seen as an example that fits within this model for feature cataloging. For example, if we look up "administrative areas" on a map of Virginia, we see the state as a first order division of the United States, and counties as second order divisions. "State" and "county" are aliases for first and second order divisions respectively, and they are related to each other by the simple relationship "isPartOf." So there's a little bit of relationship information (aliasing), and then the footprints as a kind of spatial referencing. There are a lot more possibilities for doing spatial referencing, of course, and the ISO standards have some of those options spelled out quite precisely.
Feature type is a class of real world phenomena with common properties. I had a member of my team from Quebec and another one from France, and so the Eiffel Tower is the example we came up with. The feature type is "tower." The properties of such a feature type may include attributes, relationships, operations, and alias terms. Alias terms are only for those other terms that have exactly those same properties, otherwise it's another concept.
A feature attribute has a name, data type, and value domain, and in the ISO model is explicitly associated with one or possibly more feature types. But the attributes are entered not as separate things, as they are in some catalogs, but as belonging to a particular type.
Feature relationships included in this ISO proposed standard are generalization, aggregation, and association. The most recent revision includes sub-typing of feature types, thanks to input from Sweden, although that hasn't been finally approved. I need to find out from the chairman of my editing committee, Rob Walker - another hat he wears - whether that's been accepted or not by the editing committee. But that's what we'd like to add based on the input from Sweden.
The inheritance in this proposed standard is open-ended and allows for multiple inheritances.
Operations: for example, for the feature "dam", an operation could be to raise the dam, which affects the level of water in the reservoir or the flow of water in a stream. The dam may also block navigation beyond that point upstream. These are the kinds of things we are getting at with operations. The operations themselves inherently specify attributes and relationships. So to just say that this is a separate category for classification - that is the way it sits right now in the standard, but I don't think that it's a long-term solution. Hopefully in some future time the classification can be based on operations, which then imply relations as well as attributes. They are the key to usefulness of data in a given application. A very important area - at least the committee thinks so.
This concept of feature operations, which is new, is frequently misunderstood, and I will just mention that it does not have to do with internal database operations like finding the length of an arc or the area of a polygon. It's not intended to be a functional classification in the sense of a purpose - transportation or something like that - but the actual behavior in the real world of a type of feature, what feature type really does. Currently, in most catalogs, that information only appears in the natural language part of the definition.
Let me say a little bit about the work that Andrea Rodriquez has been doing at University of Maine using the WordNet prototype. Semantic concepts are analogous to the object model for feature classification. In WordNet there are nouns that correspond to feature types. Adjectives are associated with attribute values. For example, "the car is red" - the red color is an attribute value. A verb normally corresponds to a feature operation. So we have natural language foundations for many of these concepts, and hopefully there is an automated way to retrieve some of that content when you are doing feature classification. And of course what Andrea is working on particularly is attribute similarity in GIS databases, not to find terms that are exactly equivalent, but how close they are to the same meaning as something that you will find somewhere else. There are also synonyms (alias terms) and different senses of the same term, part-whole relations and "is_a" relations in the semantics of WordNet. WordNet is a project of Princeton University; it is much broader than GIS and has to do with all terms.
The application of this to gazetteer information: there is a need, I think, for explicit definitions of feature types. Some gazetteers have them and some don't. The GNIS has very useful definitions, and they intentionally say that maybe this isn't your definition of what this feature type is, but it's how we are using it in this gazetteer. I think that some of these concepts from the standard can enrich the content of feature classification in a gazetteer, and in a similar way it's important to apply standards for spatial location and quality to gazetteer systems.
In terms of enriching content, the attributes of feature types - generalization, multiple inheritance, part-whole relations, feature operations, and terms that have multiple meanings - can be included in a gazetteer, I think. For the attributes we probably want to develop templates. Gazetteers now have population in them, but if you look you don't see - at least, I didn't see - much else there in terms of attributes. There could be a whole lot more attribute information, of course, in such a system.
Every time someone sits down to do a feature classification, they come out with some sort of hierarchical structure. The SDTS standard committee intentionally tried to resist that tendency - unsuccessfully I guess. But one thing that we can look at more carefully is whether the properties are exactly inherited between the subclasses in a hierarchical scheme. If they are, then you have some more information by having a hierarchy, and if they're not, you don't really add anything to the information than it would have otherwise. Unless someone is familiar with that hierarchy and knows where to look for something, the hierarchy should be rigorous. Multiple inheritance - a feature type "canal" is a "water feature", "cultural feature", and also "transportation feature",depending on what you are interested in - you should be able to find under any of those categories.
Part-whole relations: we have some examples of political subdivisions and transport networks. These are not hierarchical. You shouldn't confuse the part-whole relations with the hierarchical classification scheme. Some of these hierarchies are spatially defined and others are not. Feature operations can help with searching for features relevant to a given purpose. For example, that dam is a barrier to navigation. There are other things that don't have anything else to do with dams as they are being classified, but may also be barriers to navigation. Of course, someone with a navigation application would be very interested in that aspect of them.
In SDTS we had some "included terms", or the alias list, so that a "bank" could be either a building or a land formation, depending on what you were looking for or interested in. "Place": I don't know if this is still true, but at that time the USGS and the Census did not agree on the definition of "place", and yet to both of them it is a very critical concept. So we need in some cases to have multiple references or some sort of catalog of senses of these terms. So you can say that you mean "place" in the Census Bureau's sense of that term or the USGS sense of it, and have those senses be explicitly defined.
So, in conclusion, emergent international standards do provide, I think, frameworks for representing feature classifications and locations and time - there is also a temporal schema in the ISO standards. Richer and more useful information content can result from the application of this kind of standard to gazetteer design, but - and then one more thought that I'd like to leave with you, and I'll be looking forward to this and listening and trying to learn as much as I can for the next couple of days - what lessons can we learn as standards developers from digital gazetteer work for the design and the future of standards for geographic information?
Thank you.