Reports from Breakout Group Discussions

Group 4: Charles Falkenberg

Profile

We started with the initial questions that Linda gave us, but after the first round of conversation, we fell into a conversation about the definition of gazetteers. We encountered some differences. We started out with a concern for expanding the talk beyond just an index of names. We began discussing functions and definitions that really belong to GIS systems or larger database systems.

We did discuss the two definitions of gazetteers that were available to us, the UN definition and ISO 211 definition. The UN definitions uses names specifically as part of its gazetteer definition. ISO 211 says that a gazetteer is "a directory of instances of a class or classes of real world phenomena, containing some information about their location". There was some debate about whether or not the gazetteer had to have a name in it, and since this is not part of the ISO 211 definition, whether this was cause for concern. We moved forward with the idea that a gazetteer should have a name, but recognizing that there may be a broad definition of name. In other words, a FIPS code itself might be a name, or a ZIP code might be a name. How do you reconcile these issues?

One of the things that came out of this conversation was that feature type taxonomy is not really part of the gazetteer, but part of the metadata that is processed by the gazetteer or stored alongside the gazetteer. That was a clarification that I thought was quite interesting.

We identified several uses. Surely one of the prime interests here has been geocoding of input data, trying to take legacy data of some type or another, use gazetteers to provide geolocations for source documents of one type or another, reviving legacy data - I think somebody mentioned that today, and I think that is a good way to look at it. The flip side then is geographic information retrieval, using a gazetteer as perhaps an initial step or the only step in getting data out; summarized as finding names for coordinates. Information retrieval might consist of both finding names for coordinates as well as finding locations for names. Locations for names are useful in a variety of things, primarily in two ways. One, either part of a larger query process, where the gazetteer might resolve to a location of interest, but there might be additional metadata elements entered as part of that query process. Or, strictly, just learning about places as a way to go in and look at the names and locations of various places to learn about them in their own right, staying within the gazetteer.

Also, we identified the gazetteer as a list of official names. It can stand alone as an official registry of names, catalog of variants, thesauri of synonyms and loosely related words. That is, a controlled vocabulary of those things; and finally, to relate spatial and temporal relationships. This, I think, is a little more vague, but certainly a key use of a gazetteer is to do some of the things like the last speaker said, where you ask, when did something change, or what are the relationships between two elements of the gazetteer?

We went through some communities. This may have been a useless exercise. We ended up saying everybody. We started with research institutions; someone pointed out news organizations - organizations as a group that do a lot of data archiving, and need to be extremely accurate in their use of names, international names in particular. And then a host of government agencies and the general public. We mentioned motorists, but there is also use of EPA archives that need gazetteers for finding hazardous waste dumps in your own neighborhood, this kind of thing. Utilities, private sector, transportation services could make use of them. Retail services of one type or another. Then non-governmental agencies as well - environmental groups or relief groups.

Moving on to the services, which was the second question. We outlined a current list of services as this basic translation between places and coordinates - retrieving coordinates of a place name or a name from a set of coordinates. Retrieving the name and variants of those names as a thesaurus. Retrieving feature types, asking questions not only about names and coordinates, but feature types. Then there was some conversation about how much hierarchy should be revealed in the gazetteer itself. There was some agreement that it ought to at least reveal administrative hierarchy. But we were really focusing on the gazetteer as a repository of instances, and not necessarily of classifications. One of the prime goals is to retrieve instances of features as well as the administrative structures.

The future services: we started this conversation as a wish list. So we got kind of excited about a lot of these things. Clearinghouse of gazetteers was identified early on, as well as other locator services that might be available over the Web; the need for the clearinghouse to have metadata information about the quality and character of the gazetteers in the clearinghouse.

The question of retrieving more complex hierarchies moved from our current services into the future services. The question of whether or not there are other hierarchies, besides administrative hierarchies, that you could expose through gazetteers; river networks, or mountain chains, various hierarchies based on feature types. So, this idea that we retrieve stored instances and instance hierarchy by type, or multiple instances in the case of multiple hierarchies. Interoperability for distributed gazetteers: this is the gazetteer server idea. And then the point was made, and I think this is a good one, that some of the gazetteer services, some of the core services, might not change, but that the creative uses of those basic services will expand dramatically. So, the question of future services may not need to focus as much on gazetteers but rather on the use of gazetteers.

Some of the other issues: tools for geocoding. This is clearly a problem in a lot of respects.

Ways to add names to gazetteers: this is a sort of sensitive issue in terms of the authorization of various names, but certainly departmental gazetteers or local gazetteers might need easy ways to add names or maybe to submit names to authorities.

The problem of hardware dependence and various pieces that don’t work was brought up, and the need for hardware independent tools. Also machine-readable interface between gazetteers, standard formats, protocols. Various ways that servers can be built to talk to one another and expand gazetteer services. The hope of using gazetteers as part of a larger information retrieval process across the World Wide Web, where you enter a place and a characteristic and get information that might be anywhere on the Web. The example here is the geology of Bangladesh. This might be the agent type technology that goes out and searches the Web using gazetteer data.

Visualization tools were talked about, not only in a GIS context, which is an obvious way to visualize gazetteer data, but maybe information structures, the hierarchies that might be implied in the classes. The photos or images that might go along with the source data. Ways to visualize fuzzy names and locations; ways to try to display fuzzy or ambiguous names. Also, the visualization we saw this morning of California with the third dimension being the number of references in a particular document seemed like an effective visualization.

Gazetteer servers: the idea that we don’t replicate data. We don’t have CDs of gazetteers, but go out on the Web to retrieve gazetteers that maintain their data.

We need to build a seamless interface to those gazetteers and finally have the gazetteers perform some filtering of the data, ranking perhaps by scale or by data quality of the names. We ran out of steam when we got to the collaboration - we weren't sure we could make too many assumptions about who was able to collaborate.

Folks in Group 4 - is there anything that I missed?