Reports from Breakout Group Discussions

Group 2: Fred Broome

Profile

I have heard geographers defined in many ways. I am a geographer, so I can do this. I have heard geographers defined as people who make an intensive study of the obvious. I felt a little bit like that today. Why?  Because I think it’s obvious that we need something like a gazetteer, even in this digital world. So it is interesting to have a workshop on the obvious. We did take an informal vote partway through, however, and came to the conclusion that maybe there should not be anything called a gazetteer. Maybe the term gazetteer carries too much baggage. Maybe the term gazetteer is too unknown to anyone outside of our little clique. I don’t know. I will have to tell you that the vote was not unanimous, but later I will come back to this particular topic.

What we did is, we tried to address the five topics. The first thing we did was to examine the uses of gazetteers and key applications. But, as we started examining the uses, someone made a very wise observation. Many times people come in to use a gazetteer, and they don’t bother to tell you why they are using it. So how can you examine the uses when you don’t know why the customer is using the gazetteer?  In the electronic field I think that would even be truer than it is perhaps in the manual field.

So then we decided, let’s look at communities of users. Maybe we can derive something out of that. We started a list: emergency response, environment, archives, and so on - I can’t go through the whole list because we had a page and a half up there - mapping, natural sciences, etc. We said, "what are they really doing?"  Well, maybe the outgrowth of examining it is that we need to concentrate, not so much on what the user is using it for, but are they being satisfied, and maybe concentrate on basic and expanded components of the information that is exchanged when one questions a gazetteer. We thought maybe there might be at least some primitives; the name, clearly that’s what it is all about, and the footprint. We agreed with the other groups that perhaps there should be temporal and source or authorization components to these, and some other supporting information. Now, in doing this, we did not do it blindly. We of course referred to the document that was handed out to us, and we went through this, and that was part of our reference, and it actually initiated some conversation and discussion on its own.

So we said that concentrating on what information should be exchanged would be a better thing than trying to figure out who all the users are out there, because we know there are users, and we know there is a wide variety of them.

Then we went to the topic of data gathering and data sharing. Because, if you’ve got users and you have got a distributed group of gazetteers (I am going to continue to use that term), then clearly you want to be able to exchange data, you want to be able to give data to people who can update the gazetteer, you want gazetteers to talk to each other in a digital form. So we suggested that the first thing that must be addressed is determining the interface in the gazetteer service. Now, for purposes of discussion, again not clearly and not unanimously, we sort of separated out two things: one is, in the digital world you do have a dataset, and you might call it a gazetteer. There are other datasets that can serve a function that is equivalent to a gazetteer function, such as just a standard dataset that you would put into a GIS, like a Tiger file. You can, through GIS software, make queries that for the world look like a gazetteer query. Then you can also have a dataset that is strictly structured as a gazetteer, as a link between a name and a location, and provides a mechanism for then referring to other things. But in the digital world you have to also consider the gazetteer service. We are defining the gazetteer service as that little set of interface, if you will, that allows information in your gazetteer, in your database, to be queried. It doesn’t make any difference what it is. The query comes in a standard form. That little interface then maps into your structure, comes out with an answer, puts it out into the standard form and sends it either to the person performing the query, or you are exchanging information with another gazetteer. The interface seemed to be the most important thing there - the interface service. Then, after you have gotten this interface service, you can start concentrating on implementation agreements. You’ve got an agreement between two federal agencies to let their gazetteers talk to each other. You’ve got an agreement with a private industry, a commercial firm, to be able to point to their dataset, even if it’s a for-charge query. The agreements on what an interface should look like when it’s presented on the screen - does it always have to be something that a human would interface with, or can we think of systems where my GIS would generate a query. I would never see it. And it would send this out to somebody’s gazetteer to get a response back to then act on.

It’s fascinating. Here we are in the digital world, and I am doing the manual technique of writing it down and flipping it. I know you people in the back can’t read it. Trust me, I am reading it accurately, to the best of the ability of my trifocals.

Then we looked at the proposed content standard again. As I stated earlier, this was one of the things that brought about a lot of discussion, and I would return to it. There were suggested changes that the various members promised to give Linda over the time period. And we decided also that a gazetteer should be based on a content standard. We started looking at it and saying, what’s it really doing? We decided that it is really a locator service. It is a thing that relates names to location and back and forth. You come in with a location and you get names; you come in with names and you get locations. There was one suggestion that was made, and I think everybody agreed to it, or at least no one really objected to it strenuously, and it was that the gazetteers that we present to the outside world should be gazetteers that are read-only. Use your traditional database management software; use your other techniques in the background to do your own updating. Don’t let other people do your updating unless you really trust them. Would you trust them to marry your sister?  If so, then you might trust them to get into your gazetteer there and make changes. But I don’t necessarily believe that you want to do that. I think you want to have some kind of filtering and protection system of your own dataset.

Finally, we got to research topics. There has to be a minimal level of content. Think about it, you’ve got one name, you’ve got one location, maybe represented by a bounding box, so that is 5 entries; you’ve got one feature type, that’s 6 entries. So let’s say there are 6 lines of information. But if you look at ADL Gazetteer Content Standard, there are over 188 lines of information here that you could easily fill in. In some cases, you could fill in with multiple lines. So that’s a pretty heavy overkill. Not overkill, sorry, that’s my mistake. That is a pretty heavy data burden to carry about each name. It is not unimportant. That is very important information; it is desirable to carry it. But what is the minimal set? That was a topic that we kicked around for a while, and we said, that’s still a legitimate research topic. Now we can’t answer this just sitting around this room. The people that can answer that are the people who are both using gazetteers and building gazetteers. They are the only ones that can answer that.

What is the minimal set of feature types?  We think it is wise that at least this community come up with a minimal set of feature types. Now there may be extensions to those that map into your specific definitions of whether this is an "airport" or not. But there should be a minimal set of feature types. And there were some suggestions, and I will come back to those, about how we can achieve this. But that is a perfectly legitimate research topic. And again, this is a topic that is better handled by people who are already working on it, or people who think it is a challenge to compare the needs of different groups on this. We need a minimal set of feature types. We don’t want every type in the world, and we certainly want some kind of standardization.

The next one was natural language query. As you heard from the earlier speaker, natural language query should be supported. There are perfectly legitimate reasons why you would want to be able to use natural language in your form of query. Sometimes your data sources happen to be structured in a way that it would be easier just to give it that way, rather than make up a formal structure query.

OK, then we want to determine if there is a level of metric accuracy that is sufficient for a geo-query. And what does this mean? Well, basically this is an excellent research topic. Just how accurate do the coordinates that you have in your dataset have to be? Strictly again, we are talking about gazetteer type datasets, defining a feature, an area, linear, or point feature. Just how accurate do those coordinates have to be?  And equally, how accurate do the coordinates have to be if you are doing a coordinate query of a dataset. Do you stop at whole degrees?  Do you have to say that the coordinates in your dataset, even if they are envelopes, are plus or minus a certain level of accuracy?  Or have so many places of significance in your numbers?  There should be some way to put that kind of a parameter in there. But, before we do that, we also need to know what is a minimal level that is sufficient for a typical query. There is some discussion that says there is no minimal level. That it depends on the kind of query. That might be legit. There are also others that say, hey, we are going to take your query, we’re going to expand it by a certain factor, and we are going to give you everything inside of that whole envelope anyhow, that expanded envelope. So there was some discussion, but that is a perfectly legitimate research topic.

 Now this is not a list of all possible topics. This just happens to be the ones that came up the most. How do we then go about with the collaboration on here?  Well, we only have two kinds of collaboration, because right now we are talking about two particular issues. One is to use an OGC type process to develop the content and service components, parameters, etc. The minimal content and/or a reasonable extended content, and this (the ADL Gazetteer Content Standard) is a good start, and the service interface. The OGC type procedure is an interactive type procedure. It involves a lot of people who are already in the business sitting down and prototyping and carrying it on from there.

We thought, perhaps, there might be at least four communities of people that would be willing to get involved in this type of OGC activity.

Then the other topic was, how do we push the research issue of coming up with feature types? One suggestion was to start with the SDTS, Spatial Data Transfer Standard, Part II, as a starting point for feature types, and to work with people who are already involved, who already have a list of feature types in their own gazetteers. And put up a matrix, if you will, and start comparing and seeing what is a minimal set. What differences are there? What can we live with? What can we negotiate, what do we have to change?  But this is a perfectly legitimate research topic, and this is one that has to be solved before you can go on.

At that point we sort of ended. But there is still one open issue. If you are going to do away with gazetteers, we still have to name it something. Some suggested replacements for "gazetteer" are: nom de loc, digitteer, geoindex, repertoire, and then we ran out. With that, that’s the end of my talk.

Mike Goodchild -  Any questions, any further comments from Group 2?

Doug - "Geographic locator" is a possible name.