I'm going to offer a few comments at this point on some of the issues, but I'm very much aware that this is still the beginning of the workshop and this is very much a personal list. I hope to stimulate ideas about some of the other issues that are in your own minds.
Let me start with a few comments about needs or motivation. I think it's certainly clear to me that there is a tremendous need for what we are about to do in the next couple of days. Geographic information is increasingly important in all walks of life, and its expression in the particular form of gazetteers is also increasingly important. Let me explore that theme for just a moment.
First, what is geographic information? I like to think of geographic information as a linkage between a place, perhaps a time, and a property. How that place is specified, how we specify geographic location, is of course the primary focus here, because there are many, many different ways of specifying location. So if I were to say, for example, that it is hot today in Santa Barbara, I have linked a property "hot" with a place "Santa Barbara", but I've done it using a fairly informal expression of location "Santa Barbara", which we will have to translate into more precise and more formal terms for certain purposes, depending on what we want to do with it. If I say that the White House is across the Mall, then I've used an expression of place "across the Mall" which is even more difficult, even more informal and imprecise, and I've raised an issue about how we translate that into something precise for various purposes.
So there are very many ways of referring to location - Linda has already used the distinction between direct and indirect georeferences. We usually think of direct georeferences as being coordinate based, latitude and longitude, and hopefully precise, versus indirect, which might be placename geography, it might be street addresses, it might be casual terms like "downtown" which have very fuzzy representations on the Earth. So we have a variety of ways of referring to place and therefore linking places to properties, which is what geographic information is all about.
Time in this context is increasingly important, and certainly GIS has often been accused of ignoring time. I think time is something that we should very definitely consider in the next couple of days. Time is relevant to phenomena that change on the Earth's surface, place names that change their meaning, etc. I think time is going to be a tremendously important dimension.
Many applications of geographic data are in the form of queries like "where is some property z?" The way we specify "where" raises all kinds of questions about placename geography and about coordinate systems, and that's in the guts of gazetteers. Or, if we reverse the question and say "what is at x?", then again how we specify x is critical, and its relevance to gazetteers is obviously critical also.
These kinds of queries are, of course, increasingly important in emergencies and resource use and navigation. I will talk about some of the implications of emergencies later. Another tremendously important aspect of geographic information is its ability to link two properties at one place. So if I take two of these linkages, a link between a place and property-1 and a linkage between the same place and property-2, then I'm able to link those two properties. This is the "geographic spike" that we talk about in GIS. The ability to link phenomena and information based on common location is a tremendously important concept in growing applications of geographic information.
Here's an extract from the digital ortho-photograph (DOQ) of downtown Boston. I want to use it to make a couple of points about how we organize geographic information, and how geographic information combines what is in the system with what is in the head. To make effective use of this image I would have to combine it with what I know about Boston, because the image itself tells me nothing. It says that in a particular spot there is dark gray and in another spot there is light gray, but it's my head that allows me to identify downtown Boston in that image because I know the gazetteer - the gazetteer is in my head, in this case. It's me and my head that allow me to identify Commonwealth Avenue or Massachusetts Avenue or MIT or any other feature on the map of Boston. The point here is a coupling that goes on between the geographic information that is resident in my head - my general geographic knowledge, which is very often gazetteer-type information - and the information that is present in the database. I think the only way to see contemporary applications of GIS is in that sense of coupling between the database and the human mind. It's virtually impossible to do anything useful with GIS without something in the human mind. Someone who knows nothing about geography has a great deal of difficulty doing useful things with GIS.
The other point is the way in which we organize information, and the fact that this DOQ comes from a server. It's the MIT server, and that server serves digital ortho-photos. So our methods for organizing geographic data emphasize the horizontal. They emphasize being able to go to one place and extract two DOQs. The reverse, which is to go to one place and extract two different kinds of information about the same place, is very poorly supported. Our production arrangements, in fact, work the opposite way from GIS. We make a great deal of fuss in GIS about the ability to layer the world and to combine layers of information. But our production arrangements, in fact, work quite the opposite way. It's very difficult to combine information vertically, when you have to go to one site to find a DOQ and another site to find another kind of information and to another site to find another kind of information. So I think it is tremendously important that gazetteers have a great role to play in reorganizing the production and arrangement of geographic information, so that it's much easier to link information vertically than horizontally. I think it's much more important for many applications to be able to find two kinds of information about the same place than to find the same kind of information about two places. It seems to me to be one theme that we should emphasize here. We can make the same point about the globe. I need to be able to recognize South Africa in order to make much sense of it.
Here, just to illustrate this point, is a rather nice site that Dan Gustavson at Montana State worked up. All he has done is to illustrate what could be done with vertical integration and multiple placename geographies. This is the US, and you can see that it has a digital elevation model, hydrography, and state boundaries. There are several datasets there, although I don't know that, of course - to me they seem to be just one dataset. On this website, which is called the Graphical Locator and is very easy to find on the web, I can go into any particular part of the United States - here's downtown Santa Barbara - and I can point to any location and get "That location is …" There are many placename geographies of that location: the hydrographic unit number - the hydrographic placename geography - has been integrated. Topographic quad geography has been integrated. The geographic names file has been integrated as well. To me as a user this is just one dataset. It's not a separate dataset of elevation data, hydrographic data, what have you; it's fully integrated and it appears to me as a vertically integrated dataset. Legal description is integrated also - township and range, UTM zone number, etc. It's a nice illustration, I think, of the difference between horizontal organization of information and vertical, driven by multiple placename geographies which are in this example superimposed.
Let me just make a couple more points about the many ways of specifying location. I want to emphasize that I think one of the things we should be doing here is integrating the vague and the precise. We as humans work in a vague world. We work in a world with words like "hot" and "cold", which have very vague meanings, and our vernacular geography is also vague. I come from Santa Barbara, and I don't worry about the precise definition of the footprint of Santa Barbara. In fact, I don't come from the municipality of Santa Barbara, if you wanted to be fussy about it, but you know what I mean when I say that I come from Santa Barbara. On the other hand, our GISs are, of course, exact and "scientific"; they are in a very different, precise world. In many ways a gazetteer provides the linkage between those two worlds. It provides the linkage between the vernacular terms we use to talk about the world and the precise coordinate systems we use when we need to be exact.
There is an issue here, of course, of scale. When location is specified, scale often provides a great deal of context. So when I say " Santa Barbara" at a global scale, you attach a different meaning to that than if I use the term at a local scale. "Santa Barbara" at a local scale probably means the municipal boundaries; "Santa Barbara" at a global scale probably means something much more vague - some blot on the map of Southern California.
Also, application domains are important, and many of our placename geographies are application dependent. So the placename geography of surface hydrography is different from the placename geography of mail delivery. One of the issues I think for this meeting ,then, is how to integrate these application domains, how to make it possible to talk from the placename geography of mail delivery to the placename geography of hydrography.
Interoperability is, I think, a tremendously important issue because one specification of location needs to map to another specification of location. That mapping can be from one domain to another, from hydrology to mail delivery; it can be from one scale to another, from a local scale to a global scale, and it can be across a number of divides, from one technology to another, for example. If we think about the traditional gazetteer in this context, the traditional gazetteer provides a mapping of place names to coordinates; from a world that is somewhat vague, in the sense that place names are vague, to the precise world of coordinates, and the traditional gazetteer does it typically at global scales. Most of our gazetteers work at the atlas, global scale rather than at the very local scale. How do we generalize that idea, and what is that likely to say about the future of gazetteers? Let me suggest the following: in the future, gazetteers should map between all specifications of location. Not just between the vague and the precise, not just between place names and coordinates. They should be considered as mappings between all specifications of location. That means in both directions. Not only from place names to coordinates, but also from coordinates to place names. Not only from vernacular to precise, but also from vernacular to vernacular, if you like, or from precise to precise. That means, I think, that if one wants to combine vague and precise, then one has to deal explicitly with accuracy. The accuracy of a placename specification has to be an explicit part of the future gazetteer. We need to know how vague "vague" is. We need to know the vagueness involved with the place name "Santa Barbara," for example.
I think future gazetteers should support time, and that means supporting multiple ways of naming which vary from time to time, or the effects of time, for example, in transferring something that is vague to something that is precise. That has happened repeatedly in the past where we've changed from a vague specification - the same word has taken on precise meaning. Place names, of course, vary through time as definitions change.
I think the future gazetteer needs to be customizable, because it needs to be able to work in different application domains, and different application domains will have their own gazetteers. I've already noted that the hydrographic world, for example, has its own gazetteer. We may need to consider personal, departmental, and corporate gazetteers. What is a gazetteer to an oil and gas corporation? And should that be supported along with a gazetteer for society as a whole? Gazetteers may need to be regional, they may need to be state, and they may need to be national. Again, as we consider scale varying from local to global, the concept of a gazetteer at a county level may need to be integrated with the concept of a gazetteer at a national level.
Then finally, and this is a point that comes out very clearly in the Distributed Geolibraries report that has already been referred to, I think gazetteers defined this way are a major component of spatial data infrastructure. If you have followed the history of the National Spatial Data Infrastructure as a concept, starting with the work of the Federal Geographic Data Committee with an executive order in 1994 and a variety of developments since then, one of the things that has been conspicuously missing from NSDI, certainly to my mind, has been a gazetteer. There are seven datasets in the National Spatial Data Infrastructure, and they are very much the layered view of the world, in which each of those seven is a separate perspective on the world. The one thing that is conspicuously missing is a gazetteer. There is no recognition within the National Spatial Data Infrastructure of the importance of this particular geographic dataset.
Finally, let me raise some issues. As I said earlier, this is a very personal list and my purpose in doing it is to encourage you to find the gaps and suggest areas that I have ignored.
First, we need to make a common specification. So what does a specification of a gazetteer look like? Does it include time? Should it include feature type? There are a variety of issues that come up as soon as one starts to explore the question of a standard - a common specification for gazetteers.
Secondly, a number of issues are related to accuracy and uncertainty. If we are to translate between the vague world and the precise world, some attention to this is essential. It comes down to how do you measure the uncertainty associated with a place name? How do you represent it in a system in digital form? How do you visualize it? What does a map of vague place names look like, if that map explicitly portrays the uncertainty associated with place names?
Production arrangements: a series of questions about how gazetteers are produced; how we can make that production more integrated, more effective in this future world of gazetteers?
A series of questions about system support: what are the appropriate database models for gazetteers? What are the appropriate indexing schemes to make gazetteers work efficiently?
Questions about application domains: are we fully cognizant of all of the potential application domains of future gazetteers?
Questions of uniqueness: within domains, there are obvious instances where place names are non-unique, and there are instances of place names that are non-unique within geographic domains.
There are a number of questions about ambiguity. At Santa Barbara we have a project looking at the communication of location in the context of intelligent transport systems. The classic example is: you are driving a Cadillac; it's involved in an accident; you, the unconscious driver, are unable to report the accident, but your Cadillac does so automatically using the OnStar system. It transmits a GPS location of the accident to an emergency management center. You would think that the transmitted GPS location is unambiguous. Unfortunately, it's not. It has an inaccuracy associated with it that may be as high as 100 meters; it may fail to transmit the vertical coordinate, so that it may be impossible to tell which ramp of the freeway you are on. The emergency management system will match the GPS coordinates to its own database, which may be off by as much as 50 meters. As a result, the accident appears to be on the wrong street. The emergency unit is dispatched to the wrong street. It takes 10 minutes for it to go around the freeway, etc., etc. So a major delay is introduced.
There are within ITS many proposals for alternative ways of reporting location. You can report location, for example, by street address. It has tremendous ambiguities associated with it. You can report by linear offset - I'm so many feet from this intersection. Again there are massive ambiguities. One of the things we are trying to do in an effort to be unambiguous about place is to digitally model what actually happens between humans, because what happens between humans is a process of negotiation. If my first specification of location isn't precise enough, the question comes back to me "What do you mean by…?" So for example, if I say that I'm involved in an accident in Santa Barbara, the response is "Whereabouts in Santa Barbara?" And the negotiation may continue for several rounds, as we try to clarify unambiguously where I am. Trying to do the same thing in a digital world is, I think, an important research question, because inevitably any of these methods of placename specification will be ambiguous to some degree: in fact, in many cases, ambiguous to a disastrous degree.
There are questions about distributed systems. How do we work a system of gazetteers in which the gazetteer information is distributed across many servers or across a network? Then, of course, there are questions of interoperability - how do we make gazetteer-1 interoperate with gazetteer-2?
So those are just a few questions and I'm sure there are others in your mind, and that's what we're here to do in the next two days.
Thanks for your attention.