Roger said I was going to explain everything about the Board's activities with regard to foreign geographic names; that's not the case - that would take up the rest of the workshop and then some. I'll say a little bit about what the Board is doing with regard to foreign geographic names, but what I want to spend most of my time talking about this afternoon is the two topics that were associated with this particular part of the workshop; namely toponymic authorities as one topic, and issues associated with sharing data as a second topic.
To give you a little bit of background on myself, I've spent almost 21 years in the federal government as what I'm calling a toponymic regulator. I am associated with the BGN, and a BGN basic function is, as Roger has explained, to regulate for purposes of the federal government place name spellings for use in federal publications and databases. My side of the regulatory house is the foreign names side. So these are my confessions.
Let me talk a little bit about toponymic authorities. First, when you're dealing with foreign geographic names, the first question that comes to mind is - there must be, or is there some international authority for what we call places around the world, and the answer to that is yes and no. Roger has already talked a little bit about the United Nations and the role that it plays in standardizing geographic names spellings. The UN does not prescribe geographic names, it does not tell you what you call places around the world, not even at the level of country names, although it does issue lists of terminology that are helpful in that regard. Rather the UN's role, as it has been constructed over the years, is to promote programs of national geographic names standardization to benefit both individual countries and the creation of their internal infrastructures, as well as to promote ease of communication internationally.
The UN holds conferences on geographic names standardization at five year intervals, mostly recently in 1998 in New York and, as Roger mentioned, the executive agent for the UN, so to speak, in carrying out its promotional activities is the UN Group of Experts of Geographical Names. They meet every two years. The next meeting will be in January 2000 in New York, if New York is functional in January 2000.
Now as far as international authorities are concerned, yes, there are authorities in two other cases that are actual authorities that you can go to and get fairly authoritative rulings for names of certain kinds of places. In the case of high seas features and to a certain extent undersea features, you can go to the International Hydrographic Organization in Monaco and they can advise you on what you should call high seas features, such as oceans, significant seas, international straits and so on. They issue a publication called Special Publication No. 23: Limits in the Seas, that identifies for purposes of maritime safety, so that everybody understands when you talk about the Java Sea, this is the limit of the Java Sea and the name of it. That kind of information is included in Special Pub 23, and what they do is reach an international consensus on what these bodies of water should be called.
We've already talked a little bit about what the International Astronomical Union does. They have a working group on nomenclature that deals with naming features on the various planets, asteroids and other bodies, and they have elaborated fairly detailed rules on how you go about proposing names for these features and the avenues that you go through to finally get approval.
So, internationally, there's a kind of a spectrum of approval authority that you can refer to. If you go to the UN, you're going to get something that's more advisory in nature, promotional in the sense of - don't come to us for a specific spelling, you should be working at building a national names authority in your own country and, by the way, we can refer you to other national names authorities in other countries if you're interested in talking with them. The other end of the spectrum is more towards the IAU end, where they are prescriptive; they actually prescribe what you call a particular feature. Now, as with Roger's presentation, if you have questions or comments you want to make as I go along here, please feel free to speak up.
Now going down a level to national authorities, you've gotten a good introduction now to the activities of the U.S. Board on Geographic Names at the domestic level. Do such boards and authorities exist in other countries? Disappointingly, not to the extent that we on the Foreign Names side of the Board would like to see. If you go through the list of the approximately 195 independent states that are out there, you'll find that only about 15% of them actually have explicit geographic names authorities that would be analogous to our own U.S. Board on Geographic Names. By explicit authorities, I mean ones that have been instituted by a law in that country or a presidential decree of some sort, an authority that has explicit standing.
Another way of looking at it is to look at publications that have come out of the UN's Geographic Names Standardization Promotion programs. One of those programs is to promote nations to issue toponymic guidelines, basically publications that advise geographic names users on how to treat place names in that particular country. Again, you go through the list and you find that only about 15% of the countries out there have taken the trouble of compiling toponymic guidelines for their countries and issuing them through UN auspices.
Speaking from the foreign names side of the U.S. Board, this presents a real problem to us. Our business is to be the authority for the U.S. government on foreign place name spellings. Most of you, if not all of you, are probably familiar with our website, our database, which is based on the geographic names processing system at NIMA. If our job is to go out and collect, analyze and standardize foreign geographic names spellings, and only 15% of the countries out there have national names authorities, then consider the difficulties that we face in dealing with the toponomies of the other 85% of the countries that are out there. Our basic job is to attempt to ferret out information from whatever implicit authorities might exist in those countries, and there may be several within a particular country, there may be a national mapping agency that implicitly has some naming authority, there may be a census office that implicitly has some other aspect of naming authority and so on and so forth, each of those offices issuing their own publications, issuing their own databases, that frequently do not agree. The task that we face is coming up with policies and procedures for dealing with discrepant information issued by specific countries, and coming up to the best extent we can with single name spelling for these features in foreign countries.
The rest of the briefing is more touching on a few ideas regarding the topics for this part of the workshop, and one of those ideas is, what is explicit authority and what is implicit authority? If we look at the law that put the U.S. Board in place, it gave to the Board certain explicit authorities having to do with principles, policies and procedures for the standardization of place names within the U.S., and to quote from the law itself: "the responsibility for developing uniformity in geographic nomenclature and orthography for use by the federal government". There were some questions during Roger's presentation that were getting at aspects of implicit authority, items that people look to the Board or look to a naming authority for, assuming that that authority is probably the appropriate authority for that kind of information. I mention a few examples here that may not be the first ones that come to mind.
Romanization: on the foreign side of the Board when we deal with standardization of geographic names in a country where a non-Roman writing system is either official or national or in predominant use or in occasional use, it's our obligation to develop a Romanization system that conveys that non-Roman writing system into the Roman alphabet that we would then use to convey geographic names spellings to the federal user community. That implicit authority is given to us not through the law that put the BGN into existence but simply as a matter of something that we had to do in order to carry out our mission. So, as such, we work with our British counterparts, the Permanent Committee on Geographical Names for British Official Use, to develop systematic and scientific (we hope) Romanization systems for the non-Roman writing systems that we encounter in processing foreign geographic names, and we've developed a little over thirty such systems. In 1994 we published the most recent edition of our Romanization guide, which describes in tabular form how to apply those systems. If you're interested in getting copies, just let me know and I'll be happy to send them to you.
Another aspect of authority that is implicit was brought up during Roger's presentation; that when we act as an authority to maintain a database of geographic names which by its very nature also has to include information about the named features, about their locations, we become an implicit authority on what is a feature. We become an implicit authority on where that feature is located officially. OK, we don't have that authority given to us explicitly in the law, but because we systematically collect and maintain that kind of information we are looked to as authorities in those areas.
The last implicit authority is an interesting one from the foreign perspective, and that's as a cultural gatekeeper. It's something that I don't think we see as much of here in the U.S. as we see in some other countries. By cultural gatekeeper, what I'm getting at is the development of standardization principle and policy that might go on in a foreign country that is multi-ethnic in nature, multi-lingual in nature, and that naming authority will inevitably reach conclusions about what language you can use when coming up with the name for a feature; what language is official, which ones are unofficial, what cultural aspects of society are to be considered when naming geographic features. This is an aspect of cultural geography that we see over and over again in dealing with foreign geographic names. You can cite any number of examples; the most obvious one in recent news would be, of course, the situation in the Balkans. We did a lot of work over the last year and then some to collect and analyze geographic names information in the former Yugoslavia, an obviously multi-ethnic and multi-lingual situation. The government in power in that area has issued laws that explicitly state what you can use the various languages for, how they apply in toponymy, and in that case the authorities in that area no longer have an implicit authority to be the cultural gatekeeper, they have an explicit authority to carry out that role.
Roger talked about the toponymic police. In the U.S. and in the federal government we don't get too hard on that, we try to educate people as best we can and point them to the standardized names information that they need to use in their documents and in their publications. That's not necessarily the case in other countries; I would suggest that it would be an interesting case study for someone to investigate the enforcement provisions for toponymy that you might find in some other countries, such as Sweden. This is an example that was related to me by a fellow named Hans Ringstamm who is a geographic names authority in Sweden ,and it involves a case where some people owned an estate that they named Winsta. Winsta is a dialectal form of a word in Swedish. It doesn't conform to the norms of current Swedish orthography; the correct form would be Vinstad, as you see here. There are provisions in the law that put the Swedish Geographic Names Authorities in place, whereby those Geographic Names Authorities can look at a situation where a community or group of people are attempting to use a geographic name that does not meet the regulations of the standard Swedish orthography, and they can litigate. We're a little bit easier on things here, but there are cases in other countries where the situation is much more serious.
A couple of words about gazetteers: there have been a lot of interesting statements made today, interesting to me, regarding what a gazetteer is. Working with the Board we are in the business of promulgating geographic names information, so we see ourselves as very much in the business of publishing gazetteers, and we do that. But there are two ways of looking at gazetteers from our standpoint. When we publish a gazetteer of foreign geographic names information under Board auspices, essentially what we are doing is publishing a code of regulations from our point of view, which is a very narrow point of view with our blinders on. That is the publication that you refer to as an employee of the federal government to find the correct name, i.e., the Board approved name for the feature you're referring to in your report or your publication.
I think that the user community has a much broader view of gazetteer, and that was reflected in a lot of the discussion earlier today, and that to my mind is reflected in looking at a gazetteer as an entry portal to a much wider range of geospatial information. The geographic name is the basic identifier that unlocks the door to a much wider realm of other types of geospatial information associated with that name or associated with that feature. That other information would include, by the way, all other place name spellings that might have been collected for that particular feature.
In both cases, though, as a blinkered producer of gazetteers and those with the wider point of view, receiving the information and expecting to be able to use a gazetteer to get at richer funds of information, what's implicit in both cases is that there's a sharing of information going on. OK, we've collected and analyzed as the national authority mandated to do so, we're going to share that with you, principally so that you have the right spelling to use, but if you want to use it for other purposes to get at other information, then that's fine too. So, in that context, what's a digital gazetteer? In my narrow point of view, a digital gazetteer is one of our old paper DMA BGN gazetteers that are on the shelves of libraries everywhere, converted into digital form, and maybe given to you on a floppy disk or a CD-ROM or a file ftp'd from a website or the result of a query to the Geonet Name server on the NIMA website.
If you go to the UN and look at the work of their group of experts, they have actually compiled a manual of toponymic terminology. They include two definitions of gazetteer: one for index gazetteer, the other for toponymic gazetteer. Index gazetteer is very simple, it's the kind of gazetteer or name listing that you find at the back of an atlas, that takes a named spelling and refers you to an atlas plate, and maybe a location on an atlas grid for where that feature's located in that book. All other kinds of gazetteers fall under the rubric of toponymic gazetteer, but these are the definitions that have been adopted by the UN group of experts on geographic names covering gazetteer qua gazetteer.
When you dig a little bit deeper into digital gazetteers, what you find are a couple of issues, and I won't talk about this very much because I think other people have addressed it earlier today probably a lot better than I could . I'll state only that, at the geographic name level, we're dealing with building some kind of conceptual data model for geographic name as an entity in itself. Another level we're dealing with is that conceptual data model of geographic name plugged into a much broader information architecture that is at the basis of some information community or specific information activity. The ones I've listed in quotes there that are discipline-specific; they are simply ones that I pulled out of some of the biographical sketches of the participants in this workshop as areas that they're interested in. So our challenge as collectors and purveyors of toponymic information is to be sure that we factor into the way we organize our geographic names information some kind of adaptivity. So that what we collect and the database is adaptive to perhaps other conceptual data models for geographic names, and most certainly adaptive to other information domains, so that we get back to the buzzword of interoperability that we've heard several times today.
Let me talk about a couple of issues about sharing data. One issue is certainly a content standard. Linda talked this morning about taking a stab at defining a content standard for digital geographic names information that really could be generalized into any kind of geographic names information. Another technical issue specifically with foreign geographic names - Roger told you the rule for place names in the U.S: the Roman alphabet. With foreign geographic names you have a fairly serious text-encoding issue: how do you deal with a consistent representation of multi-lingual and multi-scriptural geographic names information in a single data set? There's been an evolution in this regard, you know. Things started out very simply with 7-bit encoding that gave you 128 possible characters, which covered you for the 52 characters in the basic Roman alphabet. That's known as ASCII, as we're familiar with it. A little while later, that evolved into extended ASCII that gave you 256 possible characters, gave you a matrix of 256 code slots, where you could fill in a particular character beyond the 52 that you find in the English language Roman alphabet, so you started finding parts of that code table filled with characters that are required for French, for German, for Czech - you found parts of that code table filled with Russian Cyrillic, but they were filled by many different organizations and entities, so you ran into the problem with the 8-bit encoding of many different proprietary and industrial standards, such as IBM code pages that you may be familiar with. If you want to use that upper end of ASCII for Russian Cyrillic, then you point to code page xxx, if you want to use it for Greek it's code page yyyyy, and you needed to tell the user that that's what you were using so that they could display the data properly on their end.
There's been a lot of work at International standards for 8-bit encoding, principally the ISO standard 8859, that has upwards now of fifteen different sets of code tables with 256 entries, each covering a number of different languages and language groups. You've got national standards like the Russian standard for Cyrillic. Again you have a communication problem between the data provider and the data user - if you're passing foreign geographic names information, what text encoding standard did you use? Things have now gotten to the point where 16-bit encoding standards are actually beyond development; they're being put into practice. There are some proprietary and industrial standards that cover 16-bit standards, but what's most promising is ISO 10646, which is now an international standard for 16-bit text encoding. It gives you approximately 65,000 slots to insert characters, and you can take an application like good old PowerPoint 97 and with a few clicks of the keyboard implement that standard and come up with Russian Cyrillic characters that you can pass around to many different applications that are compliant with that standard. Yes, it covers Chinese - the Unicode standard has upwards of 32-33,000 code table elements representing Chinese Han characters. So that single standard will take you everywhere from the 52 characters in the English Roman alphabet all the way to one of the most complex writing systems that's out there.
Finally, a couple of words about data exchange. One of the things we do most frequently at NIMA to support our customers is, we cut CDs with user-specific geographic names data sets on them and we ship them off to customers who then take the data and load it into their databases. We wish they wouldn't do that, but they really haven't got much of an alternative at this point. We wish they wouldn't do it, not because we're not willing to share the data, we'll share it with anyone, but what it imposes on them is a fairly significant overhead cost in maintaining that. Our database like Roger's is changing daily. When we cut you a CD and we send it to you, you've got a snapshot in time, and when are you going to come back to us to ask for another CD over the same area? We're doing that with thousands of customers, they've got overhead in maintaining the data that they're trying to keep in their systems, we've got overhead in trying to keep up with providing them the CDs that they're asking for.
Is there a better idea? Well, we hope so, and an example that I'll use here is a project that we've been working on recently at NIMA that gets at some of the vertical integration of data that Mike was talking about first thing this morning. We have a product that we call "NIMA in a Box", and there have been a couple of newspaper articles about it. I can tell you more offline if you're interested, but very simply what we do is, we package up what we consider to be fairly static geospatial data onto a compact disk - static would be basic vector data of features that don't change very much, digital elevation data, things that we can rather safely put on a CD and not have to update for some period of time. Do we put names information on the CD? Yes, but a little bit. A little bit in the sense that we don't copy everything - all of the elements of data associated with the geographic name - and put it on the disk, because that's highly variable. What we do is, we put in some geographic names information and a URL. Where does the URL point? It doesn't point simply to our Geonet name server, but it points directly to the record in our database that links all of the information pertaining to that feature and to that geographic named entity. So if you're a user in the field with NIMA in a Box, and you're in a situation where you're doing mission planning over an area where there's a lot of, say, geopolitical activity going on that's leading to a lot of name changes, and you're not so certain that the name that you have on your CD is up-to-date, or you're not so sure where the source was that it came from, you can click the name and it will take you directly to our database (if you're networked), and you can pull up the latest information about that feature. I believe this has the opportunity to get at some fairly serious savings in overhead, where instead of copying data sets, loading them into different systems and then running into a maintenance problem, you have a direct link back to our database to all of the information that we can give you. Linda ?
Linda Hill - The data set we got from you which we were working with was on CD, and it was a snapshot, but it didn't include an identifier that we could use to do a direct link back to get the current version of that record. Do you publish a particular identifier that would be used for that kind of access?
Randy - We are beginning to publish two such identifiers. One would be the URL that I just spoke of, and we're beginning to be able to generate data sets that include that URL. The other is a static unique feature identifier that we have for each named feature in our database, so that you can manually, or perhaps semi-automatedly, insert that unique feature ID into the query screen on the Geonet name server and get an update.
Linda - We were able to do that for GNIS because that unique identifier came with the data set, but the data set that we got from you did not have that unique identifier in it.
Randy - We can get you a better data set, and anybody else who wants it. Last thing I'll mention is an issue that we find again with dealing with foreign geographic names information. We try to identify the authorities in other countries for geographic names information and work collaboratively with them to share data. There are a number of geographic names authorities in other countries, and not just geographic names authorities but also geospatial data authorities, who look at geospatial information as a commodity, not just because there are potential dollar signs behind it, but because of issues where judgements are being made regarding the intellectual property value of that information. We are finding this to be an obstacle in data exchange and data sharing. We may be able to enter into licensing agreements with these authorities whereby we can get the information, but we can't share it with anybody else. That doesn't support our mission, so this is an issue that we're trying to face, and a challenge for today and the future. That's all I'm going to say. Are there any questions for discussion? Linda?
Linda - I just want to make one comment, and that is the degree to which you are swayed by what we call in the library world " literary warrant" - that is, the way in which the public is referring to a place, and I have an example. When the newspapers were reporting on the earthquake that happened in Turkey, they referred to that place as Ismet. When I looked it up in your gazetteer, that was a variant name; the official name was another name, but that official name was not what everybody was using to refer to it. This must be somewhat of a problem for you for setting that name, and for people following the use of the official name. So my question is, to what degree are you swayed by popular uses, popular names?
Randy - We have a policy and a methodology for approving what we call conventional names, and these are the kind that you speak of, that are not official in the country involved but are in widespread current usage in the vernacular. We have a set of about 8-10 what I'll call popular geographic sources that we refer to in determining whether a particular vernacular name warrants the criteria of being in current and widespread usage. If it does, then the Board has the opportunity to approve that name as a conventional name that may be used in addition to or in lieu of the official local name for that feature. In the case of Ismet, I don't believe there was a conventional name approved for that feature. But we do have others, such as the one mentioned earlier today, Köln and Cologne - the Board approves both names for use, because Cologne is in such widespread usage. Yes?
Allen Hittelman - I'm with NOAA, and quite often we superimpose political boundaries on satellite images, and it's not uncommon for the State Department to chastise us for not using the politically correct boundary files. Is there a source in the government where the current politically correct boundary files exist?
Randy - This sounds like an implicit authority that we've just gotten, which is to deal with boundary information. I just so happen to be the guy at NIMA who has that responsibility. There is no single source of authoritative boundary information in terms of delineation that you can go to reliably and get what you're looking for. We can give you some delineations that we hope adhere to the guidance that's issued by the Office of the Geographer at the State Department, but we would have to do some further checks on that before we could vouch that. We are about to issue as a standard product for unlimited distribution a product that we're calling World Vector Shoreline Plus. It's not just shoreline, but that's the title of it. It also includes international boundaries at a scale equivalent to 1:1,000,000, and maritime claims, including claim baselines and the resultant territorial seas that derive from them. They have had some review in the case of the maritime limits from the Office of the Navy Judge Advocate General. It's not been a complete review; there's been some review of both the maritime boundaries and the land boundaries by the appropriate authorities at the State Department. But it's going to turn out to be a single data set that you can go to for consistently derived boundaries, hopefully in accordance with guidance again at a resolution equivalent to a scale of 1:1,000,000. Question?
[someone from the audience] - Aside from the commodity issue, do you incorporate all the 15% of the world that does have standardized names into your database, or should I continue to assume that I really ought to continue to use the database associated through the Ceonet server that Canada has?
Randy - The Board's policy on names in Canada is to refer users to the Canadian authorities. We have a Canada file in our database, it has somewhere in the neighborhood of 30,000 entries in it. It was built in the early 1950s, if I'm not mistaken. It is not where you want to go for authoritative Canadian place name information, you want to go to the CPCGN, and perhaps beyond that - Kathleen will explain it - to the provincial authorities. Mr. Payne?
Roger Payne - I'd like to, if I may, jump back to Linda 's question and your comment about conventional names. It might be useful for this group to know that, yes, we do have a policy on conventional names, but it dances around a particularly ticklish issue at the UN and that's endonyms versus exonyms. As it happens, most of the exonyms that are used throughout the world conventionally are English equivalents, and this is a topic that is constantly discussed at the UN and everybody's promised faithfully to use endonyms whenever possible. That is to say, instead of Rome, whenever possible we should use Roma and on and on. Now whether we'll use Nuevo York I don't know, but that's what the UN by resolution encourages, that endonyms should be used, not exonyms. Randy, you may want to say a little more about that?
Randy - Yes, what Roger says is true. With respect to specific decisions by the Board's foreign names committee, I think we're tending to see, and Betsy [Mangan] can correct me if she sees differently, I think we're tending to see a gradual increase in the number of conventional names that we're recognizing as official, and there are a variety of factors that enter into that. The principal factor, I believe, would be the geo-political changes that have occurred since 1989 whereby, if we followed strict policy and used local names only, we would have lost a conventional nomenclature in Eastern Europe and in the former Soviet Union that is extremely widespread. So for a lot of areas, a lot of features, a lot of political entities, we're retaining certain names that were familiar from Russian as conventional names.
[someone from the audience] - The same is true of India now.
Randy - Excellent point. With the rise in certain ethnic groups in India, where we're seeing changes in long-standing names of towns in India from what we had considered to be conventional norms to the local language forms.
[someone from the audience] - About the static unique feature identifiers, Roger's portion of the database, do you use those as well? We were talking about two types of identifiers for places, some kind of key or ID.
Randy - We have a unique identifier, yes.
[someone from the audience] - OK. Do your identifiers all work together; are they unique across the two?
Randy - That's an embarrassing question.
Roger - No, the answer is embarrassing. (Laughter)
Randy - Go ahead.
Roger - No, go ahead.
Randy - One would have thought that a professional organization of such long standing as the Board on Geographic Names would at some point in time have gotten its act together, and taken the domestic side of a database and the foreign side of a database and come up with a consistent conceptual data model for a geographic name, a content standard for the database, that was unified across the two areas that the Board is responsible for, so on and so forth, developed a common user interface that would take you either to domestic names or to foreign names or to undersea feature names, whatever you're looking for, but that hasn't happened. However, there are elements of the two databases that do overlap, and in the case of unique identifiers that is the case. They have their own set of identifiers, we have our own set of identifiers, there may be duplicates, I don't know, but …
Roger - It's not really relevant, but it would have happened this way if domestic and foreign had not been divided in 1961. Because they were divided, sent to two different agencies, they developed differently.
Randy - Anything else? We've got time now for some comments by some of the
other participants on the panel. Kathleen? Wayne?