Nancy Blair (USGS) - Traditional Library Applications of Digital Gazetteers

Please cite as

Profile

When we are talking about traditional libraries, we are, of course, talking about libraries that started out with books, and then some had maps - a lot didn’t. Then we started expanding to audio-visual materials. In our library, because we lend to educational institutions, we even catalog and barcode boxes of dinosaurs, and gold pans, and magnifying glasses and posters for schools. Then the Internet started, and now we catalog websites, we catalog publications online. We are trying to fit all of these things within the same context, so that you could conceivably search for a subject in erosion and pull up all kinds of materials in different formats, including the Internet or World Wide Web sites.

Traditionally, libraries had a form of classification because they dealt with people who walked into the library and wanted to know “where are the books on this subject?” And so you had classification systems. The most common one used in academic institutions is, of course, the Library of Congress classification system. Like most libraries, maps, geography, and history are sorted by region. For all the other subjects, the geographic part comes after the initial subject. So if you had an Atlas of New York State, you would give it this call number, which is unique to that book. No other book is going to have that call number. It is a unique location. Conceivably, the person walks in, and unless it has been mis-shelved or something, they’re going to be able to find that map. Now, with other subjects, the geographic part comes second. So you would have to go to the section on public health or public medicine, and if you wanted to find a book on public health in Havana, you would have to go to a call number like this. You could not walk to Cuba shelf area and find everything that is on Cuba.

Another common classification system is the Dewey Decimal, which public libraries use. Again, if you wanted history or geography, you go to a 900 number. If you want Somalia, you go to 960, and then you find the specific number for Somalia. If you want another subject, you have to use these decimal additions to the number. So, conceivably, if you had a book on handcrafts of Alexandria, the cataloger would give it a call number and then add a six-digit decimal number after it for the area. You would find the general history of Somalia in the 900s, but if you wanted tribal history, if you wanted legal history, if you wanted economics, you would have to go to other parts of the library. Now this is from days when people walked in a library, they went to a general area and they kind of searched around for what they wanted.

Nowadays, most of our customers are not on site. Within the Survey we have field offices, and we get inquiries from all over the world; so the shelf classification doesn’t work. In our library, because we always felt we were unique, we are one of the few that still have our own classification system. We have these kinds of numbers. As far as I know, we are the only ones that ever used it. I think there are a couple of misguided libraries in the Middle East that picked it up, because they thought we knew what we were doing. But again, in our system, geology is often arranged just by the area. Maps are arranged by the area. But then, if you want the hydrology of Puerto Rico, you have to go to that number and then sub-divide it.

All of these classifications had the problem that every time there was a political sub-division of the country, or somebody else took them over, or they changed their names, then you had to decide - are we going to leave all those books there, or are we going to move them. It was because you were dealing with physical objects. It was not easy to change.

Now, people who were map catalogers, and there are very few of them, always were interested in geographic coordinates. On library records there is a field where you can put in the coordinates described this way. On library records I can tell you that it appears only on maps, some atlases. Even if you had a handbook of the birds of California, it’s not going to have those geographic coordinates in the fields that it’s set up for.

Linda Hill - Are you going to say why that is so?  There’s a field where you can put coordinates, but they aren’t applied to anything but maps. Why is that?

Nancy - What reason do you have?

Linda - It is actually just cataloging practice, I believe.

[someone from the audience] -  It is also not generally searchable.

Linda - It is searchable, but isn’t it just cataloging practice?

Nancy - It is cataloging practice not to put it in. But with the advent of online systems and with searching on the Web, we would like to have this on the records of all types of materials.

One of the problems that librarians have is describing materials, producing records for them, so that the people coming in with inquiries can find them; linking the books or the maps with the people, or the websites with the people. So, very few records, if you search, have these geographic coordinates in them. One of the reasons is that librarians, and I don’t want to disparage my species, but very few of them understand coordinates. Most of the people we deal with who come into the library don’t understand coordinates. And you don’t want to give them a half hour lesson on how to find the coordinates of their location to do this. But it would be nice, with automatic searching, if we could go back and put this field in a lot of records. That is where a gazetteer would help. At least for a title such as a Handbook of the Birds of California, you could add the coordinates for California. You don’t have to go and pull that book and try to figure out what the coordinates are for each item.

This is a record that you would see online if you went to our web site. This field is marked "map scale", and it includes the coordinates. There is a number for that, and I will show you in a marked record where it is. Most online systems don’t search coordinates very well. Most people don’t really understand them. But if you could have a gazetteer where this is translated automatically, this would help considerably. In a traditional library record there is usually only one subject entry; for example, one starting with "Geology." You could not look up Monterey County in the card catalog and find the record, because the subject heading would be "Geology-Monterey County".  With computers you can search all of the words in the subject heading. But you see that, if you did a keyword search on title, you could pick up the quadrangle names of Monterey Peninsula which the person might use. But a person is very likely to walk into my library and search for Monterey Bay; and they would never pull up this record. If there was a way that with a gazetteer you could translate that into Monterey County, this is another way of searching. Or if looking for the cities in Monterey - oh, that’s Monterey County, so you could search that way. The translation would help considerably.

This is the MARC record. This is what the cataloger produces and sees. The Library of Congress developed the MARC record format for us. They took our traditional fields and the collation and the imprint and all the terms we used, and put them in specific fields. It also allowed for a lot of inputting of symbols that indicate something about the book which isn't displayed to the public, such as format, language, or even who did the cataloging. These are the added entries. Field 255 is where the scale and the coordinates go into this record. The subject field is 650.

These are some of the subject headings that the Library of Congress calls “Geographic Names”. These include seas, and stars, and hazardous waste places, and military installations, and all the other things that we have been talking about in terms of geographic names. Because in libraries we found long ago that, for the purpose of bringing things together that are about the same topic, the kind of random keywords that people use won’t work. You have to have a structured language in order to bring more things on the same subjects together. There are cataloging rules to guide how to enter geographic names. Now, the problem with online searching is, of course, you need a gazetteer eventually to translate what a person might use when they’re searching to what the librarian is going to use for the subject heading. A person who has traveled probably talks about Fujiyama, but the Library of Congress heading is Mt. Fuji. So there needs to be a way in which you don’t have to tell the person, "OK, don’t use the way it’s used in the country, use it the way the Library of Congress wants you to use it."  There needs to be translation of those terms in order that you can gather as much information as you can.

These are the kinds of subject headings that are used when you use place name subject headings. These are established headings; the secondary parts are called floating subdivisions. For example, you can use "history" as a subdivision under any place name. And then it is set up to show the more common periods of that history that are known for the country. It’s what people write books on.

Now, my problem is that I work in a geology library. You have somebody come in and they don’t care about all the cataloging rules, they don’t care about classifications. They are buying a house in an area in Cupertino, which is a fast growing area, (at least it was when this was published). They want a map on the faults and the geology, and they think they are going to find out everything they need to know from the map. Well, if they search for a map on Cupertino in our catalog - nowadays you’ll find a couple of other things - at that time they would not have found anything. Because the geological map that includes Cupertino does not have "Cupertino" as a subject heading. The indexer has picked up the ends of the map. The map is between Los Gatos and Los Altos Hills. The catalog record does not show what is in between: Cupertino, Saratoga, and Anderson Dam are all in between there. And those are things that people might be wanting. With this record there is no way the person can really find out about this; unless they search for everything in Santa Clara County, they wouldn’t find this.

When people would come into the library, I would tell them, "Okay, first you need to find the county, and then the 7.5' quadrangle for that area. I would have one of these little hand indexes of what the 15' quadrangle was for that area, and then if you know of any geographical term, like Monterey Bay, put that in too. They could have done that for this search for information about Cupertino and they’d still miss it. I knew as a librarian that this was the best map at the time for the geology of that area, and that it shows the fault line that was what everybody was worried about. I would say, "I know a map, let me get it for you." But they had to find someone like me who knew. If they encountered some other person and asked that question, they would not have found that map. If they were at a remote site and they didn’t happen to call, they would have never found this map. And yet it was the best map for the area. With a gazetteer and with coordinates, they should have been able to find it. Today we hope they would be able to find it. But the problem is that searching is so limited using traditional methods.

Again, this is the MARC record for that map. Again, the coordinates are not given on this...

(Break in text between tapes)

Betsy Mangan - As soon as you take a picture of that file, it is out of date. Things are changing. And that is one of the things I think a distributed gazetteer, where authoritative agencies maintain their authority files, and you can access them, is one thing. To copy those off and replicate them, so that they are therefore automatically out of date and inaccurate, is not the way to go.

[someone from the audience] - What is an authority file?

Nancy Blair - Authority file is when you establish the structure headings. Libraries traditionally had those so that they were using the same heading in the same way.

Betsy - We do authority files for personal names, corporate names, and natural features. We have both a name authority file and a subject authority file.

Nancy - I just want to add one thing about arrangement. It really depends on the library. So, even in our library, atlases have kind of wandered between subject and geographical arrangements.

Fred Broome (Census) - This is also by way of clarification. For the last day and a half, I have heard a number of references to the need to have coordinates as part of your gazetteer. I would perhaps speak the obvious, and that is: please be careful about coordinate boundaries that are beyond merely a point reference, because a number of things occur. First, such things as the Rocky Mountains - where is the outer edge of it?  So fuzzy boundaries is a topic of discussion. But the one that I am more familiar with is the change of political boundaries. There are some 20 county boundary changes in this nation every year. Those are not stable. There are thousands of place boundary changes every year. I know, because my division, the Geography Division, surveys this thing on almost an annual basis and finds this out. There are state boundary changes. Then, of course, when you get to such things as cultural definitions of boundaries, like the U.S. South or something of that nature, you have even more difficulty in trying to decide what is an authoritative boundary. I don’t believe you want gazetteers to have to be continuously updated with polygon boundaries in all cases. I am not suggesting you shouldn’t have them in some cases, I am just suggesting that you may want to temper any rush towards coordinates for everything with knowledge of the changing and the fuzziness of them.

Nancy - The only thing I can respond is that, of course, with an online system it is easier to change the boundaries when they do change.

Fred - Think of it in historical searches.