It's nice to follow Allen because, as for many of us, the National Geographic was my introduction to maps, even as a very small boy. They used to come in the mail and we'd open those things up. It probably started my fascination with this. And I also want to thank Mike and Linda for organizing this. I think it is a great opportunity. There is a small core of people in this room that I have seen regularly in the last 5-10 years, worked with or been in meetings with, but there is a larger group that I haven't, and that's what is exciting about this opportunity - to have that chance to cross-fertilize.
I also need to make a qualification whenever I speak from a Museum perspective, because I am not a scientist. I am a librarian and I've been doing this for about 25 years. There are scientists in this room who will no doubt be able and willing to correct me and to provide the kind of perspective that I may not be able to provide.
We at the Museum of Natural History in New York are doing a large digital library project. It's something that I've thought about for a long time. We are focusing initially on one expedition that occurred between 1909 and 1915 in the Belgian Congo. The group of philanthropists in New York sent a couple of very young scientists to the Congo; they thought they were going to go for a couple of years and they ended up staying for six. The region that we are focusing on is up in the northeastern part of the Congo, in the Congo River Basin, and we have already begun in the process of the digital library project working on this pilot to plot out a lot of the locations where collections occurred as a part of this expedition. The expedition was called the Lang-Chapin Expedition. Our initial focus, which gives us a relatively well-defined testbed for developing what we want to develop, is in this area.
What we have here is a geological map. Some American geo-prospectors were in the Congo during the period that we are focusing on. We found some maps of theirs in a museum in Belgium. This is actually a very site-specific map. In our activity, we are very interested in this level of resolution. This, for example, is a map not from the Congo but one that our scientists created in Peru. If you look at this, you can see a very detailed depiction of the collecting effort that was occurring at this site in Peru. What we're attempting to do as part of our digital library project is to go back and with relatively high resolution start to depict what was actually happening on the ground in that 1909-1915 period.
This is what happens on the ground, among other things. This is a low-level mist net that is used for capturing bats and birds that are flying at the ground level. This is obviously a rain forest. It is a drift barrier with pit traps at the base of it. This is another way that specimens are captured in the wild. One of the things we are trying to do is find ways to depict these various types of efforts on the ground in a collecting setting.
That's what is collected among many, many other things.
That's actually one of Lang's field notes. In the field, scientists typically carry little diaries or log books and make notes in the field, so this is part of the raw material that we have that describes what was actually happening on the ground.
This is a transcription of that; we are experimenting with OCR and other techniques to try to get at the problem.
We think about 13,000 pages of formal scientific publication came out of that one expedition. Here is an example: "Tapeworms of the Rhinoceros".
So what is natural history? That was just an example of one instance of it. One possible definition is "a study of how organisms and cultures vary over time and space". That's a very high level and broad definition, but that's one way of talking about what it is we do, and it is carried out by the collection, description, comparison, and classification of specimens or artifacts. So that's actually what happens in these big museums. I ran across a paper not too long ago about how the banding of birds first started, and I think it really started in great detail in the 1950s. Prior to that, birds were identified with a shotgun, and nowadays we're able to do it with a variety of observational techniques.
Beyond the idea of collecting or observing, there is also the idea of what effort is exerted to observe or collect specimens, which has significance in its own right, because the inability to find or collect something in a locale - in other words, a null result - has significance in terms of perhaps defining the extinction of that organism or its relative distribution, rarity, etc.
So here are the numbers. We think an article in Nature last July suggested that there are 3 billion specimens in the world's natural history collections, and estimates that there might be 6,500 natural history museums holding such collections. This gives some sense of the order of the problem that we are dealing with. While we don't have exact numbers on this, it is fair to say that a very small percentage of this data is actually captured in electronic form. Then, of course, there is other data having to do with observations that are not included in this number, and then the collecting event data that I referred to earlier would be a separate piece of that.
I didn't mention it but one of the other hats I wear is as a member of the World Commission on Protected Areas of the IUCN - the World Conservation Union. I've been working with the IUCN for 7-8 years and I'm also co-chair of the Biodiversity Conservation Information System, which is an international consortium of major organizations involved in biodiversity. So not only do I have a focus on museums, but also on how this data can be applied in support of biodiversity research and conservation. Our data from the museum provides a unique source baseline information that is not available anywhere else. For one thing, this is vouchered data - they are specimens that you can look at and have some idea of what the thing really is, as opposed to any of the other varieties of data you can imagine. To give you an example: we did a study in Modoc County, California, where we had about 12,000 data points across disciplines and in several museums, and we found that one third of our data was before 1940 - so we can see that provides a really unique source of baseline information that may not be available from any other source.
Major support is available for biodiversity.The World Bank and the Global Environmental Facility are continuing to make major grants, usually on a national or regional scale, for biodiversity. For example, the Russians have a $27 million biodiversity grant. Pakistan was just rewarded a $10 million to focus on biodiversity in the Himalayas. There is money there, but to my knowledge very, very little museum data has been actually used for those purposes, which is a real lack in my view.
What about the museum environment? Museum departments and collections are usually defined by discipline, so that you have an anthropology collection, a botany collection, entomology, paleontology, and so forth. Very often they are highly decentralized, and they are seldom organized into multi-disciplinary projects. Usually these departments function with more or less autonomy. Data capture and processing typically occurs at the individual researcher or departmental level, as opposed to any higher level. So that's the environment in which we have been operating for a long time, probably 150-200 years.
I'd suggest that there is a core dataset which corresponds to most of the artifacts and specimens that we have. None of this is rocket science:
Typically, the problem with our placename data is that it is composed of alphanumeric strings, often no coordinates, fuzzy approximations using words like "about", "near", etc. Often place names are obsolete and no longer in existence, so we, along with a larger community of art historians, archeologists, and historians, have an interest in retrospective place names. I personally have a fair amount of experience having to dig those things out - going back into old maps. Typically, by the way, museum libraries, unlike most others in the world, carry all editions of maps. We carry every edition of a map that we can find.
Named features are often benchmarks. They are often a reference point to the point where something was collected. For example, a cardinal direction: north-northeast of Washington D.C. - that sort of thing. Often they are expressed in road miles, river miles, trail miles - three and a half miles downstream from …; a mile and a half on this trail; 3 highway miles from … - those kinds of things. Slope and aspect can be important; in botany it is often recorded. We are often dealing with bodies of water without greater specificity - sometimes that's an issue. Geologic features or formations - in paleontology this is often important; like the Burgess Shale, for example. Township-range-section designations are used by some collections to express location.
Some suggestions as to what features gazetteers might need:
That's it for me. I would be happy to talk to anyone who's interested
and to answer questions when we get there.