Next: Loading of Datasets
Up: Collections
Previous: Collections
The loading strategy developed
for building the initial collections of ADL
to be made available over the WWW
involves the loading of both information
already in digital form and information
being digitized as part of the ADL project.
Conditions that materials already in digital form
should satisfy in order to be eligible for loading
into ADL include the following.
-
The materials should currently be in the collections
of the MIL or the new ADL clone at SDSC.
Since the long-term objectives of ADL
include the current objectives of the MIL,
this is a useful criterion.
-
In general, the materials should be unencumbered in terms
of intellectual property rights. Nevertheless,
we will attempt to load some materials that are
encumbered to demonstrate our ability to handle
intellectual property rights issues,
as in the case of items for which
the collection of fees is mandated
or where data has been licensed to UC,
as in the case of California SPOT imagery.
-
In terms of the choice of collection items,
we believe that, in order of importance,
the materials should initally be chosen to:
-
serve particular
target user groups within the context of ADL;
-
be of value as ``experimental'' datasets for ADL's research,
such as evaluating how users respond to digital items
of various classes or observing what
unexpected uses are made of geospatial information
when it is made available in digital form;
-
satisfy criteria for general use,
as long as ADL does not unnecessarily duplicate
other collection efforts.
In particular, the experimental datasets could be used
to exploit ADL's comparative advantage which
is the use of spatial footprints to make
surprising linkages among information items.
Conceptual examples of such datasets
are the Bible, the works of Shakespeare,
and information on palaeoenvironments.
-
the materials should ingestible into ADL
at relatively low cost.
We believe that an important initial focus
for ADL's collection should be information that supports
basic science, including the Earth and Social Sciences.
We have therefore begun the process of identifying important
datasets in certain areas of science, such as seismology
(see below.)
It is also clear that an important goal for ADL
is to support collections in a manner
that allows users to integrate diverse information
concerning specified geographic areas.
We have identified several candidate datasets that each satisfy at
least some these criteria.
These datasets, together with their appropriate sizes,
include:
-
items of value to specific communities, such as
-
Sierra Nevada Ecologic Project data: 3 GB;
-
Mojave Ecologic Project data: 4 GB
(this database is currently under construction,)
-
Experimental datasets, such as
- Two dates of scanned aerial photographs of Santa Barbara County
and the associated texture/color characterizations of features in the
photographs: 30 GB.
-
Generally useful items, such as
-
US Geological Survey Digital Raster Graphics
(topographic maps, production in process): 30 GB;
-
US Geological Survey Digital Elevation Models
at 3 arc seconds and 1 arc second for California: 17 GB;
-
US Bureau of the Census TIGER and STF files (in vector format); 10 GB
-
US Geological Survey Digital Line Graphs: 10 GB;
-
ETOPO5: 0.6 GB;
-
Digital Chart of the World (DCW): 2.5 GB;
These datasets have an aggregate volume of approximately 100GB.
In addition to the items in the preceding loading plan,
the UIE team at UCSB has been developing
a desired loading list for items,
that was developed in discussions with three classes
of ``target'' user groups. These target
user groups (TUGs) included panels of
earth scientists, information intermediaries
(including librarians), and education specialists.
The collections identified in some cases
overlap those in the current ADL loading plan.
Others will have to be considered as resources become available.
The questions to which the user groups responded are discussed below.
We present a small sampling of the materials that were
indicated as useful by these groups.
-
-
Materials of general interest
-
APSRS (Metadata base);
-
extended Gazetteer (extraterrestrial items?);
-
external sources of data such as the GeoRef thesaurus;
-
spatially indexed text, such as geoindexing of MELVYL catalog;
-
user-definable content / local library group/user contributed data;
-
the ability to index/query other sites
-
travel information or catalog links to existing sites
such as Excite and Yahoo;
-
content to support full GIS functionality online;
-
National Geographic/Britannica (and other packaged sets of
information suitable for classroom use).
-
-
Materials of interest to an Earth Science target user group
-
Digital Orthoquadphoto Maps (DOQs);
-
CIA maps;
-
Soils;
-
Air photos - historical;
-
High-level index to regional holdings; e.g. National Spatial Data
Inventory (NSDI), CERES, ...;
-
Faults/seismic: coarse and fine;
-
Precipitation data;
-
GeoRef metadata;
-
Land use and land cover maps (LULC);
-
Knowledge of and connection with other spatial data sites and projects.
Materials of interest to an information intermediary target user group
-
An interesting subset of MIL's aerial photography with catalog records;
-
Census statistics (US and foreign) ;
-
Historic maps (pre-1950?; pre-1900; world history;
local area (2,2). E.g. Times Atlas of World History) ;
-
A selection of historic texts (e.g. History of Mongolia);
-
Foreign country 1:200K/1:250K topos.
-
-
Materials of interest for an education target user group
-
"Here is what is available": synoptic view of ADL's holdings
specified in terms of geographic footprint;
-
Historical (dates): place name lineage;
-
Types of data: cultural (maps for race, religion, custom, dress, food,
etc. - with pointers to other sites when not in
ADL), historical, economics (industry, earnings, agriculture),
population (demographic information for Census);
-
distribution of income in city of SB; shading to show population
density, political, geology, land forms: roads,
-
local rivers, mountains, water source and consumption, flora and fauna ;
-
Directions: e.g. from Santa Barbara to Los Angeles; produce a road map ;
We note that a significant proportion of these materials
are currently not available in digital form,
and would have to have been digitized before they could
be loaded into ADL.
The digitization of a significant collection
of these materials would require a larger scanning
capability than currently exists within ADL.
In relation to production of digitized materials
and their ingest into ADL, for example,
the current daily rate of production of digital materials is
only 1 GB/day from the digitization of
airphotos and 0.5 GB/day from the downloading of AVHRR data,
giving a current collection growth rate of 1.5GB/day.
Nevertheless, we believe that an important component
of the ADL loading strategy in this regard
should be, at least initially, to digitize
only the most interesting subsets
of large analog collections.
Because much of the relevant material
is not currently in digital form
and because of our currently limited ability
to scan such materials,
the project has focused significant attention
to the issue of increasing the growth rate
of ADL's collections of maps and images
from an augmented scanning capability.
As part of this activity,
the project investigated the scanning of analog geo-spatial materials,
such as maps of various sizes and imagery in various formats.
The MIL does not currently support production level scanning
of spatial materials, and current scanning
technology consists of a small flatbed unit supported by
a PC workstation hooked to the MIL internal network.
A successful proposal was presented to NSF for supplementary funds
to investigate content-based, intelligent
searching of images based on texture patterns and visual thesaurus.
This subproject involved
ADL, the University of Arizona, and the University of Illinois.
Since digitized imagery was needed to support the research,
it was decided to use images scanned from the
MIL collection as part of the database.
The images did not have to be scanned at a preservation level
(approximately 5-10 microns), but at a 600dpi gray-scale level.
MIL committed its one scanning station (a JX-610 scanner
connected to a networked PC) for the 90-day experiment.
University of Arizona researcher Dr. Hsinchun Chen hired a local person
who had GIS and systems background to scan the images, write
computer-scripts for file management, determine geo-coordinate
information, and generate metadata. ArcInfo is used to generate
geo-coordinate footprints and ADLs Access metadata form is used for
cataloging.
The scanning started on January 1997 with procedures that were written
to include a set of scanning instructions for the black and white
National Aerial Photo Project imagery.
The following statistics characterize the first month
of the scanning operation:
-
each scanned black and white image file is approximately 32MB
when scanned at 600dpi;
-
total time for scanning and file management is
15 to 19 minutes per image, not including footprint generation;
-
extrapolating to an 8-hour work-day, approximately 1GB is generated;
-
extrapolating to a 12-hour work-day, approximately 1.5GB is generated.
One person could run two
scanning stations configured as above, in which
case, the output would double to about 3GBs per day.
Extrapolating these figures to an annual rate of production,
approximately 750 GBs of data could be scanned and loaded.
We note, however, that ADL currently has only 500GBs of available storage
(375GBs of disk and 100GBs of optical.)
Hence if two scanning stations were operated at 12 hours a day,
it would only take about six months to fill the existing storage array.
Excess full-resolution data could be stored to tape temporally with only
browse images and metadata made available through ADL.
The ADL project is currently planning
an augmentation of its digitizing capability
based on the preceding analysis.
Next: Loading of Datasets
Up: Collections
Previous: Collections
Terence R. Smith
Thu Feb 20 13:50:53 PST 1997