Next: Sierra Nevada Ecosystem
Up: National Imagery and
Previous: O2
Two major sets of interactions with the San Diego Supercomputer
Center (SDSC) have occurred over the past year, apart
from the member of Reagan Moore on the Advisory Board of ADL.
These activities include
(1) the cloning of ADL at SDSC as a first step in moving towards
a distributed version of ADL;
(2) the organization of working groups on metadata and
scientific dataset collections, as a first step
in moving towards the support of scientific dataset collections
in ADL;
and (3) participation in a proposal for the renewal of
SDSC that involves a significant DL functionality.
The Cloning of ADL at SDSC
ADL and SDSC have completed an agreement whereby SDSC will eventually
maintain a complete mirror of the ADL catalog and holdings. The mirror
will actually be a proper superset, in that SDSC will add metadata and
storage for their own collections. SDSC's unrivaled Internet
connectivity will allow it to function as an alternative ADL server, and
its immense tertiary storage capacity will serve as the default backup
for ADL holdings.
To date, we have set up a mirror ADL catalog site at SDSC, and populated
it with California-only metadata subset. This initial experiment has
been a complete success; it is now possible to retarget the WP user
interface to the SDSC catalog with no impact on the functioning of the
WP.
Working Groups to Develop Metadata Standards
and Collection Strategies for Scientific Datasets
Proposal to Coordinate the Activities of Four DLI Partners in
a National Partnership for Advanced Computational Infrastructure
ADL has joined a large and potentially important partnership,
led by the San Diego Supercomputer Center,
that is focused on writing a proposal
for a National Partnership for Advanced Computational Infrastructure.
In particular, ADL is coordinating a set of DL activities
that will play an important role in the proposed infrastructure.
Four of the DLI projects are involved in this coordinated
set of activities, including University of California at Berkeley,
University of California at Santa Barbara,
University of Michigan, and Stanford University.
The success of the proposal would have a very
important impact on the DLI projects. Hence it is of value to
summarize the main DL activities that are being proposed.
The proposed activities for the first year relating to DL support
for data-handling environments in general and for a DL for
Earth Systems Science in particular will include:
-
the design, implementation, and initial testing of a user interface to
provide Internet,
and particularly World Wide Web (WWW),
access to the DL components of the NPACI.
The design will initially reflect the needs of scientific investigators
involved in the development of the infrastructure. This work will be based
on the existing DL technology developed at UCSB and UCB and will centered
at UCSB.
-
The design, implementation, and initial testing of a meta-information
environment to support access to information from the area of Earth Systems
Science. This will involve: the extension of metadata models that have
already been developed for many classes of geo-referenced information to
scientific datasets; the development of a catalog component at UCSB that
supports modeling of users, user queries, user workspace requirements, and
matching of models of user queries and models of documents;
the development of and store data at SDSC. This work will be based on the
existing DL technology developed at UCSB and will centered at UCSB.
-
The design, implementation, and initial testing of work-centered
information services for scientific applications. This work will based on
the DL technology developed at UCB and will be centered at UCB.
-
The initial design, implementation, and testing of information access
protocols based on the CORBA specifications for distributed object
frameworks. This component will underly the interoperability aspects of
the DL services for NPACI. This work will be based on the existing DL
technology developed at at Stanford and will be centered at Stanford and
SDSC.
-
The construction of a limited set of collection items in the Earth
Systems Sciences, including examples of scientific datasets, images,
digitized maps, and digitized text documents. The construction of this
collection will involve the extraction of meta-information, the formatting
of the items as appropriate (e.g. formatting of image data into wavelets),
and the placement on secondary and tertiary storage devices at SDSC and
UCSB. This work will be based on technology developed at SDSC and UCSB and
will be centered at these two sites and at UCB. In particular, software
developers at UCSB will design and implement the system in collaboration
with a digital librarian at UCSB and a mass storage expert at SDSC.
Based on this initial development, there will be, at the end of the first
year, a limited testbed DL. Users will be able to employ the WWW in
searching and/or browsing for data in this testbed, using the catalog of
the Alexandria Digital Library at UCSB; to have their search extended to
other catalogs (such as that as UCB) on the basis of the technology
developed at Stanford; to download the datasets, or other documents, in
which they have an interest in from the mass storage system at SDSC; and to
employ some of the workspace services provided by the UCB developments.
The interface to the system will provide a ``transparent'' look and feel,
so that users do not need to be aware of the location of either the data or
the catalogue.
The proposed resources for these developments include
two software engineers and one digital librarian at UCSB;
a mass storage expert at SDSC;
a mass storage system installed at SDSC and available for experimentation
(less than 5 Terabytes of storage will be needed initially);
a system installed at UCSB to support the catalogue and software
development activities;
a system engineer at Stanford to coordinate application of distributed
object technology to the underlying the application;
and a system engineer at UCB to coordinate work-centered library aspects of
the application.
The second year of development will focus on moving towards a limited, but
true, operational phase for a DL facility supporting NPACI activities.
Many of the activities for the second year, relating to DL support for
data-handling environments in general and for a DL for Earth Systems
Science in particular, will involve extensions and continuations of the
activities commenced in the first year.
-
The design, implementation, and testing of the user interface(s) will be
extended to include a wider range of functionality, and to move the
interface towards a true ``workspace'' environment that supports a large
range of scientific modeling activities.
This work will be centered at both UCSB and UCB, with the integration of a
broader set of work-centered information services, such as the integration
of a variety of scientific modeling tools.
-
The design, implementation, and testing of the meta-information
environment to support
access to information from the area of Earth Systems Science will be
extended with a variety of both metadata types and query/document matching
services. Specific attention will be focused on providing meta-information
about procedures that may be applied
in the DL workspace. Meta-information will also be developed to indicate
``authentification'' and levels of ``approval'' for both declarative
information objects
and for procedural information objects. Software will be designed and
built that supports the automated extraction of meta-information. This
software will include format converters, automatic metadata extractors, and
indexing software. While it is anticipated that the final process will
involve some human effort, the intent is that large parts of the process
will be automated and that facilities will be provided to allow the
construction of self-describing information objects.
-
The continuing development of information access protocols based on the CORBA
specifications for distributed object frameworks. Proxies for a variety of
distributed DL services will be constructed within this framework. This
work will be be centered at Stanford and SDSC.
-
The construction of a reasonably full collection of items to support
research activities in the Earth Systems Sciences, including many classes
of datasets, images, digitized maps, and digitized text documents. This
construction will involve the acquisition of datasets and other information
objects from several different sources, the conversion of these objects
into formats suitable for inclusion in the mass storage system, and the
actual insertion of the data into the mass storage system with appropriate
references to the data from the catalogue. This activity will involve
interactions with a variety of groups operating data servers within the
NPACI framework, and employing the metadata models developed as part of the
DL catalog component. In loading the information objects into the
secondary and tertiary datastores at SDSC, the data may be too large to
transfer over the network, so the final process may involve putting the
processed data on tape and shipping it to the SDSC to insert into the mass
storage system. Developments will continue on providing computing support
for filtering of information objects at SDSC prior to their transfer to
interested users. This work will be centered on SDSC and UCB, but will
involve a variety of secondary sites.
Under the terms of the proposal, the end of the second year should
see the completion of an operational distributed DL, available
to researchers within the NPACI framework. This DL will be the kernel of a
full, operational DL that will exist at the end of the five-year period.
This DL facility is intended to provide support for ingesting data in several
formats, converting it to a format suitable for inclusion in the Alexandria
Digital Library, indexing it into the catalogue at UCSB and inserting the
data into the mass storage system at SDSC. It will also provide a variety
of work-centered services and support interoperability between different DL
operations, such as at UCSB, UCB, and Stanford, using a distributed object
framework.
Next: Sierra Nevada Ecosystem
Up: National Imagery and
Previous: O2
Terence R. Smith
Thu Feb 20 13:50:53 PST 1997