next up previous
Next: Sierra Nevada Ecosystem Up: National Imagery and Previous: O2

San Diego Supercomputer Center (SDSC)

Two major sets of interactions with the San Diego Supercomputer Center (SDSC) have occurred over the past year, apart from the member of Reagan Moore on the Advisory Board of ADL. These activities include (1) the cloning of ADL at SDSC as a first step in moving towards a distributed version of ADL; (2) the organization of working groups on metadata and scientific dataset collections, as a first step in moving towards the support of scientific dataset collections in ADL; and (3) participation in a proposal for the renewal of SDSC that involves a significant DL functionality.

The Cloning of ADL at SDSC

ADL and SDSC have completed an agreement whereby SDSC will eventually maintain a complete mirror of the ADL catalog and holdings. The mirror will actually be a proper superset, in that SDSC will add metadata and storage for their own collections. SDSC's unrivaled Internet connectivity will allow it to function as an alternative ADL server, and its immense tertiary storage capacity will serve as the default backup for ADL holdings.

To date, we have set up a mirror ADL catalog site at SDSC, and populated it with California-only metadata subset. This initial experiment has been a complete success; it is now possible to retarget the WP user interface to the SDSC catalog with no impact on the functioning of the WP.

Working Groups to Develop Metadata Standards and Collection Strategies for Scientific Datasets

Proposal to Coordinate the Activities of Four DLI Partners in a National Partnership for Advanced Computational Infrastructure

ADL has joined a large and potentially important partnership, led by the San Diego Supercomputer Center, that is focused on writing a proposal for a National Partnership for Advanced Computational Infrastructure. In particular, ADL is coordinating a set of DL activities that will play an important role in the proposed infrastructure. Four of the DLI projects are involved in this coordinated set of activities, including University of California at Berkeley, University of California at Santa Barbara, University of Michigan, and Stanford University. The success of the proposal would have a very important impact on the DLI projects. Hence it is of value to summarize the main DL activities that are being proposed.

The proposed activities for the first year relating to DL support for data-handling environments in general and for a DL for Earth Systems Science in particular will include:

  1. the design, implementation, and initial testing of a user interface to provide Internet, and particularly World Wide Web (WWW), access to the DL components of the NPACI. The design will initially reflect the needs of scientific investigators involved in the development of the infrastructure. This work will be based on the existing DL technology developed at UCSB and UCB and will centered at UCSB.
  2. The design, implementation, and initial testing of a meta-information environment to support access to information from the area of Earth Systems Science. This will involve: the extension of metadata models that have already been developed for many classes of geo-referenced information to scientific datasets; the development of a catalog component at UCSB that supports modeling of users, user queries, user workspace requirements, and matching of models of user queries and models of documents; the development of and store data at SDSC. This work will be based on the existing DL technology developed at UCSB and will centered at UCSB.
  3. The design, implementation, and initial testing of work-centered information services for scientific applications. This work will based on the DL technology developed at UCB and will be centered at UCB.
  4. The initial design, implementation, and testing of information access protocols based on the CORBA specifications for distributed object frameworks. This component will underly the interoperability aspects of the DL services for NPACI. This work will be based on the existing DL technology developed at at Stanford and will be centered at Stanford and SDSC.
  5. The construction of a limited set of collection items in the Earth Systems Sciences, including examples of scientific datasets, images, digitized maps, and digitized text documents. The construction of this collection will involve the extraction of meta-information, the formatting of the items as appropriate (e.g. formatting of image data into wavelets), and the placement on secondary and tertiary storage devices at SDSC and UCSB. This work will be based on technology developed at SDSC and UCSB and will be centered at these two sites and at UCB. In particular, software developers at UCSB will design and implement the system in collaboration with a digital librarian at UCSB and a mass storage expert at SDSC.

Based on this initial development, there will be, at the end of the first year, a limited testbed DL. Users will be able to employ the WWW in searching and/or browsing for data in this testbed, using the catalog of the Alexandria Digital Library at UCSB; to have their search extended to other catalogs (such as that as UCB) on the basis of the technology developed at Stanford; to download the datasets, or other documents, in which they have an interest in from the mass storage system at SDSC; and to employ some of the workspace services provided by the UCB developments. The interface to the system will provide a ``transparent'' look and feel, so that users do not need to be aware of the location of either the data or the catalogue. The proposed resources for these developments include two software engineers and one digital librarian at UCSB; a mass storage expert at SDSC; a mass storage system installed at SDSC and available for experimentation (less than 5 Terabytes of storage will be needed initially); a system installed at UCSB to support the catalogue and software development activities; a system engineer at Stanford to coordinate application of distributed object technology to the underlying the application; and a system engineer at UCB to coordinate work-centered library aspects of the application.

The second year of development will focus on moving towards a limited, but true, operational phase for a DL facility supporting NPACI activities. Many of the activities for the second year, relating to DL support for data-handling environments in general and for a DL for Earth Systems Science in particular, will involve extensions and continuations of the activities commenced in the first year.

  1. The design, implementation, and testing of the user interface(s) will be extended to include a wider range of functionality, and to move the interface towards a true ``workspace'' environment that supports a large range of scientific modeling activities. This work will be centered at both UCSB and UCB, with the integration of a broader set of work-centered information services, such as the integration of a variety of scientific modeling tools.
  2. The design, implementation, and testing of the meta-information environment to support access to information from the area of Earth Systems Science will be extended with a variety of both metadata types and query/document matching services. Specific attention will be focused on providing meta-information about procedures that may be applied in the DL workspace. Meta-information will also be developed to indicate ``authentification'' and levels of ``approval'' for both declarative information objects and for procedural information objects. Software will be designed and built that supports the automated extraction of meta-information. This software will include format converters, automatic metadata extractors, and indexing software. While it is anticipated that the final process will involve some human effort, the intent is that large parts of the process will be automated and that facilities will be provided to allow the construction of self-describing information objects.
  3. The continuing development of information access protocols based on the CORBA specifications for distributed object frameworks. Proxies for a variety of distributed DL services will be constructed within this framework. This work will be be centered at Stanford and SDSC.
  4. The construction of a reasonably full collection of items to support research activities in the Earth Systems Sciences, including many classes of datasets, images, digitized maps, and digitized text documents. This construction will involve the acquisition of datasets and other information objects from several different sources, the conversion of these objects into formats suitable for inclusion in the mass storage system, and the actual insertion of the data into the mass storage system with appropriate references to the data from the catalogue. This activity will involve interactions with a variety of groups operating data servers within the NPACI framework, and employing the metadata models developed as part of the DL catalog component. In loading the information objects into the secondary and tertiary datastores at SDSC, the data may be too large to transfer over the network, so the final process may involve putting the processed data on tape and shipping it to the SDSC to insert into the mass storage system. Developments will continue on providing computing support for filtering of information objects at SDSC prior to their transfer to interested users. This work will be centered on SDSC and UCB, but will involve a variety of secondary sites.

Under the terms of the proposal, the end of the second year should see the completion of an operational distributed DL, available to researchers within the NPACI framework. This DL will be the kernel of a full, operational DL that will exist at the end of the five-year period. This DL facility is intended to provide support for ingesting data in several formats, converting it to a format suitable for inclusion in the Alexandria Digital Library, indexing it into the catalogue at UCSB and inserting the data into the mass storage system at SDSC. It will also provide a variety of work-centered services and support interoperability between different DL operations, such as at UCSB, UCB, and Stanford, using a distributed object framework.



next up previous
Next: Sierra Nevada Ecosystem Up: National Imagery and Previous: O2



Terence R. Smith
Thu Feb 20 13:50:53 PST 1997