Annual Report Of The Alexandria Digital Earth Prototype Project

June 1, 2000

 

 

PROJECT PARTICIPANTS

 

The main participants in the project include faculty, staff, and students at the University of California at Santa Barbara (UCSB), University of California at Los Angeles (UCLA), San Diego Supercomputer Center, University of Georgia (UG), and Georgia Tech (GT).

 

UCSB Personnel

 

Personnel who have worked on the project at UCSB include the following faculty researchers:

 

           Terence R. Smith (Project Director)

           Michael F. Goodchild (Project Associate Director)

           Anurag Acharya (Computer Science)

           Divyakant Agrawal (Computer Science)

           Kevin Almeroth (Computer Science)

           Omer Egecioglu (Computer Science)

           Amr El Abbadi (Computer Science)

           James Frew (Bren School of Environmental Science)

           Richard Mayer (Psychology)

           Oscar Ibarra (Computer Science)

           Richard Kemmerer (Computer Science)

           Ambuj Singh (Computer Science)

           Jianwen Su (Computer Science)

           Yuan-Fang Want (Computer Science)

           Tao Yang (Computer Science).

 

These faculty are supported by approximately 12 graduate students. The project also has a staff of full- and part-time employees, including:

 

           Greg Janee (full-time,lead software engineer),

           Gary Monheit (full-time, visualization; client development),

           Linda Hill (part-time, gazetteers; thesauri; metadata; system evaluation),

           Qi Zheng (full-time, database and server development; left project Dec 1999),

           Ray Smith (part-time, executive associate director),

           Karen Andersen (full-time, project manager),

 

as well as other part-time programmers. The now-operational Alexandria Digital Library (ADL), whose services provide critical support for ADEPT, is supported by the staff of the Map and Imagery

Laboratory (MIL) of UCSB's Davidson Library, including:

 

           Larry Carver (MIL Director),

           Mary Larsgaard (Map Librarian),

           Catherine Masi (database management),

           Kim Philpot (system administration),

           Jason Simpson (system administration),

 

and other support staff. The MIL staff also play a central role in the development of ADEPT.

 

Other individuals and groups at UCSB who have been involved include instructors in Geography (Oliver Chadwick, Stuart Sweeney) whose, courses are being supported by ADEPT and members of UCSB's Instructional.Development Program (Stanley Nicholson and Rick Johnson).

 

UCLA Personnel

 

Faculty from UCLA who have worked on the project include:

 

            Christine Borgman (Director of UCLA activities,)

            Greg Leazer,

            Anne Gilliland-Swetland.

 

and 4 graduate students.

 

UG Personnel

 

Faculty from UG who have worked on the project include:

 

            Amit Sheth (Computer Science),

            Tarcisio De Souza Lima,

 

and associated graduate students.

 

GT Personnel

 

Faculty from GT who have worked on the project include:

 

             Nick Faust

 

and associated graduate students.  The Implementation Team interacted closely with Dr. Faust and is currently reviewing his multiresolution whole-Earth database and visualization technology for possible incorporation into the ADEPT testbed.

 

SDSC Personnel

 

Members of SDSC who have worked on the project include:

 

             Reagan Moore,

             Chatanya Baru,

             Richard Marciano,

             Bertram Ludaescher.

 

Two students who are supported by NSF REU grants also participated.

 

Other Organizations Involved As Partners

 

A close working partnership is with the following members of the so-called InterLib Project

 

             University of California at Berkeley (UCB) DLI-2 project

             (Robert Wilensky, Director)

 

             Stanford University DLI-2 project

             (Hector Garcia-Mollina, Director)

 

             California Digital Library (CDL)

             (John Ober, chief liaison).

 

With SDSC forming a fifth member of InterLib, the projects hold regular meetings and support close research and development activities. Two all-day meetings have been held during the first six months of the project that involved significant representation of all five InterLib team, the most recent being at UCSB on 22nd March during which agreement was reached on further InterLib interactions and an InterLib presentation at the DLI-2 meeting in England (June 12-13th). In terms of activities with specific InterLib partners, the Implementation Team interacted with Howard Foster from UCB's e-lib project and has set specific goals for collaboration, with UCB is retargeting some of their client applications to access ADL/ADEPT's library and gazetteer, and UCSB is working with UCB to expose a UCB collection to ADL/ADEPT. With respect to Stanford, a UCSB student is developing a server-side implementation for ADL/ADEPT of Stanford's Simple Digital Library Interoperability Protocol (SDLIP). With respect to SDSC, the Implementation Team has fully integrated. SDSC's Storage Resource Broker (SRB) into the ADL/ADEPT architecture. Discussions continue between the Team and SDSC on the possible uses of other SDSC systems and architectures (e.g.  XML-based databases). Finally, the ADEPT Implementation Team developed a Web-based browser according to a specification of CDL and participated on CDL's Technical Architecture and Services (TAS) Board.

 

An especially close relationship has been developed with the NSF-supported PAGE (Program for the Advancement of Geoscience Education) Project, which is developing a Geoscience Digital Library (GDL) at University Consortium for Atmospheric Research (UCAR), under the direction of Mary Marlino. A series of meetings at both sites led to a memorandum of understanding (MoU) between PAGE and ADEPT, and specific agreements for PAGE and the Implementation Team to jointly develop and evaluate testbed technologies. PAGE also helped locate hydrography resources for the ADEPT physical geography curriculum.  ADEPT and GDL will continue to meet on a regular basis to foster complementary technical developments and research.

 

ADEPT has developed a partnership with the University of Utrecht (Netherlands) under which the University of Utrecht gave the Implementation Team access to the source code for PCRASTER, a popular modeling and simulation software package which is being used to support an initial ADEPT geography curriculum.

 

Active partnerships with private corporations and government agencies involve Silicon Graphics (SGI), SRI, and Oracle. The Implementation Team met with developers from SGI to review their standard API for multiresolution whole-Earth browsing.  A joint proposal was submitted to NSF but was not successful. The Implementation Team has also interacted with researchers from SRI who developed SRI's TerraVision distributed, hierarchical, geographic visualization system.  An MoU was signed between SRI and ADEPT, giving the Implementation Team access to the TerraVision source code.

 

A further close relationship has been developed with the University of Aberdeen, Scotland, and with Mike Freeston (Professor of Computing Science) in particular. Dr. Freeston spends approximately 4-6 months each year with the ADEPT project, working on database and indexing issues.

 

In other partnering relations, the Implementation Team, the UCSB Graduate School of Education, and the Survivors of the Shoah Visual History Foundation (VHF) submitted a joint proposal to the US Department of Commerce to develop the VHF's video archive technology as an educational and digital library resource. The Implementation Team began a joint project with the Donald Bren School of Environmental Science and Management's Earth System Science Workbench (ESSW) project, which will make ESSW's extensive data holdings (historic and real-time satellite remote sensing) accessible as ADL/ADEPT collections.

 

The UCLA researchers have developed partnerships with UCLA's Office of Instructional Development (Larry Loehrer, Ruth Sabean, and Steve Rossen) and interacted closely with both UCLA's Department of Geography  (John Agnew, Marilyn Raphael, Antony Orme, and Larry Smith) and its Map library (David Deckelbaum). The UCLA team has also provided their evaluation plans to UCAR for their work on the GDL and are participating with the InterLib group.

 

Other Contacts

 

ADEPT has ongoing contacts with numerous other organizations, including ESRI, USGS, OGIS, NPS, and US Navy,

 

 

ACTIVITIES AND FINDINGS OF THE PROJECT

 

The activities and findings of the ADEPT project may be organized in terms of those of UCSB's research and development teams and those of the (subcontracting) partners at UCLA, SDSC, UGA, and GT.

 

Major Research And Development Activities And Findings

 

UCSB's Implementation Team Activities

 

Among the main development activities of the project, the Implementation team augmented the architecture of ADL so that it could serve as an engine for providing ADEPT services. In particular, the fourth generation of the ADL architecture was developed An implementation of the new architecture, entirely rewritten in Java using Java servlet technology, is running at UCSB and supports both ADEPT research and the California Digital Library.  The architecture is now a true three-tier model in which the middle tier (the "middleware") defines both client- and server-side interfaces.  The middleware's client-side interface now employs a generic, XML-based framework for describing metadata and metadata semantics; this framework closely follows the Resource Description Framework (RDF) proposed standard.  The services that comprise the client-side interface now closely follow the Simple Digital Library Interoperability Protocol (SDLIP), an InterLib-developed proposed standard.  In addition, the client-side interface now supports multiple transports, including HTTP and HTTP with XML/HTML mediation. The middleware's new server-side interface formalizes the connection between the middleware and collection servers.  A generic collection server has been developed for collections based on relational

databases with star-shaped schemata.  The middleware server itself now supports some additional capabilities such as configurable access control, result ranking, and result set caching.  Finally, a simple

client that runs entirely in a standard Web browser has been developed. This was built to specifications of the CDL (one of our InterLib partners), and is now in operational use and available for instructional purposes (see URL below).

 

Work has continued on extending and developing the ADL gazetteer as an ADEPT service.  The gazetteer now contains 4,099,540 entries that cover the world.  It includes a merging of the two U.S. federal gazetteers and other gazetteer sets (it is available as the website listed below). Work during this period has included converting more USGS GNIS gazetteer data and loading it into the ADL gazetteer. Problems that needed solving in order to do this included assigning ADL categories to GNIS data and identifying duplicate GNIS records for the same feature. Assigning ADL categories requires an analysis of the content of the placenames and mappings from GNIS categories to ADL categories. Duplicate detection requires an analysis of names, categories, and similarity of location coordinates for suspected duplicate sets.

 

Work has also begun on the specification of a gazetteer service protocol, working with Doug Nebert of the FGDC, and on an XML specification of the ADL Gazetteer Content Standard to support the export and import of gazetteer data. We are working with several other projects that would like to use the ADL gazetteer data to geospatially reference their collection objects (that currently have only placename referencing), to build local gazetteers, and to support the digitization of valuable historical gazetteer data.

 

L. Hill and M. Goodchild were co-PIs for an NSF-sponsored workshop on Digital Gazetteer Information Exchange (DGIE), hosted by the Smithsonian Institution in October. The workshop was planned by a group representing ADL , the National Geographic Society, the Smithsonian Institution, and a set of federal agencies including the U.S. Geological Survey, the National Imagery & Mapping Agency, and NASA. Participants came from a wide range of communities from U.S. federal and state governments, commercial and academic organizations, and international organizations. The goals of the two-day workshop were to (1) develop an understanding of  the potential of indirect spatial referencing of information resources through geographic names and (2) to identify the research and policy issues

associated with the development of digital gazetteer information exchange.  The highlights from the workshop include acknowledgement of the immediate opportunity and requirement to coordinate the building of shareable digital gazetteer data in the interest of digital earth applications; the importance of the temporal aspects of gazetteer data; and the need for a gazetteer service protocol to support distributed gazetteer services. Complete information about the workshop, the participants, the presentations, and the final report can be accessed at http://www.alexandria.ucsb.edu/gazetteer/dgie/DGIE_website/DGIE_homepage.htm.

 

Work has also begun on a Browse Service for ADEPT and ADL. The Browse Service has load and browse functionality for executing high-performance spatial searches on the metadata of a collection. Hiearchical tiles of data are stored in browse databases and queried via HTTP interface. Requests are made via XML requests and results are streamed back in binary records. The browse databases additionally store counts or aggregate information representing the density of metadata at any tile. Aggregate information helps the user determine where to search for data from a top level zooming into a bottom level tile. The Browse Service is accessed via any number of client programs. These clients (or GeoBrowsers) may be written in Java, Perl, C++, and may display information in2D or 3D.

 

UCSB's Research Team Activities

 

ADEPT Geoservices

 

ADEPT Geoservices focused on developing materials useful in learning environments, and in particular on dynamic simulation, and the problems of implementing simulation models within a digital library environment. PCRaster was acquired from the Netherlands, and we were successful in acquiring source code through a memorandum of understanding to share research results. We have acquired a high level of expertise in PCRaster and in developing novel applications. Current work is concentrating on developing PCRaster applications for the modules to be introduced into courses in the Spring and Fall quarters. We have successfully implemented the Clarke urban growth model, a form of cellular automaton, and believe that this is the first model of social processes to be implemented in PCRaster. We are also pursuing an ecological model of plant communities in a vernal pool complex, using PCRaster to model the annual hydrologic cycle; and developing a model of evapotranspiration over a large agricultural region. We have built a new WWW interface to PCRaster that allows simulations to be controlled and viewed through a standard browser.

 

At this time we are beginning work on metadata for models, or formal methods of description of simulation models of physical and social processes. These models will form the foundation for ADEPT's services for storing, sharing, accessing, and using simulation models. We are pursuing three approaches at

this time: 1) examination of the ways in which scientists describe models to each other, and development of methods that capture and formalize such information; 2) use of a standard language, such as the scripting language of PCRaster; 3) formalization of models as transformations between input and output data sets, which allows models to be characterized as sets of metadata records. We plan to hold a small workshop on the topic in the summer, at which we will bring together researchers from a wide range of backgrounds to share ideas.

 

Development of Course materials

 

Research and development was carried out on the discovery, form, and presentation of materials for initial ADEPT-supported course offerings in Introductory Physical Geography at UCSB and UCLA in the Spring Quarter of 2000. It was decided to organize the materials around the idea that an appropriate framework for both creating and presenting scientific and mathematical knowledge is an appropriate model for the notion of a concept. Adopting a model of concepts developed by T. Smith and associates in relation to computational modeling systems and a set of fundamental concepts, a series of lectures for the Geography courses and an associated set of presentation materials, particularly involving dynamic displays, has been developed.

 

The model of concepts employed involves: an abstract representation of the concept (i.e., concept names); a concrete representation of the concept that can be manipulated; a set of transformations that can be applied to the concept; and examples of the concept. The set of concepts developed include those covering the hydrological cycle and fluvial erosion and development. The discovery of appropriate presentations of different aspects of these concepts (e.g., transformations of a ``drainage basin'' that illustrate the effects of erosional processes) was aided by members of the PAGE Project.  The Implementation Team  played a leading role in developing a mockup of this ADEPT module  on fluvial processes, which includes animated models built from hydrographic data. In particular, they prototyped an Information Landscape (Iscape) for ADEPT. The approach was to collect geographical data files and geographical modeling software and to simulate a hydrological model of the state of Nebraska for a 12 month period. The input data files were run with the geographical modeling software to produce multiple output files (images and animations). Metadata was collected on the input and output files and on the model itself. The entire Iscape is visually presented using a Web browser. We have documented the steps undertaken to collect the data, organize it, and present it. Software utilities were written to automate

the process of running the model and producing visual images that are Internet-ready.

 

The Implementation Team also experimented with using thesaurus software to build a concept map for fluvial processes.  This thesaurus framework was extended to link to "library objects," including animations and illustrations of fluvial processes.  Three sets of terminology (concepts) were derived from an organization of the domain by a subject expert (T. Smith).  Each term has its own HTML page, with links to relevant resources as well as to other terminology, through hierarchical, equivalent, and associated relationships.  This framework is proposed for organizing concepts (and associated illustrations) into classroom presentation sequences. A set of powerpoint slides with pointers (URLs) to concrete representations, transformations, and examples has been developed for classroom presentations. The educational value of such an approach is being evaluated by the ADEPT evaluation team.

 

Resource Discovery in ADEPT

 

One of the goals of the ADEPT project is to extend current ADL infrastructure to incorporate a large number of autonomous, heterogeneous online data sources. Most of the existing works can be categories into two approaches: search engine approach and data integration approach. The search engine approach tries to crawl all the data sources, and store/index the found information in a centralized database. While the this approach enjoy the advantage of requiring minimal support from the data sources, it suffers from several drawbacks: 1) the search capability is limited, which leads to usually poor results 2) slow updates 3) cannot index data behind a query interface due to technical/administrative reasons. The data integration approach utilize the search capabilities of the individual data sources. A mediator, which sits between the user and the data sources, translates a query, broadcasts the translated queries to all the sources, merges the results and send them back the user. Well-designed data integration systems can return high-quality results, but do not scale well even for homogeneous data sources (e.g. FGDC Clearinghouse Interface) due to its broadcasting nature: 1) the number of queries is multiplied by the number of sources, which place a heavy burden on the mediator and the network; and 2) the response time limited by the response time of the worst source.

 

An important research activity at UCSB is focused on building another layer, namely, the resource discovery layer, on top of the data integration techniques. The resource discovery layer is responsible for finding and ranking the data sources that are relevant to a particular query, and automatically/interactively direct the query to the best data source(s). At the syntatical level, we are investigating various multi-dimensionalhistogram- and sampling-based techniques to summarize the data sources efficiently while at the same time providing accurate data source relevance estimation. At a more semantic level, we continue to work on the framework established in the Pharos project. One of our current experiments is automatic/semi-automatic ontology translation using search engine directory data.

 

In supporting search of collections, the Implementation Team designed and implemented collections meta-information services. Research that is ongoing in support of such services relates to aggregation and summarization for large data collections. Such collections (e.g. ADL, or other digital libraries) contain a huge number of data items. Nevertheless users expect fast answers for their queries. Also, it can not be assumed that each user is aware of the exact contents of a collection. To make the query process more efficient the user therefore should be supported with summarized information about the contents of the collection. Hence we have developed a prototype of a data aggregation tool that provides users with progressively refined approximate answers to their queries. This tool has two main applications. First, it can be used to directly answer queries. The user gets fast responses. Those responses are approximate results with absolute error bounds. The longer the query is executed the more exact the result. The query processing can be interrupted as soon as the answer is exact enough for the user or application. Second, it can be used to provide the user with a quick approximate overview of the contents of a collection. Thus the user gets support in building a query that retrieves exactly what he/she wants.

 

In order to support rapid access to educational materials involving images, UCSB researchers are currently involved in developing object-based representation of aerial images in order to facilitate: (1) faster access of data in the context of object-based querying; (2) relating maps and aerial images; and (3) study of spatio-temporal relationships in/among aerial images, Aerial images contain information that are of much use for geography and other spatially related disciplines such as geology and anthropology. The useful information in most images, including aerial, are concentrated in smaller, often homogeneous, portions of the image which we term 'Objects'. For example, in an aerial image, a typical list of interesting objects would be : lake, field, park, highways and housing colonies.

 

We are currently addressing the issues of extracting relevant objects out of aerial images and finding efficient descriptors for objects. At present, we are using the following descriptors to represent objects isolated from larger aerial images: (1) oriented rectangular bounding box; (2) dominant colors; and (3) dominant texture (Gabor) feature vectors.

 

We intend to address the following issues in the near future: (1) fusion of map and aerial image data/information; (2) study of spatial relationships between objects in an image; and (3) study of temporal relationships between objects in aerial images taken at different points of time.

 

During the past six months, we have digitized a collection of aerial photographs (about 100, each about 100MB each, with the help of the Implementation Team) that contain time varying data. This is being used as a test data set to develop and validate our algorithms. In addition, we are also in the process of acquiring images from the National Institute for Space Research (INPE), Brazil. These are satellite images of the Amazon basin and one of the objectives is to identify the deforestation from these multi-date images.

 

Collaborative Work Environments

 

An important attribute of learning environments is support for distributed collaborative work environments allowing multiple, geographically-distributed ADEPT users to participate collaboratively in tasks such as data visualization, information discovery, distance learning, education, and training. Central to these types of collaborations is the ability to create, deliver, represent, and visualize multimedia information including text, graphics (both 2D and 3D), audio, and video. An important thrust of UCSB research has been in developing essential tools and techniques to support distributed collaborative work environments allowing multiple, geographically-distributed ADEPT users to participate in collaborative tasks such as data visualization, information discovery, distance learning, education, and training, and towards developing help desks in the context of ADEPT.  Central to collaborations is the ability to create, deliver, represent, and visualize multimedia information including text, graphics (both 2D and 3D), audio, and video. 

 

One focus of UCSB research in this area is on the use of graphics as a means of communication.  2D and 3D graphics allow users to communicate beyond textual means. It is a powerful way of representing information in a form that can be more elucidate and intrinsic.  This is particularly true in the ADEPT application environment that emphasizes geographical data. We and investigating a modeling and visualization subsystem that enables distributed users to interactively construct, manipulate, visualize, and animate 2D and 3D graphics models on the Web. This will be based on our existing prototype that allows distributed 3D model construction on the Web.

 

Currently the model is based on VR objects that represent a physical scene.  Manipulation of these objects and communication of these changes through a network corresponds to a collaborative reconstruction of a physical environment.  Any educational activity benefiting from using a common 3-D representation is a candidate for this type of software support.

 

A software system has been designed and implemented for creating thisVR world together with allowing users to join a session.  Some multimedia components are being added to enhance the ways in which users can alter the common environment.  The software seems to be an effective way for multiple participants to work on a common task, using a shared representation.  As one example, we used a set of molecular structures downloaded from various biochemistry departments, using the standard "wrl" format.  Using these and other files, it was evident that manipulation of these structures could be accomplished in a distributed environment.  Using locking mechanisms to avoid conflicting changes, multiple users could change the molecular structures in a collaborative fashion.  The integration of better locking mechanisms has to be explored to allow simultaneous, yet non-conflicting manipulation of the scene.

 

Another feature that has been explored has been real-time media streaming between participants and weighing its benefits vs. drawbacks considering the bandwidth and processing requirements necessary for real-time collaboration.

 

Research issues related to integrating networking, graphics, and concurrency control still need to be addressed.  We hope to finish a prototype in the spring quarter with graphics, audio, and video communication capability. An appropriate demonstration will then be the use of this system for a distributed educational setting that allows dynamic information sharing to connect otherwise isolated students.

 

Another research thrust in the area of collaborative environments involves advanced tools for a ``helpdesk'' in particular and collaborative environments in general.  The goal is to investigate software technology to support the communication, interaction, and collaboration among a group of people and develop prototype software components to demonstrate the usefulness and efficacy of the technology in focused applications in distance learning and a help desk for digital libraries.  The initial results of this work is the identification of a tool currently named a "sticky note", which can be used to enhance the information exchange between users in a collaborative environment. Specifically, we have identified to develop (1) the "glass pane" which provides a means to facilitate user specific viewing and organization of data, (2) the "sticky note", which is an organization utility for annotations, and (3) the "active note" which automates the monitoring of the environment and the execution of certain predefined tasks.  This team was later expanded to the entire group.  Currently, the group is in the process of an initial implementation of a prototype using Wang-Koppel's current collaborative model construction system.

 

Security Issues

 

As systems such as ADEPT that support access to and creation of heterogeneous learning materials evolve, supporting security at a variety of levels becomes a critical issue. The ADEPT project has been developing a Safe Area of Computation (SAC) approach that uses a collection of trusted devices that enforces the protection of users from the insecurity of specific applications. Each of these devices is called a Safe Area of Computation. The goal of these devices is to provide islands of security that interact with an ocean of insecurity.

 

The main goal of the SAC approach is to provide security for client-server applications. The approach can be used to protect stand-alone applications too. However, due to space limitations this paper only describes its use for protecting client-server applications. In a client-server configuration Safe Areas of Computation provide:

1) Strong authentication. A client SAC exchanges cryptographic messages with a server SAC in order to perform mutual authentication. At the end of the exchange the client SAC and the server SAC will have agreed on a secret key.

2) Secure channels. Both the client and the server SAC will use the secret key that was agreed on in the authentication step to encrypt and decrypt messages that are exchanged, thus providing a secure channel.

3) Access control. The SAC approach uses three types of access control lists (ACLs) to implement various security policies based on the security required to access data. The access may be based on the security label associated with the data or on a more complex set of parameters, like the current date and/or number of accesses allowed.

 

An important goal of SAC is to provide strong security even for users that are not concerned with security.  The price paid for this security is that the SAC approach requires the user to deal with a hardware token, which is the client SAC. Although this requires an initial adaptation to its use, the benefits of the additional security provided outweigh the initial discomfort. The only requirement for a user is to insert the client SAC in a reader  before performing a secure transaction, in the same way that ATM cards are currently used for interactions with bank ATM machines.

 

The SAC approach provides access control in a generic way. The approach uses generic hierarchical security labels and access control lists. The access control lists are used to decide on a user's permission to access data. The attributes used for this decision are not predefined and only have a meaning in a particular context implemented for a particular security policy. This enables contextual interpretation and implementation of more than one security policy concurrently. A bank can, for example, use attributes to

specify what banking transactions a user can perform, while a digital library can use attributes to specify what type of documents a user can access, how long he/she has access to the documents, how many times and at what cost.

 

The current requirement of the Alexandria Digital Library is to be able to provide access to a particular subset of its data to only a selected group of users. ADL does not require fine granularity control; the data is either restricted or not restricted. In the current implementation the restricted data is composed of compressed files. Each of these files is a collection of data that must be transmitted in the compressed format to the user's computer. Programs in the user computer uncompress the downloaded data.  The SAC approach can meet these simple requirements very easily. Each privileged user has a smart card, which implements a client SAC. Each client SAC has the same set of permissions. It has an access control list entry that gives the user access to the one and only type of protected data.

 

The current requirements of the Alexandria Digital Library (ADL) are very simple and easily fulfilled using the SAC approach as described above. ADL has, however, some additional requirements that are being considered. A major change that the ADL Internet interface is undergoing is the change from using JiGI, a proprietary user interface, for all interactions to using standard web browsers for interactions. This does not affect any of the designs of the test bed. The reason for this is that the design of the test bed always used only pure web browsers. That is, the test bed never used any functionality of JiGi; therefore, no change is needed.

 

ADL is also concerned with the possibility of having to terminate access to particular data to non-members of the UC community after providing it for some time. This may occur due to legal constraints not considered initially or simply the desire to no longer offer the service. This is easily accommodated using the delete a label operation that deletes an access from the complex ACL.

 

ADL wants to be able to report to non UC users how many tokens are remaining in their smart cards. This is easily accomplished by having the server request this value using a field in the security attribute of a data item.

 

The ADL test bed use the Siemens SLE66CX160S integrated circuit and the CardOS M3.0 operating system. The functionality of the client SAC is implemented by applications loaded on the EEPROM memory.

 

The current test bed implementations impose some particular requirements for the client and server computers. The client platform requirements are:

  a) Windows 98 or NT system;

  b) A smart card reader;

  c) A standard browser that supports ActiveX controls; e.g., Microsoft Internet Explorer or Netscape Navigator;

  d) A set of programs that may be downloaded as an ActiveX package, which

consist of the necessary controls and the Client Communication Package.

 

The server platform requirements are:

  a) The server platform must be tamper resistant. This is a very strong assumption and as such requires very specific servers;

  b) Windows NT server;

  c) Microsoft Internet Information Service (MIIS);

  d) A particular ISAPI extension that communicates with the server SAC;

  e) The server SAC application;

  f) A secure connection to the database that stores classified data.

 

These requirements can be easily met by a Windows platform, which is the platform used for the implementation. In addition, the server could be easily migrated to a Unix platform.

 

Performance Issues

 

There are a significant number of performance issues that systems such as ADEPT will have to resolve in taking large amounts of electronic materials into learning environments. An important set of such issues relate to server clustering and distributed caching for scalable and reliable  Web services, particularly for improving scalability and availability of large Web systems with applications in ADL and ADEPT. In ADEPT research on adaptive load sharing for clustered DL servers, we are investigating load balancing strategies for clustered ADL servers.  ADEPT/ADL involves intensive database I/O and heterogeneous CPU activities.  Clustering servers  can improve the scalability of the ADL system in response to a large number of simultaneous access requests.  One difficulty addressed is that clustered workstation nodes may be non-uniform in terms of CPU and I/O speeds.  We have developed an optimization scheme that dynamically monitors the resource availability, uses  a low-cost communication strategy for updating load information among nodes, and schedules requests based on both I/O and computation load indices. Since the accurate cost estimation for processing database-searching requests is difficult, we have proposed a sampling  and prediction scheme to identify the relative efficiency of nodes for satisfying I/O and CPU demands of these requests.  We have provided analytic results to bound the performance of our scheme on this cluster environment and have conducted a set of experiments using the ADL traces to verify the effectiveness of the proposed  strategies.

 

In relation to scheduling optimization for resource-intensive Web requests on server clusters, we have investigated a two-level scheduling framework with a master/slave architecture for clustering Web servers. Such an architecture has advantages in dynamic resource recruitment, fail-over management and it

can also improve server performance compared to a flat architecture. The key methods we propose to make this architecture efficient are the separation of static and dynamic content processing, low overhead remote execution, and reservation-based scheduling which considers both I/O and CPU utilization. Our study provides a comparison of several scheduling approaches using experimental evaluation and analytic modeling and the results show that proper optimization in resource management can lead to over 65% performance improvement for a fixed number of nodes, and can achieve more substantial improvement when considering idle resource recruitment.

 

In relation to web caching for dynamic content delivery, many Web sites, such as ADEPT, evolve to provide sophisticated information manipulation services, dynamic content generation becomes more popular. Server or Wide-area caching can provide significant additional benefit by reducing server load, end-to-end latency and bandwidth requirement.  We classify locality in dynamic web content into three kinds: identical requests, equivalent requests, and partially equivalent requests. Equivalent requests are not identical to previous requests but result in generation of identical dynamic content. The documents generated for partially equivalent requests are not identical but can be used as temporary place holders for each other while the real document is being generated. We present a new protocol, which we refer to as Dynamic Content Caching Protocol (DCCP), to allow individual content generating applications to exploit query semantics and specify how their results should be cached and/or delivered. We illustrate usefulness of DCCP for several applications and evaluate its effectiveness using traces from the Alexandria Digital Library and NASA Kennedy Center as case studies.

 

In relation to page partitioning and lazy invalidation for caching dynamic Web content, we note that the caching of dynamic pages at a server is beneficial in reducing server resource demands in a busy Web site such as ADEPT and it also helps dynamic page caching at proxy sites.  Previous work has used fine-grain dependence graphs among individual dynamic pages and underlying data sets to enforce result consistency.  Such an approach can be cumbersome or inefficient to manage a cache in dealing with an arbitrarily large number of dynamic pages.  Our work studies partitioning dynamic pages into classes based on URL patterns and the proposed  scheme allows an application to specify cachability and data dependence and invoke validation for a class of dynamic pages. To make this scheme time-efficient with small space requirement, lazy invalidation is proposed to minimize slow disk accesses when URLs of dynamic pages are stored in memory with a digest format.  We have developed a prototype of a caching system for dynamic Web content and our initial experiment data indicates that the proposed techniques can reduce  server response times with an up to  seven-fold speedup for tested applications.

 

UCLA Activities and Findings

 

The UCLA portion of the ADEPT project is focused on supporting and evaluating the services of ADEPT that support learning.  ADEPT offers an important opportunity to evaluate learning activities and integrate the assessment results into the design of the system.  The classroom evaluation component of ADEPT is focusing on assessing learning outcomes as a result of implementation of successive ADEPT prototypes in undergraduate classrooms, first in geography and subsequently in other subject areas where geo-referenced information may be useful (for example, urban planning, environmental studies, archaeology, and public health).

 

We are employing a variety of research methods, including intensive analyses of individual users and large-scale studies of entire classrooms, using multiple dependent measures including analyses of problem-solving processes, quantitative analyses of learning outcomes, and qualitative descriptions of user misconceptions.  These converge on understanding how people learn using the ADEPT system. ADL prototypes already developed have been instrumented forsophisticated data collection, including transaction logging and surveys.  We are now extending these capabilities in ADEPT.  Results of the usability and evaluation studies will provide continuous feedback to the design of ADEPT services, functionality, and choice of collections.

 

The starting point for the classroom studies has been the gathering of baseline data about the performance and demographics of classes in the same subject area for the five preceding years, as well as information relating to faculty teaching practices and pedagogical objectives.  This forms a part of a needs analysis designed to identify faculty and student users, their tasks, task context, and what tools, content, collections, and metadata might be usable in their environment. Formative evaluation is building upon the needs analyses and will continue throughout the project, since needs will change as the system develops and becomes increasingly integrated into classroom instruction. Summative evaluation will begin midway through the project, by triangulating quantitative and qualitative methods to assess short and long-term learning and instructional methods. We are evaluating ADEPT in geography and other science classrooms where the use of ADEPT is integral to the curriculum, and in humanities classrooms such as history and classics where use of ADEPT is an important supplementary resource for student information seeking and curricular enrichment. Faculty in multiple disciplines at both UCSB and UCLA have agreed to participate, for classroom and distance-learning, and we will recruit additional faculty and classrooms over the course of the study, as user needs evolve, as capabilities of ADEPT expand, and as the success of ADEPT attracts other participants.

 

To date, we have planned our research strategy, designed the instruments for baseline data collection and received campus human subjects approval (the latter process took 6 months).  These activities have been done in collaboration with UCSB.  We collected pilot data at UCLA in winter term and are scheduled to collect full baseline data at UCLA and UCSB in spring term, 2000.  We conducted the first round of Iscape evaluation at UCLA with geography instructors and the Office of Instructional Development.

 

We note that the first year efforts have been devoted to research design, to baseline data collection, and evaluation of the initial ADEPT Iscapes.

 

 SDSC Activities and Findings

 

The San Diego Supercomputer Center staff has worked on three projects in support of the Interlib project: 1) development of an Art Museum Image Consortium (AMICO) image library in collaboration with the CDL; 2) evaluation of advanced XML-based databases for use in digital libraries; and 3) support for archiving of the Alexandria Digital Library data sets in the High Performance Storage System at SDSC.

 

The AMICO data collection consists of 55,000 images of art objects including 800 MBytes of catalog metadata and 180 GBytes of high-resolution art imagery, assembled from 26 art museums.  SDSC has created an XML Document Type Definition for the metadata for the AMICO collection, and has converted all of the AMICO metadata into the XML representation.  The data sets have been loaded onto a 100 GB disk farm, except for the images from the Fine Arts Museum of San Francisco.  SDSC is acquiring additional disk space to support the FASF images.  The eXcelon XML-based database was used to assemble the collection.  This required proper indexing of the collection and also identifying the correct way to distribute the data collection across multiple tables and servers in order to achieve reasonable performance. A user interface has been constructed for the collection which provides the ability to query metadata, view thumbnails, and extract full-sized art images.

 

The AMICO collection is being used to compare the performance of XML databases with traditional object-relational databases such as Oracle and IBM UDB.  The entire collection is being instantiated in all three databases.  This is both a test of the ability of each database to manipulate the collection, as well as a demonstration of the ability to migrate the collection between multiple databases using the XML DTD that was defined for the AMICO collection.

 

In collaboration with the CDL, the AMICO collection will be made available for art classes within the UC system.  A major design issue is the ability to support access to the high-resolution images provided by the collection.  Currently, the images are stored on disk to minimize latency of access.  The long-term goal is to store the images in an archive, and provide a disk cache for frequently requested images.  The latency of access to data stored within the archive is of concern.

 

SDSC in collaboration with UCSB, is supporting the ADL data collection within the HPSS archival storage system.  The ADL collection is over a terabyte in size and exceeds the capacity of the disk cache at SDSC.  HPSS uses IBM 3590 cartridges that are held within Storage Technology robots to store the data.  The nominal retrieval time is 2-4 minutes, depending upon whether a tape must be unloaded from a drive before the desired tape is mounted.  Depending upon the user load, request queues for access to data on tape can become long and further delay access to the tape drives.  Access times on the order of 20 minutes can then occur.

 

There are multiple approaches for improving latency of access to archived digital library data.  They include: 1) caching of frequently requested images on a separate server from the archive.  This requires a migration policy for which data sets to keep in the cache. 2) aggregation of images in containers to improve the probability that other desired images will be retrieved at the same time. 3) prefetching or staging of images into a cache to prepare for subsequent use.

 

The technology to support all three approaches is being developed at SDSC as part of the SDSC Storage Resource Broker (SRB) data handling system.  Although the software development is being funded by other projects, the resulting system will be applied to the InterLib project.  The SRB provides containers for aggregation of data sets, When users reference a desired data set, the data handling system copies the associated container from the archive onto a disk cache, and then returns the desired data set.  Policy mechanisms are being designed to control which containers remain on the disk cache under heavy user load.  Staging commands are being added to the SRB to support prefetch of containers from the archive.

 

GT Activities and Findings

 

GT has been investigating various alternatives for ADEPT visualization and working with UCSB  to integrate California Landsat and elevation data into the GT Virtual Geographic Information System (GT-VGIS).  The UNIX version was installed on an SGI ONYX (Loaner) at UCSB and demonstrations were performed at an ADEPT meeting.  Descriptions of several Application Programmer Interfaces were also provided.   GT has been restructuring the GT-VGIS rendering system to work on Windows NT so that PCs may be used as a delivery platform.  An early version of the windows code was delivered to ADEPT for demonstration in December.  Integration of the GIS query/select functionality into the Windows NT GT-VGIS is proceeding, with a socet interface to ArcView for attribute handling.  The NT version is also being upgraded to handle 3 D objects such as buildings.

 

In a parallel effort, GT is implementing the concept of a 3 D server whereby a very lightweight client ( web browser) is able to interact with 3 D images that are generated on a UNIX or NT server with high end graphics capability and transmitted as JPEG compressed images to the client. Modes of interaction are still being defined, but a user should be able to point to an area on the globe and have an AVI movie generated to fly to that point.  At this point a viewer is immersed in a 3 D terrain and simple navigation functions would allow limited movement capability.  A user might click on a mountain and request an AVI movie flying around that point.  He should also be able to select objects and query a GIS database interactively.

 

Other alternatives are also being considered including the SRI visualization software system, other commercial visualization systems, and VRML applications.

 

UGA Activities and Findings

 

UGA has been playing a role in the Iscape construction of Information Landscapes (Iscapes) To accomplish with that, we started with an Iscape working definition based upon the Iscape concept model. On constructing Iscapes, we deal with a collection of semantically related information assets that may not only be heterogeneous in syntax/format, structure, and media, but also may be obtained by different locations (web sites, repositories, databases, data collections) using a variety of query languages and information retrieval techniques and access methods. In order to achieve our goal, we have identified certain focus areas that we have been concentrating on: (a) designing and implementing a metabase for our target geographical domain; (b) implementing a diverse range of extractors for different web sites providing information pertaining to the target domain, allowing for extracting relevant metadata to populate the metabase; (c) designing and implementing ontologies relevant for the purpose; (d) designing and creating a few representative Iscapes for performing queries on the metabase; and (e) designing and implementing an agent architecture to process the Iscape request.  A preliminary Iscape specification was constructed on top of Web-centric XML (Extensible Markup Language) and RDF (Resource Description Framework) based infrastructures.  A few Iscape scenarios were designed along with information requests that were implemented (see Web site referenced below) as a first realization of the Iscape concept.

 

A demonstrational version is also available for trial with classroom applications focusing on finding relevant information on the Digital Earth (http://lsdis.cs.uga.edu/ADEPT/IscapeDemos/Version1/version1.html). Extensions of the architecture are now being considered as a substrate for the next Iscape demonstration version (version 2, to be released soon), for classroom applications built on top of contextual information and operation simulations, taking advantage of the correlation of information across the Digital Earth.

 

An example of an information request to be processed as an Iscape that will facilitate learning about the Digital Earth involves a city council making decisions over the planning of a new landfill. Landfills are a common practice worldwide and by far the most common waste disposal method in the  United States, probably accounting for more than 90 percent of the nation's municipal refuse. This example scenario comes in support of one of the suggestions for Digital Earth scenarios sampled by the "First Inter-Agency Digital Earth Working Group", an effort on behalf of NASA's inter-agency Digital Earth Program. This is a simplified yet coherent hands-on learning interaction exercise with the Digital Earth. It should be refined through actual experience. The starting point would be to find the best location for the landfill. The figure shows a high-level Iscape and corresponding intermediate refinements that would occur during processing.  This kind of prototype is what is being applied to the ADEPT Project.

 

Research Training Provided By The Project

 

The project has provided research training to a large number of graduate students including 15 at UCSB, 4 at UCLA, 2 at UCSD (SDSC), 2 at UG, and 2 at GT.  Most of these students are PhD candidates. The graduate students received training in research methods in a variety of fields (e.g., geography, computer and information science, psychology, and education) and some have gained contextual background in geography and geography education.  The co-PIs at UCLA are attending courses in geography to gain more domain knowledge.

 

Educational And Outreach Activities

 

Support for educational activities lies at the heart of the project, and activities in this area have been described above. In particular, the current products of our research are being taken into classrooms at UCSB and UCLA during the Spring Quarter.

 

PUBLICATIONS, PRODUCTS, AND CONTRIBUTIONS OF THE PROJECT

 

Publications Of The Project

 

Frew, J., M. Freeston, N. Freitas, L. Hill, G. Jane'e, K. Lovette, R. Nideffer, T. Smith, Q. Zheng (2000) The Alexandria Digital Library architecture. Int. J. Digit. Libr. 3:1, 1-10.

 

Hill, L., Dolin, R., Rae, M.A., Carver, L., Frew, J., Larsgaard, M. Alexandria Digital Library: User evaluation studies and system design Journal of the American Society for Information Science (Special issue on Digital Libraries), 2000.

 

Leazer, G.L., Gilliland-Swetland, A.J., Borgman, C.L.  (in review).Classroom Evaluation of the Alexandria Digital Earth Prototype(ADEPT). American Society for Information Science, 2000 Annual

Conference, Chicago, November, 2000.

 

Moore, R., Baru, C., Rajasekar, R., Ludaescher, B. Marciano, R., Wan, M., Schroeder, W., Gupta, A., "Collection-Based Persistent DigitalArchives", D-Lib Part 1, Volume 6 Number 3

 

Ben Smith, Anurag Acharya, Tao Yang, Huican Zhu, Caching Equivalent and Partial Results for Dynamic Web Content. in Proceedings of 1999 USENIX Symposium on Internet Technologies and Systems (USITS'99).  pp.  209-220.

 

H. Zhu, T. Yang, Q. Zheng, D. Watson, O. Ibarra and T. Smith, Adaptive Load Sharing for Clustered Digital Library Servers, Accepted for publication in International Journal of Digital Libraries,

2000.

 

Huican Zhu, B. Smith, and T. Yang, Scheduling Optimization for Resource-Intensive Web Requests on Server Clusters. in the Proceedings of the Eleventh Annual ACM Symposium on Parallel Algorithms and

Architectures (SPAA'99), pp. pages 13--22.

 

Submitted Publications

 

Leazer, G.L., Gilliland-Swetland, A.J., Borgman, C.L.  (in press). Evaluating the use of a geographic digital library in undergraduateclassrooms: the Alexandria Digital Earth Prototype (ADEPT).  DL '00: Association for Computing Machinery Conference on Digital Libraries, San Antonio, Texas, June, 2000.

 

Unpublished Reports and Papers

 

Zhu, H., and T. Yang, Fast Invalidation for Caching Dynamic Web Content, 2000 (under revision).

 

Presentations

 

A number of presentations about the project have been given both nationally and internationally.

 

The nature and activities of the project may be found on the following web sites

 

           http://www.alexandria.ucsb.edu

                   Alexandria Digital Library Project home page

 

           http://webclient.alexandria.ucsb.edu

                   HTML browser client for CDL

 

           http://www.alexandria.ucsb.edu/gazetteer/gazserver

                   gazetteer server

 

           http://is.gseis.ucla.edu/adept (currently password protected)

 

           http://lsdis.cs.uga.edu/ADEPT/IscapeDemos/Version0/version0.html

 

Specific Products Of The Project

 

The specific products of the project, apart from its research outputs described above, included the augmented ADL system and services, the gazetteer, and courseware.

 

Contributions Of The Project

 

The project is multidisciplinary (e.g., geography, computer science, psychology, education) and is making contributions in each of these fields, as well as to the specific field of digital libraries. In particular, we are contributing to: (1) investigation and development of a large array of computational methods for the construction of personalized digital libraries providing services that can be used to construct and employ such systems, particularly in educational environments; (2) the representation of scientific concepts as a basis for electronic representations of scientific knowledge for educational purposes; (3) the development of methods for the formative and summative evaluation of digital libraries in educational applications.

 

By the nature of the ADEPT project, we are making major contributions to the use of digital library services and collections in undergraduate and graduate level education, and simultaneously to the physical, institutional,  and information resources for science and technology. In particular we are contributing to university infrastructure for IT-based delivery and use of digital libraries, which will have broad applications in telelearning and distance education.  Since in principle ADEPT can be employed in any learning setting, and has obvious applications outside of learning (such as emergency response), it is also contributing to the public welfare beyond science and engineering. In particular, it will provide services that allow arbitrary groups of individuals to construct personalized digital libraries with spatial search capabilities in any area of knowledge.