Annual Report Of
The Alexandria Digital Earth Prototype Project
June 1, 2000
The main participants in the
project include faculty, staff, and students at the University of California at
Santa Barbara (UCSB), University of California at Los Angeles (UCLA), San Diego
Supercomputer Center, University of Georgia (UG), and Georgia Tech (GT).
Personnel who have worked on
the project at UCSB include the following faculty researchers:
Terence R. Smith (Project Director)
Michael F. Goodchild (Project Associate Director)
Anurag Acharya (Computer Science)
Divyakant Agrawal (Computer Science)
Kevin Almeroth (Computer Science)
Omer Egecioglu (Computer Science)
Amr El Abbadi
(Computer Science)
James Frew (Bren School of Environmental Science)
Richard Mayer (Psychology)
Oscar Ibarra (Computer Science)
Richard Kemmerer (Computer Science)
Ambuj Singh (Computer Science)
Jianwen Su (Computer Science)
Yuan-Fang Want (Computer Science)
Tao Yang (Computer Science).
These faculty are supported
by approximately 12 graduate students. The project also has a staff of full-
and part-time employees, including:
Greg Janee (full-time,lead software engineer),
Gary Monheit (full-time, visualization; client
development),
Linda Hill (part-time, gazetteers;
thesauri; metadata; system evaluation),
Qi Zheng (full-time, database and server development;
left project Dec 1999),
Ray Smith (part-time, executive associate director),
Karen Andersen (full-time, project manager),
as well as other part-time
programmers. The now-operational Alexandria Digital Library (ADL), whose
services provide critical support for ADEPT, is supported by the staff of the
Map and Imagery
Laboratory (MIL) of UCSB's
Davidson Library, including:
Larry Carver (MIL Director),
Mary Larsgaard (Map Librarian),
Catherine Masi (database management),
Kim Philpot (system administration),
Jason Simpson (system administration),
and other support staff. The
MIL staff also play a central role in the development of ADEPT.
Other individuals and groups
at UCSB who have been involved include instructors in Geography (Oliver
Chadwick, Stuart Sweeney) whose, courses are being supported by ADEPT and
members of UCSB's Instructional.Development Program (Stanley Nicholson and Rick
Johnson).
Faculty from UCLA who have
worked on the project include:
Christine Borgman (Director of UCLA activities,)
Greg Leazer,
Anne Gilliland-Swetland.
and 4 graduate students.
Faculty from UG who have
worked on the project include:
Amit Sheth (Computer Science),
Tarcisio De Souza Lima,
and associated graduate
students.
Faculty from GT who have
worked on the project include:
Nick Faust
and associated graduate
students. The Implementation Team
interacted closely with Dr. Faust and is currently reviewing his
multiresolution whole-Earth database and visualization technology for possible
incorporation into the ADEPT testbed.
Members of SDSC who have
worked on the project include:
Reagan Moore,
Chatanya Baru,
Richard Marciano,
Bertram Ludaescher.
Two students who are
supported by NSF REU grants also participated.
A close working partnership
is with the following members of the so-called InterLib Project
University of California at Berkeley (UCB) DLI-2
project
(Robert Wilensky, Director)
Stanford University DLI-2 project
(Hector Garcia-Mollina, Director)
California Digital Library (CDL)
(John Ober, chief liaison).
With SDSC forming a fifth
member of InterLib, the projects hold regular meetings and support close
research and development activities. Two all-day meetings have been held during
the first six months of the project that involved significant representation of
all five InterLib team, the most recent being at UCSB on 22nd March during
which agreement was reached on further InterLib interactions and an InterLib
presentation at the DLI-2 meeting in England (June 12-13th). In terms of
activities with specific InterLib partners, the Implementation Team interacted
with Howard Foster from UCB's e-lib project and has set specific goals for
collaboration, with UCB is retargeting some of their client applications to
access ADL/ADEPT's library and gazetteer, and UCSB is working with UCB to
expose a UCB collection to ADL/ADEPT. With respect to Stanford, a UCSB student
is developing a server-side implementation for ADL/ADEPT of Stanford's Simple
Digital Library Interoperability Protocol (SDLIP). With respect to SDSC, the
Implementation Team has fully integrated. SDSC's Storage Resource Broker (SRB)
into the ADL/ADEPT architecture. Discussions continue between the Team and SDSC
on the possible uses of other SDSC systems and architectures (e.g. XML-based databases). Finally, the ADEPT
Implementation Team developed a Web-based browser according to a specification
of CDL and participated on CDL's Technical Architecture and Services (TAS)
Board.
An especially close
relationship has been developed with the NSF-supported PAGE (Program for the
Advancement of Geoscience Education) Project, which is developing a Geoscience
Digital Library (GDL) at University Consortium for Atmospheric Research (UCAR),
under the direction of Mary Marlino. A series of meetings at both sites led to
a memorandum of understanding (MoU) between PAGE and ADEPT, and specific agreements
for PAGE and the Implementation Team to jointly develop and evaluate testbed
technologies. PAGE also helped locate hydrography resources for the ADEPT
physical geography curriculum. ADEPT
and GDL will continue to meet on a regular basis to foster complementary
technical developments and research.
ADEPT has developed a
partnership with the University of Utrecht (Netherlands) under which the
University of Utrecht gave the Implementation Team access to the source code
for PCRASTER, a popular modeling and simulation software package which is being
used to support an initial ADEPT geography curriculum.
Active partnerships with
private corporations and government agencies involve Silicon Graphics (SGI),
SRI, and Oracle. The Implementation Team met with developers from SGI to review
their standard API for multiresolution whole-Earth browsing. A joint proposal was submitted to NSF but
was not successful. The Implementation Team has also interacted with
researchers from SRI who developed SRI's TerraVision distributed, hierarchical,
geographic visualization system. An MoU
was signed between SRI and ADEPT, giving the Implementation Team access to the
TerraVision source code.
A further close relationship
has been developed with the University of Aberdeen, Scotland, and with Mike
Freeston (Professor of Computing Science) in particular. Dr. Freeston spends
approximately 4-6 months each year with the ADEPT project, working on database
and indexing issues.
In other partnering
relations, the Implementation Team, the UCSB Graduate School of Education, and
the Survivors of the Shoah Visual History Foundation (VHF) submitted a joint
proposal to the US Department of Commerce to develop the VHF's video archive
technology as an educational and digital library resource. The Implementation
Team began a joint project with the Donald Bren School of Environmental Science
and Management's Earth System Science Workbench (ESSW) project, which will make
ESSW's extensive data holdings (historic and real-time satellite remote sensing)
accessible as ADL/ADEPT collections.
The UCLA researchers have
developed partnerships with UCLA's Office of Instructional Development (Larry
Loehrer, Ruth Sabean, and Steve Rossen) and interacted closely with both UCLA's
Department of Geography (John Agnew,
Marilyn Raphael, Antony Orme, and Larry Smith) and its Map library (David
Deckelbaum). The UCLA team has also provided their evaluation plans to UCAR for
their work on the GDL and are participating with the InterLib group.
ADEPT has ongoing contacts
with numerous other organizations, including ESRI, USGS, OGIS, NPS, and US
Navy,
The activities and findings
of the ADEPT project may be organized in terms of those of UCSB's research and
development teams and those of the (subcontracting) partners at UCLA, SDSC,
UGA, and GT.
Among the main development
activities of the project, the Implementation team augmented the architecture
of ADL so that it could serve as an engine for providing ADEPT services. In
particular, the fourth generation of the ADL architecture was developed An
implementation of the new architecture, entirely rewritten in Java using Java servlet
technology, is running at UCSB and supports both ADEPT research and the
California Digital Library. The
architecture is now a true three-tier model in which the middle tier (the
"middleware") defines both client- and server-side interfaces. The middleware's client-side interface now
employs a generic, XML-based framework for describing metadata and metadata
semantics; this framework closely follows the Resource Description Framework
(RDF) proposed standard. The services
that comprise the client-side interface now closely follow the Simple Digital
Library Interoperability Protocol (SDLIP), an InterLib-developed proposed
standard. In addition, the client-side
interface now supports multiple transports, including HTTP and HTTP with
XML/HTML mediation. The middleware's new server-side interface formalizes the
connection between the middleware and collection servers. A generic collection server has been
developed for collections based on relational
databases with star-shaped
schemata. The middleware server itself
now supports some additional capabilities such as configurable access control,
result ranking, and result set caching.
Finally, a simple
client that runs entirely in
a standard Web browser has been developed. This was built to specifications of
the CDL (one of our InterLib partners), and is now in operational use and
available for instructional purposes (see URL below).
Work has continued on
extending and developing the ADL gazetteer as an ADEPT service. The gazetteer now contains 4,099,540 entries
that cover the world. It includes a
merging of the two U.S. federal gazetteers and other gazetteer sets (it is
available as the website listed below). Work during this period has included
converting more USGS GNIS gazetteer data and loading it into the ADL gazetteer.
Problems that needed solving in order to do this included assigning ADL
categories to GNIS data and identifying duplicate GNIS records for the same
feature. Assigning ADL categories requires an analysis of the content of the
placenames and mappings from GNIS categories to ADL categories. Duplicate
detection requires an analysis of names, categories, and similarity of location
coordinates for suspected duplicate sets.
Work has also begun on the
specification of a gazetteer service protocol, working with Doug Nebert of the
FGDC, and on an XML specification of the ADL Gazetteer Content Standard to
support the export and import of gazetteer data. We are working with several
other projects that would like to use the ADL gazetteer data to geospatially
reference their collection objects (that currently have only placename
referencing), to build local gazetteers, and to support the digitization of
valuable historical gazetteer data.
L. Hill and M. Goodchild
were co-PIs for an NSF-sponsored workshop on Digital Gazetteer Information
Exchange (DGIE), hosted by the Smithsonian Institution in October. The workshop
was planned by a group representing ADL , the National Geographic Society, the
Smithsonian Institution, and a set of federal agencies including the U.S.
Geological Survey, the National Imagery & Mapping Agency, and NASA.
Participants came from a wide range of communities from U.S. federal and state
governments, commercial and academic organizations, and international
organizations. The goals of the two-day workshop were to (1) develop an
understanding of the potential of
indirect spatial referencing of information resources through geographic names
and (2) to identify the research and policy issues
associated with the
development of digital gazetteer information exchange. The highlights from the workshop include
acknowledgement of the immediate opportunity and requirement to coordinate the
building of shareable digital gazetteer data in the interest of digital earth
applications; the importance of the temporal aspects of gazetteer data; and the
need for a gazetteer service protocol to support distributed gazetteer
services. Complete information about the workshop, the participants, the
presentations, and the final report can be accessed at http://www.alexandria.ucsb.edu/gazetteer/dgie/DGIE_website/DGIE_homepage.htm.
Work has also begun on a
Browse Service for ADEPT and ADL. The Browse Service has load and browse
functionality for executing high-performance spatial searches on the metadata of
a collection. Hiearchical tiles of data are stored in browse databases and
queried via HTTP interface. Requests are made via XML requests and results are
streamed back in binary records. The browse databases additionally store counts
or aggregate information representing the density of metadata at any tile.
Aggregate information helps the user determine where to search for data from a top
level zooming into a bottom level tile. The Browse Service is accessed via any
number of client programs. These clients (or GeoBrowsers) may be written in
Java, Perl, C++, and may display information in2D or 3D.
ADEPT Geoservices
ADEPT Geoservices focused on
developing materials useful in learning environments, and in particular on dynamic
simulation, and the problems of implementing simulation models within a digital
library environment. PCRaster was acquired from the Netherlands, and we were
successful in acquiring source code through a memorandum of understanding to
share research results. We have acquired a high level of expertise in PCRaster
and in developing novel applications. Current work is concentrating on
developing PCRaster applications for the modules to be introduced into courses
in the Spring and Fall quarters. We have successfully implemented the Clarke
urban growth model, a form of cellular automaton, and believe that this is the
first model of social processes to be implemented in PCRaster. We are also
pursuing an ecological model of plant communities in a vernal pool complex,
using PCRaster to model the annual hydrologic cycle; and developing a model of
evapotranspiration over a large agricultural region. We have built a new WWW
interface to PCRaster that allows simulations to be controlled and viewed
through a standard browser.
At this time we are
beginning work on metadata for models, or formal methods of description of
simulation models of physical and social processes. These models will form the
foundation for ADEPT's services for storing, sharing, accessing, and using
simulation models. We are pursuing three approaches at
this time: 1) examination of
the ways in which scientists describe models to each other, and development of
methods that capture and formalize such information; 2) use of a standard
language, such as the scripting language of PCRaster; 3) formalization of
models as transformations between input and output data sets, which allows
models to be characterized as sets of metadata records. We plan to hold a small
workshop on the topic in the summer, at which we will bring together
researchers from a wide range of backgrounds to share ideas.
Development of Course
materials
Research and development was
carried out on the discovery, form, and presentation of materials for initial
ADEPT-supported course offerings in Introductory Physical Geography at UCSB and
UCLA in the Spring Quarter of 2000. It was decided to organize the materials
around the idea that an appropriate framework for both creating and presenting
scientific and mathematical knowledge is an appropriate model for the notion of
a concept. Adopting a model of concepts developed by T. Smith and associates in
relation to computational modeling systems and a set of fundamental concepts, a
series of lectures for the Geography courses and an associated set of
presentation materials, particularly involving dynamic displays, has been
developed.
The model of concepts
employed involves: an abstract representation of the concept (i.e., concept
names); a concrete representation of the concept that can be manipulated; a set
of transformations that can be applied to the concept; and examples of the
concept. The set of concepts developed include those covering the hydrological
cycle and fluvial erosion and development. The discovery of appropriate
presentations of different aspects of these concepts (e.g., transformations of
a ``drainage basin'' that illustrate the effects of erosional processes) was
aided by members of the PAGE Project.
The Implementation Team played a
leading role in developing a mockup of this ADEPT module on fluvial processes, which includes
animated models built from hydrographic data. In particular, they prototyped an
Information Landscape (Iscape) for ADEPT. The approach was to collect
geographical data files and geographical modeling software and to simulate a
hydrological model of the state of Nebraska for a 12 month period. The input
data files were run with the geographical modeling software to produce multiple
output files (images and animations). Metadata was collected on the input and
output files and on the model itself. The entire Iscape is visually presented
using a Web browser. We have documented the steps undertaken to collect the
data, organize it, and present it. Software utilities were written to automate
the process of running the
model and producing visual images that are Internet-ready.
The Implementation Team also
experimented with using thesaurus software to build a concept map for fluvial
processes. This thesaurus framework was
extended to link to "library objects," including animations and
illustrations of fluvial processes.
Three sets of terminology (concepts) were derived from an organization
of the domain by a subject expert (T. Smith).
Each term has its own HTML page, with links to relevant resources as
well as to other terminology, through hierarchical, equivalent, and associated
relationships. This framework is
proposed for organizing concepts (and associated illustrations) into classroom
presentation sequences. A set of powerpoint slides with pointers (URLs) to
concrete representations, transformations, and examples has been developed for
classroom presentations. The educational value of such an approach is being
evaluated by the ADEPT evaluation team.
Resource Discovery in ADEPT
One of the goals of the
ADEPT project is to extend current ADL infrastructure to incorporate a large
number of autonomous, heterogeneous online data sources. Most of the existing
works can be categories into two approaches: search engine approach and data
integration approach. The search engine approach tries to crawl all the data
sources, and store/index the found information in a centralized database. While
the this approach enjoy the advantage of requiring minimal support from the
data sources, it suffers from several drawbacks: 1) the search capability is
limited, which leads to usually poor results 2) slow updates 3) cannot index
data behind a query interface due to technical/administrative reasons. The data
integration approach utilize the search capabilities of the individual data
sources. A mediator, which sits between the user and the data sources,
translates a query, broadcasts the translated queries to all the sources,
merges the results and send them back the user. Well-designed data integration
systems can return high-quality results, but do not scale well even for
homogeneous data sources (e.g. FGDC Clearinghouse Interface) due to its
broadcasting nature: 1) the number of queries is multiplied by the number of
sources, which place a heavy burden on the mediator and the network; and 2) the
response time limited by the response time of the worst source.
An important research
activity at UCSB is focused on building another layer, namely, the resource
discovery layer, on top of the data integration techniques. The resource discovery
layer is responsible for finding and ranking the data sources that are relevant
to a particular query, and automatically/interactively direct the query to the
best data source(s). At the syntatical level, we are investigating various
multi-dimensionalhistogram- and sampling-based techniques to summarize the data
sources efficiently while at the same time providing accurate data source
relevance estimation. At a more semantic level, we continue to work on the
framework established in the Pharos project. One of our current experiments is
automatic/semi-automatic ontology translation using search engine directory
data.
In supporting search of
collections, the Implementation Team designed and implemented collections
meta-information services. Research that is ongoing in support of such services
relates to aggregation and summarization for large data collections. Such
collections (e.g. ADL, or other digital libraries) contain a huge number of
data items. Nevertheless users expect fast answers for their queries. Also, it
can not be assumed that each user is aware of the exact contents of a
collection. To make the query process more efficient the user therefore should
be supported with summarized information about the contents of the collection.
Hence we have developed a prototype of a data aggregation tool that provides
users with progressively refined approximate answers to their queries. This
tool has two main applications. First, it can be used to directly answer
queries. The user gets fast responses. Those responses are approximate results
with absolute error bounds. The longer the query is executed the more exact the
result. The query processing can be interrupted as soon as the answer is exact
enough for the user or application. Second, it can be used to provide the user
with a quick approximate overview of the contents of a collection. Thus the
user gets support in building a query that retrieves exactly what he/she wants.
In order to support rapid
access to educational materials involving images, UCSB researchers are
currently involved in developing object-based representation of aerial images
in order to facilitate: (1) faster access of data in the context of
object-based querying; (2) relating maps and aerial images; and (3) study of
spatio-temporal relationships in/among aerial images, Aerial images contain
information that are of much use for geography and other spatially related
disciplines such as geology and anthropology. The useful information in most
images, including aerial, are concentrated in smaller, often homogeneous,
portions of the image which we term 'Objects'. For example, in an aerial image,
a typical list of interesting objects would be : lake, field, park, highways
and housing colonies.
We are currently addressing
the issues of extracting relevant objects out of aerial images and finding
efficient descriptors for objects. At present, we are using the following
descriptors to represent objects isolated from larger aerial images: (1)
oriented rectangular bounding box; (2) dominant colors; and (3) dominant
texture (Gabor) feature vectors.
We intend to address the
following issues in the near future: (1) fusion of map and aerial image
data/information; (2) study of spatial relationships between objects in an
image; and (3) study of temporal relationships between objects in aerial images
taken at different points of time.
During the past six months,
we have digitized a collection of aerial photographs (about 100, each about
100MB each, with the help of the Implementation Team) that contain time varying
data. This is being used as a test data set to develop and validate our
algorithms. In addition, we are also in the process of acquiring images from
the National Institute for Space Research (INPE), Brazil. These are satellite
images of the Amazon basin and one of the objectives is to identify the
deforestation from these multi-date images.
Collaborative Work
Environments
An important attribute of
learning environments is support for distributed collaborative work
environments allowing multiple, geographically-distributed ADEPT users to
participate collaboratively in tasks such as data visualization, information
discovery, distance learning, education, and training. Central to these types
of collaborations is the ability to create, deliver, represent, and visualize
multimedia information including text, graphics (both 2D and 3D), audio, and
video. An important thrust of UCSB research has been in developing essential
tools and techniques to support distributed collaborative work environments allowing
multiple, geographically-distributed ADEPT users to participate in
collaborative tasks such as data visualization, information discovery, distance
learning, education, and training, and towards developing help desks in the
context of ADEPT. Central to
collaborations is the ability to create, deliver, represent, and visualize
multimedia information including text, graphics (both 2D and 3D), audio, and
video.
One focus of UCSB research
in this area is on the use of graphics as a means of communication. 2D and 3D graphics allow users to
communicate beyond textual means. It is a powerful way of representing
information in a form that can be more elucidate and intrinsic. This is particularly true in the ADEPT
application environment that emphasizes geographical data. We and investigating
a modeling and visualization subsystem that enables distributed users to
interactively construct, manipulate, visualize, and animate 2D and 3D graphics
models on the Web. This will be based on our existing prototype that allows
distributed 3D model construction on the Web.
Currently the model is based
on VR objects that represent a physical scene.
Manipulation of these objects and communication of these changes through
a network corresponds to a collaborative reconstruction of a physical
environment. Any educational activity
benefiting from using a common 3-D representation is a candidate for this type
of software support.
A software system has been
designed and implemented for creating thisVR world together with allowing users
to join a session. Some multimedia
components are being added to enhance the ways in which users can alter the
common environment. The software seems
to be an effective way for multiple participants to work on a common task,
using a shared representation. As one
example, we used a set of molecular structures downloaded from various
biochemistry departments, using the standard "wrl" format. Using these and other files, it was evident
that manipulation of these structures could be accomplished in a distributed
environment. Using locking mechanisms
to avoid conflicting changes, multiple users could change the molecular
structures in a collaborative fashion.
The integration of better locking mechanisms has to be explored to allow
simultaneous, yet non-conflicting manipulation of the scene.
Another feature that has
been explored has been real-time media streaming between participants and
weighing its benefits vs. drawbacks considering the bandwidth and processing
requirements necessary for real-time collaboration.
Research issues related to
integrating networking, graphics, and concurrency control still need to be
addressed. We hope to finish a
prototype in the spring quarter with graphics, audio, and video communication
capability. An appropriate demonstration will then be the use of this system
for a distributed educational setting that allows dynamic information sharing
to connect otherwise isolated students.
Another research thrust in
the area of collaborative environments involves advanced tools for a
``helpdesk'' in particular and collaborative environments in general. The goal is to investigate software
technology to support the communication, interaction, and collaboration among a
group of people and develop prototype software components to demonstrate the
usefulness and efficacy of the technology in focused applications in distance
learning and a help desk for digital libraries. The initial results of this work is the identification of a tool
currently named a "sticky note", which can be used to enhance the
information exchange between users in a collaborative environment.
Specifically, we have identified to develop (1) the "glass pane"
which provides a means to facilitate user specific viewing and organization of
data, (2) the "sticky note", which is an organization utility for
annotations, and (3) the "active note" which automates the monitoring
of the environment and the execution of certain predefined tasks. This team was later expanded to the entire
group. Currently, the group is in the
process of an initial implementation of a prototype using Wang-Koppel's current
collaborative model construction system.
Security Issues
As systems such as ADEPT
that support access to and creation of heterogeneous learning materials evolve,
supporting security at a variety of levels becomes a critical issue. The ADEPT
project has been developing a Safe Area of Computation (SAC) approach that uses
a collection of trusted devices that enforces the protection of users from the
insecurity of specific applications. Each of these devices is called a Safe
Area of Computation. The goal of these devices is to provide islands of
security that interact with an ocean of insecurity.
The main goal of the SAC
approach is to provide security for client-server applications. The approach
can be used to protect stand-alone applications too. However, due to space
limitations this paper only describes its use for protecting client-server
applications. In a client-server configuration Safe Areas of Computation provide:
1) Strong authentication. A
client SAC exchanges cryptographic messages with a server SAC in order to
perform mutual authentication. At the end of the exchange the client SAC and
the server SAC will have agreed on a secret key.
2) Secure channels. Both the
client and the server SAC will use the secret key that was agreed on in the
authentication step to encrypt and decrypt messages that are exchanged, thus
providing a secure channel.
3) Access control. The SAC
approach uses three types of access control lists (ACLs) to implement various
security policies based on the security required to access data. The access may
be based on the security label associated with the data or on a more complex
set of parameters, like the current date and/or number of accesses allowed.
An important goal of SAC is
to provide strong security even for users that are not concerned with
security. The price paid for this
security is that the SAC approach requires the user to deal with a hardware
token, which is the client SAC. Although this requires an initial adaptation to
its use, the benefits of the additional security provided outweigh the initial
discomfort. The only requirement for a user is to insert the client SAC in a
reader before performing a secure
transaction, in the same way that ATM cards are currently used for interactions
with bank ATM machines.
The SAC approach provides
access control in a generic way. The approach uses generic hierarchical
security labels and access control lists. The access control lists are used to
decide on a user's permission to access data. The attributes used for this
decision are not predefined and only have a meaning in a particular context
implemented for a particular security policy. This enables contextual
interpretation and implementation of more than one security policy
concurrently. A bank can, for example, use attributes to
specify what banking
transactions a user can perform, while a digital library can use attributes to
specify what type of documents a user can access, how long he/she has access to
the documents, how many times and at what cost.
The current requirement of
the Alexandria Digital Library is to be able to provide access to a particular
subset of its data to only a selected group of users. ADL does not require fine
granularity control; the data is either restricted or not restricted. In the
current implementation the restricted data is composed of compressed files.
Each of these files is a collection of data that must be transmitted in the
compressed format to the user's computer. Programs in the user computer
uncompress the downloaded data. The SAC
approach can meet these simple requirements very easily. Each privileged user
has a smart card, which implements a client SAC. Each client SAC has the same
set of permissions. It has an access control list entry that gives the user
access to the one and only type of protected data.
The current requirements of
the Alexandria Digital Library (ADL) are very simple and easily fulfilled using
the SAC approach as described above. ADL has, however, some additional
requirements that are being considered. A major change that the ADL Internet
interface is undergoing is the change from using JiGI, a proprietary user
interface, for all interactions to using standard web browsers for interactions.
This does not affect any of the designs of the test bed. The reason for this is
that the design of the test bed always used only pure web browsers. That is,
the test bed never used any functionality of JiGi; therefore, no change is
needed.
ADL is also concerned with
the possibility of having to terminate access to particular data to non-members
of the UC community after providing it for some time. This may occur due to
legal constraints not considered initially or simply the desire to no longer offer
the service. This is easily accommodated using the delete a label operation
that deletes an access from the complex ACL.
ADL wants to be able to
report to non UC users how many tokens are remaining in their smart cards. This
is easily accomplished by having the server request this value using a field in
the security attribute of a data item.
The ADL test bed use the
Siemens SLE66CX160S integrated circuit and the CardOS M3.0 operating system.
The functionality of the client SAC is implemented by applications loaded on
the EEPROM memory.
The current test bed
implementations impose some particular requirements for the client and server
computers. The client platform requirements are:
a) Windows 98 or NT system;
b) A smart card reader;
c) A standard browser that supports ActiveX controls; e.g.,
Microsoft Internet Explorer or Netscape Navigator;
d) A set of programs that may be downloaded as an ActiveX
package, which
consist of the necessary
controls and the Client Communication Package.
The server platform
requirements are:
a) The server platform must be tamper resistant. This is a very
strong assumption and as such requires very specific servers;
b) Windows NT server;
c) Microsoft Internet Information Service (MIIS);
d) A particular ISAPI extension that communicates with the server
SAC;
e) The server SAC application;
f) A secure connection to the database that stores classified
data.
These requirements can be
easily met by a Windows platform, which is the platform used for the implementation.
In addition, the server could be easily migrated to a Unix platform.
There are a significant
number of performance issues that systems such as ADEPT will have to resolve in
taking large amounts of electronic materials into learning environments. An
important set of such issues relate to server clustering and distributed
caching for scalable and reliable Web
services, particularly for improving scalability and availability of large Web
systems with applications in ADL and ADEPT. In ADEPT research on adaptive load
sharing for clustered DL servers, we are investigating load balancing
strategies for clustered ADL servers.
ADEPT/ADL involves intensive database I/O and heterogeneous CPU activities. Clustering servers can improve the scalability of the ADL system in response to a
large number of simultaneous access requests.
One difficulty addressed is that clustered workstation nodes may be
non-uniform in terms of CPU and I/O speeds.
We have developed an optimization scheme that dynamically monitors the
resource availability, uses a low-cost
communication strategy for updating load information among nodes, and schedules
requests based on both I/O and computation load indices. Since the accurate
cost estimation for processing database-searching requests is difficult, we
have proposed a sampling and prediction
scheme to identify the relative efficiency of nodes for satisfying I/O and CPU
demands of these requests. We have
provided analytic results to bound the performance of our scheme on this
cluster environment and have conducted a set of experiments using the ADL
traces to verify the effectiveness of the proposed strategies.
In relation to scheduling
optimization for resource-intensive Web requests on server clusters, we have
investigated a two-level scheduling framework with a master/slave architecture
for clustering Web servers. Such an architecture has advantages in dynamic
resource recruitment, fail-over management and it
can also improve server
performance compared to a flat architecture. The key methods we propose to make
this architecture efficient are the separation of static and dynamic content
processing, low overhead remote execution, and reservation-based scheduling
which considers both I/O and CPU utilization. Our study provides a comparison
of several scheduling approaches using experimental evaluation and analytic
modeling and the results show that proper optimization in resource management
can lead to over 65% performance improvement for a fixed number of nodes, and
can achieve more substantial improvement when considering idle resource
recruitment.
In relation to web caching
for dynamic content delivery, many Web sites, such as ADEPT, evolve to provide
sophisticated information manipulation services, dynamic content generation
becomes more popular. Server or Wide-area caching can provide significant
additional benefit by reducing server load, end-to-end latency and bandwidth
requirement. We classify locality in
dynamic web content into three kinds: identical requests, equivalent requests,
and partially equivalent requests. Equivalent requests are not identical to
previous requests but result in generation of identical dynamic content. The
documents generated for partially equivalent requests are not identical but can
be used as temporary place holders for each other while the real document is
being generated. We present a new protocol, which we refer to as Dynamic
Content Caching Protocol (DCCP), to allow individual content generating
applications to exploit query semantics and specify how their results should be
cached and/or delivered. We illustrate usefulness of DCCP for several
applications and evaluate its effectiveness using traces from the Alexandria
Digital Library and NASA Kennedy Center as case studies.
In relation to page
partitioning and lazy invalidation for caching dynamic Web content, we note
that the caching of dynamic pages at a server is beneficial in reducing server
resource demands in a busy Web site such as ADEPT and it also helps dynamic page
caching at proxy sites. Previous work
has used fine-grain dependence graphs among individual dynamic pages and
underlying data sets to enforce result consistency. Such an approach can be cumbersome or inefficient to manage a
cache in dealing with an arbitrarily large number of dynamic pages. Our work studies partitioning dynamic pages
into classes based on URL patterns and the proposed scheme allows an application to specify cachability and data
dependence and invoke validation for a class of dynamic pages. To make this
scheme time-efficient with small space requirement, lazy invalidation is
proposed to minimize slow disk accesses when URLs of dynamic pages are stored
in memory with a digest format. We have
developed a prototype of a caching system for dynamic Web content and our
initial experiment data indicates that the proposed techniques can reduce server response times with an up to seven-fold speedup for tested applications.
The UCLA portion of the
ADEPT project is focused on supporting and evaluating the services of ADEPT
that support learning. ADEPT offers an
important opportunity to evaluate learning activities and integrate the
assessment results into the design of the system. The classroom evaluation component of ADEPT is focusing on
assessing learning outcomes as a result of implementation of successive ADEPT
prototypes in undergraduate classrooms, first in geography and subsequently in
other subject areas where geo-referenced information may be useful (for
example, urban planning, environmental studies, archaeology, and public
health).
We are employing a variety
of research methods, including intensive analyses of individual users and
large-scale studies of entire classrooms, using multiple dependent measures
including analyses of problem-solving processes, quantitative analyses of
learning outcomes, and qualitative descriptions of user misconceptions. These converge on understanding how people
learn using the ADEPT system. ADL prototypes already developed have been
instrumented forsophisticated data collection, including transaction logging
and surveys. We are now extending these
capabilities in ADEPT. Results of the
usability and evaluation studies will provide continuous feedback to the design
of ADEPT services, functionality, and choice of collections.
The starting point for the
classroom studies has been the gathering of baseline data about the performance
and demographics of classes in the same subject area for the five preceding
years, as well as information relating to faculty teaching practices and
pedagogical objectives. This forms a
part of a needs analysis designed to identify faculty and student users, their
tasks, task context, and what tools, content, collections, and metadata might
be usable in their environment. Formative evaluation is building upon the needs
analyses and will continue throughout the project, since needs will change as
the system develops and becomes increasingly integrated into classroom
instruction. Summative evaluation will begin midway through the project, by
triangulating quantitative and qualitative methods to assess short and
long-term learning and instructional methods. We are evaluating ADEPT in
geography and other science classrooms where the use of ADEPT is integral to
the curriculum, and in humanities classrooms such as history and classics where
use of ADEPT is an important supplementary resource for student information
seeking and curricular enrichment. Faculty in multiple disciplines at both UCSB
and UCLA have agreed to participate, for classroom and distance-learning, and
we will recruit additional faculty and classrooms over the course of the study,
as user needs evolve, as capabilities of ADEPT expand, and as the success of
ADEPT attracts other participants.
To date, we have planned our
research strategy, designed the instruments for baseline data collection and
received campus human subjects approval (the latter process took 6
months). These activities have been
done in collaboration with UCSB. We
collected pilot data at UCLA in winter term and are scheduled to collect full
baseline data at UCLA and UCSB in spring term, 2000. We conducted the first round of Iscape evaluation at UCLA with
geography instructors and the Office of Instructional Development.
We note that the first year
efforts have been devoted to research design, to baseline data collection, and
evaluation of the initial ADEPT Iscapes.
SDSC Activities and Findings
The San Diego Supercomputer
Center staff has worked on three projects in support of the Interlib project:
1) development of an Art Museum Image Consortium (AMICO) image library in
collaboration with the CDL; 2) evaluation of advanced XML-based databases for
use in digital libraries; and 3) support for archiving of the Alexandria
Digital Library data sets in the High Performance Storage System at SDSC.
The AMICO data collection
consists of 55,000 images of art objects including 800 MBytes of catalog
metadata and 180 GBytes of high-resolution art imagery, assembled from 26 art
museums. SDSC has created an XML
Document Type Definition for the metadata for the AMICO collection, and has
converted all of the AMICO metadata into the XML representation. The data sets have been loaded onto a 100 GB
disk farm, except for the images from the Fine Arts Museum of San
Francisco. SDSC is acquiring additional
disk space to support the FASF images.
The eXcelon XML-based database was used to assemble the collection. This required proper indexing of the
collection and also identifying the correct way to distribute the data
collection across multiple tables and servers in order to achieve reasonable
performance. A user interface has been constructed for the collection which
provides the ability to query metadata, view thumbnails, and extract full-sized
art images.
The AMICO collection is
being used to compare the performance of XML databases with traditional
object-relational databases such as Oracle and IBM UDB. The entire collection is being instantiated
in all three databases. This is both a
test of the ability of each database to manipulate the collection, as well as a
demonstration of the ability to migrate the collection between multiple
databases using the XML DTD that was defined for the AMICO collection.
In collaboration with the
CDL, the AMICO collection will be made available for art classes within the UC
system. A major design issue is the
ability to support access to the high-resolution images provided by the
collection. Currently, the images are
stored on disk to minimize latency of access.
The long-term goal is to store the images in an archive, and provide a
disk cache for frequently requested images.
The latency of access to data stored within the archive is of concern.
SDSC in collaboration with
UCSB, is supporting the ADL data collection within the HPSS archival storage
system. The ADL collection is over a
terabyte in size and exceeds the capacity of the disk cache at SDSC. HPSS uses IBM 3590 cartridges that are held
within Storage Technology robots to store the data. The nominal retrieval time is 2-4 minutes, depending upon whether
a tape must be unloaded from a drive before the desired tape is mounted. Depending upon the user load, request queues
for access to data on tape can become long and further delay access to the tape
drives. Access times on the order of 20
minutes can then occur.
There are multiple
approaches for improving latency of access to archived digital library
data. They include: 1) caching of
frequently requested images on a separate server from the archive. This requires a migration policy for which
data sets to keep in the cache. 2) aggregation of images in containers to
improve the probability that other desired images will be retrieved at the same
time. 3) prefetching or staging of images into a cache to prepare for
subsequent use.
The technology to support
all three approaches is being developed at SDSC as part of the SDSC Storage
Resource Broker (SRB) data handling system.
Although the software development is being funded by other projects, the
resulting system will be applied to the InterLib project. The SRB provides containers for aggregation
of data sets, When users reference a desired data set, the data handling system
copies the associated container from the archive onto a disk cache, and then
returns the desired data set. Policy
mechanisms are being designed to control which containers remain on the disk
cache under heavy user load. Staging
commands are being added to the SRB to support prefetch of containers from the
archive.
GT has been investigating
various alternatives for ADEPT visualization and working with UCSB to integrate California Landsat and
elevation data into the GT Virtual Geographic Information System (GT-VGIS). The UNIX version was installed on an SGI
ONYX (Loaner) at UCSB and demonstrations were performed at an ADEPT
meeting. Descriptions of several
Application Programmer Interfaces were also provided. GT has been restructuring the GT-VGIS rendering system to work
on Windows NT so that PCs may be used as a delivery platform. An early version of the windows code was
delivered to ADEPT for demonstration in December. Integration of the GIS query/select functionality into the
Windows NT GT-VGIS is proceeding, with a socet interface to ArcView for
attribute handling. The NT version is
also being upgraded to handle 3 D objects such as buildings.
In a parallel effort, GT is
implementing the concept of a 3 D server whereby a very lightweight client (
web browser) is able to interact with 3 D images that are generated on a UNIX
or NT server with high end graphics capability and transmitted as JPEG
compressed images to the client. Modes of interaction are still being defined,
but a user should be able to point to an area on the globe and have an AVI
movie generated to fly to that point.
At this point a viewer is immersed in a 3 D terrain and simple
navigation functions would allow limited movement capability. A user might click on a mountain and request
an AVI movie flying around that point.
He should also be able to select objects and query a GIS database
interactively.
Other alternatives are also
being considered including the SRI visualization software system, other
commercial visualization systems, and VRML applications.
UGA has been playing a role
in the Iscape construction of Information Landscapes (Iscapes) To accomplish
with that, we started with an Iscape working definition based upon the Iscape
concept model. On constructing Iscapes, we deal with a collection of
semantically related information assets that may not only be heterogeneous in
syntax/format, structure, and media, but also may be obtained by different
locations (web sites, repositories, databases, data collections) using a
variety of query languages and information retrieval techniques and access
methods. In order to achieve our goal, we have identified certain focus areas
that we have been concentrating on: (a) designing and implementing a metabase
for our target geographical domain; (b) implementing a diverse range of
extractors for different web sites providing information pertaining to the
target domain, allowing for extracting relevant metadata to populate the
metabase; (c) designing and implementing ontologies relevant for the purpose;
(d) designing and creating a few representative Iscapes for performing queries
on the metabase; and (e) designing and implementing an agent architecture to
process the Iscape request. A
preliminary Iscape specification was constructed on top of Web-centric XML (Extensible
Markup Language) and RDF (Resource Description Framework) based
infrastructures. A few Iscape scenarios
were designed along with information requests that were implemented (see Web
site referenced below) as a first realization of the Iscape concept.
A demonstrational version is
also available for trial with classroom applications focusing on finding
relevant information on the Digital Earth
(http://lsdis.cs.uga.edu/ADEPT/IscapeDemos/Version1/version1.html). Extensions
of the architecture are now being considered as a substrate for the next Iscape
demonstration version (version 2, to be released soon), for classroom
applications built on top of contextual information and operation simulations,
taking advantage of the correlation of information across the Digital Earth.
An example of an information
request to be processed as an Iscape that will facilitate learning about the
Digital Earth involves a city council making decisions over the planning of a
new landfill. Landfills are a common practice worldwide and by far the most
common waste disposal method in the
United States, probably accounting for more than 90 percent of the
nation's municipal refuse. This example scenario comes in support of one of the
suggestions for Digital Earth scenarios sampled by the "First Inter-Agency
Digital Earth Working Group", an effort on behalf of NASA's inter-agency
Digital Earth Program. This is a simplified yet coherent hands-on learning
interaction exercise with the Digital Earth. It should be refined through
actual experience. The starting point would be to find the best location for
the landfill. The figure shows a high-level Iscape and corresponding
intermediate refinements that would occur during processing. This kind of prototype is what is being
applied to the ADEPT Project.
The project has provided
research training to a large number of graduate students including 15 at UCSB,
4 at UCLA, 2 at UCSD (SDSC), 2 at UG, and 2 at GT. Most of these students are PhD candidates. The graduate students
received training in research methods in a variety of fields (e.g., geography,
computer and information science, psychology, and education) and some have
gained contextual background in geography and geography education. The co-PIs at UCLA are attending courses in
geography to gain more domain knowledge.
Support for educational
activities lies at the heart of the project, and activities in this area have
been described above. In particular, the current products of our research are
being taken into classrooms at UCSB and UCLA during the Spring Quarter.
Frew, J., M. Freeston, N.
Freitas, L. Hill, G. Jane'e, K. Lovette, R. Nideffer, T. Smith, Q. Zheng (2000)
The Alexandria Digital Library architecture. Int. J. Digit. Libr. 3:1, 1-10.
Hill, L., Dolin, R., Rae,
M.A., Carver, L., Frew, J., Larsgaard, M. Alexandria Digital Library: User
evaluation studies and system design Journal of the American Society for
Information Science (Special issue on Digital Libraries), 2000.
Leazer, G.L.,
Gilliland-Swetland, A.J., Borgman, C.L.
(in review).Classroom Evaluation of the Alexandria Digital Earth
Prototype(ADEPT). American Society for Information Science, 2000 Annual
Conference, Chicago,
November, 2000.
Moore, R., Baru, C.,
Rajasekar, R., Ludaescher, B. Marciano, R., Wan, M., Schroeder, W., Gupta, A.,
"Collection-Based Persistent DigitalArchives", D-Lib Part 1, Volume 6
Number 3
Ben Smith, Anurag Acharya,
Tao Yang, Huican Zhu, Caching Equivalent and Partial Results for Dynamic Web
Content. in Proceedings of 1999 USENIX Symposium on Internet Technologies and
Systems (USITS'99). pp. 209-220.
H. Zhu, T. Yang, Q. Zheng,
D. Watson, O. Ibarra and T. Smith, Adaptive Load Sharing for Clustered Digital
Library Servers, Accepted for publication in International Journal of Digital
Libraries,
2000.
Huican Zhu, B. Smith, and T.
Yang, Scheduling Optimization for Resource-Intensive Web Requests on Server
Clusters. in the Proceedings of the Eleventh Annual ACM Symposium on Parallel
Algorithms and
Architectures (SPAA'99), pp.
pages 13--22.
Submitted Publications
Leazer, G.L.,
Gilliland-Swetland, A.J., Borgman, C.L.
(in press). Evaluating the use of a geographic digital library in
undergraduateclassrooms: the Alexandria Digital Earth Prototype (ADEPT). DL '00: Association for Computing Machinery
Conference on Digital Libraries, San Antonio, Texas, June, 2000.
Unpublished Reports and
Papers
Zhu, H., and T. Yang, Fast
Invalidation for Caching Dynamic Web Content, 2000 (under revision).
Presentations
A number of presentations
about the project have been given both nationally and internationally.
The nature and activities of
the project may be found on the following web sites
http://www.alexandria.ucsb.edu
Alexandria Digital Library Project home page
http://webclient.alexandria.ucsb.edu
HTML browser client for CDL
http://www.alexandria.ucsb.edu/gazetteer/gazserver
gazetteer server
http://is.gseis.ucla.edu/adept (currently password
protected)
http://lsdis.cs.uga.edu/ADEPT/IscapeDemos/Version0/version0.html
The specific products of the
project, apart from its research outputs described above, included the
augmented ADL system and services, the gazetteer, and courseware.
The project is
multidisciplinary (e.g., geography, computer science, psychology, education)
and is making contributions in each of these fields, as well as to the specific
field of digital libraries. In particular, we are contributing to: (1)
investigation and development of a large array of computational methods for the
construction of personalized digital libraries providing services that can be
used to construct and employ such systems, particularly in educational
environments; (2) the representation of scientific concepts as a basis for
electronic representations of scientific knowledge for educational purposes;
(3) the development of methods for the formative and summative evaluation of
digital libraries in educational applications.
By the nature of the ADEPT
project, we are making major contributions to the use of digital library
services and collections in undergraduate and graduate level education, and
simultaneously to the physical, institutional,
and information resources for science and technology. In particular we
are contributing to university infrastructure for IT-based delivery and use of
digital libraries, which will have broad applications in telelearning and
distance education. Since in principle
ADEPT can be employed in any learning setting, and has obvious applications outside
of learning (such as emergency response), it is also contributing to the public
welfare beyond science and engineering. In particular, it will provide services
that allow arbitrary groups of individuals to construct personalized digital
libraries with spatial search capabilities in any area of knowledge.