QUARTERLY REPORT: ALEXANDRIA DIGITAL LIBRARY PROJECT

 

NSF Program: CISE (IRIS), NSF 03-141

Award Number: IRI94-11330

PI Name: Terence R. Smith

Period Covered By This Report: 05/01/97-07/31/97

PI Institution: UCSB & Date: August 1st, 1997

PI Address: Department of Computer Science, UCSB, Santa Barbara, CA 93106

 

Summary of Overall Progress: The Alexandria project team structure has been modified to better reflect the current working environment. Research-centered teams include the original Information Systems, Parallel Processing and Image Processing teams, and a new Geospatial Information team has been added. Development-centered teams are Collection Development, Systems Engineering, User Interface Design and Implementation, and Evaluation. In addition, there is an overlying Digital Library Requirements team that defines the overall project direction. This new organization is more fully described in the Management section and will be referred to throughout this report.

 

Information Systems team research on the Pharos system included construction of a working system with a Web interface which automatically classifies Internet newsgroups within the library of congress classification system. Research also continued in t he area of database performance tuning. Outer joins are introduced into the materialized view approach and experiments of the approach have been conducted in ADL gazetteer database using Sybase DBMS. NSF funding support for a new research project called ALASKA (A Large Scale Knowledge Repository) has been received. Within the next three months the ALASKA project intends to benchmark its spatial point and extent indexing implementations against the commercial systems currently supporting ADL. The Image Processing team continued the work on wavelet analysis and progressive delivery. A theoretical framework based on average interpolation subdivision point of view was developed. The team Parallel Processing team conducted research in three areas:

1) SWEB++ system for supporting dynamic client-server partitioning and scheduling;

2) incorporation of SWEB with the ADL server; and

3) evaluation of ADL server performance.

 

Funding has just arrived for a new research project called ALASKA (A Large Scale Knowledge Repository). Within the next three months the ALASKA project intends to benchmark its spatial point and extent indexing implementations against the commercial s ystems currently supporting ADL.

 

Transition of the ADL testbed to an operational system is in progress. A new java user interface was implemented and underwent extensive internal evaluation.

 

 

2.0 Management Issues

 

2.1 Organizational Structure - The Alexandria team structure has been modified to better reflect the current Alexandria project working environment.

 

2.1.1 Information Systems Research Team

Membership: Singh (leader), Agrawal, El Abbadi, Kothuri, Prakhabar, Smith, Su, Wu

 

Mission: The Team's research is focused on a number of issues pertaining to the architecture of digital libraries including resource discovery, extensible data store, and the implications of building a geographically referenced database. In rela tion to geographically referenced databases, issues under investigation include multi-dimensional indexing, data placement, tertiary storage, content-based retrieval, and performance tuning.

 

2.1.2 Performance and Parallel Processing Research Team

Membership: Yang (leader), Andresen, Egecioglu, Ibarra, Poulakidas, Watson, Zheng

 

Mission: The goal of the performance and Parallel Processing Team is to identify and investigate aspects of ADL that will benefit from high-performance computing on multi-computers. Current investigation is focused on various performance issues arising from the ADL environment in terms of both space and time complexities, including WWW applications. It is also developing algorithms and software techniques for efficient image processing for high performance digital libraries.

 

2.1.3 Image Processing Research Team

Membership: B. Manjunath (leader), Y. Ma, S. Mitra, N. Stroebel, Y. Wang

 

Mission: The goal of the Image Processing Team is to investigate issues concerning the representation, storage, and access of image-related data. The Team also aids the implementation team in adapting their research findings and recommendations for the testbed system. Particular foci of activity for the Team are wavelet decompositions for storage, manipulation and transmission of images, and access of images by content

 

2.1.4 Geospatial Information Research Team

Membership: Goodchild (leader), Carver, Geffner, Hill, Kemp, Kothuri, Larsgaard, Smith

Mission: Responsible for investigating a variety of issues relating to the integration of spatially referenced information objects into ADL.

 

2.1.5 Collection Development Team

Membership: Carver (leader), Goodchild, Hajic, Larsgaard, Simpson, Smith

Mission: Oversee ADL collection development plans, including criteria for new additions to the collection. Prioritization and schedule for loading new data into ADL. Track issues related to data formats, data ingest and analysis tools, catalog maintenance.

 

2.1.6 Systems Engineering Team

Membership: Frew (leader), Andresen, Colucci, Freeston, Hajic, Lovette, Nideffer, Simpson, Zheng

Mission: Responsible for design and implementation of ADL on-line query and retrieval services, and for overall integration of all ADL services into operational and research testbed system.

 

2.1.7 User Interface Design & Implementation Team

Membership: Nideffer (leader), Lovette, Freitas

Mission: Design and implement ADL user interface.

 

2.1.8 User Interface Evaluation Team

Membership

UCSB Subteam: Hill (leader), Carver, Dolin, Frew, Larsgaard, Rae, Simpson

Colorado Subteam: Buttenfield (leader) Larsen, Reitsma, Kersky, Tsou, Rokoske, Rock, Smith, Kole

Mission: Evaluate the effectiveness of the ADL system from the perspective of potential user of the system. Knowledge gained from these evaluation activities is used to inform the design and implementation of ADL, and to document in detail the effectiveness of ADL and areas calling for improvement, both in the interface design and in the underlying system functionality and content.

 

2.1.9 Digital Library Requirements Team

Membership: Smith (leader), Carver, Colucci, Frew, Hill

Mission: Define set of library services that will define current and future ADL systems.

 

2.2 Personnel

 

2.2.1 UCSB - Joy Colucci has recently joined the Alexandria staff as Project Manager. In response to a recent Alexandria Advisory Board recommendation that we hire a project manager, we immediately set in motion a search process. We offered the position to Joy, who has had significant experience in managing large teams involved in digital library-type projects in the context of the EOSDIS program of Hughes. Joy is working for us in this capacity as a consultant from Hughes for at least two days per week. She will be concentrating on helping Alexandria field an operational version of the testbed this coming fall.

 

The ADL Systems Engineering team has hired a CS graduate student Chen Ding as a research assistant for the summer months. We have also hired a CS graduate student Rong Hua as a research assistant to assist in Gazetteer development.

 

In the Information Systems team, two CS graduate students have been hired as research assistants for the ALASKA project: Steve Geffner and Mike Hoerhammer. The team also re-hired a graduate student in Statistics (Peter Karcher) to examine the potenti al of Bayesian networks as a technology to support intelligent help desks for ADL.

 

On the Parallel Processing team, Daniel Andresen, Huican Zhu, David Watson, Dianyuan Xiong worked at this summer. Athanassios S. Poulakidas left (graduated in July 1997). Both Andresen and Poulaidas obtained their Ph.D degrees.

 

On the Image Processing team, Wei-Ying MA took the thesis defense in June and moved to the Bay area to work for HP research labs. Yining Deng replaced Wei as the student on the ADL project.

 

Randy Kemp (ADL Collection Development) has left the project for graduate school in Colorado. Randy was ADL’s primary data/metadata loader.

 

2.2.2 University of Colorado - Sara Fabrikant, a doctoral student in Geography, joined the project for a summer month to develop strategies for subject testing of spatialized views. Michael Rock left the CU graduate program in August to intern at Joshua Tree National Park. James Kole (undergraduate hourly) has left the project; he was working on metadata entry and digitizing. We will hire another undergraduate hourly later in the Fall, once the UCSB metadata ingest staff slot is filled.

 

3.0 Research Progress and plans

 

3.1 Information Systems Team - Pharos: The team constructed a working system with a Web interface which automatically classifies Internet newsgroups within the library of congress classification system. A full description of this system will a ppear in D-Lib.

 

Database performance tuning: Outer joins were introduced into our materialized view approach and experiments of the approach have been conducted in ADL gazetteer database using Sybase DBMS. With the natural join, a view can not be used to answer its s ub-queries because the natural join doesn't contain those dangling tuples. But the outer join contains both the natural join part and the dangling tuples part. A materialized view of the outer join thus can be used to answer more queries. But the duplicat es removal with outer join is quite costly. The team introduced tagged outer join to avoid duplicates removal. This research work has been reported in a paper accepted by WITS'97. The experiments show a significant performance improvement. The average qu ery execution time in the experiments is reduced from 112.8 seconds to 23.8 seconds.

 

We investigated of cyclic approaches for allocating two-dimensional data and proposed new schemes that result in significant improvements over existing schemes. We investigated the performance of the schemes with actual disks and bounds on the perfor mance of the new cyclic approaches.

 

3.1.1 Research Plans of Team for Next Six Months - Pharos: 3 month plan is to formally evaluate the Pharos system. 6 month plan is to write up entire Pharos component.

 

View selection algorithm: An important issue in using materialized views is to have an efficient "optimal'' view selection algorithm. Since it is impossible to materialize all possible views, an efficient algorithm must be given to select a set of views to be materialized so that the obtained "benefit" is maximal. It has many applications in data cube and data warehousing systems. The problem is believed to be very hard and some approximate algorithms have been proposed. The team is studying the properties of the view selection problem and trying to find a better approximate view selection algorithm.

 

Semantic query optimization: As proved in our materialized view work for ADL systems, one has to use application specific semantic information, including statistics on data in the database and data distribution, to further optimize queries. The seman tic information will be obtained at the times of both system design and in operation. The team plans to study the issues of query optimization based on semantic information. The main thrust is to investigate how to maintain some useful statistics on the data, and how to use them to select optimal query evaluation plan. Continuing on the earlier work of Materialized views, we plan to continue to study the following issues:

1. How to eliminate more joins

2. How to let the system learn such semantic information and use it in query optimization.

 

Distributed query optimization: In global information systems, a single program performing global query optimization using a cost-based optimizer will not work well. This traditional distributed relational database approach assumes static data alloca tion, single administrative structure and uniformity in the architecture which are no longer valid in a non-uniform, multi-administrator global information environment. How to effectively perform query optimization in such a environment is the key to the success of global information system. Research on this issue is being done at several institutions. For example fusion query optimization in Stanford's Tsimmis, Information Manifold in AT&T Bell lab, Mariposa of UC Berkeley etc. It's still too early to say which one is more appropriate. We are also working on this issue using the techniques from constraint database, data warehousing, and information integration etc.

 

We are investigating cyclic approaches for allocating two-dimensional data and extending cyclic approaches to multiple dimensions. We are investigating replication for achieving load balance in a dynamic setting with multiple servers and skewed l oad as well as the benefits of prefetching for applications that have connected path traversals on two-dimensional data.

 

ALASKA: Funding has just arrived for a new research project called ALASKA (A Large Scale Knowledge Repository). Within the next three months the ALASKA project intends to benchmark its spatial point and extent indexing implementations against the commercial systems currently supporting ADL. Three specific applications of the multi-dimensional indexing technology will then be developed over the subsequent three months: a) support for the Digital Chart of the World as the basis map for the ADL map browser. b) the use of index levels in the index tree structure hierarchy to generate a set of raster images of different resolution (and to compare this with wavelet technology); c) projections of multi-dimensional indexes on to two-dimensional visual izations, as a data mining tool for studying the distribution of various attributes of library collections. In addition, a comprehensive and active web demonstrator is being prepared to explain the theory and practice behind the ALASKA indexing technology.

 

3.2 The Image Processing Team - The team continued the work on wavelet analysis and progressive delivery. A theoretical framework based on average interpolation subdivision point of view was developed. Together with multiresolution analysis it establishes a consistent framework for transformations well suited to (lossless) progressive-resolution image transmission and visualization. The deep connection between interpolation subdivision, scaling functions, and wavelets indicates that subdiv ision methods will become increasingly important in the near future.

 

The team has also continued the work on image segmentation and extending it to video sequences. Initial results are quite promising. A hardware convolution board is being designed by a Masters thesis student to facilitate texture feature extraction. < /P>

 

3.2.1 Research Plans of Team for Next Six Months: The team will work on progressive multiresolution image delivery including subregion retrieval; conclude theoretical analysis and experimental work (lossless compression of color and satellite imagery) and extend visual thesaurus to integrate multispectral, texture, and shape attributes as well as spatial relationships.

 

3.3 The Parallel Processing Team - The team conducted research in three areas: 1) SWEB++ system for supporting dynamic client-server partitioning and scheduling; 2) incorporation of SWEB with the ADL server; and 3) evaluation of ADL server per formance.

 

The team completed the experiments for verifying the results of dynamic client-server partitioning and scheduling. A software prototype (SWEB++) which assists the programming of WWW applications in using our partitioning and scheduling techniques has also been developed. In such an environment, a WWW programmer first creates the task binary executables for the client and server. The programmer then describes each task and the task chain using the task definition language. After that, the SWEB++ comp oser takes the task specification as an input and generates the C++ code to link the necessary modules together used for client and server site libraries. It extracts the task attribute information from the specification and produces C++ stubs for the sc heduler's resource requirement estimation. The composer also creates a CGI shell script program for the run-time chain executor to invoke when a split point is provided. This system can substantially reduce the programming efforts in incorporating dynamic optimization features introdu ced in our earlier work.

 

The team has also installed the ADL server on our local Ultra-SPARC SCI cluster. The goal is to evaluate the performance of ADL in the presence of a large number of concurrent accesses and improve the performance using SWEB. The installation process t akes times much longer than expected because of some bugs in the ADL Web server. The team is currently in the process of gathering test data and simulating the real-work concurrent requests. Evaluation of the ADL performance in responding the selected w orkload was initiated this Fall.

 

3.3.1 Research Plans of Team for Next Six Months - The team will evaluate the performance of an ADL server in responding the selected workload, incorporate and extend SWEB techniques, and study web caching on such an environment.

 

3.5 User Interface Evaluation Team - The UCSB team developed a user registration questionnaire for the next ADL interface which is based on the experience with the beta tester registration and ADL user evaluation studies. It is designed to colle ct the information needed to correlate user characteristics to session logs and user feedback through on-line comments and on-line surveys and questionnaires.

 

The team planned and carried out an internal evaluation of the interim ADL interface scheduled for October release to the University of California campuses, during the week of August 18th. Twenty-three people participated in this week-long evaluation on the UCSB campus. They were given three questions and asked to use the interface to find the answers. Most of the sessions were videotaped; notes were kept of the paths they took and the comments they made. A report of the r esults was written at the end of the week and given to the interface development team. The report includes (1) the most consistent and important problems observed; (2) other comments and observations; (3) an evaluation of the requirements for this phase of developme nt and whether they were met; and (4) more detailed notes from the evaluation process. The interface development team has reviewed the comments and responded to each point. In addition, we also developed a research plan for the future in the area of user ne eds analysis, use of user registration/profiles/usage patterns in adaptive information systems, user help services, and user evaluation of a digital library system.

 

The Colorado team has continued in the areas of spatial interaction logging of the Web transaction logs and in three metadata pilot projects. Additional work on spatialized views began to make real progress, and forms a new thrust in the Colorado efforts. Spatial interaction modeling of user behavior patterns on the Web interfa ce were a focus of research in mid-summer. The team employs spatial interaction models to estimate user patterns of moving between different ADL Web pages. The intent is to monitor use patterns for a user set of (mostly) unknown individuals, and to de termine if use patterns vary with adjustments to the interface design. Work from the spring demonstrated that behavior patterns for the aggregated ADL user group are non-random, and stable over time. The team additionally determined from the models that use of the tutorials and help files does not interact much with use of the catalog, gazetteer, and map browser. This summer, the team disaggregated the three target user groups, also the system designers, and the Alexandria Design Review (ADR) Beta Testers, developing spatial interaction models for each group. It took a good part of the summer to decide how to allocate 174,000 individual transactions to one or another group, and to re-process and re-summarize the transaction logs. Early in Septemb er, we began analyzing results. Preliminary indications are that the librarians as a group have somewhat different use patterns than any other disaggregated groups. This work will continue in coming months, as well as monitoring the ADR group longitudinally .

 

A new area of work applies spatial metaphors to visually organize large collections of information. Termed 'spatialization', the method utilizes metadata keywords to construct visual representations of queried items, where items whose metadata are more similar lie closer together in the visualization than items whose metadata are dissimilar. This work will continue in coming months.

 

3.5.1 Research Plans of Team for Next Six Months - The UCSB team plans to conduct another internal evaluation of the new ADL interface in September prior to the October release. The team will also plan and implement a user evaluation plan for the new ADL interface which be will a follow-on to the evaluation studies we did on the beta web interface. Data and analysis will be fed to the ADL development teams and will result in an overall report near the end of the project. The Evaluation Te am (UCSB) paper on our user evaluation studies will be presented at the annual conference of the American Society for Information Science in November in Washington D.C.

 

To follow on the user pattern modeling, the Colorado team will continue the disaggregation analysis, and focus a larger proportion of our time on publishing the results which are now coming in quickly. In preparation for monitoring transaction logs for the new JAVA-based interface, Tsou and Buttenfield are developing strategies for transaction logging agents that transmit patterns of use or delays from the JA VA client. To follow on the georeferencing project with the Denver Public Library, the team should complete GPS positioning of individual photographs for the pilot group before the snows return. Smith and will work on this effort. We expect that GIPSY will be completely operational soon, allowing the probabilistic coordinates to be generated.

 

Metadata reconstruction will continue for the Niwot Ridge air photo series. The Colorado and UCSB teams are working together to streamline the instructions for Access metadata ingest, and to dovetail them more directly with procedures that cataloging l ibrarians are familiar with. In this way, we can prepare for other cataloging librarians to begin ingesting metadata according to ADL specifications in coming years.

 

4.0 ADL System Development Progress and Plans

 

4.1 Transition of ADL testbed to operational system - The Alexandria project is dedicated to releasing an operational version of the ADL system because it will serve as real proof that the collections, services, and underlying technology that we have in place form a usable and attractive system for a significant number of users. In addition, we view this effort as being the first step in moving towards having a full DL facility at UCSB and in UC in general. In regard to the latter, the Universi ty is moving ahead with its plans to form the California Digital Library, and we are anticipating that ADL become a component of such a library.

 

Fall Release: The Alexandria Project has defined an operational build target for release in the fall of 1997 and based upon the current tested system. The fall release will be limited to users within the University of California system and member s of the existing ADL design review board.

 

This release of the ADL Search Interface is being implemented entirely in the Java programming language. It incorporates a vector-based map tool, a multi-tabbed text search area, and a hierarchical tree tool for displaying the results of the user's queries. There is also a workspace, where the user can manipulate their result sets, create a personal gazetteer, and save search parameters for future queries. The size and color of the interface, as well as its internal components, is extremely flexible , as to accommodate users on many different platforms. The interface can be distributed as an applet or application, and will run on any platform which supports Java.

 

In addition to working on the fall release, the Alexandria project is also exploring new and enhanced functionality to be added to the system for future builds to be released during 1998. The ADL development staff plans to work very closely with users and system evaluators in order to continue generating system requirements based upon user feedback.

 

System Requirements: User requirements have been generated based upon testbed evaluation. In order to ensure that ADL evolves around a user-centered design process, this system release is largely based upon requirements that were generated from feedb ack gathered during user evaluation of the original testbed. User-centered requirements have been drafted for the following categories of functions:

 

* Search Functions

* Session Management

* Result Display

* User Workspace

* Holdings Visualization

* Gazetteer Functions

* User Help Functions

* User Registration

* Usability Features

 

Other system requirements have been drafted to cover areas such as system performance, system security, data ingest, data distribution and other features necessary to support system operations.

 

In addition, requirements for post-fall release(s) of the system are being defined and prioritized. The Alexandria project plans to release it’s next version of the system in the spring of 1998.

4.2 Systems Engineering Team - The team performed an exhaustive database query performance evaluation for ADL databases. Based on the evaluation findings, adjustment was made on both data schema and database setup.

 

We also continued the research on the issue of developing gazetteer metadata content standard. A primary version of logical implementation was completed in 8/97. (See Partners section for more details of this effort)

 

In addition, the team accomplished the following:

 

*Added 300,000 more collection items to the testbed database.

*Redesigned the catalog "buckets" schema and the high level query interface.

*Installed and setup Oracle 7.3.3 RDBMS.

*Migrated all data from Sybase to the new Oracle RDBMS.

Implemented a testing schema for the gazetteer database.

*Identified new gazetteer data sources.

*Formalized the metadata reporting structure and template.

4.3 User Interface Design and Implementation Team - The team redesigned and implemented the ADL testbed system. The new system has java based user interface and more efficient backend database supports. The new system will open to UC access in t he end of 10/97. We also started integration of CMS (Computational Modeling Tool) created by the CMS project, under the direction of Terry Smith.

 

4.4 Development Plans for Next Six Months - Integration of research projects: wavelet image delivery, Image recognition searching, VRML-based 3d map implementations, OPS (On-line Public Spaces) project, and transitioning to new Java Specificati on. In addition, the Image Processing team plans a stand-alone version of the airphoto search based on texture is now on the web. Plans are being made to integrate this with the testbed. See: http://vivaldi.ece.ucsb.edu/AirPhoto.

 

4.5 Testbed Developments Involving Other Teams - All research teams are involved in monthly meetings to define and prioritize new ADL system requirements.

 

The Parallel Processing team continues to work on the JAVA implementation of the wavelet color image browsing scheme. It uses multi-threading to improve the client computation and communication performance and the code is universally executable in any standard WWW browser.

 

5.0 ADL Operations

 

5.1 Collection Development: Populating the database: Upon advice from the ADL board, renewed efforts have been given to adding materials to the ADL testbed. As of 8.21.97, the database contained 754,333 meta-records, 5601 data objects for a total of 15.7 GB’s of usable information. Examples of data loaded were California GIS data with a bio-geographical theme, Digital Orthophoto Quadrangles, Digital Raster Graphics.

 

Scanned historic aerial photography and compiled data sets from the Sierra Nevada project. Also added but licensed only for U.C. use were complete Spot Image Corp data at 10 meters resolution for the state of California. Although scanning is still underway of historic California imagery, loading has slowed until the Randy Kemp position is replaced.

 

Three pilot projects on metadata ingest have been undertaken by the University of Colorado team. A pilot project with the Boulder LTER (Long Term Ecological Research) site for Arctic and Alpine environments, whose very large and longitudinal data stud ies provide a good example of environmental data that should be distributed widely, is already in digital form, and is not available elsewhere. LTER metadata include open-ended descriptions of environmental conditions at collection sites where data are gathered at very fine-grain spatial and temporal resolution. The granularity issue is compounded by metadata not being stored in MARC format.

 

A second pilot project collaborates with the Denver Public Library (DPL) to georeference a set of half a million historical photographs for Internet distribution -- here, the metadata are cataloged (in USMARC format) and photos are scanned and archived at DPL. The challenge is how to automate the georeferencing process. This summer, we took GPS readings for a number of photographic sites, and will compare those coordinates with probabilistic georeferencing output from the GIPSY algorithm. GIPSY has been acquired, along with Postgres, and the Colorado portion of the GNIS gazetteer. GIPSY must be able to access GIRAS landuse/landcover data as well, and this installation is nearly complete. Andrew Smith and Barbara Buttenfield will deliver a paper at GIS/LIS '97 on this topic in November.

 

A third pilot project involves metadata ingest for a series of historical air photos held by the CU Library. These cover the Front Range and Niwot Ridge which is an active study site for many research projects in glaciation, snowpack modeling, biogeog raphy and fluvial geomorphology. These photos are originally from federal and state agencies, which no longer have the originals. Our intention was to begin scanning and metadata ingest (following UCSB guidelines), not only to ingest the photos and scan them, but to determine how we can teach non-ADL staff to catalog their own metadata. For example, we want Denver Public Library to do their own metadata ingest, once the georeferencing problem is worked out. This pilot project will continue through the fall.

 

5.2 Operational Policies and Procedures: First draft of operational concept for fall release has been completed, and covers the following topics:

 

*Target User Groups for Fall Release:

Restricted access - UC only

*Metadata:

Update and Change of Metadata

Changes to ADL Schema (add/delete fields, changes to fields) Population of Mandatory Metadata Fields

QA procedures

*Data Ingest:

QA procedures

Policies for data with no metadata

Procedures for cataloging

Procedures for data loading

*Data Distribution:

Policies for proprietary data

Distribution ‘packages’ - what software, readfiles, ancillary etc need to accompany data

*System Maintenance:

Backup procedures

Trouble-shooting

System security audits

*User Support:

Maintenance of user logs assisting users

Update of holdings screen on ADL system user interface

Hours for help desk, and email, 800 number for user support

Providing data expertise

 

6.0 Partnerships and Resources

 

6.1 Interactions with Partners

 

6.1.1 Hughes/Gazetteer Activities: Our work on the gazetteer implementation has resulted in the development of a new thesaurus of feature types for use with gazetteers. This thesaurus is based on the Guidelines for the Construction, Format , and Management of Monolingual Thesauri (ANSI/NISO Standard Z39.19-1993) and is designed primarily for information retrieval. The advantage of such a thesaurus is in the relationships represented between terms: hierarchical, associative, and equivalence. This allows navigation among terms by the users and also links from the user's own terminology to the terms used by the system. The terms in the thesaurus have been drawn from feature types in existing gazetteer s (USGS GNIS and NIMA's GNPS) and other sources such as NATO's Feature and Attribute Coding Catalog (FACC). The USGS Circular 1048 entitled "An Enhanced Digital Line Graph Design", various earth science and regular dictionaries, and thesauri from related information indexing and retrieval systems have been used as references for terms and their definitions and relationships.

 

The draft thesaurus has been reviewed by Giulietta Fargion (Hughes), UCSB faculty and students, staff of the Global Change Master Directory, and by NOAA staff. In all cases, the extensive comments received were reviewed and integrated into the thesauru s as appropriate. It has also been reviewed by Bob Rugg who is the chair of the ISO working group within TC211 working on "Feature Cataloguing - Methodology." This review resulted in an invitation to Linda Hill to participate in a panel that Dr. Rugg is organizing for the Association of American Geographers annual meeting on the topic of Feature Cataloging. Linda has also been appointed as a U.S. expert for TC211 Working Group 3 on Indirect Reference Systems (Gazetteers) as a result of this project wor k.

 

In addition, the feature type thesaurus is being compared to the feature types of the NATO Feature Attribute Class Catalog (FACC) used by NIMA and to the feature types used by the Canadian Permanent Committee on Geographical Names (CPCGN) and terms are added that were not there before. Needed terminology is also discovered during the process of writing conversion rules to move the current ADL gazetteer feature types to the new set of terminology. We plan to continue this process of review and comparison and frequent changes until we are ready for the actual conversion process. As of this date, the thesaurus contains a total of 392 terms (177 valid terms and 215 invalid terms for reference). It is available for review at the following URL: http://www.alexandria.ucsb.edu/~lhill/html/index.html. The structure of six (6) top terms (administrative areas, hydrographic features, land

parcels, manmade features, physiographic features, and regions) is working out very well as a basis for the organization of the terminology.

 

6.1.2 Other Partner Interactions - The Systems Engineering team worked with Oracle to install and setup the Oracle RDBMS. They also had interactions with ESRI to evaluate the Java based vector map generator, and contacted Orbix to get a eval uation copy of Orbix CORBA systems.

 

The Parallel Processing team continued interaction with Navy NRaD for implementing adaptive client-server scheduling code on their Convex machine with applications in image document browsing.

 

In June, Barbara Buttenfield met with John Moeller and Mike Domaratz of FGDC to begin planning a workshop on generalization for fusing digital cartographic data from multiple scales. Several phone conferences over the summer have advanced workshop plans, and the two-day meeting will be held (probably in North Carolina) in early December.

 

6.2 Additional Resources to Support Research and Development - The substantially delayed funding for ALASKA (A Large Scale Knowledge Repository) has finally arrived. ALASKA has been funded separately by NSF as a research program to develop a ne w indexing technology. ADL however provides an excellent testbed, and initial applications of ALASKA will be geared to enhancing the performance and functionality of ADL.

 

7.0 Presentations and Promotional Activities - The Alexandria project has developed a new set of informational web pages for the general public. These contain information about the ADL project goals, organization, members and missions of the rese arch and development teams, lists of publications, descriptions of partnerships with corporations and other groups, and will also provide access to ADL project documents. The site is being designed by Robert Nideffer, leader of the ADL User Interface Design and Implementation team, and will be consistent with the design of the new system user interface. The new pages may be accessed at: http://alexandria.ucsb.edu

 

The following people paid visits to UCSB and received ADL demos:

 

5/97

Chris Erickson, CEO of Red Brick Systems, a data warehouse firm

 

6/97

Visitors to UCSB/ADL from NSF: Nora Sabelli; Chris Dede; Andres Henriquez

 

Dr. Shiro Sakata, Assoc. General Manager, NEC; Dr. Yoshinari Hara,Senior Research Manager, NEC

 

Dr. Edward A. Fox, Virginia Tech

 

Dr. Hiroshi Murakami, Head of Research, Geographic Survey Institute, Ministry of Construction, Government of Japan

 

Dr. Greg Smith, NIMA Senior Scientist, NURI Program; Dr. Richard A. Berg, NIMA Senior Scientist, NURI Program

 

Mike Goodchild made the following presentations:

 

"The New Generation of Scholars: Do They Really Need Us?" Invited presentation, Association of College and Research Libraries, American Library Association, San Francisco, June 1997.

 

"Views from the U.S. Mapping Science Committee: Past, present, and future". University Consortium for Geographic Information Science 1997 Annual Assembly and Summer Retreat, Bar Harbor, ME, June.

 

"An Update on the US National Center for Geographic Information and Analysis" and "GIS and Its Impacts on Organizations: Effects of New Technologies". ESIG '97, Lisbon, June.

 

"The GIS Research Agenda". Keynote presentation, SCANGIS '97, Stockholm, June.

 

The Information Systems team presented at the DL '97 Workshop on Metadata and Thesauri and the SIGIR '97 Workshop on Networked Information Retrieval, July/August in Philadelphia.

 

Babs Buttenfield gave the following presentations:

 

Buttenfield, B.P. and Tsou, M. H. Encapsulated Operators for

Processing Geographic Information. Second Workshop on Progress in

Automated Map Generalization, Gavle, Sweden, 19-21 June 1997.

 

Buttenfield, B.P. Overview of the Alexandria Digital Library

Project. Invited Lecture given 28 May 97, Institut fur Geomatik und

Geoinformation, Technische Universitaet Wien, Vienna Austria.

 

Buttenfield, B.P. Overview of the Alexandria Digital Library

Project. Invited Lecture given 28 May 97, Institut fur Geographie,

Wirdschaft Universitaet, Vienna Austria.

 

Buttenfield, B.P. Delivering Maps to the Information Society: A

Digital Library for Cartographic Data. Proceedings, 17th Conference of the

International Cartographic Association, June 1997, Stockholm, Sweden: 1409

- 1416. (Paper listed above also, under publications).

 

Buttenfield, B.P. Metadata Ingest in Principle and in Practice.

Colloquium presented to the National Snow and Ice Data Center (NSIDC),

University of Colorado-Boulder, 5 Sept 97 (hosted by Professor Roger Barry).

 

In addition, ADL staff attended the following meetings/conferences:

 

Joint Workshop on Metadata Registries, July 8-10, in Berkeley

 

ACM Digital Libraries '97, July 23-26, in Philadelphia

 

ACM SIGIR (Information Retrieval), July 27-31, in Philadelphia

 

FGDC/ISO Advisory Group for TC211 Metadata Standard, August 5-6, in Reston

 

5th International Symposium on Large Spatial Databases, Berlin, Germany, July

 

23rd International Conference on Very Large Databases, Athens, Greece, August

 

Dan Andresen attended the CMU DLI workshop in June and Andresen, Ibarra, and Yang attended IPPS'97 (International Symp. on Parallel Processing) to present the results of client-server scheduling for wavelet image browsing.

 

8.0 Publications

 

Andresen, D. and T. Yang, "Multiprocessor Scheduling with Client Resources to Improve the Response Time of WWW Applications" Proceedings of the 11th ACM International Conference on Supercomputing (ICS97), Vienna, July 7-11, 1997.

 

Beard, M.K., Smith, T.R., "A Framework for Meta-Information in Digital Libraries." In Managing Multimedia Data: Using Metadata to Integrate and Apply Digital Data. A. Sheth and W. Klaus (Eds.) McGraw Hill. (forthcoming).

 

Beard, M.K., Sharma, V., "Multidimensional Ranking in Digital Spatial Libraries." Special Issue of Metadata. Journal of Digital Libraries. (forthcoming).

 

Beatty M., and Manjunath, B.S., "Dimensionality Reduction Using Multi-Dimensional Scaling for Image Search", to appear in the Proceedings of IEEE International Conference on Image Processing, Santa Barbara, CA, October, 1997.

 

Buttenfield, B.P. "Talking in the Tree House: Communication and Representation in Cartography." Cartographic Perspectives (forthcoming).

 

Buttenfield. B.P., "Why Don't We Do It on the Web? Distributing Geographic Information Via the Internet." Transactions in GIS, Invited Editorial (forthcoming).

 

Buttenfield, B.P., "Delivering Maps to the Information Society: A Digital Library for Cartographic Data." Proceedings, 17th Conference of the International Cartographic Association, Stockholm, Sweden: 1409-1416, June, 1997.

 

Buttenfield. B.P., "The Future of the Spatial Data Infrastructure: Delivering Geospatial Data." GeoInfo Systems, June 1997: 18-21.

 

Chandrasekaran, S., Manjunath, B.S., Wang, Y.F., Winkeler, J., Zhangk, H., "An Eigenspace update algorithm for image analysis," (CS TR, April 1996) to appear in CVGIP: Graphical models and image processing, 1997.

 

Cheng, X., Dolin, R., Kothuri, R., Neary, M., Prabhakar, S., Wu, D. Agrawal, D., El Abbadi, A., Freeston, M., Singh, A., Smith, T., Su, J., "Scalable Access Within the Context of Digital Libraries" to appear in Advanced Digital Libraries Forum, 97. Proceedings of the IEEE forum on Research and Technology Advances in Digital Libraries--ADL '97, Washington, DC, May 1997, pp. 70-81.

 

Deng, Y., Manjunath, B.S., "Content-based Search of Video Using Color, Texture, and Motion", to appear in the Proceedings of IEEE International Conference on Image Processing, Santa Barbara, CA, October, 1997.

 

Dolin, R., Agrawal, D., Dillon, L., El Abbadi, A., "Pharos: A Scalable Distributed Architecture for Locating Heterogeneous Information Sources," Proceedings of the Sixth CIKM Conference, Las Vegas, Nevada, November, 1997.

 

Fabrikant, S. I. and Buttenfield, B.P. Envisioning User Acess to a Large Data Archive. Proceedings, GIS/LIS '97, Cincinnatti, Ohio, 27-29 November, 1997.

 

Hill, L.L., Dolin, R., Frew, J., Kemp, R.B., Larsgaard, M., Montello, D.R., Rae, M-A, Simpson, J, "User Evaluation: Summary of the Methodologies and Results for the Alexandria Digital Library, University of Califoria at Santa Barbara," Proceedings of the American Society for Information Science (ASIS) Annual Meeting, Washington D.C., November 1997. (in press)

 

Lee, C. H., Y. F. Wang, and Tao Yang, "Global Optimization for Mapping Parallel Image Processing Tasks on Distributed Memory Machines," Accepted for publication in Journal of Parallel and Distributed Processing.

 

Liang, P., and Y. F. Wang, "Local Scale Controlled Anisotropic Diffusion with Local Noise Estimate for Image Smoothing and Edge Detection," to appear in International Conference on Computer Vision, Bombay, India, January, 1998.

Ma, W. Y. and B. S. Manjunath, "A pattern thesaurus for browsing large aerial photographs," accepted for publciation, Journal of American Society for Information Science, 1997.

 

Ma, W.Y., Manjunath, B.S., "NETRA: A Toolbox for Navigating Large Image Databases," to appear in the Proceedings of IEEE International Conference on Image Processing, Santa Barbara, CA. October, 1997. A detailed version has been accepted for publicat ion in the Multimedia Systems journal.

 

Ma, W.Y., Manjunath, B.S., "Edge Flow: A Framework of Boundary Detection and Image Segmentation," Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 744-749, San Juan, Puerto Rico, June, 1997.

 

Poulakidas, A. S., A. Srinivasan, O. Egecioglu, O. Ibarra, T. Yang, "A Compact Storage Scheme for Fast Wavelet-Based Subregion Retrieval," (Journal of Theoretical Computer Science), invited submission, in preparation, 1997.

 

Prabhakar, S., D. Agrawal, A. El Abbadi, A. Singh, and T. Smith, "Browsing and Placement of Multiresolution Images on Parallel Disks" To Appear in 5th Annual Workshop on I/O in Parallel and Distributed Systems, IOPADS 97, San Jose, CA, Nov 1 7 1997.

Prabhakar, S., D. Agrawal, A. El Abbadi, and A. Singh, "Scheduling Tertiary I/O in Database Applications"To Appear in 8th International Workshop on Database and Expert Systems Applications, DEXA 97, Toulouse, France, Sept 1-5 1997.

 

Prabhakar, S., D. Agrawal, A. El Abbadi, A. Singh and T. Smith, "Browsing and Placement of Images on Secondary Storage" IEEE International Conference of Multimedia Computing and Systems (ICMSC'97), Ottawa, Canada, June 1997.

 

S. Prabhakar, D. Agrawal, A. El Abbadi, A. Singh and T. Smith, "Browsing and Placement of Multiresolution Images on Multiple Disks", To Appear in 5th Annual Workshop on I/O in Parallel and Distributed Systems, IOPADS 97, San Jose, CA, Nov 17 1997, page s 102-113.

 

S. Prabhakar, D. Agrawal, A. El Abbadi and A. Singh, "A Survey of Teriary Storage Systems and Research", Submitted to International Journal of Digital Libraries.

 

Smith, A. and Buttenfield, B.P. Georeferencing Historical Photographs for a Digital Library. Proceedings, GIS/LIS '97, Cincinnatti, Ohio, 27-29 November, 1997.

 

Strobel, N., Mitra, S.K., Manjunath, B.S., "Model-Based Detection and Correction of Corrupted Wavelet Coefficients," to appear in Proceedings of IEEE International Conference on Image Processing, Santa Barbara, CA, October, 1997.

 

Strobel, N., Sanjit K. Mitra, and B.S. Manjunath, "Progressive-Resolution Transmission and Lossless Compression of Color Images for Digital Image Libraries", 13th International Conference on Digital Signal Processing, Santorini, Greece, July 1997.

 

Strobel, N., Li, C.S., Castelli, V., "Texture-Based Image Segmentation and MMAP for Digital Libraries," to appear in Proceedings of IEEE International Conference on Image Processing, Santa Barbara, CA, October, 1997.

 

Wang, Y. F., and P. Liang, "3D Shape and Motion Analysis from Image Blur and Smear: A Unified Approach," to appear in International Conference on Computer Vision, Bombay, India, January, 1998.

 

Wang, Y. F., and Ronald-Bryan O. Alfreze, "A Unified Framework for Image-Derived Invariants," to appear in the 3rd Asian Conference on Computer Vision, Hong Kong, January, 1998.

 

Zhan, F. and B.P. Buttenfield, "Multi-Scale Representations of Digital Cartographic Lines" Cartography and GIS Volume 23 (4): 206 - 228, 1996.

 

 

I certify that to the best of my knowledge

 

  1. the statements herein (excluding scientific hypotheses and scientific opinions) are true and complete, and
  2.  

  3. the text and graphics in this report as well as any accompanying publications or other documents, unless otherwise indicated, are the original work of the signatories or individuals working under their supervision. I understand that the willful provis ion of false information or concealing a material fact in this report(s) or any other communication submitted to NSF is criminal offense (U.S. Code, Title 18, Section 1011.)

 

 

Terence R. Smith

Director, Alexandria Digital Library Project