5 RESEARCH ACTIVITIES AND PROGRESS

Home Alexandria Digital Library: ANNUAL REPORT Prev Next

5 RESEARCH ACTIVITIES AND PROGRESS

As noted previously, it is virtually impossible to separate in a meaningful manner the progress in our research activities from the progress of our development activities, since the two sets of activities are highly coupled. In particular, much of the output of research efforts has taken the form of implementations in the various testbed systems.

While some of the results of our research were described above in relation to testbed development, we now summarize our main research results in terms of the activities of each of the research and development teams.

Before describing our research activities, it is important to note that a major contribution of our research activities in the first full year of the Project was the identification of important research issues. This output of our research is laid out in our plans for research and development in Annual Program Plan (see below).

5.1 LIBRARY TEAM

Membership: M. Goodchild (leader), Carver, Geffner, Gottsegen, Kemp, Kothuri, Larsgaard, Simpson, Smith

This Library Team is responsible for investigating a variety of issues relating to the nature of the ADL collection and to their characterization in terms of metadata in the catalog component. These issues include important problems relating to the integration of spatially referenced information objects into ADL. Subteams of the Library Team are investigating issues relating to the design and construction of an "Alexandria Atlas" and issues relating to the nature and representation of metadata as well as catalog interoperability

User requirement specifications

The Team conducted an examination of specification of user requirements for a digital library with access to spatial materials. An initial concern was ensuring that the WP preserved as much of the functionality of the RP as possible, while extending it significantly. As well as a specification of user requirements for a gazetteer, the Team investigated other requirements concerning user search, and in particular, the expansion and narrowing of search in relation to the definition of themes. This research was stimulated by work in other DLI projects to investigate sources of information that can be used to build networks and spaces that capture the relationships between themes and feature types, and that might support search expansion. Expansion and narrowing were also investigated in a geographic context based on geographic hierarchies.

Research on gazetteers for ADL

An important requirement determined by the Team concerned the addition of a second method for specifying the locational component of a query: namely through the use of a gazetteer and named places. Hence the specification of user requirements included those for the functionality and design of a gazetteer (defined as an index connecting names of features to geographic location.) A survey of existing digital gazetteers was undertaken as a basis for constructing a prototype digital gazetteer for the Web Presence. Basic issues researched included: defining the gazetteer, examining the suitability of existing sources, determining appropriate extensions of the concept of a gazetteer in a DL context; models of feature extent that can be used to drive base map displays and queries; and hierarchical structures that can be coded into digital gazetteers to enhance functionality. A draft specification for a gazetteer implementation was developed.

Several major sources of digital gazetteers were found to suffer in two areas: lack of information on physical extents of features that can be used to help define the area of search; and lack of features that are important for querying but not well enough defined for traditional purposes. These "fuzzy" features include informally defined geographic areas such as neighborhoods and regions. The Team investigated a range of compromises that would allow the Web Presence prototype to offer functionality in these areas.

Fuzzy footprints

In relation to the research on gazetteers, the Team investigated the concept of fuzzy footprints. In particular, the concept was investigated by Dan Montello from the perspective of methods for eliciting geometric definitions of fuzzy regions from human subjects; methods for storing geometric definitions in digital form; and methods for executing queries based on fuzzy regions.

Survey of standards and protocols

A survey of standards and protocols impacting Alexandria was undertaken. A document detailing such standards and protocols is being revised on a continuing basis.

Survey of Web applications for geospatial data

A comprehensive survey of existing Web applications in the general area of geospatial data was developed.

User/librarian discourse analysis

The Team investigated methods for capturing knowledge about the discourse between user and librarian, as a source of information on likely queries for the Information Systems Team, and as a basis for user interface design for the Interface Design and Evaluation Team. A specification document was developed.

5.1.1 METADATA AND CATALOG INTEROPERABILITY SUBTEAM

Membership: Smith (leader), Geffner, Gottsegen, Gritton, Hill, Larsgaard

A relatively new subteam of the Library Team has been investigating two issues of major importance for the ADL catalog. The first concerns the construction of fundamental models of catalog metadata and the second concerns the development of models for catalog interoperability in terms of exchange.

Knowledge representation and a general model of metadata

The subteam has conducted a survey of knowledge representation languages to determine their suitability for respesenting and exchanging metadata in DL catalogs. The subteam has constructed a general model of metadata based upon the knowledge representation schema of representational structures, and has produced an initial catalog design based upon this model. The initial investigations of this subteam led to a successful proposal to the Central Imagery Office (CIO) to examine both of the issue of general models for metadata and semantic exchange of metadata in great depth over the next three years.

5.1.2 ALEXANDRIA ATLAS SUBTEAM

Membership: Carver (leader), Frew, Goodchild, Kemp, Larsgaard, Simpson, Smith

Another relatively new subteam of the Library Team has been investigating the design and functionality of an "atlas" that would support graphical/geographical access to library materials, in a manner that greatly generalizes our current map browsers.

Design and construction of an "Alexandria Atlas"

The Team has defined the requirements and functionality of an "atlas" that would act as a graphical/geographical interface supporting direct access to a large variety of materials by geographical reference. As well as identifying requirements, the team has acquired selected datasets, identified high level tools, and constructed the beginnings of an "electronic atlas requirement." The datasets already acquired are: the Digital Chart of the World, the Vector Shoreline Dataset, the Digital Line Graphs for the United States and, the three arc second Digital Elevation Models for the United States. Other datasets are being sought. The atlas gazetteer is close to completion and has more than 6.5 million entries. A live link between the map interface and the gazetteer will be constructed as part of the implementation. Extent from the above vector datasets will also be added to the gazetteer database so gazetteer features may be accessed directly from the map interface. The design for this component is underway. A draft requirement for atlas functionality is near completion. Work assignments will be made when this document is complete.

5.2 INTERFACE DESIGN AND EVALUATION TEAM

Membership: Montello (co-leader, UCSB), Buttenfield (co-leader, Colorado), Carver, Dillon, Dolin, Green, Kumler, Larsen, Larsgaard, Nishikawa, Rae, Simpson

The main function of the Team is to investigate issues relating to the design of the user interface, the functionality of the system, and the evaluation of the system from the users' point of view.

The original Interface Design Team and the User Evaluation Team were combined to form the Interface Design and Evaluation Team during the preceding year. Although they both operated as two teams for over half of the year, we have combined descriptions of the research activities of the two teams into a single description.

Survey of Commercial GUI development packages

The Team conducted a detailed survey and comparative study of many commercial GUI development packages. The purpose of the study was to: formulate a set of evaluation criteria; determine a suitable metrics for evaluation; evaluate a number of GUI development packages to determine their strength/weakness and their suitability for adoption in the Alexandria project; and to provide a recommendation and justification of the development platforms. A report was prepared.

User interface requirements The Team investigated the issue of user interface requirements relating to four categories (conceptual, functional, operational, and developmental) and formulated an ADL interface requirement document. The document is based on the requirements and functionalities of the RP and includes as core functionality:

  1.  the composition of queries relating to spatial search;
  2.  the display of spatially-indexed materials of both raster and vector type;
  3.  the browsing of metadata and spatially-indexed items returned by search;
  4.  the user configuration of the UI defaults and options;
  5.  the efficient retrieval of data items in various native formats.

The WP design was checked against the requirements document to ensure that all functionality requirements were met. Specifications of UI requirements in four categories (conceptual, functional, operational, and developmental) have been formulated.

Concepts and constructs for user-defined browsing

The team has investigated and developed initial concepts and constructs for user defined browsing activities. Unlike many existing ad hoc approaches, a framework has been designed that unifies both information retrieval (queries) and presentation (browsing as a special case) functionalities. In particular, the framework supports incremental query construction, together with automated assistance in specifying some query parameters. The framework is being integrated with the constructs developed in the data modeling aspect into a concrete data model, suitable for Alexandria collections and operations, as well as a more general class of applications involving spatial data.

Concepts for an ADL OO data model

The team has investigated a conceptual data model based on an object-oriented paradigm for the Alexandria collections. It integrates metadata and data, and is conformant with the hierarchical structure of many collections. To accommodate any diversity in the underlying coordinate system of spatial data collections (e.g., non-geospatial data), the model allows a partition of objects into a set of "worlds," each supporting its own coordinate system. Initial designs of the associated query language feature flexibility, ease of use, and high expressive power. The query language supports incremental query construction, together with automated assistance in specifying some query parameters.

Model of a "virtual library"

The Team constructed a "virtual" library model that supports uniform access to (possibly foreign) library collections on the WEB and permits spawning of advanced UI features as needed. The virtual library model includes provisions for the construction of personal catalogs that refer to an individual's items of greatest interest.

Redesign of Original User Evaluation Plan

Following the February site visit at UCSB, reviewer comments indicated concern about the intended user evaluation program. Specific points were raised about creating a separate version of the Rapid Prototype to test user interface modules. The site evaluation team suggested direct user testing of the UNIX version be undertaken. After discussion, we accepted this position, discarding the original approach. Much of the spring activity involved an entire revamping of the user evaluation plan, and expansion of the User Evaluation Team at UCSB. We feel that the redesign provides a much more robust user evaluation program.

The revised user evaluation effort is a multi-phased approach integrating and expanding upon library requirements research and on quantitative paradigms commonly applied in other disciplines (for example, Education, Human-Computer Interaction, and Information Science). These paradigms include real-time transaction logging of system use, online user surveys and tutorials for the Alexandria software, and embedding user- activated buttons into the Alexandria testbed to annotate specific events in user sessions. All three types of tools have been successfully pre-tested in Buffalo and the more informative tools are being implemented in the Web version. At UCSB, an ethnographic study of library patron behavior has begun at the Map and Imagery Library (MIL). Details on these efforts are described below. New members have joined the User Evaluation Team including two key personnel at Santa Barbara, and three key personnel at the University of Colorado-Boulder (CU).

In December, User Evaluation Team Leader Barbara Buttenfield moved from Buffalo to Colorado, taking the Buffalo subcontract of the Alexandria Project with her. CU has provided temporary lab space while construction of permanent lab space is completed. CU has also met and exceeded the match originally offered by SUNY-Buffalo, including a reduced teaching load. Colorado granted Dr. Buttenfield the first semester on leave with full pay to insure that project momentum is not hampered by the move. Other matching funds from CU include two graduate research assistant positions for the duration of the Project, laboratory space and equipment. The CU Library is providing partial release time for the Map Librarian to work on Alexandria, and will purchase a small amount of computer equipment to support Alexandria testing sites in one or more Libraries on campus.

Interactive Transaction Logging

The Buffalo team coded interactive transaction logging functions and initially embedded in the UNIX Rapid Prototype running locally at Buffalo and at UCSB. Following several months of subject pre-testing, the transaction logs have been refined, and we have begun to implement them in the Web testbed this winter. The logs record and timestamp the sequence of specific icons and tools called by the user. The transaction log is not tied to screen pixels but to system commands and objects on the screen, thus capturing a higher level of user behavior than proposed a year ago. We additionally log which Alexandria windows are active, and record the names of archived image files as they are opened. We created a brief tutorial to guide new users through the library, and can inspect user logs to determine that the tutorial is being utilized, and what are the patterns of use and of use error. For example, we determined that many users are confused in using the selection pad. The transaction logs show users "clicking twice" on thumbprint icons, applying the Macintosh metaphor (clicking twice on an icon to open the file associated with that icon). The user interface utilized a separate menu tool for opening files, and transaction logs showed users consistently repeating the double click sequence in lieu of the correct menu tool. Thus we discovered a specific place to streamline the interface design.

Results of early pretests indicated that users would utilize cognitive affect buttons in the Alexandria menu. We embedded three such buttons initially: a Good button, a Bad button, and a NotePad button. Users were instructed in the tutorial to click on the affect buttons when they particularly liked or disliked something about the interface, and (optionally) to insert a comment annotating their opinion. Use of the affect buttons is included in the transaction log, thus we can monitor where in a sequence of events a user is delighted, confused, or frustrated by the system interface, network response, data holdings, and so forth. The cognitive affect buttons are currently embedded in the Alexandria Web testbed.

Ethnographic Studies of Library Patron Behavior

Judith Green has begun an ethnographic analysis of library users. We see the work as a formative evaluation that is provided to inform as the other teams are revising and redefining the interface and the system. Three analyses have been undertaken.

First, the team has analyzed the demonstration protocol. We identified issues of accessibility to the language and the content of the information currently on the web. We have identified "insider" language that is used by members of the culture of ADL that are not accessible to "outside" audiences, actual and potential ADL users. We did this by giving a group of potential users the material and asking them to identify terms that were problematic or strange to them, concepts or phrasing that was troublesome to them as readers, and information that they needed. This information will be given to the development team by Monday. The study is called, ADL-Speak. We have ten participants in this study. We have reported some of our findings to the users group already and they have used them as scenarios to begin to revise the front matter to make it more user friendly. The outcome of this study was discussion of and agreement on the need to include "hot buttons" to allow people access to a glossary and the need for information that will provide an overview of capacity for the user.

Second, the team is collecting data on users with different levels of expertise, of transcribing their comments as they think aloud about what they are doing during their search, and of creating tapes that illustrate the problems. The tapes will provide feedback to the user group. We transcribe the tapes to identify problems, successes, and strategies needed in building an accessible interface. The first tape was a two hour session with someone who has knowledge of the web and web searching but not knowledge of ADL. This tape provided insight into problems of access, needed areas of information, and issues of needed tutorials. This tape will provide input into the development of the tutorials. We are currently analyzing, transcribing and collecting data on knowledgeable users-members of the development team and a reference librarian. These tapes will allow us to identify insider knowledge and what sophisticated users understand, expect, and do as they search. The contrast across user groups will provide a basis for identifying insider knowledge so that we can make the library content accessible to a range of users.

The third study is of the reference librarians and how they conduct actual reference interviews. the librarians are taping actual reference interviews for us. These will be transcribed and analyzed and the findings from this used to inform the development of the interface and library access group's work (UIE). This phase is just beginning.

We plan to bring in a broad range of users from grade 5 through adults in order to develop an interface and system that supports access to a broad range of users.

Preparation of a CD-ROM version of the Rapid Prototype

A different research activity has been supported by efforts at ESRI, one of our corporate partners. ESRI ported a subset version of the Alexandria Rapid Prototype over to a Windows platform, to capture the user audience lacking access to UNIX (as in many public libraries and elementary schools). The Windows version was burned on CD ROM in the Fall. The Buffalo Team designed a Windows version of the UNIX tutorial and a questionnaire which was included on the CD-ROM. Unfortunately, no interactive logging was included on the CD ROM version, as several ARC VIEW commands available on UNIX are not available on the Windows port). UCSB and Buffalo collaborated to compile a list of names for roughly 2500 copies of the CD to be distributed across the country. The CDs went out in late Fall, and to date we have received a few dozen responses by regular mail, and many users have sent comments and questions to the Buffalo electronic mail account.

5.3 INFORMATION SYSTEMS TEAM

Membership: El Abbadi (leader), Agrawal, Frew, Kothuri, Prakhabar, Singh, Smith, Su, Wu

Taxonomy of user queries

The Team investigated a taxonomy of user queries.

Survey and evaluations of data models

The Team investigated and evaluated the suitability of a variety of existing data models in the relational and object-oriented paradigms that provide support for spatial data. The Team also examined associated query languages. An investigation was completed on a generalized relational model which uses geometrical constraints to provide a finite representation for infinite sets of points in space (lines, regions, polyhedra). In evaluating query languages for this model, the use of aggregation operators to compute areas and volumes was completed.

Suitability of O2 OODBMS

The Team investigated the suitability of the O2 DBMS in which these structures can be implemented. The goal of the investigation is to examine the suitability of implementing the Alexandria catalogue in O2 and integrate it with the planned Web-server for Alexandria. The reason for this approach is twofold. First, the metadata standards are intrinsically amenable to OODBMS. Second, an OODBMS implementation of the Alexandria catalog will provide the opportunity to deal with multiple as well as heterogeneous servers. The Team has completed a preliminary object-oriented modeling of FGDC and USMarc metadata standard for spatial data and investigated the implementation of the OO schema in O2.

Unifying browsing and retrieval

The Team investigated various concepts in relation to user defined browsing activities. A framework that unifies both information retrieval (queries) and presentation (browsing as a special case) of incremental, assisted query specification was investigated. The goal was to find a framework that supports incremental, assisted query specification. The framework was integrated with the constructs developed in the data modeling investigations into a concrete data model, suitable for Alexandria collections and operations, as well as a more general class of applications involving spatial data. This model supports the notion that browsing is simply a display of totally ordered elements.

Search of gazetteer

The Team's investigated the gazetteer in relation to the issue of providing rapid access to collection items that contain named instances of specific classes of features. The gazetteer for the WP was initially implemented in an RDBMS (SYBASE). Although the translation from exact feature names to geographic locations is fast, SYBASE provided limited functionality ("like" predicates and "soundex" function) to deal with fuzziness in feature names. But this limited functionality is either too slow ("like" predicates) or returns too much unnecessary information ("soundex"). To effectively deal with fuzziness in query specification, the gazetteer is now based on the text-processing package ConQuest. The initial experience of the Team is that this provides much better performance.

Content based retrieval

The Team investigated mechanisms to facilitate content-based retrieval in image databases. In particular, evaluations were made of two approaches to reduce the dimensionality of multidimensional data: Fourier Transform and Singular Value Decomposition. Feature extraction is used to summarize image content in terms of multidimensional vectors. Unfortunately, the dimensionality of these vectors is typically quite large ranging from 24 to 120. None of the existing index structures (e.g. R-trees and its variants) can cope with this dimensionality for both point queries and range queries. Often, when moving to the image data domain, similarity search (or range queries) becomes necessary.

One criterion for the goodness of a content-based retrieval is that there should be no false dismissals while minimizing the set to avoid false hits. Fourier transforms and Singular Value Decompositions are being used to reduce the dimensionality of the image vectors from say 24 to perhaps 4, 6, or 8. Reduced dimension data using R*-trees and clustering will be used for fast retrieval. Exhaustive tests were performed to determine the most suitable technique to implement in the WP.

Indexing methods for spatially-indexed data

In the area of search structures, the Team investigated various multi-dimensional index structures such as R-trees, R*-trees, R+ trees, and BV-trees, and completed a preliminary qualitative analysis of these search structures. Although hierarchical structures are prevalent in spatial data domains, the issue of indexing for such nested data has received little attention in the database and indexing community. Several issues in this regard have been investigated while designing index structures for hierarchical data. B-trees and related structures can only index unidimensional "point" data. The Team extended B-trees (to IB-trees) to handle data objects that span a range of values rather than single-valued points in the data space.

Two different approaches were investigated for indexing multidimensional hierarchical data. The first decomposes the d-dimensional data objects into d intervals, one per dimension, and indexes the intervals in each dimension separately. The second approach organizes all data objects at the same level together using standard spatial indexing schema.

The Team investigated experimentally the new indexing scheme that it designed, called "Level-Based Interval B-trees". Such trees are well-suited for containment queries. It is the first index structure with logarithmic worst case bounds (in level of nesting and size of data) for single-dimensional interval data. In experiments, it has proved to be up to 10 times more efficient than existing index structures. The proposed index structure also generalizes very easily to higher dimensions, and it is possible to get a good speedup on parallel machines. This was shown through experiments on the Meiko parallel machine. The inherent simplicity of the design allowed it to be more efficient than a parallel implementation of R* trees.

Content-based placement for "wavelets" on secondary storage

The Team investigated content-based image placement and browsing and investigated and evaluated several strategies for storing wavelet coefficients on multiple parallel disks so that thumbnail browsing as well as image reconstruction can be done efficiently. These strategies can be classified into two broad classes depending on whether or not the content of the images is used in the placement of the image coefficients. The simulation results indicate that if content based retrieval is used to access the images, then this information should also be used for the placement of images on disk. In particular, when content-based placement is used to store image coefficients on disk, performance improvements of up to 40% are achieved using as few as four disks.

5.4 IMAGE PROCESSING TEAM

Membership: B. Manjunath (leader), Y. Ma, S. Mitra, N. Stroebel, Y. Wang

The Image Processing Team is responsible for investigating issues concerning the representation, storage, and access of image related data. The Team also aids the development team in adapting their research findings and recommendations for the testbed system. Particular foci of activity for the Team are wavelet decompositions for storage, manipulation and transmission of images, and access of images by content.

Image browser

The Team developed and implemented a stand-alone image browser. It was primarily designed to demonstrate the functionality of progressive and selective image reconstruction at multiple resolutions. In addition, different image enhancement methods for zooming and interpolation were investigated. A more efficient browser which can be used on an arbitrarily-sized image (or image segment) was developed for the WP.

An investigation of the browsing tool indicated that fast system response is more important than accurate image reconstruction at the intermediate levels. The accuracy of the intermediate representation depends both on the particular image data as well as the choice of wavelet filters. A basic problem under investigation is to quantify the optimality of a given representation.

Optimal wavelets

Although many good wavelet filters for our application have already been found and tested, the choice of an "optimal" wavelet remains difficult. As a basic problem, there is a need to establish the criteria defining the optimality in the context of project Alexandria. The team has investigated the performance of an optimal uniform mean square quantizer in representing all wavelet coefficients to ensure that the disk space necessary for storing a wavelet-based multiresolution representation does not exceed that of the original image. In addition, popular wavelet filters have been compared with respect to their reconstruction performance and computational complexity. Based on this work the Team has concluded that, for the ADL application, the Haar wavelet filters offer an appropriate compromise between reconstruction performance and computational efforts. Extension of the previous quantization scheme to incorporate lossless reconstruction is an ongoing activity.

Storage of wavelets

While designing the uniform quantization scheme the Team discovered that one could store an encoded error image instead of the first-level wavelet coefficients. The Team therefore investigated the advantages that this storage modification permits with respect to perfect reconstruction of the original image within original storage limits without affecting progressive data transmission.

Texture features for browsing and retrieval

Research on content-based retrieval in image data bases has focused on using image properties, such as color, texture, histogram, and shape, for searching through images. The Alexandria Project has made considerable progress in developing algorithms for texture based search. These algorithms are being used in implementing content based search in the web prototype using image texture as the measure of content.

The Team has investigated and developed an effective texture feature extraction scheme. The scheme is based on the multiresolution Gabor wavelet decomposition. Simple statistical moments, such as the mean and standard deviation of the filtered outputs, can then be used as indices to search the database. The Team has compared the performance of different texture features in terms of the retrieval accuracy and efficiency. These evaluations were performed using the Brodatz texture album, with over 100 different textures. Gabor filters demonstrably offer the best performance among the multiresolution texture features that have been compared (i.e. the tree structured wavelet transform, the conventional orthogonal and bi-orthogonal transforms, and the multiresolution autoregressive model). In order to reduce the image processing time, the Team developed an adaptive filtering scheme that can be used to reduce the image processing computations while maintaining retrieval accuracy and speed.

In relation to the catalog component, methods for indexing library items using these wavelet based texture features are being investigated. The Team conducted extensive experiments on the entire Brodatz texture album and is using the developed methodology in searching satellite image data. In collaboration with the database researchers, the team is addressing issues related to indexing and search in the feature space.

For the WP, the Team has created a design for a database of aerial photographs which can be searched using texture templates. At the time of ingest, these images are analyzed and texture information is extracted. A small set of texture templates are created which represents the different textures that may occur in these photographs. At the time of user initiated search, the user can chose a region of interest, and search the database based on the texture information within the region.

The Team is currently investigating the use of neural network based learning algorithms for unsupervised clustering and for learning suitable distance metrics for image comparisons. Future research emphases will be on integrating different visual cues (such as texture, shape, color, etc.) for image retrieval.

5.5 PERFORMANCE AND PARALLEL PROCESSING TEAM

Membership: Yang (leader), Andresen, Egecioglu, Ibarra, Poulakidas, Srinivasan, Zheng

It is clear that the success of DL's in general, and of ADL in particular, is heavily dependent on high performance computing. It is our belief that parallel processing, particularly in the form of networks of workstations, has an important role to play in achieving this high performance. The responsibility of the Performance and Parallel Processing Team is to identify and investigate aspects of ADL that will benefit from high-performance computing on multi-computers. In particular, the Team is investigating various performance issues arising from the ADL environment in terms of both space and time complexities. It is also developing algorithms and software techniques for high performance digital libraries.

A scalable WWW server on multicomputers

The Team has investigated issues involved in developing a scalable WWW server on a cluster of workstations and parallel machines, using the Hypertext Transport Protocol (HTTP). The main objective is to improve the processing capabilities of the ADL server by utilizing the power of multicomputers to match the demands of simultaneous access requests from the WWW.

The team has developed and implemented a system called SWEB on a distributed memory machine, the Meiko CS-2, and networked SUN and DEC workstations. Each processing unit is a workstation linked to a local disk. The disks are NFS-mounted to all processing units. Scalability of the server is achieved through effective resource utilization by actively monitoring the run-time CPU, disk I/O, network loads of system resource units, dynamically scheduling user HTTP requests to a proper workstation for efficient processing.

The distinguishing feature of the scheduling scheme is that it considers the aggregate impact of multiple resource load factors (e.g. CPU, I/O channels and interconnection network) on the choice of processor assignment. Previous work typically considered one resource load factor in the scheduling scheme.

The team has conducted extensive experiments to examine the overall performance of this system. and tested several performance factors that affect scalability issues. Among the issues examined, for example, was how many requests per second could be processed in delivering regular files, such as image thumbnails or text and also accessing subregions of compressed wavelet data. The team also studied the improvement of response time and drop ratios when the number of server nodes is varied.

Experiments with SWEB indicate that the system provides a sustained round-trip performance when the number of requests reaches 5 to 30 millions per week. These results have been compared with those of other approaches. NCSA, for example, has built a multi-workstation HTTP server based on round-robin domain name resolution to assign requests to workstations. The round-robin technique is effective when HTTP requests access to HTML information of relatively-uniform size chunks. For ADL, however, the computational and I/O demands of requests may vary dramatically because of the large images and metadata files of variable sizes, and the round-robin approach cannot effectively utilize resources. The round-robin approach has been compared to our load-balancing approach for processing different ADL-related requests and a 20% to 50% improvement in performance has been observed.

Fast subregion retrieval, image compression, and parallel wave transforms

The team has investigated parallel wavelet transforms and related I/O storage schemes and as well as investigating parallel and scheduling techniques for supporting parallel image processing. The forward and reverse transforms have been coded and tested, yielding superlinear speedup on large images.

Experimental results arising from an implementation of a prototype of parallel wavelet transformations (forward and reverse) with support of parallel I/O facilities indicated that the storage scheme has a significant influence on the design of the algorithm. Hence a storage and compression scheme was developed in which compression techniques used by EPIC group (MIT) are combined with quadtrees. The EPIC group is using a variation of run-length and Huffman encoding methods to compress the quantized coefficient matrices created by the wavelet forward transformation.

The reason for using a hybrid coding technique based on quad-tree and Huffman coding methods is not only to achieve effective image data compression to save disk space but also to minimize the time spent in retrieving subregions. The new scheme supports decompression/retrieval of image subregions in multi-resolution data because subregion accessing is required for browsing large images and it is impossible to view an entire image in a single screen. Thus the hybrid code involves trade-offs between the compression ratio and retrieval times.

The team conducted experiments with sample satellite images from the ADL collection. Current results indicate that a 70-90% space reduction ratio can be achieved for quantized image coefficient data while the time for accessing a subregion is less than few seconds using a SPARC 5 with a SCSI-2 disk. These methods have been incorporated with the quadtree representation so that the subregion of an image can be reconstructed efficiently but still sustain a good compression ratio. The various code components are combined in building an implementation of the storage and compression scheme and developing a scheme to schedule parallel I/O accesses and wavelet transform.



Home Alexandria Digital Library: ANNUAL REPORT Prev Next
Last modified on 1996-03-08 at 16:41 GMT by the Alexandria Web Team