Performance issues are critical to the success of ADL. Performance bottlenecks include both network bandwidth and server processing capability. While we expect that network communication technology will improve steadily, particularly with the advent of ATM and BISDN, improvements in the performance of Internet servers will have to match huge increases in expected access requests. For example, in 1993 alone, the weekly access rate for NCSA's HTTP server at UIUC increased from 91K to 1.5M. The peak request arrival rate at NCSA exceeds 20 requests/second, while current high-end workstations can handle only about 4 requests/second [8]. The requests to the NCSA server typically involve only simple file retrieval operations, whereas requests to the ADL server will involve much more intensive I/O and CPU activities with, for example, the use of run-time wavelet transforms and content-based searching on metadata and images.
The Alexandria Project, like many other projects, is investigating parallel computation [2] to address performance issues, for example, scheduling on multiprocessors and parallel I/O, parallel forward wavelet transform in image ingest, and parallel reverse wavelet transform for efficient browsing of multi-scale images. Although the sequential wavelet transform has a complexity proportional to the size of the image, the computing time is still slow for a large image (e.g. 2 minutes in a SPARC-10 workstation for a 1024x1024 image) and parallelization of wavelet transforms is beneficial. The main research is the efficient compression and retrieval of wavelet coefficients and image reconstruction with the support of parallel I/O. In this paper, we focus on improvements in the HTTP server performance.
A goal of the Alexandria Project
is to develop high performance DL servers
based on clusters of workstations, parallel machines,
and parallel I/O facilities. The system we are developing
includes networked Sun and DEC workstations and a
Meiko CS-2 (distributed memory machine).
As depicted in Fig. 4
,
the main component of the system
is a distributed scheduler that
executes information retrieval and processing operations in response to HTTP
requests.
Figure 4: The parallel server.
We have developed the preliminary version of SWEB, a parallel HTTP server which contains a set of collaborative processing units, each of which is capable of handling a user request. The distinguishing feature of SWEB is resource optimization by close collaboration of multiple processing units. Each processing unit is a workstation (e.g. SUN SPARC or a Meiko CS-2 node) linked to a local disk. The disks are NFS-mounted to all processing units. Resource constraints affecting the performance of the server are: CPU speed and memory size of one processing unit; the background load imposed by non-SWEB processes; I/O bandwidth between the processing unit and its local disk; network latency and bandwidth between a processing unit and a remote disk, when the accessed files are not stored in the local disk; and disk contention when multiple I/O requests are accessing the same disk. Scalability of the server is achieved by actively monitoring the CPU, disk I/O, and network loads of system resource units, and then dynamically scheduling user HTTP requests to a proper workstation for efficient processing.
Our experiments indicate that SWEB provides a sustained round-trip performance of response time when the number of requests reaches 5 to 30 million per week. The following table shows the performance of SWEB on 6 Meiko CS-2 nodes, compared with single-node server performance, in accessing image files of average size 1.5MB when the number of requests varies from 2 to 20 per second. The round-trip total response time in seconds is improved significantly by using multiple processing units, and the response time does not change significantly when the number of requests per second (rps) increases. We have observed similar speedups using a multi-node server when varying the size of image files.
NCSA [8] has built a multi-workstation HTTP server based on round-robin domain name resolution to assign requests to workstations. This technique is effective when HTTP requests return relatively uniform-sized chunks of HTML. For ADL, however, the computational and I/O demands of requests may vary dramatically because of large images and variable-sized metadata. We have compared the round-robin approach to our load-balancing approach for different file sizes and have observed a 20% to 50% improvement in performance.