Web Server Organization

Harvest

Look at Chankhunthod, et al paper on Harvest from USENIX'96. Widely cited paper on early Web server experience.

Squid

Commonly used proxy server. The next version of Harvest. Why Squid? ``All the good ones are taken'' (Harris' Lament). http://www.squid-cache.org/

Like Harvest it:

Available for most (all?) Unix platforms.

Lots of implementation features in versions 1.0 and 1.1. Some of the highlights.

Private (single client) vs. public objects. Only public objects are saved on disk as part of cache.

Can use ICMP (ping) to determine nearest parent cache.

Cache Coherency

If-Modified-Since Get. If IMS Get received then handle in one of three ways:

Object Purge Policy

Uses a LRU replacement algorithm for cached (on disk) objects. Its aggressiveness in purging depends how much store swap space is available--the less space, the more aggressive in purging objects.

All objects within a hash bucket that exceed a LRU age threshold are purged. The entire cache is scanned every 24 hours.

Memory Use

Rough allocation of memory (assuming machine only runs the Squid server):

Multithreading

What about using real multithreading? Could then be used effectively on a multi-processor.

Theoretically looks straightforward. Code is too complex now to seriously consider re-writing. Trade one set of problems for another.

Cache Digests

Caches share compact digests with other caches. Based on Bloom Filters. Paper: fan:sigcomm98.

2.0 Features

Web Server Benchmarks

Look at WebSTONE and SPECweb.

Generating Server Load

Look at Banga and Druschel paper from USITS97. Found that clients do not scale because they back off waiting for TCP connection. Build more scalable clients. Slides:
http://www.cs.wpi.edu/~cs535/s03/banga:usits97/

Operating Systems Support for Busy Internet Servers

Paper in HotOS'95 conference by Jeff Mogul. Based on experience with Digital work in running large Internet information servers (IISs). Particularly 1994 California election service.

Characteristics of IISs distinguishing them from distributed systems applications:

Makes comparison with transaction-processing systems in handling lots of short-duration requests, but notes they don't require frequent, fast and synchronized updates of stable storage.

Other Observations

Lack of benchmarks (particularly high load).

How does the cost of a fork() compare with cost of select() handling large number of file descriptors--latter approach used in Harvest/Squid.

Notes potential problems of IIS systems in overload situation leading to livelocked.

Lack of scaling of some OS facilities--such as linear search for PCB entries versus more efficient approaches.

Operating System ``Wish List''

Apache Web Server

http://www.apache.org. Comparisons of servers at http://webcompare.internet.com

Based on NCSA httpd 1.3 (early 1995) server. It is a Unix-based HTTP server. Most popular WWW server on the Internet.

Result of a group of core contributors with patches to the original server. Core group continues to control development.

A PAtCHy server--hence Apache.

Lots of features.