Look at Chankhunthod, et al paper on Harvest from USENIX'96. Widely cited paper on early Web server experience.
Commonly used proxy server. The next version of Harvest.
Why Squid? ``All the good ones are taken'' (Harris' Lament).
http://www.squid-cache.org/
Like Harvest it:
Available for most (all?) Unix platforms.
Lots of implementation features in versions 1.0 and 1.1. Some of the highlights.
Private (single client) vs. public objects. Only public objects are saved on disk as part of cache.
Can use ICMP (ping) to determine nearest parent cache.
If-Modified-Since Get. If IMS Get received then handle in one of three ways:
Uses a LRU replacement algorithm for cached (on disk) objects. Its aggressiveness in purging depends how much store swap space is available--the less space, the more aggressive in purging objects.
All objects within a hash bucket that exceed a LRU age threshold are purged. The entire cache is scanned every 24 hours.
Rough allocation of memory (assuming machine only runs the Squid server):
What about using real multithreading? Could then be used effectively on a multi-processor.
Theoretically looks straightforward. Code is too complex now to seriously consider re-writing. Trade one set of problems for another.
Caches share compact digests with other caches. Based on Bloom Filters. Paper: fan:sigcomm98.
Look at WebSTONE and SPECweb.
Look at Banga and Druschel paper from USITS97. Found that clients do not
scale because they back off waiting for TCP connection. Build more
scalable clients.
Slides:
http://www.cs.wpi.edu/~cs535/s03/banga:usits97/
Paper in HotOS'95 conference by Jeff Mogul. Based on experience with Digital work in running large Internet information servers (IISs). Particularly 1994 California election service.
Characteristics of IISs distinguishing them from distributed systems applications:
Makes comparison with transaction-processing systems in handling lots of short-duration requests, but notes they don't require frequent, fast and synchronized updates of stable storage.
Lack of benchmarks (particularly high load).
How does the cost of a fork() compare with cost of select() handling large number of file descriptors--latter approach used in Harvest/Squid.
Notes potential problems of IIS systems in overload situation leading to livelocked.
Lack of scaling of some OS facilities--such as linear search for PCB entries versus more efficient approaches.
http://www.apache.org. Comparisons of servers at
http://webcompare.internet.com
Based on NCSA httpd 1.3 (early 1995) server. It is a Unix-based HTTP server. Most popular WWW server on the Internet.
Result of a group of core contributors with patches to the original server. Core group continues to control development.
A PAtCHy server--hence Apache.
Lots of features.