CS 502 Operating Systems                                                                          WPI,
Spring 2006
Hugh C. Lauer                                                                                        Project
4 (30 points)
Assigned: Monday, March 27, 2006                                          Due:
Monday, April 17, 2006
This assignment is an opportunity for you to use Unix sockets to build a real web server. You may use the sample code provided below as a starting point. Your server will actually serve Web pages. It will respond to real HTTP requests and reply with appropriate responses. You can test your server using a standard web browser. However, a part of this project is to also build a simple web client for testing your server and seeing what it is doing.
Your server will be started from a command line as follows:–
% server <directory name> [optional port #]
That is, the first argument is the name of a directory where the server will look for web pages. The second argument is optional; it is the port number by which a browser or your web client contacts the server. If you do not specify the second argument, your server should use a default port programmed into it. (However, see the note below.)
The server should first allocate a socket, bind() it to the port, and start listening using listen(). It should then go into a simple loop as follows:–
1. Wait for and accept() the next connection from a client and read the client’s request
2. Send information back to the client on the accepted connection
3. Close the accepted connection
4. Go back to step 1.
The client will send an HTTP “GET” request specifying the web page that it wants. If you can find the web page, you will send it back to the client and then close the connection. If not, you must respond with an error before closing the connection.
Since your server will not be using the standard http port, your client must explicitly specify the port that your server is serving. For example, if your port is 4242, and if your server is running on CCC4, then the URL for accessing the WPI admissions page would be
            http://ccc4.wpi.edu:4242/admissions.html
There are two HTTP rules that you must implement (and many others that you may ignore). First, each requested web page must be prefixed with the directory name in the argument of the server command line. For example, WPI’s web pages are stored in the directory
/www/docs
If this is specified on your command line, then you would look for the file
/www/docs/admissions.html
to serve the URL above. (Try this one; it seems to work.)
Second, if the requested web page either ends with a “/” character or resolves to the name of a directory, you must add “index.html” to the path name and search for that file. For example, if the client or browser specifies either of the URLs
http://ccc4.wpi.edu:4242/News or http://ccc4.wpi.edu:4242/News/Features
your server should serve one of the pages
/www/docs/News/index.html
/www/docs/News/Features/index.html
You will only be responsible for serving web pages that actually map to files. Some web pages invoke scripts – for example, WPI’s home page at
/www/docs/index.html
Your server will be able to respond with the html text in the file, but it will not know how to react to the further communication that the script initiates.
You should return an error for all requests that do not map to regular files after following these two mapping rules.
Note that Unix and Linux have a rule about programs that bind sockets to ports, namely that port numbers may not be re-used in rapid succession. I.e., if your program binds to port #4242, then after it terminates, you cannot immediately rerun it and bind to the same port again. This rule is instituted to allow time for stale references to the port to flush themselves from the network.
One other thing you have to do is to figure out a way of exiting from your server cleanly.
You may use as a starting point the sample code on
http://www.cs.wpi.edu/~cs502/s06/CodeFragments/sockserver.c
The relevant socket functions are socket() to create the socket, bind() to bind the socket to port, listen() to create a request queue and to start listening for requests, and accept() to accept a connection and create a new socket on which to reply to that connection.
Once you have accepted a connection, your server needs to read and handle an HTTP request. The following is an example generated by a browser for the page index-t.html located on ccc1.wpi.edu at port #4242. The first line of the request contains the type of request. You will only need to recognize and handle the GET request. Following the GET request is the name of the object being requested and the HTTP version. You must extract the name of the object, and you may ignore the HTTP version.
The remaining lines are HTTP request headers. You may ignore them, but you still need to read them. Your server should keep reading lines until it encounters a blank line.
GET /index-t.html HTTP/1.0
Connection: Keep-Alive
User-Agent: Mozilla/4.7 [en] (X11; U; SunOS 5.7 sun4u)
Host: ccc1.wpi.edu:4242
Accept: image/gif, image/x-xbitmap, image/jpeg, image/png, */*
Accept-Encoding: gzip
Accept-Language: en
Accept-Charset: iso-8859-1,*,utf-8
To aid you in reading the request a line at a time, the routine sockreadline() has been provided. You may find this at
http://www.cs.wpi.edu/~cs502/s06/CodeFragments/sockreadline.c
This routine receives a character at a time from a given socket and stores these characters in a NULL-terminated character buffer. It returns when the newline (\n) character is reached. The HTTP specification expects that all lines are terminated with a CR (carriage return) character followed by a LF (line-feed) character. In C and C++, these are represented as “\r” and “\n,” respectively, and they are referred to below as “CR/LF.”
There are many HTTP response codes, but for this project we will use only two: – 200 and 404. If you receive a GET request for an object that can be successfully mapped to a file, then you should open the file for reading using the system call open(). If you can successfully open the file, then your server should first send the HTTP response
HTTP/1.0 200 OK\r\n\r\n
indicating success followed by a blank line. Note that two sets of CR/LF characters are sent, the first to terminate the HTTP line and the second to represent the blank line. Subsequently, you should use the read() function to read the contents of the file and send it on to the connection. (The reason for using read() rather than text-based I/O routines is that not all of the content is guaranteed to be text.) When you have completed reading, use close() to close the file and to close the socket connection. Your server is now done handling the request, and it is ready to wait for the next request.
If the request is not valid or the object cannot be mapped to a file and successfully opened, then your server should send back to the client the response
HTTP/1.0 404 Not Found\r\n\r\n
This indicates failure. You should then close the socket connection and wait for the next connection.
To test your client, you can use a standard web browser. However, to make testing easier and to be able to see the response headers, you should create a simple Web client. This client should connect to a given port on a given host and send a minimal request string. Use command line arguments to control your client. For example:–
% webclient ccc1 4242 /News/index.html
can be used to request the object from port 4242 on the machine CCC1. Your simple client will need to connect to the port and send the GET line, patterned after the one above and ending with CR/LF. It should then follow the request with a blank line — i.e., a standalone CR/LF. You may use “HTTP/1.0” as the version. Your client should then receive back the response headers and content from the server and print them to the standard output stream.
Beware of requesting images with your simple client, because the content will likely not print very well. You may test your web client with any standard web server by sending to the well-known web server port 80.
For your reference, a sample web client can be found on
http://www.cs.wpi.edu/~cs502/s06/CodeFragments/sockclient.c
This does not do exactly what is requested, but it should serve as guidance for how to build your web client.
The web server and client together are worth 20 points of the 30 points of this project.
For an additional five points on the project, modify your web server to fork a new process to handle each request. Let the child process use the connection socket returned by accept() while the parent process returns to the top of the loop to wait for the next connection. This will allow multiple requests to be handled in parallel.
This modified web server should be invoked the same was as your original server, but with the optional additional argument fork — i.e.,
% server <directory name> [optional port #] [fork]
This part should be very straightforward, but you need to sure that the child processes exit cleanly and close their own sockets. Also, a child processes that terminate become “zombies” until the parent waits for them. Use wait3() for this purpose. An example code fragment is
int pid;
int status;
struct rusage usage:
while ((pid = wait3(&status, WNOHANG, &ruse))
> 0)
   /* loop */ ;
Test your server by having several web clients or browsers requesting different pages at the same time from different windows.
For the final five points on the project, modify your web server to spawn a new thread to handle each request. Let the spawned thread use the connection socket returned by accept() while the main thread returns to the top of the loop to wait for the next connection. This will allow multiple requests to be handled in parallel. Your modified web server would be invoked by the command line
% server <directory name> [optional port #] [thread]
Remember that sockets and file descriptors belong to the processes, not to individual threads. A thread must remember to close its own socket and its own file, but it must not close any other sockets or files and it must not leave any sockets or files lying around. Also, since the threads all run in the same address space, they must avoid sharing buffers or variables.
This project is NOT a team project; it is to be done individually.
As always, you should assume that you are writing code suitable for inclusion in an operating system. For example, you should never assume that the user or the user’s browser submits correct input.
All code must be clearly commented. All output and printouts must be easy to understand and cleanly formatted.
When you do later parts of the project, be sure that you do not corrupt earlier, previously working parts. You may do this by making a copy of the code before developing the later part or by retesting the original code on the earlier part.
For this assignment, please use the turnin program, i.e., the command line tool for turning in assignments on CCC computers. Information about this tool can be found on
http://web.cs.wpi.edu/Help/turnin.html
This class is ‘cs502’, and the assignment is ‘project4’. Therefore, the “turnin” command would be
            /cs/bin/turnin submit cs502 project4
<your files>
Your submission should include
1. The files containing the code for all parts of this assignment.
2. The makefiles for building the executable programs and information on the system on which your program is built.
3. The test files or input that you use.
4. Files that capture the input and output for building and running and testing the programs.