CS3013 Project 1

Slicing Time

Due date: Friday, September 12th by 11:59pm

Context switches occur when the operating switches from running one process to running another. You are to modify the Linux kernel to record context-switch information for each process. You are then to design experiments to measure the time slice quantum, the context switch time and time to create and destroy processes in the Linux kernel.

Description

Getting Resource Usages

The getrusage() system call returns system resource information about a process. The rusage structure it uses has a number of fields, such as time used, messages sent, page faults, and context switches, not all of which are filled in by a given operating system. In particular, the Linux kernel does not record per-process context switches. In this project, you will extend the Linux kernel implementation so that the getrusage() system call returns information about context switches.

The getrusage() system call is located in linux/kernel/sys.c. The system call itself is sys_getrusage() which calls the internal function getrusage(). The filled in fields of the rusage structure come from the structure task_struct. The struct task_struct contains information about each process (task) in the system and is located in linux/include/linux/sched.h. The task_struct fields are modified when a process is created and exits, in linux/kernel/fork.c and linux/kernel/exit.c respectively. They are also modified in linux/kernel/timer.c and in the directory arch/i386/mm.

You need to extend the functionality of getrusage() to return meaningful values for voluntary and involuntary context switches. You will need to add fields to the struct task_struct to keep track of context switches for each process, for both for the process itself and its children. You can model your changes on how minor and major page faults are handled, although the method of counting them is different (see below).

Counting Context Switches

The scheduler is a kernel function called schedule() (located in linux/kernel/sched.c), that gets called from other system call functions (for example, when a process goes to sleep waiting for I/O), after every system call and after some interrupts. When invoked, the scheduler:

Performs some basic periodic tasks, like handling interrupt service routines (not a concern of this project)
Chooses one process to execute according to the scheduling policy
Dispatches the chosen process to run

You do not need to be concerned about specific scheduling policies for this project.

Linux maintains a counter kstat.context_swtch, which is a global counter that is incremented whenever a context switch occurs. This increment occurs in the schedule() function, when the process identified by the task_struct pointer variable prev switches to the task_struct pointer variable next (where prev and next are different). At this point you can insert a statement to increment the total number of context switches (both voluntary and involuntary) for the process pointed to by prev (since you will keep track of the number of context switches from a process rather than to a process so you should use prev rather than next). You will then have the total number of context switches, both voluntary and involuntary. In Linux, processes that are running or ready to run have their state in their tast_struct set to TASK_RUNNING. So, at this point, a voluntary context switch means the process is in a state other than TASK_RUNNING (most likely waiting for I/O). Therefore, if prev->state is not TASK_RUNNING then the voluntary context switch count can be incremented, too.

Once you have your kernel changes implemented, you should be able to verify that they work by writing some user level programs and using getrusage(), or fork() and exec() if you wish to measure the context switches of other system programs.

Experiments

After you have implemented your getrusage() changes and debugged it carefully, you will then design experiments to measure: 1) the amount of time required to perform a context switch, 2) the amount of time required to create and destroy a process, and 3) how long a time slice is (for a CPU intensive process). Since the time scales for these operations are very small (typically smaller than the clock granularity of 10 ms), you will measure the time for many operations and then divide by the number of operations performed. Also, you will need to do multiple runs in order to account for any variance in the data between runs.

To measure the time for context-switching consider using a basic CPU-intensive process that counts up to a large number (about 2000000000 on a typical Fossil client). If you run two or more such processes, they will take longer to perform the count. This extra time is the overhead contributed by the context switches.

Process creation in Linux is done via the fork() system call (check out the sample fork.c). You can use the wait() system call to have the parent process block until a child process has exited. Do a man fork or man 2 wait for more information.

When your experiments are complete, you must turn in a brief (1-2 page) write-up with the following sections:

Design - describe your experiments, including: a) what programs/scripts you ran (use pseudo-code); b) how many runs you performed; c) how you recorded your data; d) what the system conditions were like; e) and any other details you think are relevant.
Results - clearly depict your results clearly using a series of tables or graphs. Provide statistical analysis including at least mean and standard deviation.
Analysis - interpret the results. Briefly describe what the results mean and what you think is happening and any subjective opinions you may have.

Hints

You will probably need to modify include/linux/sched.h to add information to struct task_struct. When you add to struct task_struct, you also need to change the INIT_TASK macro (also in sched.h) to be sure the initial values are in place. Also, note that sched.h has a lot of files depending upon it, meaning there will be a lot that need recompilation every time you modify it. So, change sched.h as few number of times as possible (design twice, compile once).

When writing kernel code, you will want to print messages to stdout, as you do in printf(). Since many parts of the kernel may not have access to the stdio library, kernel developers wrote their own version of printf() called printk(). printk() basically behaves the same as printf(), in terms of formatting. Furthermore, printk() also writes messages to the log file /var/log/messages, so you can view output there in case your modified OS crashes. You might add prefixes to your printk() messages, such as "MLC: " or "Fossil: " so you can more easily pick out your messages from the log file (using grep, perhaps). But be careful! If you have printk() messages in a part of the kernel that is accessed frequently (like the scheduler) it can fill up your log file quickly. When this happens, your system can become unstable. Check the size of your log file (using ls -l) and the disk space that is free (using du) frequently.

Remember to save your work frequently in case you crash your machine or need to "roll-back" to a previous working source code version! Refer to http://fossil.wpi.edu/ for more information on how to do this and general use of the Fossil lab and other useful Linux links.

The following system calls might be useful:

fork() -- to create a new process.
getrusage() -- to get information about resource utilization.
gettimeofday() -- to get the wall-clock time.
wait() -- to wait for a process to terminate.
execve() -- to execute a file. The call execvp() may be particularly useful.

When running your experiments, you need to be careful of processes in the background (say, a Web browser downloading a page or a compilation of a kernel) that may influence your results. While multiple data runs will help spot periods of extra system activity, try to keep your system "quiet" so the results are consistent (and reproducible). You may consider running your experiments on a system with minimal number of other system processes. An easy way to do this on Linux (and other versions of Unix) is via the runlevels. The command telinit (run as root by using sudo) will put the computer into different runlevels, with level "1" (single user mode) being a good candidate. Do a man telinit or a man inittab for more information.

If you find yourself struggling, you might proceed carefully through the following steps:

Write a test program that correctly executes the default, unmodified getrusage() system call. You might make several versions of the test program that do different amounts of computation vs. I/O to observe how the getrusage() values vary.
Familiarize yourself with the getrusage() system code and related routines that modify rusage values. Use printk() statements as needed to build up confidence where to add your modifications.
Add fields into the struct task_struct to record context switches. You need fields for both the process itself and its children. Once you have the structure changes in place, just initialize the values to a fixed, non-zero value, such as one, so you can verify your code is working. When you call getrusage() at this point it will just return this fixed value. Your code should accumulate values for child processes when these processes exit (as done for other fields in linux/kernel/exit.c). Test your code with a process that creates many child process and you should see the number of context switches increase for each forked child process.
Modify linux/kernel/sched.c to properly record context switches. You may use printk() statements here to build up confidence, but this code is a core part of the operating system and will result in numerous log messages so pay attention to the size of the log file.
Proceed with the project experiments.
Turn in the project. Relax!

Hand In

You must hand in the following:

All modified source code files for your solution (for example, the entire sched.c and sys.c files),
A compiled version of your kernel.
Instructions on how to incorporate your code into the kernel tree and compile it.
Your experiments. Include:
1. source code of experiment programs/scripts and brief instructions how to run them
2. a table or graph of your measured results
3. your writeup of design and analysis

The turnin (/cs/bin/turnin) for proj1 is "proj1". When turnin, also include file "group.txt" which contains the following:

        group_name
        login_name1  last_name1, first_name1
        login_name2  last_name2, first_name2
        ...

Also, before you turnin tar up (with gzip) your files. For example:

        mkdir proj1
        cp * proj1  /* copy all your files to submit to proj1 directory */
        tar czf proj1.tgz proj1

then:

        scp proj1.tgz login_name@ccc:~/
        ssh login_name@ccc    /* will ask your ccc passwd */
        /cs/bin/turnin submit cs3013 proj1 proj1.tgz

Return to the 3013 Home Page

Send all project questions to the TA mailing list.

Send all Fossil administrative questions to the Fossil mailing list.