CS3013 Project 4a

Process/Thread Accounting

Due date: Tuesday, October 11th by 11:59pm

This project is designed to give you some in-depth experience in modifying the kernel related to process and thread creation.

Index

Process/Thread Accounting
Hints
Hand In

Process/Thread Accounting

Counting New Processes and New Threads

You are to add code to the Linux kernel that allows tracking of the number of copies a process makes of itself and the number of threads it creates.

Creating another process in Linux is done with fork(). The functionality of creating another thread in Linux is done with clone(). The clone() system call is not designed to be run by a typical user process, but rather is typically used via a user-level thread library, such as the Posix pthreads library.

Internally, the kernel routine do_fork() implements both fork() and clone(). The do_fork() routine is found in linux/kernel/fork.c.

In do_fork(), near the top of the function, the new task_struct is copied from the parent task_struct. At this point, you can add code to keep track of duplicates. Later on, there is a comment about copying "copy all the process information". Here, the helper routines make copies of process information that is not shared (if it is a new process). In particular, the copy_mm() routine determines if there is to be a new process or if the calling routine is a new thread. You can add appropriate code to track creation of a thread.

Getting Resource Usages

The getrusage() system call returns system resource information about a process. The rusage structure it uses has a number of fields, such as time used, messages sent, page faults, and context switches (not all of which are filled in by a given operating system). In particular, the Linux kernel does not record per-process context switches. You can use the context switch fields from the getrusage to return the number of processes created and the number of threads created. Make the voluntary context switch field represent the number of processes created and the involuntary context switch field represent the number of threads.

The getrusage() system call is located in linux/kernel/sys.c. The system call itself is sys_getrusage() which calls the internal function getrusage(). The filled in fields of the rusage structure come from the structure task_struct. The struct task_struct contains information about each process (task) in the system and is located in linux/include/linux/sched.h. The task_struct fields are modified when a process is created and exits, in linux/kernel/fork.c and linux/kernel/exit.c respectively. They are also modified in linux/kernel/timer.c and in the directory arch/i386/mm, but this is not strictly needed for this project.

You need to extend the functionality of getrusage() to return meaningful values for the number of threads and processes created. You will need to add fields to the struct task_struct to keep track of threads and processes created for each process and it's children. You can model your changes to getrusage() based on how minor and major page faults are handled, although the method of counting is different (see above).

Once you have your kernel changes implemented, you should be able to verify that they work by writing some user level programs and using getrusage() in conjunction with fork() and pthread_create() code. See the samples section on the cs3013 Web page.

Create a small program that fork() and exec() in conjunction with getrusage() to observe the number of processes and threads various user applications make. In particular, try a make on the kernel (or other large software program) and a Web browser. Describe the tasks you did and the number of processes/threads created for each.

Hints

To track the number of processes and threads created, you will probably need to modify include/linux/sched.h to add information to struct task_struct. When you add to struct task_struct, you also need to change the INIT_TASK macro (also in sched.h) to be sure the initial values are in place. Also, note that sched.h has a lot of files depending upon it, meaning there will be a lot that need recompilation every time you modify it. So, change sched.h as few number of times as possible (design twice, compile once).

The following system calls might be useful:

fork() -- to create a new process.
getrusage() -- to get information about resource utilization.
pthread_create() -- to create a new thread.

If you find yourself struggling, you might proceed carefully through the following steps:

Write a test program that correctly executes the default, unmodified getrusage() system call. You might make several versions of the test program that do different amounts of computation vs. I/O to observe how the getrusage() values vary.
Familiarize yourself with the getrusage() system code and related routines that modify rusage values. Use printk() statements as needed to build up confidence where to add your modifications.
Add fields into the struct task_struct to record number of processes and threads created, for both for the process itself and its children. Once you have the structure changes in place, just initialize the values to a fixed, non-zero value, such as one, so you can verify your code is working. When you call getrusage() at this point it will just return this fixed value. Your code should accumulate values for child processes when these processes exit (as done for other fields in linux/kernel/exit.c under the release() section). Test your code with a process that creates many child process and you should see the number of context switches increase for each forked child process.
Modify linux/kernel/fork.c to properly record processes and threads. You may use printk() statements here to build up confidence. Verify your changes with simple test programs.
Proceed with the questions/evaluation.
Turn in the project. Relax!

Hand In

You must hand in the following:

All modified source code files for your solution (for example, the entire fork.c.
A compiled version of your kernel.
Instructions on how to incorporate your code into the kernel tree and compile it.
All user-level programs written in your evaluations/tests.
Answers to your questions/evaluation.

The turnin (/cs/bin/turnin) for proj4 is "proj4". When turnin, also include file "group.txt" which contains the following:

        group_name
        login_name1  last_name1, first_name1
        login_name2  last_name2, first_name2
        ...

Also, before you turnin tar up (with gzip) your files. For example:

        mkdir proj4
        cp * proj4  /* copy all your files to submit to proj1 directory */
        tar czf proj4.tgz proj1

then:

        scp proj4.tgz login_name@ccc:~/
        ssh login_name@ccc    /* will ask your ccc passwd */
        /cs/bin/turnin submit cs3013 proj4 proj4.tgz

Return to 3013 Home Page

Send all project questions to the cs3013-staff at cs.wpi.edu mailing list.

Send all Fossil administrative questions to the fossil at cs.wpi.edu mailing list.