CS3013 Project 3

Memory Guru

Due date: Thursday, October 10th by 11:59pm


Index


Overview

You are to write a Linux kernel module to help you monitor virtual and physical memory usage. You will then write a user-level program that uses your kernel module to effectively log data for analysis. Using your log tool, you will analyze the memory use of several carefully selected programs to gain an understanding of their memory use.


/proc/mguru

You will write a kernel module that registers itself in /proc/mguru and produces memory information when queried via a read() system call. Specifically, reading /proc/mguru should produce one line of output for each process where each line has:

The values for pages and frames are kept in the memory management structure for the process, called struct mm_struct which is declared in linux/include/sched.h. Each process struct task_struct has a pointer to its struct mm_struct. The struct mm_struct field of rss contains the number of frames for a process while total_vm has the number of pages for a process.

The values of hard and soft page faults for a process are kept in the Linux struct task_struct (see getrusage() in linux/kernel/sys.c).

The first line of /proc/mguru output should contain a descriptive header. The last line of /proc/mguru should present totals for the number of processes, hard page faults, soft page faults, pages and frames.

Here is a possible sample output (middle truncated for brevity) from running cat /proc/mguru:

   pid     Hard    Soft    Pages   Frames
   1       127     106     87      51
   72      0       2       89      51
   231     104     12      254     106
   233     54      1152    1424    369
   1393    778     1756    6461    1765
   ...
   18030   114     75      677     368
   18031   317     1706    557     335
   18084   96      26      257     90
   41      3451    25260   18974   7862    TOTAL


mlog

You are to write a stand-alone, user-level utility called mlog that helps capture data from your /proc/mguru. By default, when run mlog will read from /proc/mguru once per second and append this information into a log file named mlog.log (mlog should clear the file of any old logs when it is first invoked). Each line should be pre-pended with the time (in milliseconds) it was recorded from the start of mlog. Note, the mlog.log file should not include the descriptive header from /proc/mguru nor the line with the TOTAL.

The following command line switches should be supported:

You can use usleep() or setitimer() to periodically "wake-up" the process and read from /proc/mguru. See man pages for their usage.

Here are some sample mlog invocations:

  mlog (default)
  mlog -p1425 -t60 (log only information on process 1425 for 60 seconds)
  mlog -s100 (log information every 100 msec)
  mlog -h (display the help message)


Evaluation

You are to conduct experiments to test some proposed hypotheses related to memory performance.

  1. Hypothesis: Under "typical" load conditions, there many more soft page faults than there are hard page faults. To test this hypothesis, you must design experiments that allow you to run what you can justify as a "typical" load and measure faults in a reproducible manner. You then monitor the hard and soft page faults for the system. You should be able to quantitatively compare a "typical" ratio between hard and soft page faults on your Linux system.

  2. Hypothesis: For "typical" processes, the number of physical frames is significantly less than the number of logical pages. To test this hypothesis, you will run experiments similar to those used in testing the first hypothesis but you will analyze pages and frames instead.

  3. Hypothesis: A user process incurs no page faults for requesting virtual memory with a malloc() call, but it does incur page-faults when it initializes that memory. To test this hypothesis, you need to design experiments that let you measure the page faults over time during malloc() memory use.

  4. Hypothesis: When a process incurs a hard page fault, it is "charged" with multiple hard page faults as the kernel brings in adjacent pages. To test this hypothesis, you should design experiments that should induce a few (say, 2) page fault in a controlled manner and measure how many hard and soft page faults are actually recorded.

  5. Hypothesis: running the code segment:

          for(i=0; i < MAX; i++)
             for(j=0; j < MAX; j++)
                array[j][i] = 0;
    
    produces significantly more page faults than the same code with array[i][j] instead of array[j][i] for some values of MAX. To test this hypothesis, you should design experiments to run the code for different values of MAX. You should then repeat the runs with [i] and [j] changed. In your analysis, you should be able to approximate for what values of MAX do page fault differences become significant.

For each set of experiments, you must provide details on


Hints

The most important hint is to read all the hints in this section. Refer back to these hints and your project progresses.

See Project 2 for hints on on loadable modules.

The Linux page size is 4 Kbytes (4096 bytes).

In running your experiments, you can use calls like gets() or scanf() to pause the program in order to clearly separate events in the memory log.

Remember, if you run the same workload consecutive times in memory you may have memory caching effects. In this case, access to a page, say, from the program, may already in memory and will not cause a page fault the 2nd time the workload is run whereas it did the first time it was run. This may influence your results.

For help in parsing command line parameters, you might see get-opt.c which uses the user-level library call getopt(). See the manual pages on getopt for more information.

If you want to use mlog to monitor only a specific process workload, you can use fork() to create a process and pass the id returned to an exec() of mlog.

The tool grep is useful for parsing text-based log files in Unix. For example: grep ^1345 mlog.log will display all the log entries with 1345 as the process id (the '^' matches the beginning of the line, only). See the man pages for more details.

Visualizations, such as graphs or charts, even simple ones, are typically much better representations of data than just tables of numbers. All graphs should include:

Graph Tips

If you are using Windows, MS Excel has good support for drawing graphs. You might try this tutorial http://www.urban.uiuc.edu/Courses/varkki/msexcel/graphs/Default.html to get started.

For Unix (including Linux), gnuplot has good support for drawing graphs. You might see http://www.gnuplot.info/ for more information.


Hand In

You will be graded on:

You must hand in the following:

The turnin (/cs/bin/turnin) for proj3 is "proj3". When turnin, also include file "group.txt" which contains the following:

        group_name
        login_name1  last_name1, first_name1
        login_name2  last_name2, first_name2
        ...

Also, before you use turnin tar up (with gzip) your files. For example:

        mkdir proj3
        cp * proj3 /* copy all your files to submit to proj2 directory */
        tar -czf proj3.tgz proj3

then copy your files from your Fossil client to your CCC account:

        scp proj3.tgz login_name@ccc:~/  /* will ask for your ccc passwd */
        ssh login_name@ccc               /* will ask for your ccc passwd */
        /cs/bin/turnin submit cs3013 proj3 proj3.tgz

Return to the 3013 Home Page

Send all project questions to the TA mailing list.

Send all Fossil administrative questions to the Fossil mailing list.