Due date: Thursday, October 10th by 11:59pm
You are to write a Linux kernel module to help you monitor virtual and physical memory usage. You will then write a user-level program that uses your kernel module to effectively log data for analysis. Using your log tool, you will analyze the memory use of several carefully selected programs to gain an understanding of their memory use.
/proc/mguru
You will write a kernel module that registers itself in
/proc/mguru
and produces memory information when queried
via a read()
system call. Specifically, reading
/proc/mguru
should produce one line of output for each
process where each line has:
The values for pages and frames are kept in the
memory management structure for the process, called struct
mm_struct
which is declared in
linux/include/sched.h
. Each process struct
task_struct
has a pointer to its struct mm_struct
.
The struct mm_struct
field of rss
contains
the number of frames for a process while total_vm
has the
number of pages for a process.
The values of hard and soft page faults for a process are kept in
the Linux struct task_struct
(see
getrusage()
in linux/kernel/sys.c
).
The first line of /proc/mguru
output should contain
a descriptive header. The last line of /proc/mguru
should present totals for the number of processes, hard page
faults, soft page faults, pages and frames.
Here is a possible sample output (middle truncated for brevity)
from running cat /proc/mguru
:
pid Hard Soft Pages Frames 1 127 106 87 51 72 0 2 89 51 231 104 12 254 106 233 54 1152 1424 369 1393 778 1756 6461 1765 ... 18030 114 75 677 368 18031 317 1706 557 335 18084 96 26 257 90 41 3451 25260 18974 7862 TOTAL
mlog
You are to write a stand-alone, user-level utility called
mlog
that helps capture data from your
/proc/mguru
. By default, when run mlog
will
read from /proc/mguru
once per second and append this
information into a log file named mlog.log
(mlog
should clear the file of any old logs when it is
first invoked). Each line should be pre-pended with the time
(in milliseconds) it was recorded from the start of mlog
.
Note, the mlog.log
file should not include the
descriptive header from /proc/mguru
nor the line with
the TOTAL.
The following command line switches should be supported:
-snum
- read from /proc/mguru
every num milliseconds instead of the default once per second.
-ppid
- log only information on the
process with pid pid instead of the default of logging all
process information.
-tsec
- run for sec seconds instead of
the default of running until Ctrl-C is pressed.
-h
- display a simple usage message.
You can use usleep()
or setitimer()
to
periodically "wake-up" the process and read from
/proc/mguru
. See man pages for their usage.
Here are some sample mlog invocations:
mlog (default) mlog -p1425 -t60 (log only information on process 1425 for 60 seconds) mlog -s100 (log information every 100 msec) mlog -h (display the help message)
You are to conduct experiments to test some proposed hypotheses related to memory performance.
Hypothesis: Under "typical" load conditions, there many more soft page faults than there are hard page faults. To test this hypothesis, you must design experiments that allow you to run what you can justify as a "typical" load and measure faults in a reproducible manner. You then monitor the hard and soft page faults for the system. You should be able to quantitatively compare a "typical" ratio between hard and soft page faults on your Linux system.
Hypothesis: For "typical" processes, the number of physical frames is significantly less than the number of logical pages. To test this hypothesis, you will run experiments similar to those used in testing the first hypothesis but you will analyze pages and frames instead.
Hypothesis: A user process incurs no page faults for
requesting virtual memory with a malloc()
call, but it
does incur page-faults when it initializes that memory. To test
this hypothesis, you need to design experiments that let you measure
the page faults over time during malloc()
memory use.
Hypothesis: When a process incurs a hard page fault, it is "charged" with multiple hard page faults as the kernel brings in adjacent pages. To test this hypothesis, you should design experiments that should induce a few (say, 2) page fault in a controlled manner and measure how many hard and soft page faults are actually recorded.
Hypothesis: running the code segment:
for(i=0; i < MAX; i++)
for(j=0; j < MAX; j++)
array[j][i] = 0;
produces significantly more page faults than the same code with
array[i][j]
instead of array[j][i]
for some
values of MAX
. To test this hypothesis, you should
design experiments to run the code for different values of MAX. You
should then repeat the runs with [i] and [j] changed. In your
analysis, you should be able to approximate for what values of MAX do
page fault differences become significant.
For each set of experiments, you must provide details on
The most important hint is to read all the hints in this section. Refer back to these hints and your project progresses.
See Project 2 for hints on on loadable modules.
The Linux page size is 4 Kbytes (4096 bytes).
In running your experiments, you can use calls like
gets()
or scanf()
to pause the program
in order to clearly separate events in the memory log.
Remember, if you run the same workload consecutive times in memory you may have memory caching effects. In this case, access to a page, say, from the program, may already in memory and will not cause a page fault the 2nd time the workload is run whereas it did the first time it was run. This may influence your results.
For help in parsing command line parameters, you might see get-opt.c
which uses the user-level
library call getopt()
. See the manual pages on
getopt
for more information.
If you want to use mlog
to monitor only a specific
process workload, you can use fork()
to create a process
and pass the id returned to an exec()
of
mlog
.
The tool grep
is useful for parsing text-based log
files in Unix. For example: grep ^1345 mlog.log
will
display all the log entries with 1345 as the process id (the '^'
matches the beginning of the line, only). See the man pages for more
details.
Visualizations, such as graphs or charts, even simple ones, are typically much better representations of data than just tables of numbers. All graphs should include:
A title which summarizes what the graph is showing. The title can be at the top of the graph or in the caption below the figures.
The independent variable on the x-axis (horizontal axis) and the dependent variable on the y-axis (vertical axis). The independent variable is the one that you manipulate, and the dependent variable is the one that you observe. Note that sometimes you don't really manipulate either variable, you observe them both. In that case, if you are testing the hypothesis that changes in one variable cause changes in the other, put the variable that you think causes the changes on the x-axis.
Labels on the x-axis and the y-axis. Where appropriate, these labels must include the units of measurement. Examples include "arm length (in cm)" "time (in generations)" and "number of head bumps per week."
Numbers on the x-axis and the y-axis. These should be evenly spaced numbers, such as 0.4, 0.5, 0.6, 0.7.... If the numbers on your x-axis aren't evenly spaced (for example, they go 0.45, 0.5, 0.6, 0.62, 0.63) it means you chose the wrong kind of graph. When there are more than one set of points, they should be labeled (in the form of a legend or key or next to each line). If you only have one set of points, do not include a legend.
If you are using Windows, MS Excel has good support for drawing graphs. You might try this tutorial http://www.urban.uiuc.edu/Courses/varkki/msexcel/graphs/Default.html to get started.
For Unix (including Linux), gnuplot
has good support
for drawing graphs. You might see http://www.gnuplot.info/ for more
information.
You will be graded on:
/proc/mguru
module (30%)
mlog
utility (35%)
You must hand in the following:
/proc/mguru
module.
mlog
utility.
The turnin (/cs/bin/turnin
) for proj3 is "proj3".
When turnin, also include file "group.txt" which contains the
following:
group_name login_name1 last_name1, first_name1 login_name2 last_name2, first_name2 ...
Also, before you use turnin tar
up (with
gzip
) your files. For example:
mkdir proj3 cp * proj3 /* copy all your files to submit to proj2 directory */ tar -czf proj3.tgz proj3
then copy your files from your Fossil client to your CCC account:
scp proj3.tgz login_name@ccc:~/ /* will ask for your ccc passwd */ ssh login_name@ccc /* will ask for your ccc passwd */ /cs/bin/turnin submit cs3013 proj3 proj3.tgz
Send all project questions to the TA mailing list.
Send all Fossil administrative questions to the Fossil mailing list.