Due date: Tuesday, March 25th, 2014 at 11:59pm
You've been working on your MQP for the last 2 terms are well on
your way to writing the final report. At 4am, fuzzy-brained after
pulling 2 all-nighters, you accidentally type rm mqp.tex
(you are using LaTeX, of course). Argh, months of effort wasted! You
vow not to be bitten the same way at the end of your MQP. But how to
avoid this? Then, inspiration strikes you ... you'll use your
new-found file system knowledge to build a way of doing automatic file
backups instead of just deleting them!
Your dumpster diving will consist of several utilities that can
take the place of existing system utilities. Basically, you will have
a version of rm
that puts deleted files into a separate
"trash can" (a directory) rather than actually deleting them. Users
can then recover these "deleted" files with another utility, named
dv
(Unix-speak for "dive"). A dump
program
completely removes all files in the trash can. All utilities should
work silently if successful but should report appropriate error
messages when unsuccessful.
It is expected that each user has a trash directory they specify with the TRASH environment variable. Your programs can expect this directory to be created ahead of time. If the trash directory is not created, your utilities should display and appropriate error message and exit.
You must do your coding in C/C++ in Linux, such that it runs on the WPI CCC machines. You can develop in Cygwin in Windows, if that is an easier development environment, but must function on the CCC machines when you turn it in. Cygwin is installed on all the Zoo lab computers.
The three utilities are presented in more detail next.
You will write a program named rm
(intended to
transparently replace /bin/rm
) that moves files to a
directory specified by the user using the rename()
system
call. (Note, you do not (and should not) actually replace the
/bin/rm
call. Rather, a user can just be sure your new
rm
appears first in their path.) If the file to be
removed is on the same partition as the trash directory, the file
should not be copied but instead should be renamed (or hard-linked).
If the file being removed is not on the same partition as the trash
directory, your rm
will copy the file and then delete it
(via the unlink()
or remove()
system call).
If there is a file with the same name already in the trash can, the
new file should receive a .num
extension, where
num
is 1, 2, 3 ...
The file permissions, including access times, should be preserved.
You can use the stat()
system call to gather these values
when a new entry must be made, and chmod()
and touch()
to set them.
rm
should support the following command line
options:
-f
: force a complete remove, do not move to trash can
-h
: display a basic usage message
-r
: recurse directories
file [file ...]
- file(s) to be removed
-t
option that
allows removal to a different trash can that is specified by name.
If the file to be removed is a directory (and the -r
flag is given), rm
should move the directory and its
contents to the trash directory (if the file to be removed is a
directory but the -r flag is not given, rm
should return
an appropriate error message).
You will write a program named dv
to (potentially)
restore any file that has been "deleted". Your dv
looks
in the trash directory and, if the indicated file is found, moves it
to the current working directory. You can use getcwd()
to obtain the current working directory. If dv is called on a
directory, the directory and all of the files inside should be
restored.
If a file or directory with the same name already exists in the
current directory, dv
should report an appropriate error
message and exit.
The file permissions, including access times, should be preserved.
You can use the stat()
system call to obtain this
information when a new entry must be made.
dv
should support the following command line
options:
-h
: display a basic usage message
file [file ...]
: file(s) to be restored
-a
option
that lists (or restores) all files with the same name, but also those
with a .1, .2, ... extension.
You will write a program named dump
to permanently
remove files from the trash directory. To actually remove files,
dump
should use unlink()
and for directories,
rmdir()
(or use remove()
for either).
dump
should recursively remove files and directories, as
needed, until the trash directory is empty.
dv
should support the following command line
options:
-h
- display a basic usage message
You cannot use the system()
call at all.
For help in parsing command line parameters, you might see get-opt.c
which uses
the user-level library call getopt()
. See the manual
pages on getopt
for more information.
See stat.c
for
sample code on how to get status information (using
stat()
) about a file, including permissions, access time,
directory information.
See env.c
for sample
code on how to (more easily) access environment variables (using
getenv()
), such as TRASH.
See ls.c
for sample
code on how to do a directory listing (say, for recursive removal
of a directory).
See touch.c
for
sample code on how to set the modification times (using
utime()
) on a file.
Other system calls that may be useful:
chmod()
- change (user, group, other) permissions
chown()
- change ownership
link()
- add a hard link to a file
unlink()
- delete a file
rmdir()
- remove a directory
remove()
- delete a file or directory
rename()
- change the name or location of a file
getcwd()
- get current working directory for a process
access()
- can be used to check for existence of a file
basename()
and dirname()
- split full
path to dir and filename (warning! can modify incoming string so copy
first)
You can use perror()
to print appropriate text-based
strings for system call errors, or strerror()
to more
generally get the same error string. The include files <errno.h>
and <linux/errno.h> may be useful for using error codes.
Lastly, although not required for this project, you could easily
put an entry into your
crontab
to execute dump
periodically (say,
once per week).
If you are having trouble getting started, you might start with rm and consider the following notes:
In general:
malloc()
is
fine to start).
stat()
call, looking
at st_dev
or by doing the rename()
call,
when an errno of EXDEV could be set.
Suggested development steps:
argv[1]
) and TRASH set using getenv()
.
rename()
.
access()
, adding
extension .num.
/tmp
), using
dirname()
and basename()
.
stat()
with
st_dev
or EXDEV
error set by rename()
call.
unlink()
once done.
stat()
and utime()
.
stat()
and
S_ISDIR()
.
opendir()
and readdir()
, providing each file name for moving.
getopt()
.
After you have implemented and debugged your utilities, you will
then design experiments to measure: 1) the amount of time required
(the latency, in milliseconds) to perform either a
link()+unlink()
or a rename()
system call;
2) measure the throughput (in bytes per second) in copying a large
file across separate partitions; 3) use data from 1 and 2 to estimate
the time for doing an rm -r
(using your rm
)
on a large directory (10s of directories with 10s+ of files) in a
separate partition, then measure this time. Note, you should look
into using
sync()
to ensure your disk operations are actually
flushed to the disk and not cached. For both sets of measurements,
you will need to do multiple runs in order to account for any variance
in the data between runs.
To measure the rename()
or (un)link()
system calls, since the time scale for test is very small, you will
measure the time for many operations and then divide by the number of
operations performed. You will want to build a harness (a program,
shell script, perl script or something similar) to make repeated
requests.
In order to record the time on your computer (instead of, say,
looking at the clock on the wall) you should use the
gettimeofday()
system call from a program,
localtime()
from a perl script or
/usr/bin/time
from a shell (note, some shells have a
built-in time
command, too).
The exact means you use to gather timing is up to you. You could put in timing hooks in your program, perhaps that can be turned on or off via compile options or command line programs, that you use for measurements. Another recommendation is to leave as much of your programs intact as you can. In this case, you might initiate timing from a shell script that calls your program, or from a separate timing program you write program that calls your program. Whatever method you use should be specified in your writeup.
When your experiments are complete, you must turn in a brief (1-2 page) writeup with the following sections:
You must hand in the following:
rm.c
, dv.c
, and dump.c
.
Note! Make sure your code is well-structured and commented
if you expect to receive partial credit.
.h
files.
If you do not have your files on the CCC machines, then copy your
entire working directory to your CCC account. Then, login to the CCC
machines (using slogin
or putty
).
Use tar
with gzip
to archive your files.
For example:
mkdir lastname-project1 cp * lastname-project1 /* copy all the files you want to submit */ tar czvf project1-lastname.tgz lastname-project1 /* compress */Submit your assignment (
project1-lastname.tgz
):
/cs/bin/turnin submit cs4513 project1 project1-lastname.tgz
Not needed, but you can verify your submission via:
/cs/bin/turnin verify cs4513 project1
If you need more information, see Using the turnin Program for additional help with turnin.
A grading guide shows the point breakdown for the individual project components. A more general rubric follows:
100-90. All three utilities meet all specified requirements. Programs are robust in the face of possible errors, such as insufficient file permissions or user errors. Code builds and runs cleanly without errors or warnings. Experiments effectively test all required measurements. Experimental writeup has the three required sections, with each clearly written and the results clearly depicted.
89-80. The three utilities meet most of the specified requirements, but a few features may be missing. Programs are robust in the face of most errors. Code builds cleanly and runs mostly without errors or warnings. Experiments mostly test all required measurements. Experimental writeup has the three required sections, with details on the methods used and informative results.
79-70. The three utilities are in place, but with only the minimal functionality and without all required features. Code compiles, but may exhibit warnings. Programs may fail ungracefully under some conditions. Experiments are incomplete and/or the writeup does not provide clarity on the methods or results.
69-60. Not all of the utilities are in place and/or significant parts are not functional. Code compiles, but may exhibit warnings. Programs may fail ungracefully under many conditions. Experiments are incomplete and the writeup does not provide clarity on the methods or results.
59-0. Not all of the utilities are in place and for the code that is there, significant parts are not functional. Code may not compiles without fixes. Programs may not run under even slightly more advance conditions. Experiments are incomplete with a minimal writeup.
Send all questions to the TA mailing list (cs4513-staff at cs.wpi.edu).