CS4513 Project 1

Dumpster Diving

Due date: Sunday, January 24th, 2016 at 11:59pm


You've been working on your MQP for the last 2 terms are well on your way to writing the final report. At 4am, fuzzy-brained after pulling 2 all-nighters, you accidentally type rm mqp.tex (you are using LaTeX, of course). Argh, weeks of effort wasted! You vow not to be bitten the same way at the end of your MQP. But how to avoid this? Then, inspiration strikes you ... you'll use your new-found file system knowledge to build a way of doing automatic file backups instead of just deleting them!


Description | Hints | Experiments | Turn In | Grading

Description

Your dumpster diving will consist of several utilities that can take the place of existing system utilities. Basically, you will have a version of rm that puts deleted files into a separate "dumpster" (a directory) rather than actually deleting them. Users can then recover these "deleted" files with another utility, named dv (Unix-speak for "dive"). A dump program completely removes all files in the dumpster. All utilities should work silently if successful but should report appropriate error messages when unsuccessful.

It is expected that each user has a dumpster directory specified with the DUMPSTER environment variable. Your programs can expect this directory to be created ahead of time. If the dumpster directory is not created, your utilities should display and appropriate error message and exit.

You must do your coding in C/C++ in Linux, such that it runs on the WPI CCC machines. You can develop in Cygwin in Windows, if that is an easier development environment, but must function on the CCC machines when you turn it in. Cygwin can be installed on all the Zoo lab computers.

The three utilities are presented in more detail next.

RM

You will write a program named rm (intended to transparently replace /bin/rm) that moves files to a directory specified by the user using the rename() system call. (Note, you do not (and should not) actually replace the /bin/rm call. Rather, a user can just be sure your new rm appears first in his/her path.) If the file to be removed is on the same partition as the dumpster directory, the file should not be copied but instead should be renamed (or hard-linked). If the file being removed is not on the same partition as the dumpster directory, your rm will copy the file and then delete it (via the unlink() or remove() system call).

If there is a file with the same name already in the dumpster, the new file should receive a .num extension, where num is the next consecutive number (e.g., 1, 2, 3 ...) up to a maximum of 9.

The file permissions, including access times, should be preserved. You can use the stat() system call to gather these values when a new entry must be made, and chmod() and touch() to set them.

rm should support the following command line options:

Feel free to support other command line options that you think are useful. For example, you might want a -d option that allows removal to a different dumpster that is specified by name.

If the file to be removed is a directory (and the -r flag is given), rm should move the directory and its contents to the dumpster (if the file to be removed is a directory but the -r flag is not given, rm should return an appropriate error message). As for files, permissions and access times should be preserved when removing a directory.

Note, paths are relative to the running rm process, with the exception of that a path that begins with a "/" is absolute. For example, "/tmp/a" is absolute, but "tmp/a" is relative.

DV

You will write a program named dv to (potentially) restore any file that has been "deleted". Your dv looks in the dumpster directory and, if the indicated file is found, moves it to the current working directory (not to the original directory from which it was deleted). You can use getcwd() to obtain the current working directory. If dv is called on a directory, the directory and all of the files inside should be restored.

If a file or directory with the same name already exists in the current directory, dv should report an appropriate error message and exit.

The file permissions, including access times, should be preserved. You can use the stat() system call to obtain this information when a new entry must be made.

dv should support the following command line options:

Feel free to support other command line options that you think are useful. For example, you may want to support a -a option that lists (or restores) all files with the same name, but also those with a .1, .2, ... extension.

DUMP

You will write a program named dump to permanently remove files from the dumpster directory. To actually remove files, dump should use unlink() and for directories, rmdir() (or use remove() for either). dump should recursively remove files and directories, as needed, until the dumpster is empty.

dv should support the following command line options:

Feel free to support other command line options that you think are useful.


Hints

You cannot use the system() call at all, nor fork() nor any flavor of exec()! Likewise, no third-party libraries are allowed (e.g., you cannot use the boost libraries.)

For development questions, consider the cs4513 question-answer forum. Both the professor and TA will look to answer all questions there, but students can also answer each other's questions. You may even find your question has been answered already!

For help in parsing command line parameters, you might see get-opt.c which uses the user-level library call getopt(). See the manual pages on getopt for more information.

See stat.c for sample code on how to get status information (using stat()) about a file, including permissions, access time, and directory information.

See env.c for sample code on how to (more easily) access environment variables (using getenv()), such as DUMPSTER.

See ls.c for sample code on how to do a directory listing (say, for recursive removal of a directory).

See touch.c for sample code on how to set the modification times (using utime()) on a file.

Other system calls that may be useful:

You can use perror() to print appropriate text-based strings for system call errors, or strerror() to more generally get the same error string. The include files <errno.h> and <linux/errno.h> may be useful for using error codes.

Lastly, although not required for this project, you could easily put an entry into your crontab to execute dump periodically (say, once per week).

If you are having trouble getting started, you might start with rm and consider the following notes:

In general:

Suggested development steps for rm:

The final overall rm flow design may look something like:

  Parse command line arguments
  Check environment setup
  For each file to be removed
    If file does not exist
      Report error
    If file is directory
      Create directory in dumpster
      For each file to be removed ... (recursive)
    If file is on another partition
      Copy file to dumpster
      Change permissions & access times on file
    If file is on same partition
      Link file to dumpster
    Unlink old file
  End for

Diving and dumping should be relatively easy once rm is complete.


Experiments

After you have implemented and debugged your utilities, you will then design experiments to measure: 1) the amount of time required (the latency, in milliseconds) to perform either a link()+unlink() or a rename() system call; 2) measure the throughput (in bytes per second) in copying a large file across separate partitions; 3) use data from 1 and 2 to estimate the time for doing an rm -r (using your rm) on a large directory (10s of directories with 10s+ of files) in a separate partition, then measure this time. Note, you should look into using sync() to ensure your disk operations are actually flushed to the disk and not cached. For both sets of measurements, you will need to do multiple runs in order to account for any variance in the data between runs.

To measure the rename() or (un)link() system calls, since the time scale for a single test is very small, you will measure the time for many operations and then divide by the number of operations performed. You will want to build a harness (a program, shell, perl or python script, or something similar) to make repeated requests.

In order to record the time on your computer (instead of, say, looking at the clock on the wall) you should use the gettimeofday() system call from a program, localtime() from a perl script or /usr/bin/time from a shell (note, some shells have a built-in time command, too).

The exact means you use to gather timing is up to you. You could put in timing hooks in your program, perhaps that can be turned on or off via compile options or command line programs, that you use for measurements. Another recommendation is to leave as much of your programs intact as you can. In this case, you might initiate timing from a shell script that calls your program, or from a separate timing program you write program that calls your program. Whatever method you use should be specified in your writeup.

When your experiments are complete, you must turn in a brief (1-2 page) writeup with the following sections:

  1. Design - describe your experiments, including: a) what programs/scripts you ran and what they did (use pseudo-code); b) how many runs you performed; c) how you recorded your data; d) what the system conditions were like; e) and any other details you think are relevant.
  2. Results - depict your results clearly using a series of tables or graphs. Provide statistical analysis including at least mean and standard deviation.
  3. Analysis - interpret the results. Briefly describe what the results mean and what you think is happening and any subjective opinions you may have.


Turn In

You must use a Makefile for this project (it is good for you and good for us since it makes grading easier). The program make is a useful program for maintaining large programs that have been broken into many software modules is make. The program uses a file, usually named Makefile, that resides in the same directory as the source code. This file describes how to compile a number of targets specified. To use, simply type "make" at the command line. WARNING: The operation line(s) for a target MUST begin with a TAB (do not use spaces). See the man page for make and gcc for additional information.

You must turn in the following:

Before submitting, "clean" your code (i.e., do a "make clean") removing the binaries (executables and .o files).

Use zip to archive your files. For example:
mkdir lastname-proj1 cp * lastname-proj1 /* copy all the files you want to submit */ zip -r proj1-lastname.zip lastname-proj1 /* package and compress */

To submit your assignment (proj1-lastname.zip), log into the Instruct Assist website:

https://ia.wpi.edu/cs4513/

Use your WPI username and password for access. Visit:

Tools → File Submission

Select "Project 1" from the dropdown and then "Browse" and select your assignment (proj1-lastname.zip).

Make sure to hit "Upload File" after selecting it!

If successful, you should see a line similar to:

 Creator    Upload Time             File Name        Size    Status   Removal
 Claypool 2016-01-22 21:40:07  proj1-claypool.zip   3208 KB  On Time  Delete


Grading

A grading guide shows the point breakdown for the individual project components. A more general rubric follows:

100-90. All three utilities meet all specified requirements. Programs are robust in the face of possible errors, such as insufficient file permissions or user errors. Code builds and runs cleanly without errors or warnings. Experiments effectively test all required measurements. Experimental writeup has the three required sections, with each clearly written and the results clearly depicted.

89-80. The three utilities meet most of the specified requirements, but a few features may be missing. Programs are robust in the face of most errors. Code builds cleanly and runs mostly without errors or warnings. Experiments mostly test all required measurements. Experimental writeup has the three required sections, with details on the methods used and informative results.

79-70. The three utilities are in place, but with only the minimal functionality and without all required features. Code compiles, but may exhibit warnings. Programs may fail ungracefully under some conditions. Experiments are incomplete and/or the writeup does not provide clarity on the methods or results.

69-60. Not all of the utilities are in place and/or significant parts are not functional. Code compiles, but may exhibit warnings. Programs may fail ungracefully under many conditions. Experiments are incomplete and the writeup does not provide clarity on the methods or results.

59-0. Not all of the utilities are in place and for the code that is there, significant parts are not functional. Code may not compiles without fixes. Programs may not run under even slightly more advance conditions. Experiments are incomplete with a minimal writeup.


Description | Hints | Experiments | Turn In | Grading

Return to 4513 Home Page

Send all questions to the staff mailing list (cs4513-staff at cs.wpi.edu).