
Programming
Assignment #6 —
Strings and Dynamic Memory Allocation
Due: Friday, May 1, at 11:59 PM
Write a program to scan one or more text files and count the number of occurrences of each word. Use a binary tree to keep track of all words. When all input files have been scanned, print out the results to a file. For extra credit, instrument your program to get information about how long it takes to search the binary tree as a function of the number of entries stored in it.
After successfully completing this assignment, you should be able to:–
· Build a massive, recursive data structure of structs using malloc() and free().
· Search for an item in that data structure and, if it is not found, add it to the structure.
· Create a file for your output and write to it.
· Put together a non-trivial program in C from algorithms and tools that are already available to you.
Re-read §6.5 regarding self-referential data structures and §7.5 regarding file open and creation.
The program you must create in this assignment is a little bit simpler than Programming Assignment 5. It should comprise multiple files, and it should include a makefile that builds the entire program and all of its components. If you are feeling truly adventurous, you may develop this application in Visual Studio rather than on Linux and gcc, in which case it will not have a makefile.
Unlike Programming Assignment 5, the partitioning of the problem is not given to you. Instead, you have to divide it up into component .c and .h files implementing small groups of related functions. The only requirement is that the executable program be named PA6.
Your program must accept an indeterminate number of arguments on the command line, the first of which specifies the output file and the remaining of which specify the input files. Thus, to invoke your program, the user would type
./PA6 outputFilePath
inputFilePath1 inputFilePath2 ...
Under this command, the program would open and read each of the input files in turn, building up a binary tree of words and counts as it progresses. Once all files have been read and closed, it must create the output file and write out the words in the tree in alphabetical order, one word per line, along with the number of occurrences of that word. Your program should ignore the case of the words, so that “This” and “this” are considered the same. A sample output should look like the following:–
166 a
25 and
11 as
3 command
15 each
2 file
4 files
109 in
4 input
98 it
99 of
3 open
6 program
18 read
152 the
41 this
3 under
30 would
-------------
17 Total
number of different words
To allow for very long input files, the field width of the number of occurrences of each word should be at least six decimal digits. You should also total and print the number of distinct words.
For debugging, you may use the same text files as in Programming Assignment 5 — i.e.,
These files are also available on the CCC systems in the directory /cs/cs2301-d09.
Your project submission must include a file called makefile that enables make to build the target PA6, and individual components, and the target clean, which removes the executable and all intermediate files. The graders will type the command
make
to build your program and the command
make clean
to delete it and the intermediate files. Don’t forget to include a README file.
Visual Studio will provide its own project files, which must be part of your submission. You will have to learn how to provide command line arguments in Visual Studio. Before submitting, you should clean your project, as provided in Visual Studio. You should then zip all of your files together into a single zip file and submit that, along with your README file.
In §6.5, Kernighan and Ritchie describe a struct called tnode that defines a binary tree and two functions called addtree() and treeprint() to use and print that tree. Note both are recursive functions. You should base your program on these, but you may make changes if you feel it necessary to do so.
The example program of §6.5 also refers to a function getword(), which is described on page 136, and talloc(), which is described later. (Note that getword() is more general than you need; you may shorten it if you wish so that it does not “put back” the first character after each word.) For file I/O, you should use fopen(), as described in §7.5. See especially the top of page 161, which explains that if a nonexistent file is opened for writing, it is created if possible.
The main purpose of this assignment is not to discover or invent new things but rather to bring together tools, techniques, algorithms that are already available to you into a new program. Therefore, you should freely copy and adapt any functions you can find in Kernighan & Ritchie — or from other resources — provided that you cite them in your project write-up.
Your project submission should include:–
· A write-up (i.e., a README file) describing the principal files, functions, and data structures of your program, including citations of material you used or adapted, and also a summary of the results of your testing;
· The .c and .h files of your program;
· At least one sample output file showing that it works; and
· The makefile needed to build your program or the Visual Studio project files.
Submit your files using the following turnin command (all on one line):–
/cs/bin/turnin submit cs2301 PA6 README
<your list of files>
Note that you can submit additional files at any time; for example, you can add additional samples of output, or you can modify a previously submitted file by simply resubmitting it again.
Be sure to put your name at the top of ALL files (except, obviously, the sample output files).
Submissions dated after 11:59 pm on May 1 will be automatically tagged as late, and will be subject to the late homework policy.
This assignment is worth thirty points (plus five extra points for Visual Studio), allocated as follows:–
· A properly structured README file – 5 points
· correctness of execution, particularly on the grader’s test cases – 5 points
· correct use of malloc() and correctly freeing all malloc’ed data – 5 points
· correctly building and managing dynamically allocated data structures – 5 points
· partitioning the problem into multiple .c files and setting up appropriate .h files – 5 points
· A properly working makefile to build your program – 5 points; or a suitable set of Visual Studio project files – 10 points
Programs must build successfully in order to receive any credit.