Programming Assignment #6 —
Strings and Dynamic Memory Allocation

Due: Friday, May 1, at 11:59 PM

Abstract

Write a program to scan one or more text files and count the number of occurrences of each word. Use a binary tree to keep track of all words. When all input files have been scanned, print out the results to a file. For extra credit, instrument your program to get information about how long it takes to search the binary tree as a function of the number of entries stored in it.

Outcomes

After successfully completing this assignment, you should be able to:–

·        Build a massive, recursive data structure of structs using malloc() and free().

·        Search for an item in that data structure and, if it is not found, add it to the structure.

·        Create a file for your output and write to it.

·        Put together a non-trivial program in C from algorithms and tools that are already available to you.

Before Starting

Re-read §6.5 regarding self-referential data structures and §7.5 regarding file open and creation.

This Assignment                                                             

The program you must create in this assignment is a little bit simpler than Programming Assignment 5. It should comprise multiple files, and it should include a makefile that builds the entire program and all of its components. If you are feeling truly adventurous, you may develop this application in Visual Studio rather than on Linux and gcc, in which case it will not have a makefile.

Unlike Programming Assignment 5, the partitioning of the problem is not given to you. Instead, you have to divide it up into component .c and .h files implementing small groups of related functions. The only requirement is that the executable program be named PA6.

Your program must accept an indeterminate number of arguments on the command line, the first of which specifies the output file and the remaining of which specify the input files. Thus, to invoke your program, the user would type

./PA6 outputFilePath inputFilePath1 inputFilePath2 ...

Under this command, the program would open and read each of the input files in turn, building up a binary tree of words and counts as it progresses. Once all files have been read and closed, it must create the output file and write out the words in the tree in alphabetical order, one word per line, along with the number of occurrences of that word. Your program should ignore the case of the words, so that “This” and “this” are considered the same. A sample output should look like the following:–

    166 a
     25 and
     11 as
      3 command
     15 each
      2 file
      4 files
    109 in
      4 input
     98 it
     99 of
      3 open
      6 program
     18 read
    152 the
     41 this
      3 under
     30 would
-------------
     17 Total number of different words

To allow for very long input files, the field width of the number of occurrences of each word should be at least six decimal digits. You should also total and print the number of distinct words.

For debugging, you may use the same text files as in Programming Assignment 5 — i.e.,

·        Kennedy.txt

·        MartinLutherKing.txt

·        Obama.txt

These files are also available on the CCC systems in the directory /cs/cs2301-d09.

If you are using Linux, gcc, and make

Your project submission must include a file called makefile that enables make to build the target PA6, and individual components, and the target clean, which removes the executable and all intermediate files. The graders will type the command

make

to build your program and the command

make clean

to delete it and the intermediate files. Don’t forget to include a README file.

If you are using Visual Studio

Visual Studio will provide its own project files, which must be part of your submission. You will have to learn how to provide command line arguments in Visual Studio. Before submitting, you should clean your project, as provided in Visual Studio. You should then zip all of your files together into a single zip file and submit that, along with your README file.

Algorithms for your Use

In §6.5, Kernighan and Ritchie describe a struct called tnode that defines a binary tree and two functions called addtree() and treeprint() to use and print that tree. Note both are recursive functions. You should base your program on these, but you may make changes if you feel it necessary to do so.

The example program of §6.5 also refers to a function getword(), which is described on page 136, and talloc(), which is described later. (Note that getword() is more general than you need; you may shorten it if you wish so that it does not “put back” the first character after each word.) For file I/O, you should use fopen(), as described in §7.5. See especially the top of page 161, which explains that if a nonexistent file is opened for writing, it is created if possible.

The main purpose of this assignment is not to discover or invent new things but rather to bring together tools, techniques, algorithms that are already available to you into a new program. Therefore, you should freely copy and adapt any functions you can find in Kernighan & Ritchie — or from other resources — provided that you cite them in your project write-up.

Deliverables

Your project submission should include:–

·        A write-up (i.e., a README file) describing the principal files, functions, and data structures of your program, including citations of material you used or adapted, and also a summary of the results of your testing;

·        The .c and .h files of your program;

·        At least one sample output file showing that it works; and

·        The makefile needed to build your program or the Visual Studio project files.

Submit your files using the following turnin command (all on one line):–

/cs/bin/turnin submit cs2301 PA6 README <your list of files>

 Note that you can submit additional files at any time; for example, you can add additional samples of output, or you can modify a previously submitted file by simply resubmitting it again.

Be sure to put your name at the top of ALL files (except, obviously, the sample output files).

Submissions dated after 11:59 pm on May 1 will be automatically tagged as late, and will be subject to the late homework policy.

Grading

This assignment is worth thirty points (plus five extra points for Visual Studio), allocated as follows:–

·        A properly structured README file – 5 points

·        correctness of execution, particularly on the grader’s test cases – 5 points

·        correct use of malloc() and correctly freeing all malloc’ed data – 5 points

·        correctly building and managing dynamically allocated data structures – 5 points

·        partitioning the problem into multiple .c files and setting up appropriate .h files – 5 points

·        A properly working makefile to build your program – 5 points; or a suitable set of Visual Studio project files – 10 points

Programs must build successfully in order to receive any credit.