Project 4 (45 points)
 Assigned: Thursday, February 11, 2010
Due: Friday, February 19, 2010, 11:59 PM



Programming Assignment #4 — Binary Trees

Abstract

Write a program in C++ and Visual Studio to scan one or more text files and count the number of occurrences of each word in those files. Use a binary tree to keep track of all words. When all input files have been scanned, print out the results to another file.

Outcomes

After successfully completing this assignment, you should be able to:–

·        Define a class of objects and operations on those objects.

·        Build a massive, recursive data structure comprising those objects.

·        Search for an item in that data structure and, if it is not found, add it to the structure.

·        Create a file for your output and write to it.

·        Put together a non-trivial program in C++.

Before Starting

Read Chapter 19 of Deitel & Deitel, especially §19.7-19-9. Also read §21.6 regarding new and delete.

Review the concept of binary tree from your previous courses. You may also refer to §6.5 of Kernighan & Ritchie or §12.7 of Deitel & Deitel about binary trees in C.

In Visual Studio, create a Console Application Project (similar to that you created for Laboratory Assignment #4) called PA4 to hold the program and interface files for this project.

This Assignment

Your program must accept an indeterminate number of arguments on the command line, the first of which specifies the output file and the remaining of which specify the input files. Thus, a user could invoke your program in a Windows Command Prompt by typing

./PA4 outputFile inputFile1 inputFile2 ...

Under this command, the program would open and read each of the input files in turn, building up a binary tree of words and counts as it progresses. Once all files have been read and closed, it must create the output file and write out the words in the tree in alphabetical order, one word per line, along with the number of occurrences of that word. Your program should ignore the case of the words, so that “This” and “this” are considered the same. However, words that are actually spelled differently — such as “car” and “cars” — are considered to be different words.

A sample output might look like the following:–

    166    a
     25    and
     11    as
      3    command
     15    each
      2    file
      4    files
    109    in
      4    input
     98    it
     99    of
      3    open
      6    program
     18    read
    152    the
     41    this
      3    under
     30    would
-------------
     17    Total number of different words

To allow for very long input files, the field width of the number of occurrences of each word should be at least six decimal digits. You should also total and print the number of distinct words.

For debugging, you may use the following text files:–

http://www.cs.wpi.edu/~cs2303/c10/Common/Kennedy.txt
http://www.cs.wpi.edu/~cs2303/c10/Common/Obama.txt
http://www.cs.wpi.edu/~cs2303/c10/Common/MartinLutherKing.txt

For fun, you may also any of Shakespeare’s plays, which can be downloaded from the Internet.

Implementation in C++

You must implement this project in an object-oriented style. Think, for example, how you might have done it in Java, and then use that approach as a guideline for your C++ program.

At the very minimum, you should define one or more Class Interface files (see §19.9 of D&D) and a corresponding Class Implementation file for each Class Interface. In addition, you should have one or more.cpp files for your main() function, input and output, and anything else that is appropriate.

For example, you might define a treenode class to represent an individual word in the input text as a node of the binary tree. This class might have four members:–

·        One pointer each to the left and right subtrees. Each would be of type treenode * — i.e., pointer to treenode.

·        An int containing the count of the number of occurrences of the word.

·        A string containing the word itself.

The treenode class would also contain methods to increment and access the count, access the string, and add or access the left and right subtrees. Of particular importance are the constructor and destructor for a treenode. Note that since a treenode contains a string, and since the string is a class itself, you must pay attention to the order of creation and destruction of these objects.

In addition to the treenode class, you might also have a binaryTree class that provides search, insertion, construction, destruction, and traversal methods for your binary tree.

Note:  The C++ Standard Template Library contains templates for trees. You should ignore these for this project. They are likely to make it more difficult and time-consuming..

Note:  Don’t forget to invoke the destructor of your tree. This is tantamount to freeing memory in C. Failure to do so can result in memory leaks. The graders will be looking for this.

Individual Project

This is an individual project. You may consult classmates and others on the algorithms for creating and manipulating binary trees, but you must implement them in your program on your own.

Deliverables

Before submitting your project, be sure to clean it in Visual Studio.

You must submit the following:–

·        A zip file containing all of the Visual Studio files of a clean project, including your .cpp and .h files. The project name should be PA4.

·        A document called README.txt, README.pdf, or README.doc summarizing your program, how to run it, and detailing any problems that you had. Also, if you borrowed all or part of the algorithm for this assignment, be sure to cite your sources.

·        The output of at least two different test cases.

From your CCC Linux shell, submit your C program using the following turnin command:–

/cs/bin/turnin submit cs2303 PA4 <your files>

Programs submitted after 11:59 pm on due date (February 19) will be tagged as late, and will be subject to the late homework policy.

Grading

This assignment is worth forty-five (45) points. Your program must compile without errors in Visual Studio 2008 in order to receive any credit. If you do your work on any other system, it is strongly suggested that you compile and test it on Visual Studio at least one day prior to submitting it, in order to flush out any differences in C++ implementation.

·        Correctly build in Visual Studio 2008 (Debug mode) without warnings – 5 points

·        Correct definition of class and methods for nodes of binary tree – 5 points

·        Correct definition of class and methods for the binary tree as a whole – 5 points

·        Correct construction of binary tree and insertion of nodes – 5 points

·        Correct traversal of binary tree and output of information according to specified format – 5 points

·        Proper destruction of tree and all of its objects before exiting – 10 points

·        Correct execution with graders’ test cases – 5 points

·        Satisfactory README file, including output of two test cases – 5 points

 

·        Penalty of five points if your project is not clean before creating the zip file or for submitting your programs in something other than a zip file.

Additional Notes

Command Line Arguments

Command line arguments are the same in C++ as in C. That is, the function prototype of main() is

int main(int argc, char *argv[]);

The elements of the argv array in this problem are file names. They can be used in the ifstream and ofstream constructors (see below).

Input and Output Streams

Figure 26.2 in Deitel & Deitel shows the Stream-I/O hierarchy, but it does not seem to show how to open a stream on a file. The following is from Stroustrup, The C++ Programming Language, 3rd edition, §21.5.1:–

#include <fstream>

...

ofstream output(argv[1]);
if (!output) error(
"Cannot open output file", argv[1]);

This declares and creates an ofstream object called output connected to the file named by the command line argument. It can be used in the same way as cout.

ifstream input(argv[i]);
if (!input) error(
"Cannot open input file", argv[i]);

This declares and creates an ifstream object called input connected to the file named by the ith command line argument. It can be used in the same way as cin.

Formatted Output

Section 26.6 of Deitel and Deitel discusses how to format output so that it prints correctly and neatly. These are controlled by a set of stream manipulators that are inserted into the stream along with the characters to be printed. You can work out how to make the output of this project look like the specification on the top of page 2.