WPI Worcester Polytechnic Institute

Computer Science Department
------------------------------------------

CS4341 Introduction to Artificial Intelligence 
Project 4 - D 2001

By Chris Shoemaker and Carolina Ruiz 

DUE DATE: Saturday, April 14 at 5 pm. 
------------------------------------------


Project Goal:

To understand decision trees, and algorithms that create them.  To implement a program that creates a decision tree and is able to make classification predictions for untrained data.  Note: The term "decision tree", in this assignment, is used in the practical sense, that is, to mean the type of decision tree that Winston calls a "identification tree". 


Assignment Description:

Construct the most accurate decision tree you can for predicting whether the game of connect 4 that starts with the given 8 ply will be won, lost or drawn.  Relevant files are available at the /cs/cs4341/Proj4 directory of the Unix CCC machines and also at ftp://ftp.cs.wpi.edu/pub/courses/cs4341/project4/ 
The files contained in those directories were derived from files taken from the Univ. of California-Irvine Machine Learning Repository.

The directory contains the following files:

As an illustration of the type of results obtained using decision trees over this data set, see Chris Shoemaker's decision tree results using See5, a commercial decision tree tool developed by Ross Quinlan. See5 is the Windows version of C5 (descendant of C4.5).

Input Specifications:

You may accept the training and testing filenames as command-line arguments, or you may hard-code them.  Relevant information about the file format is found in the connect-4.names file.  You may use the files in their current locations, or make copies of them, but your program must operate on files exactly like the original files.


Output Specifications:

The only formal output specification is that, at minimum, your program should display:

                                (actual)
                            win    lose    draw    total
                    win    12       1        0        13
(predicted)    lose    2      13       1        16
                    draw   3        2       4         9
                    total   17     16       5        38  <-   (For you, this number will be 22519)



Your Code:

The following are requirements for the construction and testing of your decision tree:

Report and Project Submission

Project 4 is due on Saturday, April 14 at 5:00 pm. Your system should follow the CS Department Documentation Standard.

Graduate Credit Problem

Extend and (hopefully) improve your decision tree generating program to avoid overfitting.