CS539 MACHINE LEARNING. SPRING 99
PROJECT 1
Decision Trees for Prediction Income Level
PROF. CAROLINA RUIZ
Department of Computer Science
Worcester Polytechnic Institute
PROJECT DESCRIPTION
Construct the most accurate decision tree you can
for predicting whether the income of a given person is >50K or <= 50K
using the
census-income dataset
from the US Census Bureau which is
available at the
Univ. of California Irvine Repository.
The census-income dataset contains census information for 48,842
people. It has 14 attributes for each person
(age,
workclass,
fnlwgt,
education,
education-num,
marital-status,
occupation,
relationship,
race,
sex,
capital-gain,
capital-loss,
hours-per-week, and
native-country)
and a boolean attribute class classifying the input
of the person as belonging to one of two categories >50K, <=50K.
PROJECT ASSIGNMENT
The following are guidelines for the construction of your decision tree:
- Code: You can use
Your code must run on the CS or CCC Unix machines.
- Training Instances:
Use the
census-income dataset.
You can restrict your experiments to a subset of the dataset if
your system cannot handle the whole dataset. But remember that the
more accurate your system is, the better.
Also,
note that this dataset has missing values. It is up to you how to fill in
appropriate data for those missing values.
- Test Instances:
Test data are also available at the UCI.
REPORT AND DUE DATE
Project 1 is due on Tuesday, February 16 at 5:30 pm.
Your system should follow the
Departmental Documentation Standard.
- Program and Decision Tree.
You should submit (1) the source code of your program and
(2) the most accurate decision tree you obtained, by email to
ruiz@cs.wpi.edu.
- Written Report.
Please bring your report to my office (FL232) or to class by the due date/time.
Your report should discuss the following issues:
- adaptation of the code (if any) or a description of your own code,
- the description of the (subset of the) dataset used by your program,
- the experiments you ran with the system,
- the most accurate decision tree constructed by your system,
- any pre or post processing done to improve the accuracy of your tree,
- evaluation of your tree using the test data,
- strengths and the weaknesses of your system.
Your report should also include a short user manual explaining how to install,
run, and use your system.
- Oral Report.
We will discuss the results from the individual projects during the class on Feb. 16.
Be ready to show your results (prepare transparencies if you want to)
and to discuss your project solution in class.