CS 539 Fall 2000 - Assignment Ch. 8

Computer Science Department

CS539 Machine Learning
Assignment Chapter 8 - Fall 2000

PROF. CAROLINA RUIZ

Due: Thursday, November 9, 2000 at 6:00 pm.

Project Description
Project Assignment
Report Submission and Due Date

PROJECT DESCRIPTION

Construct the most accurate instance-based classifier you can for predicting whether the income of a given person is >50K or <= 50K using the census-income dataset from the US Census Bureau which is available at the Univ. of California Irvine Repository.

I have downloaded the dataset into the following directory: /cs/courses/cs539/f00/Projects/Census_Income_Data
You can access the dataset from there.

The census-income dataset contains census information for 48,842 people. It has 14 attributes for each person (age, workclass, fnlwgt, education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, and native-country) and a boolean attribute class classifying the input of the person as belonging to one of two categories >50K, <=50K.

PROJECT ASSIGNMENT

Construct the most accurate instance-based classifier you can to predict the Salary attribute of the Census-Income data. The following are guidelines to construct your instance-based classifier:

Code: You should write code that implements the k-nearest neighbor algorithm described in Chapter 8 of the textbook.

Training Instances: Use the census-income dataset. You can restrict your experiments to a subset of the dataset if your system cannot handle the whole dataset. But remember that the more accurate your system is, the better. Also, note that this dataset has missing values. It is up to you how to fill in appropriate data for those missing values. Also, it is up to you to decide if it's a good idea to discretize continues attributes, and if so, how.
Test Instances: Test data are also available at the UCI. YOU MUST USE AT LEAST THE FIRST 1000 TEST RECORDS FROM THAT TEST DATA IN YOUR EXPERIMENTS.

REPORT AND DUE DATE

Written Report. Please bring your report to my office (FL232) or to class by the due date/time. Your report should contain the following sections that discuss the issues:
1. Code Description: Describe any adaptations of the code that you made.
2. Experiments: You should run several experiments varying the value of k, the training and perhaps the test data. For each further experiment you ran describe:
  - Training Data: What data did you use to construct your instance-based classifier?
  - Test Data: What data did you use to test your instance-based classifier?
  - Any pre or post processing done to improve the accuracy of your classifier.
  - Value of k used for the experiment.
  - Accuracy of the resulting instance-based classifier.
3. Summary of Results
  - What was the accuracy of the most accurate instance-based classifier you obtained?
  - Discuss how this accuracy compares with that of your most accurate decision tree, neural network, and naive Bayes classifier from the previous assignments.
  - Discuss the strengths and the weaknesses of your system.
Oral Report. We will discuss the results from the individual projects during the class on November 9. Be ready to show your results (prepare transparencies on your results) and to discuss your project solution in class.

CS539 Machine Learning Assignment Chapter 8 - Fall 2000

PROF. CAROLINA RUIZ

PROJECT DESCRIPTION

PROJECT ASSIGNMENT

REPORT AND DUE DATE

CS539 Machine Learning
Assignment Chapter 8 - Fall 2000