CS539 Machine Learning
Assignment 3 - Fall 2000
Due:
First Part: Thursday, September 28, 2000 at 6:00 pm.
Second Part: Thursday, October 05, 2000 at 6:00 pm.
PROJECT DESCRIPTION
Construct the most accurate neural network you can
for predicting whether the income of a given person is >50K or <= 50K
using the
census-income dataset
from the US Census Bureau which is
available at the
Univ. of California Irvine Repository.
I have downloaded the dataset into the following directory:
/cs/courses/cs539/f00/Projects/Census_Income_Data
You can access the dataset from there.
The census-income dataset contains census information for 48,842
people. It has 14 attributes for each person
(age,
workclass,
fnlwgt,
education,
education-num,
marital-status,
occupation,
relationship,
race,
sex,
capital-gain,
capital-loss,
hours-per-week, and
native-country)
and a boolean attribute class classifying the input
of the person as belonging to one of two categories >50K, <=50K.
PROJECT ASSIGNMENT
This project consists of two parts:
Part 1: Due September 28 at 6:00 pm.
Find all mistakes on the
C code for the neural network error backpropagation algorithm
provided with Chapter 4 of the textbook. The main files you need to use are:
backprop.c, backprop.h, and facetrain.c
(beware that other files to run the experiment of classifying
facial expressions described in the chapter are also provided).
To the best of my
knowledge, the code has ONE mistake. Submit a written report
pointing out this mistake as well as all
other mistakes you find in the code.
Part 2: Due October 5th at 6:00 pm.
Construct, train (using error backpropagation), and test
the most accurate neural network you can to predict the Salary
attribute of the Census-Income data.
The following are guidelines to construct and train your neural net:
- Code: You must use the
C code for the neural network error backpropagation algorithm
from Chapter 4 of the textbook. The main files you need to use are:
backprop.c and backprop.h
- Topology of your Neural Net:
Your net should consist of 1 input layer, 1 hidden layer,
and 1 output layer. All pairs of nodes in different layers
should be connected, and no nodes on the same layer should
be connected.
- Training Instances:
Use the
census-income dataset.
You can restrict your experiments to a subset of the dataset if
your system cannot handle the whole dataset. But remember that the
more accurate your system is, the better.
Also,
note that this dataset has missing values. It is up to you how to fill in
appropriate data for those missing values. Also, it is up to you
to decide if it's a good idea to discretize continues attributes, and if
so, how.
- Test Instances:
Test data are also available at the UCI.
YOU MUST USE AT LEAST THE FIRST 1000 TEST RECORDS FROM THAT TEST
DATA IN YOUR EXPERIMENTS.
REPORT AND DUE DATE
- Written Report.
Please bring your report to my office (FL232) or to class by the due date/time.
Your report should contain the following sections that discuss the issues:
- Code Description:
Describe any adaptations of the code that you made.
- Experiments: For each further experiment you ran describe:
- Training Data: What data did you use to construct your neural
network?
- Test Data: What data did you use to test your neural network?
- The topology (number of units in each hidden layer),
initial weights, number of iterations of the error backpropagation
algorithm, and final weights of each of your neural nets,
- Any pre or post processing done to improve the accuracy of your net.
- Accuracy of the resulting neural network.
- Summary of Results
- What was the accuracy of the most accurate neural network
you obtained?
- Discuss how this accuracy compares with that of your
most accurate decision tree from the previous assignment.
- Include the most accurate neural network you obtained in your
report.
- Discuss the strengths and the weaknesses of your system.
- Oral Report.
We will discuss the results from the individual projects during the class
on October 5th.
Be ready to show your results (prepare transparencies on your results)
and to discuss your project solution in class.