CS 525M KNOWLEDGE DISCOVERY AND DATA MINING
This project is due on Thursday Nov. 15, 2001 at 1 pm.
PROJECT 4 - Inductive Logic Programming. Fall 2001
After select one of the attributes in your dataset as the target
attribute for classification,
use FOIL or a FOIL-like algorithm
to construct the best set of rules you can
for predicting that target attribute.
If your dataset is not suitable for classification, use
FOIL or the FOIL-like algorithm,
for predicting whether the income of a given person is >50K or <= 50K
from the US Census Bureau which is
available at the
Univ. of California Irvine Repository.
I have previously downloaded the dataset into the following directory:
You can access the dataset from there.
The census-income dataset contains census information for 48,842
people. It has 14 attributes for each person
and a boolean attribute class classifying the input
of the person as belonging to one of two categories >50K, <=50K.
Construct, using a FOIL-like algorithm,
the most accurate hypothesis (i.e. set of rules) you can to predict the target
attribute of dataset.
The following are guidelines to use a FOIL-like algorithm to mine your patterns:
- Code: I strongly encourage you to use a version of FOIL available online,
for instance the one available
at Quinlan's Webpage or a
more recent one if you find one.
However, you can implement your own code if you prefer.
You can restrict your experiments to a subset of the dataset if
your system cannot handle the whole dataset. But remember that the
more accurate your system is, the better.
If you use the Census-income data,
note that this dataset has missing values. It is up to you how to fill in
appropriate data for those missing values. Also, it is up to you
to decide if it's a good idea to discretize continues attributes, and if
YOU MUST USE AT LEAST THE FIRST 1000 TEST RECORDS FROM THE
IN YOUR EXPERIMENTS.
REPORT AND DUE DATE
- Written Report.
Your written report is due at 1 pm. Please leave
a hardcopy of your report in my mailbox (CS Office, FL231) by the due date/time.
Please note that my mailbox is the one BELOW the label marked with my last name RUIZ.
In the EXCEPTIONAL case that you cannot go to the CS office,
email your report to me by noon. Only under EXCEPTIONAL circumstances electronic
submissions willl be accepted.
- Code Description:
Describe the code that you used/wrote. Remember to acknowledge any sources
of information/code you used for the implementation of your system.
- Experiments: For each experiment you ran describe:
- Your system parameters.
- Instances: What data did you use for the experiments?
- Any pre or post processing done to improve the accuracy of your results.
- Accuracy of the resulting rules.
- Summary of Results
- What was the accuracy of the most accurate set of rules
- Discuss how this accuracy compares with that of your
most accurate results from the previous assignments.
- Include the most accurate set of rules you obtained in your
- Discuss the strengths and the weaknesses of your project.
- Oral Report.
We will discuss the results from the individual projects during the class
on November 15th.
Be ready to show your results
and to discuss your project solution in class.
PREPARE OVERHEAD TRANSPARENCIES SHOWING YOUR WORK.