
CS 525M KNOWLEDGE DISCOVERY AND DATA MINING
PROJECT 1 - Association Rule Mining. Fall 2001
DUE DATE:
This project is due on Thursday Oct. 25, 2001 at 12 noon .

PROJECT DESCRIPTION
Use the association rule mining module of the Weka system
to mine asociation rules from the dataset that you have
selected for your course projects.
PROJECT ASSIGNMENT
Mine, using the Weka system,
association rules from your selected dataset.
Keep in mind that due to the representation of
frequent itemsets in Weka, this system may run
out of memory when mining datasets with as few as
a dozen attributes.
Run several experiments with your data and the system
varying the parameters until you obtain a collection
of association rules that represent your data well.
The following are guidelines for your experiments:
- Code:
Use the Weka system to mine the association rules
as well as for preparing the data and presenting
the results.
Code by yourself any functionality that you need for
manipulating the data and that is not offered in the
Weka system.
- Data:
- You can restrict your experiments to a subset of the dataset if
Weka cannot handle the whole dataset. But remember that the
more representative the association rules you mine from the
data, the better.
- Use the preprocessing techniques discussed in class to select,
clean, and normalize the data.
- Define concept hierarchies over the different attributes so that
you can analyze your data at different levels of generality.
- Experiments:
After you have cleaned and selected a subset of your data (if
necessary), mine association rules using different parameter
(confidence, support, etc.) settings.
Analyze the resulting rules and repeat the experiment with
other "view" of the data given by generalizing/specializing
your data according to the concept hierarchies and/or by selecting
different portions of the data.
- Results:
Assume that you as the user/miner you want to obtain association
rules for decision support, for understanding the data better,
and/or for increasing your company's profit. Mine rules until
you obtain a collection of rules that satisfies this objective.
REPORT AND DUE DATE
- Written Report.
Your written report is due at 12 noon. Please leave
a hardcopy of your report in my mailbox (CS Office, FL231) by the due date/time.
In the exceptional case that you cannot go to the CS office,
email your report to me by noon.
Your report should contain the following sections with the corresponding discussions:
- Code Description:
Describe the code that you used/wrote. Remember to acknowledge any sources
of information/code you used.
- Data:
Describe the dataset that you selected in terms of the attributes
present in the data, the number of instances, missing values, and
other relevant characteristics.
- Experiments:
- Describe what the objective of your analysis is. Is it to understand
the data better? If so, what about the data you want to understand?
Or is it for decision support? If so, what decisions you need to make
based on the data? Or is it for classification/characterization/discrimination
purposes? Explain.
- For each experiment you ran describe:
- Instances: What data did you use for the experiments?
- Any pre-processing done to improve the quality of your results.
- Your system parameters.
- Any post-processing done to improve the quality of your results.
- Analysis of results of the experiment and their significance.
- Summary of Results
- What was the best collection of association rules that
you obtained? Describe.
- Discuss the strengths and the weaknesses of your project.
- Oral Report.
We will discuss the results from the individual projects during the class
on October 25th.
Be ready to show your results
and to discuss your project in class.
PREPARE OVERHEAD TRANSPARENCIES SHOWING YOUR WORK.