DUE DATE: Thursday Dec. 3rd, 2009.
- Slides: Submit by email by 1:00 pm.
- Written report: Hand in a hardcopy by 2:00 pm.
- Oral Presentation: during class that day.
Project Description
[1000 points: 100 points for each of the 4 clustering methods per dataset,
and
25 points for meaningful interpretation of the resulting clusters for
each of the 4 clustering methods per dataset.]
See
Project Guidelines
for the detailed distribution of these points.
- Project Instructions:
Thoroughly read and follow the
Project Guidelines.
These guidelines contain detailed information about how to structure your
project, and how to prepare your written and oral reports.
- Data Mining Technique(s):
We will run experiments using the following clustering methods available
in Weka:
- Partitioning methods: Simple K-Means
- Hierarchical methods: COBWEB
- Density-based methods: DBSCAN
- Probabilistic-based methods: EM
- Dataset(s):
In this project, we will use two datasets:
- Performance Metric(s):
A major part of this project (as reflected in the grade distribution
above) is
to find meaningful ways of evaluating and
interpreting the resulting clusters.
Devise a variety of approaches to do so, including but not limited
to visualization of the resulting clusters, inspection of the
clusters' members to find commonalities, use of clustering-specific
performance metrics, etc.
The more creative/ingenious your approaches, the better.
You might want to extend the Weka code to provide the
evaluation/interpretation functionality you need.
- General Comments
Focus on experimenting with different ways of preprocessing
the data, varying the parameters of the clustering algorithms, and
providing in-depth evaluation and interpretion of the results.