CS 525D Fall 2009 - Project 5

Computer Science Department

CS 525D KNOWLEDGE DISCOVERY AND DATA MINING - Fall 2009
Project 5: Clustering

PROF. CAROLINA RUIZ

DUE DATE: Thursday Dec. 3rd, 2009.

Slides: Submit by email by 1:00 pm.
Written report: Hand in a hardcopy by 2:00 pm.
Oral Presentation: during class that day.

Project Description

[1000 points: 100 points for each of the 4 clustering methods per dataset, and 25 points for meaningful interpretation of the resulting clusters for each of the 4 clustering methods per dataset.] See Project Guidelines for the detailed distribution of these points.

Project Instructions: Thoroughly read and follow the Project Guidelines. These guidelines contain detailed information about how to structure your project, and how to prepare your written and oral reports.
Data Mining Technique(s): We will run experiments using the following clustering methods available in Weka:
- Partitioning methods: Simple K-Means
- Hierarchical methods: COBWEB
- Density-based methods: DBSCAN
- Probabilistic-based methods: EM
Dataset(s): In this project, we will use two datasets:
- The Major League Baseball Hall of Fame Dataset
- The Communities and Crime Data Set from The UCI Machine Learning Repository
Performance Metric(s): A major part of this project (as reflected in the grade distribution above) is to find meaningful ways of evaluating and interpreting the resulting clusters. Devise a variety of approaches to do so, including but not limited to visualization of the resulting clusters, inspection of the clusters' members to find commonalities, use of clustering-specific performance metrics, etc. The more creative/ingenious your approaches, the better. You might want to extend the Weka code to provide the evaluation/interpretation functionality you need.
General Comments Focus on experimenting with different ways of preprocessing the data, varying the parameters of the clustering algorithms, and providing in-depth evaluation and interpretion of the results.

CS 525D KNOWLEDGE DISCOVERY AND DATA MINING - Fall 2009 Project 5: Clustering

PROF. CAROLINA RUIZ

Project Description

CS 525D KNOWLEDGE DISCOVERY AND DATA MINING - Fall 2009
Project 5: Clustering