The precise prediction task is described in the "classification/collaborative filtering task" that accompanies the dataset. For this prediction task, you are allowed to employ any of the data mining techniques that we have studied during the semester, or (better yet!) a combination of them. As usual, the more ideas you explore and the more robust your experimentation is, the better your grade on the project will be.
The dataset is also accompanied by references to Breese, Heckerman, and Kadie's work on this dataset and you're encouraged to read their paper and/or the Microsoft Technical Report that is available in the dataset's webpage.
Students are free to work individually on this project or in groups of two. If you decide to work with another student in the class on this project, please let me know by email by Friday, April 16th (midnight).
The following are guidelines for the analysis of the data:
You may restrict your experiments to a subset of the dataset IF Weka cannot handle your whole dataset (this is unlikely though).
As usual, a main part of the project is the PREPROCESSING of the dataset. You should consider applying relevant concept hierarchies and generalizations (e.g. using the results of previous mining tasks) to your dataset. Your report should contained a detailed description of the preprocessing of your dataset and justifications of the steps you followed. If Weka does not provide the functionality you need to preprocess your data as you need to obtain useful patterns, preprocess the data yourself either by writing the necessary filters (you can incorporate them in Weka if you wish).
Provide a detail description of the preprocessing of your data. Justify the preprocessing you applied and why the resulting data is the appropriate one for mining.