WPI Worcester Polytechnic Institute

Computer Science Department
------------------------------------------

CS 525D KNOWLEDGE DISCOVERY AND DATA MINING - Fall 2009  
Project 3: Regression

PROF. CAROLINA RUIZ 

DUE DATE: Thursday Nov 5, 2009. ------------------------------------------
This assignment consists of two parts:
  1. A homework part in which you will focus on the construction and/or pruning of the models.
  2. A project part in which you will focus on the experimental evaluation and analysis of the models.

I. Homework Part

[100 points] In this part of the assignment, we will investigate the pruning techniques used for regression trees and for model trees.

Consider the dataset below. This dataset is a small, subsample of the RED Wine Quality Dataset.

@relation 'sample-winequality-red.arff-weka.filters.unsupervised.attribute.Remove-R1-3,5-8,10-weka.filters.unsupervised.instance.Resample-S1-Z1.0-no-replacement'

@attribute 'residual sugar' numeric
@attribute pH numeric
@attribute alcohol numeric
@attribute quality numeric

@data
2.3,3.52,9.7,5
1.8,3.35,10.1,5
2,3.33,9.5,5
2.6,3.37,10.5,6
3.1,3.17,10.5,5
4,3.36,10.7,6
1.7,3.41,9.5,6
2.9,3.23,12.5,6
3.2,3.56,12.7,6
4.25,3.63,10.7,3
2.2,3.2,9.2,4
2.4,3.13,11.9,7
3.6,3.23,12,7
1.8,3.35,11,7
1.8,3.42,9.5,6
For this homework, we want to predict the quality attribute (prediction target) from the other predicting attributes residual sugar, pH, and alcohol.
  1. [5 points] Build a regression tree for this dataset in Weka using M5P with the following parameters:
    build regression tree: True unpruned: True useUnsmoothed: True default values for the remaining parameters
    Record the tree in your report.

  2. [45 points] Build a regression tree for this dataset in Weka using M5P with the following parameters:
    build regression tree: True unpruned: False useUnsmoothed: True default values for the remaining parameters
    Record the tree in your report [5 points]. Follow the regression tree pruning procedure by hand (read the corresponding Weka code in detail for this) so that you can describe in your report each of the steps followed by the pruning procedure. Include in your report all the necessary formulas and a description of the calculations done to prune the regression tree in Part 1 above to obtain the resulting pruned regression tree of this part [40 points].

  3. [5 points] Build a model tree for this dataset in Weka using M5P with the following parameters:
    build regression tree: False unpruned: True useUnsmoothed: True default values for the remaining parameters
    Record the tree in your report.

  4. [45 points] Build a model tree for this dataset in Weka using M5P with the following parameters:
    build regression tree: False unpruned: False useUnsmoothed: True default values for the remaining parameters
    Record the tree in your report [5 points]. Follow the model tree pruning procedure by hand (read the corresponding Weka code in detail for this) so that you describe in your report each of the steps followed by the pruning procedure. Include in your report all the necessary formulas and a description of the calculations done to prune the model tree in Part 3 above to obtain the resulting pruned model tree of this part [40 points].

II. Project Part

[250 points: 50 points for linear regression, 100 points for regression trees, and 100 points for model trees. See Project Guidelines for the detailed distribution of these points.]