Project 3: Regression

- Slides: Submit by email by 1:00 pm.
- Written report: Hand in a hardcopy by 2:00 pm.
- Oral Presentation: during class that day.

- A homework part in which you will focus on the construction and/or pruning of the models.
- A project part in which you will focus on the experimental evaluation and analysis of the models.

Consider the dataset below. This dataset is a small, subsample of the RED Wine Quality Dataset.

@relation 'sample-winequality-red.arff-weka.filters.unsupervised.attribute.Remove-R1-3,5-8,10-weka.filters.unsupervised.instance.Resample-S1-Z1.0-no-replacement' @attribute 'residual sugar' numeric @attribute pH numeric @attribute alcohol numeric @attribute quality numeric @data 2.3,3.52,9.7,5 1.8,3.35,10.1,5 2,3.33,9.5,5 2.6,3.37,10.5,6 3.1,3.17,10.5,5 4,3.36,10.7,6 1.7,3.41,9.5,6 2.9,3.23,12.5,6 3.2,3.56,12.7,6 4.25,3.63,10.7,3 2.2,3.2,9.2,4 2.4,3.13,11.9,7 3.6,3.23,12,7 1.8,3.35,11,7 1.8,3.42,9.5,6For this homework, we want to predict the

- [5 points]
Build a regression tree for this dataset in Weka using M5P with the
following parameters:
build regression tree: True unpruned: True useUnsmoothed: True default values for the remaining parameters - [45 points]
Build a regression tree for this dataset in Weka using M5P with the
following parameters:
build regression tree: True unpruned: False useUnsmoothed: True default values for the remaining parameters **read the corresponding Weka code in detail for this**) so that you can describe in your report each of the steps followed by the pruning procedure. Include in your report all the necessary formulas and a description of the calculations done to prune the regression tree in Part 1 above to obtain the resulting pruned regression tree of this part [40 points]. - [5 points]
Build a model tree for this dataset in Weka using M5P with the
following parameters:
build regression tree: False unpruned: True useUnsmoothed: True default values for the remaining parameters - [45 points]
Build a model tree for this dataset in Weka using M5P with the
following parameters:
build regression tree: False unpruned: False useUnsmoothed: True default values for the remaining parameters **read the corresponding Weka code in detail for this**) so that you describe in your report each of the steps followed by the pruning procedure. Include in your report all the necessary formulas and a description of the calculations done to prune the model tree in Part 3 above to obtain the resulting pruned model tree of this part [40 points].

**Project Instructions:**Thoroughly read and follow the Project Guidelines. These guidelines contain detailed information about how to structure your project, and how to prepare your written and oral reports.**Data Mining Technique(s):**We will run experiment using regression techniques. You need to use:- Linear Regression (under "functions" in Weka)
- Regression Trees: M5P (under "trees" in Weka)
- Model Trees: M5P (under "trees" in Weka)

**Dataset(s):**In this project, we will use the The WHITE Wine Quality Dataset.

Use the WHITE wine dataset and the attribute**quality**as the prediction target. After you run experiments predicting this attribute you may, if you wish, run additional experiments using a different predicting target of your choice.**Performance Metric(s):**Use the metrics listed in Table 5.8 (page 178) of the textbook to measure the goodness of your models.

A major part of this project is to try to make sense of these performance metrics and to become familiar with them. When comparing the performance of different models, use tables like Table 5.9 (page 179) of the textbook.