CS 539 Spring 2005

Computer Science Department

CS539 Machine Learning - Spring 2005
Project 4 - Evaluating Hypotheses

PROF. CAROLINA RUIZ

Due Date: Thursday, Feb. 24 2005 at 4 pm.

Study Chapter 5 in detail.
Solve each and every book exercise at the end of the Chapter 5:

5.1, 5.2, 5.3, 5.4, 5.5, and 5.6.
Use stratified sampling to select two different subsets of 1000 data instances each from the Cover Type dataset. Piotr's script can be used for this purpose - thanks Piotr! Let's denote these subsets S1 and S2.
1. Learn a J4.8 decision tree t over S1 using a 75% split. That is, use 75% of the data to build the tree and the remaining 25% to calculate the error_S(t). Use this error_S1'(t) to estimate with 95% probability the error_D(t), i.e. the error of t over the entire distribution D of cover type instances.
2. Train a neural network nn with 1 hidden layer and other default parameters over the dataset S2 using a 75% split. That is, use 75% of the data in S2 to build the tree and the remaining 25% to calculate the error_S2'(nn). Compare the decision tree t from above and the neural network nn by estimating the difference d between the true errors of these two hypotheses with 95% probability using error_S1'(t) and error_S2'(nn).
3. Compare J4.8 decision trees and Neural Networks over the Cover Type dataset by estimating the difference in error between J4.8 decision trees and Neural Networks over the Cover Type dataset with an approximate confidence interval of 95%. Do this by using a paired t test with k=11 with the data subset S1 from above as D₀.

Please turn in written solutions to these problems at the beginning of class on Thursday, February 24th and be ready to discuss your solutions and the chapter in class.

CS539 Machine Learning - Spring 2005 Project 4 - Evaluating Hypotheses

PROF. CAROLINA RUIZ

CS539 Machine Learning - Spring 2005
Project 4 - Evaluating Hypotheses