Practice Exam 1

Department of Computer Science

Worcester Polytechnic Institute

STATUS | FLOOR | DEPT. | OFFICE SIZE | RECYCLING BIN? | |

1. | faculty | four | cs | medium | yes |

2. | faculty | four | ee | medium | yes |

3. | student | four | cs | small | no |

4. | faculty | five | cs | medium | yes |

Assume that each of the attributes assumes just the values that appear on the table.

- What is the size of the set of instances for this example?
- What is the size of the hypothesis space?
- Give a sequence of S and G boundary sets computed by the CANDIDATE-ELIMINATION algorithm if it is given the sequence of examples above in the order in which they appear on the table.
- Suggest an ordering of the examples that would minimize the sizes of the intermediate S and G sets. Explain your answer.
- Suggest a query guaranteed to reduce the size of the Version Space regardless of how the trainer classifies it.
- (Extra Credit) Complete the proof of Theorem 2.1. from the textbook. That is, prove that every hypothesis that is consistent with the training data lies between the most specific and the most general boundaries S and G in the partially ordered hypothesis space.

No. | Student | First last year? | Male? | Works hard? | Drinks? | First this year? |

1 | Richard | yes | yes | no | yes | yes |

2 | Alan | yes | yes | yes | no | yes |

3 | Alison | no | no | yes | no | yes |

4 | Jeff | no | yes | no | yes | no |

5 | Gail | yes | no | yes | yes | yes |

6 | Simon | no | yes | yes | yes | no |

- What is the entropy of this collection of training examples with respect to the target function classification?
- Construct a minimal decision tree for this dataset. Show your work.
- Use your decision tree from the previous part to classify the following instances:
No. Student First last year? Male? Works hard? Drinks? First this year? 7 Matthew no yes no yes ?? 8 Mary no no yes yes ?? - Suppose that the trainer tells you that your classifications of the instances are incorrect (i.e. the right classification of each of the instances is "yes" if you said "no", and "no" if you said "yes"). Discuss a way to post-processig your tree (without constructing it from scratch again) so that the resulting decision tree better represent the training data together with the two test instances.
- (Extra Credit) Suppose that we are trying to learn a concept C and that A is one of the predicting attributes. Let n be the number of values that C can assume and k the number of values that A can assume. (For instance, if C=PlayTennis and A is Outlook, then n = 2 (yes,no) and k=3 (sunny, overcast, rainy).) What is the interval of real numbers in which the value of the entropy of A with respect to S lies? In other words, find real numbers a and b such that a <= entropy(A) <= b.

A | B | ~A v B |

1 | 1 | 1 |

1 | 0 | 0 |

0 | 1 | 1 |

0 | 0 | 1 |

- Can this function be represented by a perceptron? Explain your answer.
- If yes, construct a perceptron that represents the function. Otherwise construct a multilayer neural network that represents it.
- Explain informally why the delta training rule in Equation (4.10) is only an approximation to the true gradient descent rule of Equation (4.7).

- What is the standard deviation in ERRORs(h)?
- What is the 90% confidence interval (two-sided) for the true error rate?
- What is the 95% one-sided interval?
- What is the 90% one-sided interval?

- Construct a Bayesian network that represents all attributes in the janitor example, assuming that the predicting attributes are pairwise independent. Provide the probability table for each of the predicting attributes.
- Show how a naive Bayesian classifier would classify the following instance:
STATUS FLOOR DEPT. OFFICE SIZE RECYCLING BIN? student four cs small ?? - Assume now that OFFICE SIZE depends on STATUS.
- Construct a Bayesian network that represents all attributes.
Assume that the conditional probability table for OFFICE SIZE is the following:
Office Size faculty student staff -------------------------------------- small 1/8 1/2 1/4 medium 1/4 1/4 2/4 large 5/8 1/4 1/4

- Compute P(Recycling-bin?=yes | Office-size = medium, Dept = cs, Floor = four)

- Construct a Bayesian network that represents all attributes.
Assume that the conditional probability table for OFFICE SIZE is the following:

- According to Dietterich, T. "Machine-Learning Research: Four Current Directions" AAAI Magazine, Winter 1997, what are the assumptions under which a majority vote in an ensemble of classifiers improve upon individual classifications?
- What is the main idea of the procedure for training neural nets described on the "Nature" paper that we discussed in class? (Note: I don't have the paper with me now, but I'll add the complete reference here later on).
- From the paper "Combining Neural Networks and Context-Driven search for Online, printed Handwriting Recognition in the NEWTON" by L. Yaeger et al., select one of the techniques for helping a neural network better encode class probabilities for underrepresented classes and writing styles and explain the technique in your own words.