### Problem 1 (20 Points): Concept Learning

Consider the following set of training examples (taken from Dean, Allen, and Aloimonos' book) to train a robot janitor to predict whether or not an office contains a recycling bin.

 STATUS FLOOR DEPT. OFFICE SIZE RECYCLING BIN? 1. faculty four cs medium yes 2. faculty four ee medium yes 3. student four cs small no 4. faculty five cs medium yes

Assume that each of the attributes assumes just the values that appear on the table.

1. What is the size of the set of instances for this example?
2. What is the size of the hypothesis space?
3. Give a sequence of S and G boundary sets computed by the CANDIDATE-ELIMINATION algorithm if it is given the sequence of examples above in the order in which they appear on the table.
4. Suggest an ordering of the examples that would minimize the sizes of the intermediate S and G sets. Explain your answer.
5. Suggest a query guaranteed to reduce the size of the Version Space regardless of how the trainer classifies it.
6. (Extra Credit) Complete the proof of Theorem 2.1. from the textbook. That is, prove that every hypothesis that is consistent with the training data lies between the most specific and the most general boundaries S and G in the partially ordered hypothesis space.

### Problem 2 (20 Points): Decision Trees

The following table containing Student Exam Performance Data is taken from section 7.4 of A. Cawsey, "The Essence of Artificial Intelligence", Prentice Hall Europe, 1998.

 No. Student First last year? Male? Works hard? Drinks? First this year? 1 Richard yes yes no yes yes 2 Alan yes yes yes no yes 3 Alison no no yes no yes 4 Jeff no yes no yes no 5 Gail yes no yes yes yes 6 Simon no yes yes yes no

1. What is the entropy of this collection of training examples with respect to the target function classification?
2. Construct a minimal decision tree for this dataset. Show your work.
3. Use your decision tree from the previous part to classify the following instances:
 No. Student First last year? Male? Works hard? Drinks? First this year? 7 Matthew no yes no yes ?? 8 Mary no no yes yes ??
4. Suppose that the trainer tells you that your classifications of the instances are incorrect (i.e. the right classification of each of the instances is "yes" if you said "no", and "no" if you said "yes"). Discuss a way to post-processig your tree (without constructing it from scratch again) so that the resulting decision tree better represent the training data together with the two test instances.
5. (Extra Credit) Suppose that we are trying to learn a concept C and that A is one of the predicting attributes. Let n be the number of values that C can assume and k the number of values that A can assume. (For instance, if C=PlayTennis and A is Outlook, then n = 2 (yes,no) and k=3 (sunny, overcast, rainy).) What is the interval of real numbers in which the value of the entropy of A with respect to S lies? In other words, find real numbers a and b such that a <= entropy(A) <= b.

### Problem 3 (15 Points): Neural Networks

Consider the following Boolean function

 A B ~A v B 1 1 1 1 0 0 0 1 1 0 0 1

1. Can this function be represented by a perceptron? Explain your answer.
2. If yes, construct a perceptron that represents the function. Otherwise construct a multilayer neural network that represents it.
3. Explain informally why the delta training rule in Equation (4.10) is only an approximation to the true gradient descent rule of Equation (4.7).

### Problem 4 (15 Points): Evaluating Hypotheses

Suppose that hypothesis h commits r=10 errors over a sample of n=65 independently drawn examples.
1. What is the standard deviation in ERRORs(h)?
2. What is the 90% confidence interval (two-sided) for the true error rate?
3. What is the 95% one-sided interval?
4. What is the 90% one-sided interval?

### Problem 5 (15 Points): Bayesian Learning

For the janitor example of Problem 1,
1. Construct a Bayesian network that represents all attributes in the janitor example, assuming that the predicting attributes are pairwise independent. Provide the probability table for each of the predicting attributes.
2. Show how a naive Bayesian classifier would classify the following instance:

 STATUS FLOOR DEPT. OFFICE SIZE RECYCLING BIN? student four cs small ??

3. Assume now that OFFICE SIZE depends on STATUS.
1. Construct a Bayesian network that represents all attributes. Assume that the conditional probability table for OFFICE SIZE is the following:
```          Office Size    faculty  student staff
--------------------------------------
small          1/8       1/2    1/4
medium         1/4       1/4    2/4
large          5/8       1/4    1/4
```
2. Compute P(Recycling-bin?=yes | Office-size = medium, Dept = cs, Floor = four)

### Problem 6 (10 Points): Papers

1. According to Dietterich, T. "Machine-Learning Research: Four Current Directions" AAAI Magazine, Winter 1997, what are the assumptions under which a majority vote in an ensemble of classifiers improve upon individual classifications?
2. What is the main idea of the procedure for training neural nets described on the "Nature" paper that we discussed in class? (Note: I don't have the paper with me now, but I'll add the complete reference here later on).
3. From the paper "Combining Neural Networks and Context-Driven search for Online, printed Handwriting Recognition in the NEWTON" by L. Yaeger et al., select one of the techniques for helping a neural network better encode class probabilities for underrepresented classes and writing styles and explain the technique in your own words.