CS539 MACHINE LEARNING. SPRING 99
Practice Exam 1
PROF. CAROLINA RUIZ
Department of Computer Science
Worcester Polytechnic Institute
Problem 1 (20 Points): Concept Learning
Consider the following set of
training examples (taken from Dean, Allen, and Aloimonos' book)
to train a robot janitor to predict whether or not
an office contains a recycling bin.
| STATUS | FLOOR | DEPT. | OFFICE SIZE
| RECYCLING BIN?
|
1. | faculty | four | cs | medium | yes
|
2. | faculty | four | ee | medium | yes
|
3. | student | four | cs | small | no
|
4. | faculty | five | cs | medium | yes
|
Assume that each of the attributes assumes just the values that
appear on the table.
-
What is the size of the set of instances for this example?
-
What is the size of the hypothesis space?
-
Give a sequence of S and G boundary sets computed by the
CANDIDATE-ELIMINATION algorithm if it is given the sequence of
examples above in the order in which they appear on the table.
-
Suggest an ordering of the examples that would minimize
the sizes of the intermediate S and G sets. Explain your answer.
-
Suggest a query guaranteed to reduce the size of the
Version Space regardless of how the trainer classifies it.
- (Extra Credit)
Complete the proof of Theorem 2.1. from the textbook.
That is, prove that every hypothesis that is consistent with the
training data lies between the most specific and the most general
boundaries S and G in the partially ordered hypothesis space.
Problem 2 (20 Points): Decision Trees
The following table containing Student Exam Performance Data is taken from section 7.4 of
A. Cawsey, "The Essence of Artificial Intelligence", Prentice Hall Europe, 1998.
No. | Student | First last year? | Male? | Works hard? | Drinks?
| First this year?
|
1 | Richard | yes | yes | no | yes | yes
|
2 | Alan | yes | yes | yes | no | yes
|
3 | Alison | no | no | yes | no | yes
|
4 | Jeff | no | yes | no | yes | no
|
5 | Gail | yes | no | yes | yes | yes
|
6 | Simon | no | yes | yes | yes | no
|
- What is the entropy of this collection of training examples with respect to
the target function classification?
- Construct a minimal decision tree for this dataset. Show your work.
- Use your decision tree from the previous part to classify the following instances:
No. | Student | First last year? | Male? | Works hard? | Drinks?
| First this year?
|
7 | Matthew | no | yes | no | yes | ??
|
8 | Mary | no | no | yes | yes | ??
|
- Suppose that the trainer tells you that your classifications of the instances
are incorrect (i.e. the right classification of each of the instances is "yes"
if you said "no", and
"no" if you said "yes"). Discuss a way to post-processig your tree (without
constructing it from scratch again) so that the resulting decision tree
better represent the training data together with the two test instances.
- (Extra Credit)
Suppose that we are trying to learn a concept C and
that A is one of the predicting attributes.
Let n be the number of values that C can assume and
k the number of values that A can assume.
(For instance, if C=PlayTennis and A is Outlook, then n = 2 (yes,no)
and k=3 (sunny, overcast, rainy).)
What is the interval of real numbers in which the value of
the entropy of A with respect to S lies?
In other words, find real numbers a and b such that
a <= entropy(A) <= b.
Problem 3 (15 Points): Neural Networks
Consider the following Boolean function
A | B | ~A v B
|
1 | 1 | 1
|
1 | 0 | 0
|
0 | 1 | 1
|
0 | 0 | 1
|
- Can this function be represented by a perceptron? Explain your answer.
- If yes, construct a
perceptron that represents the function. Otherwise construct a multilayer
neural network that represents it.
- Explain informally why the delta training rule in Equation (4.10) is
only an approximation to the true gradient descent rule of Equation (4.7).
Problem 4 (15 Points): Evaluating Hypotheses
Suppose that hypothesis h commits r=10 errors over a sample of n=65
independently drawn examples.
- What is the standard deviation in ERRORs(h)?
- What is the 90% confidence interval (two-sided) for the true error rate?
- What is the 95% one-sided interval?
- What is the 90% one-sided interval?
Problem 5 (15 Points): Bayesian Learning
For the janitor example of Problem 1,
- Construct a Bayesian network that represents all attributes in the janitor example,
assuming that the predicting attributes are pairwise independent. Provide the
probability table for each of the predicting attributes.
- Show how a naive Bayesian classifier would classify the following instance:
| STATUS | FLOOR | DEPT. | OFFICE SIZE
| RECYCLING BIN?
|
| student | four | cs | small | ??
|
- Assume now that OFFICE SIZE depends on STATUS.
- Construct a Bayesian network that represents all attributes.
Assume that the conditional probability table for OFFICE SIZE is the following:
Office Size faculty student staff
--------------------------------------
small 1/8 1/2 1/4
medium 1/4 1/4 2/4
large 5/8 1/4 1/4
- Compute
P(Recycling-bin?=yes | Office-size = medium, Dept = cs, Floor = four)
Problem 6 (10 Points): Papers
- According to Dietterich, T. "Machine-Learning Research: Four Current Directions"
AAAI Magazine, Winter 1997,
what are the assumptions under which a majority vote in an ensemble of
classifiers improve upon individual classifications?
- What is the main idea of the procedure for training neural nets described on
the "Nature" paper that we discussed in class? (Note: I don't have the paper with me
now, but I'll add the complete reference here later on).
- From the paper "Combining Neural Networks and Context-Driven search for
Online, printed Handwriting Recognition in the NEWTON" by L. Yaeger et al.,
select one of the techniques for helping a neural network better encode
class probabilities for underrepresented classes and writing styles and
explain the technique in your own words.