Assume that each of the attributes assumes just the values that
appear on the table.

What is the size of the set of instances for this example?
Attributes: STATUS FLOOR DEPT. OFFICE SIZE
# values: 2 x 2 x 2 x 2 = 16

What is the size of the hypothesis space?
Attributes: STATUS FLOOR DEPT. OFFICE SIZE
# values: 3 x 3 x 3 x 3 + 1 = 82
 
(= 2 + "?") [null,null,null,null]

Give a sequence of S and G boundary sets computed by the
CANDIDATEELIMINATION algorithm if it is given the sequence of
examples above in the order in which they appear on the table.
S0 = {[null,null,null,null]}
S1 = {[faculty,four,cs,medium]}
S2 = {[faculty,four,?,medium]}
S3 = S2
S4 = {[faculty,?,?,medium]}
G4 = G3
G3 = {[faculty,?,?,?], [?,?,?,medium]}
G2 = G1
G1 = G0
G0 = {[?,?,?,?]}

Suggest an ordering of the examples that would minimize
the sizes of the intermediate S and G sets. Explain your answer.
Note that the sum of the sizes of the intermediate S and G sets
is at least 2*(# of examples + 1) = 10, counting S0 and G0, since
each S and G has at least one element.
For the trace shown above the sum of the sizes is 12. Since
the final G contains 2 elements, then the minimal sum of the sizes
is obtained when the "last G" (i.e. G4) is the only one containing
2 elements. That can be achieved if we process all the positive
examples before processing the negative examples. In that case
the sum of the sizes is 11.

Suggest a query guaranteed to reduce the size of the
Version Space regardless of how the trainer classifies it.
Query = [faculty,four,cs,small]
If this is a positive instance,
then the boundaries can be updated to:
S = [faculty,?,?,?]
G = [faculty,?,?,?]
If this is a negative instance,
then the boundaries can be updated to:
S = [faculty,?,?,medium]
G = [?,?,?,medium]
 (Extra Credit)
Complete the proof of Theorem 2.1. from the textbook.
That is, prove that every hypothesis that is consistent with the
training data lies between the most specific and the most general
boundaries S and G in the partially ordered hypothesis space.
By way of contradiction, assume that there are hypotheses in H
that are consistent w.r.t. D, but do not lie between S and G.
Let K be the set of all those hyotheses. That is,
K = { h in H  Consistent(h,D) and h does not lie between
S and G in the partially ordered hypothesis space}
Let h be a maximally specific hypothesis in K (i.e. h has the
property that there is no h' in K such that h' is more specific
than h).
Case 1: If there is no hypothesis h' in H consistent with D that
is more specific than h then, by definition h would belong to s
contradiction our assumption that h belongs to K.
Case 2: If there is a hypothesis h' in H consistent with D that is
more specific than h then h' belongs to VS, since h is maximally
specific in K. Now we just need to show that we can find some
h'' in VS such that h'' is more general than or equal to h,
which would imply that h belongs to VS with would be a contradiction
with our assumption.
Case 2.1: If there is no consistent hypothesis h' in H more
general than h, then by definition h belongs to G.
Case 2.2: If there is a consistent hypothesis h' in H more
general than h, then:
Case 2.2.1: If h' belongs to VS then we're done.
Case 2.2.2: If h' belongs to K, then construct a chain
h <=g h1 <=g h2 <=g ... <=g h_n where h_n is a maximally
general hypothesis in K. Following the same arguments of
Case 1 and beginning of Case 2, one can show that either
h'' = h_n belongs to G or there is some h'' in VS such that
h_n <=g h''. In both cases, h'' <=g h, and hence h would
belong to VS, yielding a contradiction.
Since all the cases above yield a contradiction, then K must be
empty and so every hypothesis that is consistent with the
training data lies between the most specific and the most general
boundaries S and G in the partially ordered hypothesis space.
 What is the entropy of this collection of training examples with respect to
the target function classification?
entropy(S) =  (4/6)log_2(4/6)  (2/6)log_2(2/6)
here log_2 denotes logarithm in base 2.
 Construct a minimal decision tree for this dataset. Show your work.
entropyS(1stlastyear)
= (3/6)[(3/3)log_2(3/3)  0] + (3/6)[(1/3)log_2(1/3)  (2/3)log_2(2/3)]
= 0.459
entropyS(male)
= (4/6)[(1/2)log_2(1/2)  (1/2)log_2(1/2)] + (2/6)[(2/2)log_2(2/2)  0]
= 0.666
entropyS(workshard)
= (4/6)[(3/4)log_2(3/4)  (1/4)log_2(1/4)] + (2/6)[(1/2)log_2(1/2)  (1/2)log_2(1/2)]
= 0.8741
entropyS(drinks)
= (4/6)[(1/2)log_2(1/2)  (1/2)log_2(1/2)] + (2/6)[(2/2)log_2(2/2)  0]
= 0.666
1stlastyear is the attribute with the smallest entropy and so we use it for the
root of the tree:
1stlastyear?
/ \
yes / \ no
/ \
1+, 2+, 5+ 3+, 4, 6
YES
We need to split the rightmost node which contains examples 3, 4, 6 only.
Let S' be:
No.  Student  First last year?  Male?  Works hard?  Drinks?
 First this year?

3  Alison  no  no  yes  no  yes

4  Jeff  no  yes  no  yes  no

6  Simon  no  yes  yes  yes  no

Now we need to select from male?, workshard?, and drinks? the attribute
with the smallest entropy w.r.t. S'
entropyS'(drinks?)
= (2/3)[(2/2)log_2(2/2)  0] + (1/3)[(1/1)log_2(1/1)  0]
= 0
Since this is the smallest possible entropy, then drinks? is choosen.
1stlastyear?
/ \
yes / \ no
/ \
1+, 2+, 5+ drinks?
YES / \
no / \ yes
/ \
3+ 4, 6
YES NO
 Use your decision tree from the previous part to classify the following instances:
No.  Student  First last year?  Male?  Works hard?  Drinks?
 First this year?

7  Matthew  no  yes  no  yes  ??

8  Mary  no  no  yes  yes  ??

Both instances are classified as negative by the previous decision tree.
 Suppose that the trainer tells you that your classifications of the instances
are incorrect (i.e. the right classification of each of the instances is "yes"
if you said "no", and
"no" if you said "yes"). Discuss a way to postprocess your tree (without
constructing it from scratch again) so that the resulting decision tree
better represents the training data together with the two test instances.
If the correct answer for each instance is "yes", then the drinks? node
can be pruned and replaced by a majority vote of the remaining instances.
That is,
1stlastyear?
/ \
yes / \ no
/ \
1+, 2+, 5+ 3+, 4, 6, 7+, 8+
YES Majority vote = YES
The accuracy of the tree remains the same (6/8), but the tree is now smaller.
Another possibility is to replace drinks? by male? (which also has
entropy 0 with respect to S'). In that case only one instance is misclassified,
and hence the accuracy of the decision tree increases to 7/8.
 (Extra Credit)
Suppose that we are trying to learn a concept C and
that A is one of the predicting attributes.
Let n be the number of values that C can assume and
k the number of values that A can assume.
(For instance, if C=PlayTennis and A is Outlook, then n = 2 (yes,no)
and k=3 (sunny, overcast, rainy).)
What is the interval of real numbers in which the value of
the entropy of A with respect to S lies?
In other words, find real numbers a and b such that
a <= entropy(A) <= b.
0 <= entropy(A) <= log_2(k)