@relation contact-lenses-weka.filters.AttributeFilter-R2 @attribute age {young,pre-presbyopic,presbyopic} @attribute astigmatism {no,yes} @attribute tear-prod-rate {reduced,normal} @attribute contact-lenses {soft,hard,none} @data young, no, normal, soft young, yes, reduced, none young, yes, normal, hard pre-presbyopic, no, reduced, none pre-presbyopic, no, normal, soft pre-presbyopic, yes, normal, hard pre-presbyopic, yes, normal, none pre-presbyopic, yes, normal, none presbyopic, no, reduced, none presbyopic, no, normal, none presbyopic, yes, reduced, none presbyopic, yes, normal, hard
x | 1/2 | 1/3 | 1/4 | 3/4 | 1/5 | 2/5 | 3/5 | 1/6 | 5/6 | 1/7 | 2/7 | 3/7 | 4/7 | 1 |
log2(x) | -1 | -1.5 | -2 | -0.4 | -2.3 | -1.3 | -0.7 | -2.5 | -0.2 | -2.8 | -1.8 | -1.2 | -0.8 | 0 |
First we have to choose the root of the tree. The root will be the attribute with lowest entropy. CONTACT LENSES SOFT HARD NONE AGE: young (3/12)*[ -(1/3)*log2(1/3) - (1/3)*log2(1/3) - (1/3)*log2(1/3) ] = (3/12)*1.5847 pre-presb (5/12)*[ -(1/5)*log2(1/5) - (1/5)*log2(1/5) - (3/5)*log2(3/5) ] = (5/12)*1.34 presb (4/12)*[ -(0/4)*log2(0/4) - (1/4)*log2(1/4) - (3/4)*log2(3/4) ] = (4/12)*0.8 ------------- = 1.222 ASTIGM: yes (7/12)*[ -(0/7)*log2(0/7) - (3/7)*log2(3/7) - (4/7)*log2(4/7) ] = (7/12)*0.9704 no (5/12)*[ -(2/5)*log2(2/5) - (0/5)*log2(0/5) - (3/5)*log2(3/5) ] = (5/12)*0.94 -------------- = 0.957 TEAR-P: normal (8/12)*[ -(2/8)*log2(2/8) - (3/8)*log2(3/8) - (3/8)*log2(3/8) ] = (8/12)*1.56 reduced (4/12)*[ -(0/4)*log2(0/4) - (0/4)*log2(0/4) - (4/4)*log2(4/4) ] = (4/12)*0 ------------ = 1.04 Since the attribute astigmatism has the lowest entropy, the root of the ID3 tree will be astigmatism: ASTIGMATISM / \ yes / \ no / \ Now for astigmatism=yes we measure the entropy for the rest of the two attributes:age, tears-pr-rate AGE: young (2/7)*[ 0 - (1/2)*log2(1/2) - (1/2)*log2(1/2) ] = 0.286 pre-presb (3/7)*[ 0 - (1/3)*log2(1/3) - (2/3)*log2(2/3) ] = 0.394 presb (2/12)*[0 - (1/2)*log2(1/2) - (1/2)*log2(1/2)] = 0.286 ------------- = 0.966 TEAR-P: normal (5/7)*[ 0-(3/5)*log2(3/5) - (2/5)*log2(2/5)] = 0.694 reduced (2/7)*[ 0-0 - (1)*log2(1) ] = 0 ------------ = 0.694 So the next node in the tree will be tears-pr-rate, and the last age. So far the tree look like ASTIGMATISM / \ yes / \ no / \ Tp-rate / \ normal / \ reduced / \ Now according to the ID3 for the branch reduced all examples have class none and the tree will have leaf none. Branch normal doesn't give single class so we have to add the last attribute age to this branch. ASTIGMATISM / \ yes / \ no / \ Tp-rate / \ normal / \ reduced / \ Age none / | \ young /PP| \ P / | \ hard none hard Now for astigmatism=no we measure the entropy for the rest of the two attributes:age, tears-pr-rate AGE: young (1/5)*[ 0 - 0 - 1*log2(1) ] = 0 pre-presb (2/5)*[ 0 - (1/2)*log2(1/2) - (1/2)*log2(1/2) ] = 0.4 presb (2/5)*[0 - 0 - (2/2)*log2(2/2)] = 0 ------------- = 0.4 TEAR-P: normal (3/5)*[ 0-(2/3)*log2(2/3) - (1/3)*log2(1/3)] = 0.55 reduced (2/5)*[ 0-0 - (2/2)*log2(2/2) ] = 0 ------------ = 0.55 Since the gain of the attribute age is with smaller entropy, we add this atttribute to the tree: ASTIGMATISM / \ yes / \ no / \ TP-rate AGE / \ | \ \ normal / red. \ young| P| \PP / \ | | \ AGE none \ / | \ young /PP| \ P / | \ hard none hard Now for branch(astigmastism=no, age =yes) all examples have class soft and for this branch the tree will have leaf soft. Now for branch(astigmastism=no, age =P) all examples have class none and for this branch the tree will have leaf none. The Branch (astigmastism=no, age =PP) doesn't give single class so we have to add the last attribute TP-rate to this branch. So we will obtain the final tree: ASTIGMATISM / \ yes / \ no / \ TP-rate AGE / \ | \ \ normal / red. \ young| P| \PP / \ | | \ AGE none soft none \ / | \ TP-rate young /PP| \ P / \ / | \ reduced / \ normal hard none hard / \ none soft
young, no, reduced, none our decision tree predicts: soft (wrong:test instance's class is none) pre-pre, yes, reduced, none our decision tree predicts: none (correct:test instance's class is none) presbyopic, no, normal, soft our decision tree predicts: none (wrong:test instance's class is none) presbyopic, yes, normal, none our decision tree predicts: hard (wrong:test instance's class is none) So the accuracy of this decision tree on this test dataset is: 25%
Pruning a decision node consists of removing the subtree rooted at that node, making it a leaf node, and assigning it the most common classification of the training examples affiliated with that node. Nodes are removed only if the resulting pruned tree performs no worse than the original over the testing set. So the the rightmost "age" node is the first candidat for pruning.
Implementing this prunning technique and pruning the first candidate will give us the following tree: Astigmatism / \ yes no / \ Tp-rate age / \ / | \ normal reduced young P PP / \ / | \ hard none soft none Tp-rate / \ reduced normal / \ none soft The node will be pruned because the obtaiened tree performs no worse then the original tree on the testing data (i.e. the accuracy of the pruned decision tree is 25% as well).