Prof. Carolina Ruiz
Department of Computer Science
Worcester Polytechnic Institute

Solutions - Problem I. Decision Trees

Consider the following subset of the Weka's contact lenses dataset.
@relation contact-lenses-weka.filters.AttributeFilter-R2

@attribute age {young,pre-presbyopic,presbyopic}
@attribute astigmatism {no,yes}
@attribute tear-prod-rate {reduced,normal}
@attribute contact-lenses {soft,hard,none}

@data

young,          no,  normal,  soft
young,          yes, reduced, none
young,          yes, normal,  hard
pre-presbyopic, no,  reduced, none
pre-presbyopic, no,  normal,  soft
pre-presbyopic, yes, normal,  hard
pre-presbyopic, yes, normal,  none
pre-presbyopic, yes, normal,  none
presbyopic,     no,  reduced, none
presbyopic,     no,  normal,  none
presbyopic,     yes, reduced, none
presbyopic,     yes, normal,  hard


  1. (25 points) Construct the full ID3 decision tree using entropy to rank the predictive attributes (age, astigmatism, tear-prod-rate) with respect to the target/classification attribute (contact-lenses). Show all the steps of the calculations. For your inconvenience, the logarithm in base 2 of selected values are provided.

    x 1/2 1/3 1/4 3/4 1/5 2/5 3/5 1/6 5/6 1/7 2/7 3/7 4/7 1
    log2(x) -1 -1.5 -2 -0.4 -2.3 -1.3 -0.7 -2.5 -0.2 -2.8 -1.8 -1.2 -0.8 0
    
    First we have to choose the root of the tree. The root will be the attribute with lowest entropy.
    
    
    
    CONTACT LENSES               SOFT               HARD                 NONE
    
    AGE: young       (3/12)*[ -(1/3)*log2(1/3)  - (1/3)*log2(1/3) - (1/3)*log2(1/3) ] = (3/12)*1.5847
         pre-presb   (5/12)*[ -(1/5)*log2(1/5)  - (1/5)*log2(1/5) - (3/5)*log2(3/5) ] = (5/12)*1.34
         presb       (4/12)*[ -(0/4)*log2(0/4)  - (1/4)*log2(1/4) - (3/4)*log2(3/4) ] = (4/12)*0.8
                                                                                        -------------
                                                                                      = 1.222
    
    
    ASTIGM: yes      (7/12)*[ -(0/7)*log2(0/7)  - (3/7)*log2(3/7) - (4/7)*log2(4/7) ] = (7/12)*0.9704
            no       (5/12)*[ -(2/5)*log2(2/5)  - (0/5)*log2(0/5) - (3/5)*log2(3/5) ] = (5/12)*0.94
                                                                                        --------------
                                                                                      = 0.957
     
    TEAR-P: normal   (8/12)*[ -(2/8)*log2(2/8)  - (3/8)*log2(3/8) - (3/8)*log2(3/8) ] = (8/12)*1.56
            reduced  (4/12)*[ -(0/4)*log2(0/4)  - (0/4)*log2(0/4) - (4/4)*log2(4/4) ] = (4/12)*0
                                                                                        ------------
                                                                                      = 1.04
    
    
    Since the attribute astigmatism has the lowest entropy, the root of the ID3 tree will be astigmatism:
    
                       ASTIGMATISM
                          /     \
                    yes  /       \ no
                        /         \
    
    
    
    
    Now for astigmatism=yes we  measure the entropy for the rest of the two attributes:age, tears-pr-rate
    
    AGE: young       (2/7)*[ 0 - (1/2)*log2(1/2) - (1/2)*log2(1/2) ] = 0.286
         pre-presb   (3/7)*[ 0 - (1/3)*log2(1/3) - (2/3)*log2(2/3) ] = 0.394
         presb       (2/12)*[0 - (1/2)*log2(1/2)  - (1/2)*log2(1/2)] = 0.286
                                                                   -------------
                                                                     = 0.966
    
    
    
    TEAR-P: normal   (5/7)*[ 0-(3/5)*log2(3/5)  - (2/5)*log2(2/5)] = 0.694
            reduced  (2/7)*[ 0-0 - (1)*log2(1) ]                   = 0
                                                                 ------------
                                                                   = 0.694
    
    
    So the next node in the tree will be tears-pr-rate, and the last age.
    So far the tree look like
    
    		
                       ASTIGMATISM
                          /     \
                    yes  /       \ no
                        /         \
    	       Tp-rate
    	      /      \
         normal  /        \ reduced
                /          \
        
    
    Now according to the ID3 for the branch reduced all examples have class none and the tree will
    have leaf none. Branch normal doesn't give single class so we have to add the last attribute 
    age to this branch. 
    
    
                        ASTIGMATISM
                          /     \
                    yes  /       \ no
                        /         \
    	       Tp-rate
    	      /      \
         normal  /        \ reduced
                /          \
              Age          none
             / | \
      young /PP|  \ P
           /   |   \
        hard  none  hard
    
    Now for astigmatism=no we measure the entropy for the rest of the two attributes:age, tears-pr-rate
    
    AGE: young       (1/5)*[ 0 - 0 - 1*log2(1) ]                     = 0
         pre-presb   (2/5)*[ 0 - (1/2)*log2(1/2) - (1/2)*log2(1/2) ] = 0.4
         presb       (2/5)*[0 - 0  - (2/2)*log2(2/2)]                = 0
                                                                   -------------
                                                                     = 0.4
    
    
    TEAR-P: normal   (3/5)*[ 0-(2/3)*log2(2/3)  - (1/3)*log2(1/3)] = 0.55
            reduced  (2/5)*[ 0-0 - (2/2)*log2(2/2) ]               = 0
                                                                 ------------
                                                                   = 0.55
    
    Since the gain of the attribute age is with smaller entropy, we add this atttribute to
    the tree:
    
    	              ASTIGMATISM
                          /         \
                    yes  /           \ no
                        /             \
    	       TP-rate            AGE
    	      /      \            | \ \
         normal  /   red. \      young| P| \PP
                /          \          |  |  \
              AGE          none              \
             / | \
      young /PP|  \ P
           /   |   \
        hard  none  hard
    	   
    Now for  branch(astigmastism=no, age =yes) all examples have class soft and for
    this branch the tree will have leaf soft.
    Now for  branch(astigmastism=no, age =P) all examples have class none and for
    this branch the tree will have leaf none.
    
    The Branch (astigmastism=no, age =PP) doesn't give single class so we have to add the last attribute 
    TP-rate to this branch. So we will obtain the final tree:
    	                            
                          ASTIGMATISM
                          /         \
                    yes  /           \ no
                        /             \
    	       TP-rate            AGE
    	      /      \            | \ \
         normal  /   red. \      young| P| \PP
                /          \          |  |  \
              AGE        none     soft none  \
             / | \                           TP-rate
      young /PP|  \ P                        /      \
           /   |   \               reduced  /        \ normal
        hard  none  hard                   /          \
    	                            none         soft
    
                                     
    
    
    

  2. (10 points) Compute the accuracy of the decision tree you constructed on the following test data instances:
    young,      no,  reduced, none     our decision tree predicts: soft  (wrong:test instance's  class is none)
    pre-pre,    yes, reduced, none     our decision tree predicts: none  (correct:test instance's  class is none) 
    presbyopic, no,  normal,  soft     our decision tree predicts: none  (wrong:test instance's  class is none)
    presbyopic, yes, normal,  none     our decision tree predicts: hard  (wrong:test instance's  class is none)
    
    
    So the accuracy of this decision tree on this test dataset is: 25%
    
    

  3. (10 points) Consider the subtree replacement pruning technique to make your decision tree more accurate over the this test data.