CS444X Data Mining and Knowledge Discovery in Databases. D-2003
Solutions Exam 1 - April 4, 2003

Solutions by Prof. Carolina Ruiz, Alek Icev, and Senthil Palanisamy
Department of Computer Science
Worcester Polytechnic Institute

Instructions


Problem I. Decision Trees (45 points)

Consider the following subset of the Weka's contact lenses dataset.
@relation contact-lenses-weka.filters.AttributeFilter-R2

@attribute age {young,pre-presbyopic,presbyopic}
@attribute astigmatism {no,yes}
@attribute tear-prod-rate {reduced,normal}
@attribute contact-lenses {soft,hard,none}

@data

young,          no,  normal,  soft
young,          yes, reduced, none
young,          yes, normal,  hard
pre-presbyopic, no,  reduced, none
pre-presbyopic, no,  normal,  soft
pre-presbyopic, yes, normal,  hard
pre-presbyopic, yes, normal,  none
pre-presbyopic, yes, normal,  none
presbyopic,     no,  reduced, none
presbyopic,     no,  normal,  none
presbyopic,     yes, reduced, none
presbyopic,     yes, normal,  hard


  1. (25 points) Construct the full ID3 decision tree using entropy to rank the predictive attributes (age, astigmatism, tear-prod-rate) with respect to the target/classification attribute (contact-lenses). Show all the steps of the calculations. For your inconvenience, the logarithm in base 2 of selected values are provided.

    x 1/2 1/3 1/4 3/4 1/5 2/5 3/5 1/6 5/6 1/7 2/7 3/7 4/7 1
    log2(x) -1 -1.5 -2 -0.4 -2.3 -1.3 -0.7 -2.5 -0.2 -2.8 -1.8 -1.2 -0.8 0
    
    First we have to choose the root of the tree. The root will be the attribute with lowest entropy.
    
    
    
    CONTACT LENSES               SOFT               HARD                 NONE
    
    AGE: young       (3/12)*[ -(1/3)*log2(1/3)  - (1/3)*log2(1/3) - (1/3)*log2(1/3) ] = (3/12)*1.5847
         pre-presb   (5/12)*[ -(1/5)*log2(1/5)  - (1/5)*log2(1/5) - (3/5)*log2(3/5) ] = (5/12)*1.34
         presb       (4/12)*[ -(0/4)*log2(0/4)  - (1/4)*log2(1/4) - (3/4)*log2(3/4) ] = (4/12)*0.8
                                                                                        -------------
                                                                                      = 1.222
    
    
    ASTIGM: yes      (7/12)*[ -(0/7)*log2(0/7)  - (3/7)*log2(3/7) - (4/7)*log2(4/7) ] = (7/12)*0.9704
            no       (5/12)*[ -(2/5)*log2(2/5)  - (0/5)*log2(0/5) - (3/5)*log2(3/5) ] = (5/12)*0.94
                                                                                        --------------
                                                                                      = 0.957
     
    TEAR-P: normal   (8/12)*[ -(2/8)*log2(2/8)  - (3/8)*log2(3/8) - (3/8)*log2(3/8) ] = (8/12)*1.56
            reduced  (4/12)*[ -(0/4)*log2(0/4)  - (0/4)*log2(0/4) - (4/4)*log2(4/4) ] = (4/12)*0
                                                                                        ------------
                                                                                      = 1.04
    
    
    Since the attribute astigmatism has the lowest entropy, the root of the ID3 tree will be astigmatism:
    
                       ASTIGMATISM
                          /     \
                    yes  /       \ no
                        /         \
    
    
    
    
    Now for astigmatism=yes we  measure the entropy for the rest of the two attributes:age, tears-pr-rate
    
    AGE: young       (2/7)*[ 0 - (1/2)*log2(1/2) - (1/2)*log2(1/2) ] = 0.286
         pre-presb   (3/7)*[ 0 - (1/3)*log2(1/3) - (2/3)*log2(2/3) ] = 0.394
         presb       (2/12)*[0 - (1/2)*log2(1/2)  - (1/2)*log2(1/2)] = 0.286
                                                                   -------------
                                                                     = 0.966
    
    
    
    TEAR-P: normal   (5/7)*[ 0-(3/5)*log2(3/5)  - (2/5)*log2(2/5)] = 0.694
            reduced  (2/7)*[ 0-0 - (1)*log2(1) ]                   = 0
                                                                 ------------
                                                                   = 0.694
    
    
    So the next node in the tree will be tears-pr-rate, and the last age.
    So far the tree look like
    
    		
                       ASTIGMATISM
                          /     \
                    yes  /       \ no
                        /         \
    	       Tp-rate
    	      /      \
         normal  /        \ reduced
                /          \
        
    
    Now according to the ID3 for the branch reduced all examples have class none and the tree will
    have leaf none. Branch normal doesn't give single class so we have to add the last attribute 
    age to this branch. 
    
    
                        ASTIGMATISM
                          /     \
                    yes  /       \ no
                        /         \
    	       Tp-rate
    	      /      \
         normal  /        \ reduced
                /          \
              Age          none
             / | \
      young /PP|  \ P
           /   |   \
        hard  none  hard
    
    Now for astigmatism=no we measure the entropy for the rest of the two attributes:age, tears-pr-rate
    
    AGE: young       (1/5)*[ 0 - 0 - 1*log2(1) ]                     = 0
         pre-presb   (2/5)*[ 0 - (1/2)*log2(1/2) - (1/2)*log2(1/2) ] = 0.4
         presb       (2/5)*[0 - 0  - (2/2)*log2(2/2)]                = 0
                                                                   -------------
                                                                     = 0.4
    
    
    TEAR-P: normal   (3/5)*[ 0-(2/3)*log2(2/3)  - (1/3)*log2(1/3)] = 0.55
            reduced  (2/5)*[ 0-0 - (2/2)*log2(2/2) ]               = 0
                                                                 ------------
                                                                   = 0.55
    
    Since the gain of the attribute age is with smaller entropy, we add this atttribute to
    the tree:
    
    	              ASTIGMATISM
                          /         \
                    yes  /           \ no
                        /             \
    	       TP-rate            AGE
    	      /      \            | \ \
         normal  /   red. \      young| P| \PP
                /          \          |  |  \
              AGE          none              \
             / | \
      young /PP|  \ P
           /   |   \
        hard  none  hard
    	   
    Now for  branch(astigmastism=no, age =yes) all examples have class soft and for
    this branch the tree will have leaf soft.
    Now for  branch(astigmastism=no, age =P) all examples have class none and for
    this branch the tree will have leaf none.
    
    The Branch (astigmastism=no, age =PP) doesn't give single class so we have to add the last attribute 
    TP-rate to this branch. So we will obtain the final tree:
    	                            
                          ASTIGMATISM
                          /         \
                    yes  /           \ no
                        /             \
    	       TP-rate            AGE
    	      /      \            | \ \
         normal  /   red. \      young| P| \PP
                /          \          |  |  \
              AGE        none     soft none  \
             / | \                           TP-rate
      young /PP|  \ P                        /      \
           /   |   \               reduced  /        \ normal
        hard  none  hard                   /          \
    	                            none         soft
    
                                     
    
    
    

  2. (10 points) Compute the accuracy of the decision tree you constructed on the following test data instances:
    young,      no,  reduced, none     our decision tree predicts: soft  (wrong:test instance's  class is none)
    pre-pre,    yes, reduced, none     our decision tree predicts: none  (correct:test instance's  class is none) 
    presbyopic, no,  normal,  soft     our decision tree predicts: none  (wrong:test instance's  class is none)
    presbyopic, yes, normal,  none     our decision tree predicts: hard  (wrong:test instance's  class is none)
    
    
    So the accuracy of this decision tree on this test dataset is: 25%
    
    

  3. (10 points) Consider the subtree replacement pruning technique to make your decision tree more accurate over the this test data.

Problem II. Classification Rules (45 points)

Construct classification rules using the basic PRISM sequential covering algorithm from the 12 instances in the dataset below.
@relation contact-lenses-weka.filters.AttributeFilter-R2

@attribute age {young,pre-presbyopic,presbyopic}
@attribute astigmatism {no,yes}
@attribute tear-prod-rate {reduced,normal}
@attribute contact-lenses {soft,hard,none}

@data

young,          no,  normal,  soft
young,          yes, reduced, none
young,          yes, normal,  hard
pre-presbyopic, no,  reduced, none
pre-presbyopic, no,  normal,  soft
pre-presbyopic, yes, normal,  hard
pre-presbyopic, yes, normal,  none
pre-presbyopic, yes, normal,  none
presbyopic,     no,  reduced, none
presbyopic,     no,  normal,  none
presbyopic,     yes, reduced, none
presbyopic,     yes, normal,  hard
Follow these guidelines in your construction of the rules:

  1. (15 points) Construct perfect classification rules for the class label contact-lenses = soft until all instances with this class label are covered. Use the p/t measure to select conditions to add to the rule. Show all your work!
    
    
    If ?  ==> soft
    
    our choices
    
    age = young  		1/3
    age = pre-presbyopic    1/5
    age = presbyopic 		0/4
    
    astigmatism = no		2/5
    astigmatism = yes       0/7
    
    tear-prod-rate = reduced	0/4
    tear-prod-rate = normal		2/8
    
    pick the attribute-value with the largest fraction and that happens to be astigmatism = no
    add this term to the rule.
    
    if astigmatism=no ==> soft
    
    covers 5 instances and gets 2 right 
    
    pick another attribute-value to improve accuracy
    
    consider those instances where astigmatism = no
    
    age = young  		1/1
    age = pre-presbyopic    1/2
    age = presbyopic 		0/2
    
    tear-prod-rate = reduced	0/2
    tear-prod-rate = normal		2/3
    
    select age=young (highest p/t)
    
    add this to the rule
    
    if astigmatism=no and age=young ==> soft
    
    covers 1 and classifies it right
    
    We can add this to our rule set
    
    Rule 1: if astigmatism=no and age=young ==> soft
    
    so this rule covers 1 out of the 2 instances with class label equal to soft (we're left with 1 more instance with class label = soft).
    
    let's go for another rule
    
    ? ==> soft
    
    age = young  		0/2
    age = pre-presbyopic    1/5
    age = presbyopic 		0/4
    
    astigmatism = no		1/4
    astigmatism = yes       0/7
    
    tear-prod-rate = reduced	0/4
    tear-prod-rate = normal		1/7
    
    select astigmatism=no to add to the rule
    
    astigmatism=no ==> soft
    
    covers 4 and predicts 1 correctly
    
    let's try adding another attribute
    
    age = young  		0/0
    age = pre-presbyopic    1/2
    age = presbyopic 		0/2
    
    tear-prod-rate = reduced	0/2
    tear-prod-rate = normal		1/2
    
    select age=pre-presbyopic to add to the rule
    
    astigmatism=no && age=pre-presbyopic ==> soft
    
    covers 2 and predicts 1 correctly 
    
    let's try adding another attribute
    
    tear-prod-rate = reduced	0/1
    tear-prod-rate = normal		1/1
    
    select tear-prod-rate = normal
    
    add this to the rule
    
    astigmatism=no && age=pre-presbyopic && tear-prod-rate=normal ==> soft
    this covers 1 and classifies it correctly
    
    we're done with all instances with class = soft ( 2 instances )
    
    Rule Set:
    
    Rule 1: if astigmatism=no and age=young ==> soft
    Rule 2: astigmatism = no && age=pre-presbyopic && tear-prod-rate=normal ==> soft
    
    

  2. (15 points) Construct perfect classification rules for the class label contact-lenses = none until all instances with this class label are covered. Use the p/t measure to select conditions to add to the rule. Show all your work!
    
    
    ? ==> none
    
    age = young  		1/3
    age = pre-presbyopic    3/5
    age = presbyopic 		3/4
    
    astigmatism = no		3/5
    astigmatism = yes       4/7
    
    tear-prod-rate = reduced	4/4
    tear-prod-rate = normal		3/8
    
    Select tear-prod-rate=reduced to add to the rule
    
    tear-prod-rate=reduced ==> none
    This rule covers 4 and classifies all correctly, therefore we can add to the rule set.
    
    R1: tear-prod-rate=reduced ==> none
    
    we're left with 3 instances that have none for class label
    
    let's create another rule
    
    ? ==> none
    
    age = young  		0/2
    age = pre-presbyopic    2/4
    age = presbyopic 		1/2
    
    astigmatism = no		1/3
    astigmatism = yes       2/5
    
    tear-prod-rate = reduced	0/0
    tear-prod-rate = normal		3/8
    
    select age = pre-presbyopic
    
    age = pre-presbyopic ==> none
    
    this rule covers 4 but classifies only 2 correctly...
    
    let's try improving this rule
    
    astigmatism = no		0/1
    astigmatism = yes       2/3
    
    tear-prod-rate = reduced	0/0
    tear-prod-rate = normal		2/4
    
    we'll select astigmatism = yes
    
    age=pre-presbyopic && astigmatism=yes ==> none
    
    covers 3 but only classifies 2 correctly
    
    let's try adding one more attribute
    
    we don't have a choice but add tear-prod-rate
    
    age=pre-presbyopic && astigmatis=yes && tear-prod-rate = normal ==> none
    add this rule to the rule set and remove those instances covered by this rule
    
    R1: tear-prod-rate=reduced ==> none
    R2: age=pre-presbyopic && astigmatis=yes && tear-prod-rate = normal ==> none
    
    we're left with 1 instance
    
    ? ==> none
    
    age = young  		0/1
    age = pre-presbyopic    0/1
    age = presbyopic 		1/2
    
    astigmatism = no		1/2
    astigmatism = yes       0/2
    
    tear-prod-rate = reduced	0/0
    tear-prod-rate = normal		1/4
    
    select age = presbyopic
    
    age=presbyopic ==> none
    
    covers 2 and classifies 1 correctly
    
    let's look at 2 covered instances
    
    astigmatism = no		1/1
    astigmatism = yes       0/1
    
    tear-prod-rate = reduced	0/0
    tear-prod-rate = normal		1/2
    
    select astigmatism = no
    
    age=presbyopic && astigmatism = no ==> none
    
    covers 1 and classifies 1, therefore we can add this to the rule set
    
    all instances are covered for class label = none
    
    Rule Set:
    
    R1: tear-prod-rate=reduced ==> none
    R2: age=pre-presbyopic && astigmatis=yes && tear-prod-rate = normal ==> none
    R3: age=presbyopic && astigmatism = no ==> none
    

  3. (EXTRA 10 points - attempt only if you have extra time) Construct perfect classification rules for the class label contact-lenses = hard until all instances with this class label are covered. Use the p/t measure to select conditions to add to the rule. Show all your work!
    Rules from Weka's Prism
    
    Prism rules
    ----------
    If astigmatism = no
       and age = young then soft
    If astigmatism = no
       and age = pre-presbyopic
       and tear-prod-rate = normal then soft
    If astigmatism = yes
       and tear-prod-rate = normal
       and age = young then hard
    If astigmatism = yes
       and tear-prod-rate = normal
       and age = presbyopic then hard
    If age = pre-presbyopic
       and astigmatism = yes
       and tear-prod-rate = normal then hard
    If tear-prod-rate = reduced then none
    If age = pre-presbyopic
       and astigmatism = yes
       and tear-prod-rate = normal then none
    If age = presbyopic
       and astigmatism = no then none
    
    
    

  4. (5 points) Compute the accuracy of the classification rules that you constructed on the following test data instances:
    @relation test-contact-lenses-weka.filters.AttributeFilter-R2
    
    @attribute age {young,pre-presbyopic,presbyopic}
    @attribute astigmatism {no,yes}
    @attribute tear-prod-rate {reduced,normal}
    @attribute contact-lenses {soft,hard,none}
    
    @data
    
    young,          no,  reduced, none        your rules prediction: soft
    pre-presbyopic, yes, reduces, none        your rules prediction: none
    presbyopic,     no,  normal,  soft        your rules prediction: none
    presbyopic,     yes, normal,  none        your rules prediction: cannot be classified
    
    
    The accuracy of your classification rules on this test data is: __1/3 = 33____________

  5. (10 points) Assume you are given a function m defined on rules that measures the probability that a rule occurs by chance. More precisely, if R is a rule that classifies for say, contact-lenses=none, then m(R) is the probability that the improvement of accuracy by using this rule with respect to 7/12 (number of instances with contact-lenses=none over total number of instances) occurs just by chance.

    Explain in detail how you would use this function m to post-prune your classification rules.

    
    In post-pruning, we will consider all rules that have more than condition on their 
    Left-hand sides. Suppose for a rule R, we find the value m(R) by applying the function m. 
    To be able to prune this rule by removing the rightmost condition in R, we must show that
    when this condition is removed from this rule to form a new rule S, m(S) is smaller than m(R). 
    m(S) < m(R) means that the probability that the improvement of accuracy by using rule S with 
    respect to default rule (IF "no condition" THEN right-hand side of R) occurs by chance is 
    smaller than the corresponding probability for Rule R. The process is repeated for the 
    last (rightmost) condition in S, until a removal of a condition increases the value of the 
    function m for that rule. In other words, the process of removing conditions is repeated until 
    the removal of the last term makes the rule worse in terms of m.
    

Problem III. Data Preprocessing and Evaluation of Patterns(20 points)