See solutions to these homework problems.
Consider the following subset of the Contact Lenses Dataset:
ATTRIBUTES: POSSIBLE VALUES: age {young,pre-presbyopic,presbyopic} astigmatism {no,yes} tear-prod-rate {reduced,normal} contact-lenses {soft,hard,none} <- classification target
age | astigmatism | tear-prod-rate | contact-lenses |
young | no | normal | soft |
young | yes | reduced | none |
young | yes | normal | hard |
pre-presbyopic | no | reduced | none |
pre-presbyopic | no | normal | soft |
pre-presbyopic | yes | normal | hard |
pre-presbyopic | yes | normal | none |
pre-presbyopic | yes | normal | none |
presbyopic | no | reduced | none |
presbyopic | no | normal | none |
presbyopic | yes | reduced | none |
presbyopic | yes | normal | hard |
x | 1/2 | 1/3 | 1/4 | 3/4 | 1/5 | 2/5 | 3/5 | 1/6 | 5/6 | 1/7 | 2/7 | 3/7 | 4/7 | 1 |
log2(x) | -1 | -1.5 | -2 | -0.4 | -2.3 | -1.3 | -0.7 | -2.5 | -0.2 | -2.8 | -1.8 | -1.2 | -0.8 | 0 |
young, no, reduced, none your decision tree predicts: ______ pre-pre, yes, reduced, none your decision tree predicts: ______ presbyopic, no, normal, soft your decision tree predicts: ______ presbyopic, yes, normal, hard your decision tree predicts: ______
Consider the full Contact Lenses Dataset (24 instances):
ATTRIBUTES: POSSIBLE VALUES: age {young,pre-presbyopic,presbyopic} spectacle-prescription: {myope,hypermetrope} astigmatism {no,yes} tear-prod-rate {reduced,normal} contact-lenses {soft,hard,none} <- classification target
age | spectacle-prescription | astigmatism | tear-prod-rate | contact-lenses |
young | myope | no | reduced | none |
young | myope | no | normal | soft |
young | myope | yes | reduced | none |
young | myope | yes | normal | hard |
young | hypermetrope | no | reduced | none |
young | hypermetrope | no | normal | soft |
young | hypermetrope | yes | reduced | none |
young | hypermetrope | yes | normal | hard |
pre-presbyopic | myope | no | reduced | none |
pre-presbyopic | myope | no | normal | soft |
pre-presbyopic | myope | yes | reduced | none |
pre-presbyopic | myope | yes | normal | hard |
pre-presbyopic | hypermetrope | no | reduced | none |
pre-presbyopic | hypermetrope | no | normal | soft |
pre-presbyopic | hypermetrope | yes | reduced | none |
pre-presbyopic | hypermetrope | yes | normal | none |
presbyopic | myope | no | reduced | none |
presbyopic | myope | no | normal | none |
presbyopic | myope | yes | reduced | none |
presbyopic | myope | yes | normal | hard |
presbyopic | hypermetrope | no | reduced | none |
presbyopic | hypermetrope | no | normal | soft |
presbyopic | hypermetrope | yes | reduced | none |
presbyopic | hypermetrope | yes | normal | none |
Important:Remember to add 1 to all the counts to avoid the problem of having a probability that is equal to 0. For example, note that the number of instances that have astigmatism=yes among the instances that have contact-lenses=soft is equal to 0. Adding 1 to all the counts means that this count [i.e., count (astigmatism=yes | contact-lenses=soft) ] will become 1. Similarly, count(astigmatism=no | contact-lenses=soft) will be 5 + 1 = 6.
In other words, you need to construct all the conditional probability tables for the Naive Bayes net below based on the 24 data instances:
AGE SPECTACLE-PRESC ASTIGMATISM TEAR-PROD-RATE presbyopic hypermetrope yes normal your Naive Bayes model predicts: ______
Consider the full Contact Lenses Dataset above (24 instances).
ASTIGMATISM TEAR-PROD-RATE yes normal your Bayes net predicts: ______
Dataset. Use the cfs_spambase.arff dataset. This dataset is a reduced version (in Weka's "arff" input format) of the spambase dataset available at the Univ. of California Irvine (UCI) Machine Learning Repository. Read the description of this dataset available at the UCI's spambase webpage. [Note: I generated the reduced dataset by using Weka's Correlation-based Feature Selection (CFS) to select 15 attributes from the dataset's 58 original attributes.]