This project consists of two parts:
See Amro Khasawneh's HW1 Solutions.
Consider the following dataset, adapted from the Zoo Data Set available at the The University of California Irvine (UCI) Machine Learning Data Repository. Visit those webpages above to learn more about this dataset.
ATTRIBUTES: POSSIBLE VALUES: @attribute hair {no,yes} % Does the animal have hair? @attribute eggs {no,yes} % Does the animal lay eggs? @attribute toothed {no,yes} % Does the animal have teeth? @attribute legs {0,2,4,5,6,8} % Number of legs - Assumed to be a nominal attribute @attribute type {type1,type2,type3,type4,type5,type6,type7} % Type of animal
animal (ignore!) | hair | eggs | toothed | legs | type |
dolphin | no | no | yes | 0 | type1 |
frog | no | yes | yes | 4 | type5 |
gnat | no | yes | no | 6 | type6 |
herring | no | yes | yes | 0 | type4 |
ladybird | no | yes | no | 6 | type6 |
lynx | yes | no | yes | 4 | type1 |
mongoose | yes | no | yes | 4 | type1 |
ostrich | no | yes | no | 2 | type2 |
stingray | no | yes | yes | 0 | type4 |
termite | no | yes | no | 6 | type6 |
toad | no | yes | yes | 4 | type5 |
tuna | no | yes | yes | 0 | type4 |
vole | yes | no | yes | 4 | type1 |
wasp | yes | yes | no | 6 | type6 |
wren | no | yes | no | 2 | type2 |
Show all the steps of the calculations. Make sure you compute log in base b (for the appropriate b) correctly as some calculators don't have a log_b primitive for all b's.
animal (ignore!) | hair | eggs | toothed | legs | type (ignore during classification) (use to calculate accuracy) | YOUR DECISION TREE PREDICTION |
bass | no | yes | yes | 0 | type4 | |
buffalo | yes | no | yes | 4 | type1 | |
chicken | no | yes | no | 2 | type2 | |
crayfish | no | yes | no | 6 | type7 | |
deer | yes | no | yes | 4 | type1 | |
dove | no | yes | no | 2 | type2 | |
goat | yes | no | yes | 4 | type1 | |
pike | no | yes | yes | 0 | type4 | |
toad | no | yes | yes | 4 | type5 | |
vampire | yes | no | yes | 2 | type1 |
(5 points) The accuracy of your decision tree on this test data is: ________________ (10 points) The confusion matrix of your decision tree on this test data is: ....
animal (ignore!) | hair | eggs | toothed | legs | YOUR DECISION TREE PREDICTION | EXPLANATION OF YOUR ANSWER |
scorpion | no | no | no | 8 |
animal (ignore!) | hair | eggs | toothed | legs | YOUR DECISION TREE PREDICTION | EXPLANATION OF YOUR ANSWER |
no-name | yes | no | no | ? |
SEE AMRO'S GRAD CLASS PROJECT REPORT (Part II: Projects part) AS AN ILLUSTRATION.
Remember to experiment with pruning of your J4.8 decision tree: Experiment with Weka's J4.8 classifier to see how it performs pre- and/or post-prunning of the decision tree in order to increase the classification accuracy and/or to reduce the size of the decision tree.