This project consists of two parts:
See Solutions to this homework assignment by Yutao Wang.
Consider the zoo.arff dataset converted to arff from the Zoo Data Set available at Univ. of California Irvine KDD Data Repository.
Minimum support: 0.35 (35 instances) ... Size of set of large itemsets L(4): 8 Large Itemsets L(4): hair=1 milk=1 toothed=1 backbone=1 38 hair=1 milk=1 toothed=1 breathes=1 38 hair=1 milk=1 backbone=1 breathes=1 39 hair=1 toothed=1 backbone=1 breathes=1 38 milk=1 toothed=1 backbone=1 breathes=1 40 milk=1 backbone=1 breathes=1 tail=1 35 toothed=1 backbone=1 breathes=1 legs=4 35 toothed=1 backbone=1 breathes=1 tail=1 38 Size of set of large itemsets L(5): 1 Large Itemsets L(5): hair=1 milk=1 toothed=1 backbone=1 breathes=1 38
milk=1 backbone=1 breathes=1 tail=1Use Algorithms 6.2 and 6.3 (pp. 351-352), which are based on Theorem 6.2, to construct all rules with Confidence = 100% from this 4-itemset. Show your work by neatly constructing a lattice similar to the one depicted in Figure 6.15 (but you don't need to expand/include pruned rules).
Cost Sensitive Classification. See Textbook Slides - Chapter 4 pp. 74-82. We'll use cost sensitive classification available in Weka under "More Option" in the Classification tab (under "Test options"). Let's make the cost of misclassifying any type of attack DIFFERENT FROM neptune, smurf, and normal, 10 times more costly than misclassififying neptune, smurf, and normal attacks. That is, make the cost of misclassifying each neptune, smurf, and normal attacks equal to "1", and the cost of misclassifying other attacks equal to 10.
0 normal 1 probe 2 denial of service (DOS) 3 user-to-root (U2R) 4 remote-to-local (R2L)as described in the KDD Cup 1999: Results.