This project consists of two parts:
See Solutions to this homework
assignment by Yutao Wang.
Consider the
zoo.arff dataset
converted to arff from
the Zoo Data Set
available at
Univ. of California Irvine KDD Data Repository.
Cost Sensitive Classification.
See Textbook Slides - Chapter 4 pp. 74-82.
We'll use cost sensitive classification available in Weka
under "More Option" in the Classification tab (under "Test options").
Let's make the cost of misclassifying any type of attack DIFFERENT FROM
neptune, smurf, and normal, 10 times more costly than misclassififying
neptune, smurf, and normal attacks.
That is, make the cost of misclassifying each neptune, smurf, and normal attacks
equal to "1", and the cost of misclassifying other attacks equal to 10.
Minimum support: 0.35 (35 instances)
...
Size of set of large itemsets L(4): 8
Large Itemsets L(4):
hair=1 milk=1 toothed=1 backbone=1 38
hair=1 milk=1 toothed=1 breathes=1 38
hair=1 milk=1 backbone=1 breathes=1 39
hair=1 toothed=1 backbone=1 breathes=1 38
milk=1 toothed=1 backbone=1 breathes=1 40
milk=1 backbone=1 breathes=1 tail=1 35
toothed=1 backbone=1 breathes=1 legs=4 35
toothed=1 backbone=1 breathes=1 tail=1 38
Size of set of large itemsets L(5): 1
Large Itemsets L(5):
hair=1 milk=1 toothed=1 backbone=1 breathes=1 38
milk=1 backbone=1 breathes=1 tail=1
Use Algorithms 6.2 and 6.3 (pp. 351-352), which are based on Theorem 6.2,
to construct all rules with Confidence = 100% from this 4-itemset.
Show your work by neatly constructing a lattice similar to the one depicted in
Figure 6.15 (but you don't need to expand/include pruned rules).
The new technique for this project is Association Rules. Run several experiments with
association rules, but most likely you will need to combine this technique with
techniques from previous projects, given the nature of this dataset.
Just to name two ways you can do this, you could try to use association rules to
select attributes that seem to work together well, or attribute-value pairs that
are common or uncommon. Think of other ways of doing so.
0 normal
1 probe
2 denial of service (DOS)
3 user-to-root (U2R)
4 remote-to-local (R2L)
as described in the
KDD Cup 1999: Results.
Since you are testing on a separate test set, you do not need to use 10-fold
cross-validation for this challenge.