This project consists of two parts:
See Solutions to this homework assignment
by Yutao Wang.
In this part, you will construct classification rules using the sequential
covering algorithm (called Prism in Weka). Note that the dataset contains
continuous attributes. Handle those continuous attributes as J4.8 would handle
them, that is using binary splits. To reduce the amount of work, consider only the
following split points:
Given the rule
For each of the variations of the k-nearest neighbors algorithm listed below do:
Variations:
@relation shuttle-landing-control
@attribute STABILITY continuous
@attribute ERROR continuous
@attribute WIND {head,tail}
@attribute VISIBILITY {yes, no}
@attribute Class {noauto,auto}
@data
( 1) 60, 0.5, tail, no, auto
( 2) 75, 1.0, head, yes, noauto
( 3) 40, 0.9, head, no, auto
( 4) 65, 0.0, head, no, auto
( 5) 45, 0.2, head, yes, auto
( 6) 80, 0.1, tail, yes, noauto
( 7) 30, 0.4, head, yes, noauto
( 8) 90, 0.6, head, no, auto
( 9) 65, 0.1, head, no, auto
(10) 85, 0.5, head, yes, noauto
(11) 25, 0.6, tail, yes, auto
(12) 40, 0.4, tail, yes, noauto
(13) 15, 0.6, tail, yes, noauto
(14) 25, 0.8, head, yes, noauto
(15) 30, 0.2, head, yes, auto
(16) 35, 0.4, head, yes, noauto
(17) 70, 0.6, tail, no, auto
(18) 20, 0.5, tail, yes, auto
(19) 75, 0.1, tail, no, auto
(20) 80, 0.2, head, yes, noauto
(21) 85, 0.8, tail, yes, noauto
(22) 60, 0.9, tail, yes, noauto
(50 points) Classification Rules
[See solutions to a similar problem from a previous offering of this course.]
split point for STABILITY: 50
split points for ERROR: 0.3 and 0.7
regardless of what values for those attributes are present in the
(subset of the) dataset
under consideration. That is, the only predicates that can appear in
the rules are:
STABILITY ≤ 50, STABILITY > 50,
ERROR ≤ 0.3, ERROR > 0.3,
ERROR ≤ 0.7, ERROR > 0.7.
If ERROR > 0.7
and VISIBILITY = yes then noauto
If VISIBILITY = yes
and STABILITY > 50 then noauto
Starting from here, follow the sequential covering algorithm to construct "by hand" the
3rd rule for Class=noauto.
If VISIBILITY = yes
and ERROR ≤ 0.7
and ERROR > 0.3
and STABILITY ≤ 50
and WIND = tail
then Class=noauto
and the validation set:
STABILITY ERROR WIND VISIBILITY Class
(v1 ) 35, 0.1, head, no, auto
(v2 ) 80, 0.6, tail, yes, noauto
(v3 ) 35, 0.1, head, no, auto
(v4 ) 10, 0.6, tail, yes, noauto
(v5 ) 40, 0.5, tail, yes, auto
(v6 ) 80, 0.6, tail, yes, noauto
(v7 ) 25, 0.4, tail, yes, auto
(v8 ) 80, 0.6, tail, yes, auto
(v9 ) 20, 0.6, tail, yes, noauto
(v10) 35, 0.1, head, no, auto
(v11) 40, 0.5, head, yes, noauto
(v12) 15, 0.4, head, yes, noauto
show each step of the pruning method used by RIPPER on this rule over the above validation set.
Show your work.
(50 points) Instance-Based Learning
[See solutions to a similar problem from a previous offering of this course.]
Assume that we want to predict the Class attribute (prediction target)
of the following two new data instances:
STABILITY ERROR WIND VISIBILITY
(23) 35, 0.1, head, no
(24) 80, 0.6, tail, yes
using the k-nearest neighbors algorithm on the same training set of 22 instances above.
STABILITY 2
ERROR 1
WIND 4.5
VISIBILITY 3
0 normal
1 probe
2 denial of service (DOS)
3 user-to-root (U2R)
4 remote-to-local (R2L)
as described in the
KDD Cup 1999: Results.