java -Xmx768m -jar weka.jar
DATE OUTLOOK TEMPERATURE HUMIDITY WIND PLAYS 02/13/06 mostly sunny 47 25 strong no 03/10/06 mostly cloudy 66 57 weak yes 06/28/06 cloudy 91 75 medium yes 07/12/06 sunny 82 27 strong no 08/30/06 rainy 76 80 weak no 09/23/06 drizzle 66 70 weak yes 11/24/06 sunny 52 60 medium no 12/19/06 mostly sunny 41 30 strong no 01/12/07 cloudy 36 40 ? no 04/13/07 mostly cloudy 57 40 weak yes 05/20/07 mostly sunny 68 50 medium yes 06/28/07 drizzle 73 20 weak yes 07/06/07 sunny 95 85 weak yes 08/20/07 rainy 91 60 weak yes 09/01/07 mostly sunny 80 10 medium no 10/23/07 mostly cloudy 52 44 weak no
[mean - (k+1)*sd, mean - k*sd) for all integer values k, i.e. k = ..., -4, -3, -2, -1, 0, 1, 2, ...Assume that the mean of the attribute HUMIDITY above is 48 and that the standard deviation sd of this attribute is 22.5. Discretize HUMIDITY by hand using this new approach. Show your work.
See notes on using Matlab and Excel to calculate these matrices. Try to construct a visualization of each of these matrices (e.g., heatmap) to more easily understand them.
-- population -- householdsize -- racepctblack -- racePctWhite -- racePctAsian -- racePctHisp -- agePct12t21 -- agePct12t29 -- agePct16t24 -- agePct65up -- numbUrban -- pctUrban -- medIncome -- pctWWage -- pctWFarmSelf -- pctWInvInc -- pctWSocSec -- pctWPubAsst -- pctWRetire -- medFamInc -- perCapInc(5 points) If you had to remove 4 of the continuous attributes above from the dataset based on these two matrices, which attributes would you remove and why? Explain your answer.
MODEL |
YEAR |
COLOR |
SALES |
Chevy |
1990 |
red |
5 |
Chevy |
1990 |
white |
87 |
Chevy |
1990 |
blue |
62 |
Chevy |
1991 |
red |
54 |
Chevy |
1991 |
white |
95 |
Chevy |
1991 |
blue |
49 |
Chevy |
1992 |
red |
31 |
Chevy |
1992 |
white |
54 |
Chevy |
1992 |
blue |
71 |
Ford |
1990 |
red |
64 |
Ford |
1990 |
white |
62 |
Ford |
1990 |
blue |
63 |
Ford |
1991 |
red |
52 |
Ford |
1991 |
white |
9 |
Ford |
1991 |
blue |
55 |
Ford |
1992 |
red |
27 |
Ford |
1992 |
white |
62 |
Ford |
1992 |
blue |
39 |