Regarding Weka:
java -Xmx768m -jar weka.jar
Regarding Python:
Attributes SEX and CLASS are discrete; Attributes AGE, EDUCATION_NUM and HOURS_PER_WEEK are continuous. SEX AGE EDUCATION_NUM HOURS_PER_WEEK CLASS Male 27 9 40 <=50K Female 28 13 40 <=50K Male 29 10 50 <=50K Male 30 9 40 <=50K Male 35 11 40 <=50K Female 36 9 40 <=50K Female 37 14 40 <=50K Male 38 9 ? <=50K Male 40 16 60 >50K Female 44 14 40 <=50K Male 45 14 40 >50K Female 47 14 50 <=50K Male 48 9 46 <=50K Male 49 11 40 >50K Male 49 9 40 >50K Female 49 9 40 <=50K Male 50 13 55 >50K Male 52 9 45 >50K Male 52 13 40 <=50K Male 54 10 60 >50K
[mean - (k+1)*sd, mean - k*sd) for all integer values k, i.e. k = ..., -4, -3, -2, -1, 0, 1, 2, ...Assume that the mean of the attribute AGE above is 42 and that the standard deviation sd of this attribute is 8. Discretize AGE by hand using this new approach. Show your work.
Load this dataset into Weka by opening your arff dataset from the "Explorer" window in Weka. Load it into Python as well.
Construct a visualization of each of these matrices (e.g., heatmap) using Python to more easily understand them.
See Section 2.4.5 of the Tan, Steinbach, Karpatne and Kumar's textbook for the definitions and formulas for correlation and covariance.
You must perform each of the parts of this problem both in Weka and separately in Python.
You must perform each of the parts of this problem both in Weka and separately in Python.
For this part, USE ONLY THE DISCRETE attributes in the dataset. Use the CLASS attribute as the target classification attribute. Apply Correlation Based Feature Selection (CFS) (see Witten's and Frank's textbook slides - Chapter 7 Slides 5-6). For this, use Weka's CfsSubsetEval available under the Select attributes tab with default parameters. Separately, use Python for the same purpose. Look at the results to determine which attributes were selected by this method and elaboreate on any interesting observations you can make about the results.
Slides Submission: Please submit a PowerPoint or a PDF file containing your presentation slides via Canvas (submission name: Project1) by the deadline stated at the top of this webpage. Only one of the team members needs to submit the slides.