Readings:
Written Report: Your written report should consist of your answers to each of the parts in the assignment below. Both members of the team are expected to be involved in and contribute to each and every problem on this project.
Assignment:
You can find the Weka code in a file called "weka-src.jar", which should be located in the directory where Weka was installed. This "weka-src.jar" file is a zip file. Hence you need to unzip it to extract its contents. Inside, you will find the .java files that implement Weka.
Read the "Explorer Guide" and the "Experimenter Tutorial" provided with the Weka system. Browse through the "Package Documentation" to become familiar with it.
When needed, use the following command to increase the amount of main memory used by Weka. Here, I'm increasing the amount of main memory used by Weka to 768m, but you can specify any other size instead of 768 if more memory is needed/available:
java -Xmx768m -jar weka.jar
In particular,
workclass, education, and sex
age education-num capital-gain
Use Weka's unsupervised ReplaceMissingValues filter to fill in the missing values in the attribute occupation.
Apply Principal Components Analysis to reduce the dimensionality of the input dataset. For this, use Weka's PrincipalComponents option from the "Select attributes" tab. Use parameter values: centerData=True, varianceCovered=0.95.
Apply Correlation Based Feature Selection (see Witten's and Frank's textbook slides - Chapter 7 Slides 5-6) to the input dataset. For this, use Weka's CfsSubsetEval available under the Select attributes tab with default parameters.
Hand in a hardcopy of your written report at the beginning of class the day the project is due. We will discuss the results from the project during class so be prepared to give an oral presentation.