Readings:
Written Report: Your written report should consist of your answers to each of the parts in the assignment below.
Assignment:
You can find the Weka code in a file called "weka-src.jar", which should be located in the directory where Weka was installed. This "weka-src.jar" file is a zip file. Hence you need to unzip it to extract its contents. Inside, you will find the .java files that implement Weka.
Read the "Explorer Guide" and the "Experimenter Tutorial" provided with the Weka system. Browse through the "Package Documentation" to become familiar with it.
When needed, use the following command to increase the amount of main memory used by Weka. Here, I'm increasing the amount of main memory used by Weka to 768m, but you can specify any other size instead of 768 if more memory is needed/available:
java -Xmx768m -jar weka.jar
The following 2 files contain the dataset:
Let's use the nominal AYPProceedingLevel2012 attribute as the classification target. This target attribute has the following possible values (= classes): MadeAYP, SchoolImprovement, CorrectiveAction, MakingProgress, and Warning.
SchoolType, AYPProceedingLevel2004, and AYPProceedingLevel2012
PctAdvancedMath PctAdvancedReading PctAdvancedScience PctAdvancedWriting
Use Weka's unsupervised ReplaceMissingValues filter to fill in the missing values in the attribute PctAdvancedMath.
Apply Principal Components Analysis to reduce the dimensionality of the input dataset. For this, use Weka's PrincipalComponents option from the "Select attributes" tab. Use parameter values: centerData=True, varianceCovered=0.95.
Apply Correlation Based Feature Selection (see Witten's and Frank's textbook slides - Chapter 7 Slides 5-6) to the input dataset. For this, use Weka's CfsSubsetEval available under the Select attributes tab with default parameters.
Hand in a hardcopy of your written report at the beginning of class the day the project is due. We will discuss the results from the project during class so be prepared to give an oral presentation.