Miscellaneous Notes on Pre-processing: -------------------------------------------------------------------------------- For Weka: Use: C:\Program Files\Weka-3-7>java -Xmx768m -jar weka.jar to increase the amount of main memory used by Weka to 768m (you can specify to use more memory as needed/available). -------------------------------------------------------------------------------- For Matlab: Useful Functions: Here I used the iris.csv dataset as an example, but you can use any other datase. c = load('iris.csv'); cc = corr(c); cc ccv = cov(c); ccv heatmap(cc) - I downloaded heatmap from Matlab Central. More specifically, http://www.mathworks.com/matlabcentral/fileexchange/24253-customizable-heat-maps - Once you get the figure, click on "Figure Properties" under "Edit". ---- For Correlation / Covariance matrices, I recommend the following sources: Tan's, Steinbach's, and Kumar's textbook slides - Chapter 2. Slides 62-63. http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap2_data.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap2_data.ppt Tan's, Steinbach's, and Kumar's textbook slides - Chapter 3. Slide 25. http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap3_data_exploration.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap3_data_exploration.ppt -------------------------------------------------------------------------------- For Excel: To install the Analysis ToolPak in Excel: Under "File" Go to "Options" Go to "Add-Ins" Select Analysis ToolPak Hit OK To use it: Under the "Data" tab, Go to "Data Analysis" (on the right top part) Select correlation and follow the steps --------------------------------------------------------------------------------