PROBLEM SET ASSIGNMENT
Written Report:
Your written report should consist of your answers to each of the
parts in the assignment below.
Assignment:
- Dataset.
The dataset for this project is the same
GSE7390_transbig2006affy_demo.txt
dataset that we used for Problem Sets
1 and
2.
Apply the same pre-processing to this dataset that we did in Problem Sets 1 and 2.
The only dataset attributes that you should define as "numeric" in your .arff file are age, size, t.tdm, t.rfs, t.os, t.dmfs, NPI, and AOL_os_10y.
All other attributes are nominal, and should be defined as so in your .arff file.
- Bayesian Models Materials
Study in detail the
Bayesian models materials posted on the course webpage.
- Bayesian Models Experiments
We will use Weka's Naive Bayes and Bayesian Net classifiers to contruct models for this dataset. Assume that the classification target is "veridex_risk".
During model construction, use 10 fold cross-validation.
- (20 points)
Construct Naive Bayes models of the dataset. Click on "More options ...", to select "Output predictions" (choose say plain text), and to choose a value for the Random seed (initially use value = 1).
Repeat the experiment 3 times with seeds 1, 23, 62.
For each of the 3 experiments, record in your report the conditional probability values output by Weka under "Naive Bayes Classifier", the accuracy
(= % of Correctly Classified Instances) and the confusion matrix obtained.
Report any interesting observations about the results of each experiment and across the 3 experiments.