DUE DATE: Friday, Sept. 20, 2013 Slides (by email) by 11 am and Written Report (hardcopy) at the beginning of class (1:00 pm)
** This is an individual problem set **
PROBLEM SET DESCRIPTION
The purpose of this project is to
gain experience with Bayesian modeling.
PROBLEM SET ASSIGNMENT
Your written report should consist of your answers to each of the
parts in the assignment below.
Construct Naive Bayes models of the dataset. Click on "More options ...", to select "Output predictions" (choose say plain text), and to choose a value for the Random seed (initially use value = 1).
Repeat the experiment 3 times with seeds 1, 23, 62.
For each of the 3 experiments, record in your report the conditional probability values output by Weka under "Naive Bayes Classifier", the accuracy
(= % of Correctly Classified Instances) and the confusion matrix obtained.
Report any interesting observations about the results of each experiment and across the 3 experiments.
Construct Bayesian Nets over this dataset. Use K2 as the algorithm to construct the topology of the Bayesian net.
Run at least 6 experiments thoughtfully
varying the values of "initAsNaiveBayes", "maxNrOfParents", and "randomOrder" parameters of the K2 algorithm.
For each experiment, include in your report:
The dataset for this project is the same
dataset that we used for Problem Sets
Apply the same pre-processing to this dataset that we did in Problem Sets 1 and 2.
The only dataset attributes that you should define as "numeric" in your .arff file are age, size, t.tdm, t.rfs, t.os, t.dmfs, NPI, and AOL_os_10y.
All other attributes are nominal, and should be defined as so in your .arff file.
- Bayesian Models Materials
Study in detail the
Bayesian models materials posted on the course webpage.
- Bayesian Models Experiments
We will use Weka's Naive Bayes and Bayesian Net classifiers to contruct models for this dataset. Assume that the classification target is "veridex_risk".
During model construction, use 10 fold cross-validation.
- (20 points)
Compare the results obtained with Bayesian Nets and with Naive Bayes.
Slides, oral presentation, and class participation during class presentations.
- the obtained Bayesian net (right-click on the experiment on the left hand-side window, and select "Visualize graph"),
- the accuracy (= % of Correctly Classified Instances) and the confusion matrix,
- any interesting observations on the topology of the network.
Analyze the biological meaning of this topology.
- any interesting observations on the Conditional Probability Tables (CPTs)
of the network (click on the nodes in the graph visualization).
Analyze the biological meaning of these CPTs.
- report any additional interesting observations about the results of each experiment and/or across all of the experiments.
REPORTS AND DUE DATE
We will discuss the results from the problem set during class so you should prepare a few slides summarizing your findings and including any visualizations or graphs you want to share with the rest of the class. Be prepared to give an oral presentation.
Submit the following file with your slides for your oral report by email to
me before the deadline:
where: [ext] is pdf, ppt, or pptx. Please use only lower case letters in the
name file. For instance, the file with my slides for this problem set would be
- Written Report
Hand in a hardcopy of your written report at the beginning of class the
day the problem set is due.