WPI Worcester Polytechnic Institute

Computer Science Department
------------------------------------------

CS 4445 Data Mining and Knowledge Discovery in Databases - B Term 2012 
Homework and Project 3: Bayesian Models

Prof. Carolina Ruiz and Ken Loomis 

DUE DATES: Friday, Nov. 16, 11:00 am (electronic submission) and 1:00 pm (hardcopy submission) 
------------------------------------------


HOMEWORK AND PROJECT OBJECTIVES

The purpose of this project is multi-fold:

HOMEWORK AND PROJECT ASSIGNMENTS

Readings: Read Section 5.3 of your textbook in great detail.

This project consists of two parts:

  1. Part I. INDIVIDUAL HOMEWORK ASSIGNMENT

    See solutions by Ken Loomis.

    Consider the following dataset.

    @relation movie-preferences 
    
    @attribute genre {comedy, drama, action}
    @attribute critics-reviews {thumbs-up, neutral, thumbs-down}
    @attribute rating {R, PG-13}
    @attribute IMAX {true, false}
    @attribute likes {yes, no}
    
    @data
    ( 1) comedy, thumbs-up,   R,     false, no
    ( 2) comedy, thumbs-up,   R,     true,  no
    ( 3) comedy, neutral,     R,     false, no
    ( 4) comedy, thumbs-down, PG-13, false, yes
    ( 5) comedy, neutral,     PG-13, true,  yes
    ( 6) drama,  thumbs-up,   R,     false, yes
    ( 7) drama,  thumbs-down, PG-13, true,  yes
    ( 8) drama,  neutral,     R,     true,  yes
    ( 9) drama,  thumbs-up,   PG-13, false, yes
    (10) action, neutral,     R,     false, yes
    (11) action, thumbs-down, PG-13, false, yes
    (12) action, thumbs-down, PG-13, true,  no
    (13) action, neutral,     PG-13, false, yes
    (14) action, neutral,     R,     true,  no
    
    where the likes attribute is the classification target.

    1. (30 points) Construct the full naive Bayes model for this dataset. Show all the steps of the calculations. Draw the resulting graph and the Conditional Probability Table (CPT) associated with each node in the graph.

    2. (5 points) Use your model above to classify the following data instance. Explain your work.
      genre = action, critics-reviews = ?, rating  = R, IMAX = ?
      

    3. (10 points) Consider the following Bayesian net over the dataset above. Construct the conditional probability table for the critics-reviews node. Show your work.

      Bayesian net


  2. Part II. GROUP PROJECT ASSIGNMENT

  3. Dataset: We will work with the same dataset used in project 1. The following 2 files contain the dataset: Important: For all experiments, perform missing value replacement for the target attribute. Replace the missing values with a new nominal value called "Missing". Or use the dataset that you may have saved for Project 1 as suggested at the beginning of the moderate challenge.

  4. Challenges: In each of the following challenges provide a detailed description of the preprocessing techniques used, the motivation for using these techniques, and any hypothesis/intuition gained about the information represented in the dataset. Answer the question provided as well as provide the information described in the PROJECT GUIDELINES.
  5. Grading sheet for this project.