WPI Worcester Polytechnic Institute

Computer Science Department
------------------------------------------

CS539 Machine Learning 
Homework 4 - Spring 2017

PROF. CAROLINA RUIZ 

Due Date: Thursday, April 20, 2017 
------------------------------------------

HW Instructions


Section A: Bayesian Networks (50 points)

Dataset: For this part of the project, you will use the
Adult Dataset (use the adult.data file) available at the UCI Machine Learning Repository.

  1. Naive Bayes Models:
    For this part, it would be useful to look at my Matlab Naive Bayes example: diabetes_no_attribute_names.dat and naive_bayes_example_diabetis.m.
    1. (10 points) Using Matlab functions, create a Naive Bayes model over the training dataset. Look at the conditional probability tables and select one that looks interesting. Include it in your report and explain why you think it is interesting.
    2. (5 points) Classify the data instances in the test dataset using this Naive Bayes model. Include in your report the accuracy, precision, and recall values obtained.

  2. Bayesian Network:
    1. (10 points) Investigate what functions exist in Matlab to construct (non-Naive) Bayesian Networks. Describe those functions in your report.
    2. (20 points) Using Matlab functions, create a (non-Naive) Bayesian network over the training dataset. For this, I suggest you modify function parameters until you obtain a "reasonable" graph of nodes and connections among them. Plot the graphical model obtained. Describe any interesting facts about this graphical model. Look at the conditional probability tables and select one that looks interesting. Include it in your report and explain why you think it is interesting.
    3. (5 points) Classify the data instances in the test dataset using this Bayesian Network. Include in your report the accuracy, precision, and recall values obtained.

  3. Homework Problems:
    These homework problems are for you to study this topic. You do NOT need to submit your solutions.

Section B: Observable Markov Models and Hidden Markov Models (60 points)

For this part, it would be useful to look at my Matlab examples: hmmgenerate_fair_loaded_coins_HMMs_tutorial_example.m and mmgenerate_pepsi_coke_HMMs_tutorial_example.m
  1. (5 points) Use Matlab to solve Exercise 1 of Chapter 15 (pp. 440-441).

  2. (10 points) Use Matlab to solve Exercise 2 of Chapter 15 (p. 441).

  3. Consider the Coke/Pepsi hidden Markov Model (HMM) used in Prof. Ruiz's example of Viterbi's, Forward, and Backward algorithms. Using the Matlab implementations of the Viterbi's, Forward, and Backward algorithms as appropriate, answer the following questions (include in your answers what algorithms and what Matlab commands you used and how you used them to solve each problem):

    1. (10 points) Consider the following sequence of observables:
      PPPPCCPPPCCCPCCCCCPPPCPCP
      What is the probability that this sequence was generated by our HMM? Explain.
    2. (10 points) What is the most likely sequence of hidden states that generated this sequence? Explain.
    3. (10 points) Assume that the sequence is numbered stating at 1 (i.e., the first element of the sequence, "P", is at position 1). What is the most likely hidden state that generated the "C" in position 10 of the sequence? Explain.
    4. (15 points) Use the HMM to generate a sequence of observables of length 2000. Then, use this generated sequence to learn the transition probabilities and the emision probabilities of a new hidden Markov model with 3 hidden states (let's forget about the "Start" state to simplify things). Compare the transition and emission probabilities of this new hidden Markov model with those of our original HMM.

  4. Homework Problems:
    These homework problems are for you to study this topic. You do NOT need to submit your solutions.
    Chapter 15 Exercises 3, 4, 5, 8, and 9 of the textbook (pp. 440-442).

Section C: Combining Multiple Models

In your course project, you are experimenting with combining multiple models, also called meta-learning. Investigate this topic using the following resources:

Your investigation must include in particular Boosting, Bagging, and Stacking, but you are encouraged to investigate other techniques you are interested in in addition to these three techniques.

  1. Once that you have learned about these techniques, run experiments to see how they work. Follow these guidelines:

  2. Homework Problems:
    These homework problems are for you to study this topic. You do NOT need to submit your solutions.
    Chapter 17 Exercises 2, 6, and 9 of the textbook (pp. 511-513).

Section D: Reinforcement Learning

  1. Homework Problems:
    These homework problems are for you to study this topic. You do NOT need to submit your solutions.
    Chapter 18 Exercises 1, 2, 3, and 4 of the textbook (pp. 542-544).