WPI Worcester Polytechnic Institute

Computer Science Department
------------------------------------------

CS539 Machine Learning - Spring 2003 
Project 5 - Bayesian Learning

PROF. CAROLINA RUIZ 

Due Date: Monday, March 10 2003 at 8 am. 
------------------------------------------


PROJECT DESCRIPTION

Use the NaiveBayes and the NaiveBayesSimple to construct Naive Bayes classifiers for each of the following problems:

  1. Predicting the class attribute (CARAVAN Number of mobile home policies) in the The Insurance Company Benchmark (COIL 2000) dataset.

  2. Predicting whether the income of a given person is >50K or <= 50K using the census-income dataset from the US Census Bureau which is available at the Univ. of California Irvine Repository.
    The census-income dataset contains census information for 48,842 people. It has 14 attributes for each person (age, workclass, fnlwgt, education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, and native-country) and a boolean attribute class classifying the input of the person as belonging to one of two categories >50K, <=50K.

PROJECT ASSIGNMENT

  1. Read Chapter 6 of the textbook about Bayesian Learning in great detail.

  2. Read the NaiveBayes and the NaiveBayesSimple code in the Weka system in great detail.

  3. The following are guidelines for the construction of your Naive Bayes Classifiers:

REPORT AND DUE DATE