CS539 MACHINE LEARNING. SPRING 99
PROJECT 3
Naive Bayes Classification

PROF. CAROLINA RUIZ
Department of Computer Science
Worcester Polytechnic Institute



PROJECT DESCRIPTION

Construct a learning system for text classification using naive Bayesian classification. This project is based on the
source code and dataset provided online as a companion to Chapter 6 of the textbook.

PROJECT ASSIGNMENT

This project consists of two parts:
  1. Classification of Newsgroup Articles. (100 points) For this, you just need to follow the steps of the training process described on Section 6.10 of the textbook. You should reproduce their experimental results. You may find it useful to read the statement of Mitchell's Assignment 6. The Rainbow source code and newsgroups dataset are available online.

  2. EXTRA-CREDIT (25 points) Use the Rainbow code (or write your own code if you prefer) to construct a naive Bayes classifier for the Census-income data from Project 1. Compare the results that you obtain by using naive Bayesian classification over the training data with those that you obtained in your project 1. Discuss the advantages and disadvantages of using naive Bayes classification instead of decision trees for this learning task.

REPORT AND DUE DATE

Project 2 is due on Tuesday, March 23 at 5:30 pm. Your system should follow the
Departmental Documentation Standard.