CS539 MACHINE LEARNING. SPRING 99
PROJECT 3
Naive Bayes Classification
PROF. CAROLINA RUIZ
Department of Computer Science
Worcester Polytechnic Institute
PROJECT DESCRIPTION
Construct a learning system for text classification using naive Bayesian
classification. This project is based on the
source code and dataset provided online as a companion to Chapter 6 of
the textbook.
PROJECT ASSIGNMENT
This project consists of two parts:
- Classification of Newsgroup Articles. (100 points)
For this, you just need to follow the steps of the training process
described on
Section 6.10 of the textbook. You should reproduce their experimental
results. You may find it useful to read the statement of
Mitchell's Assignment 6.
The
Rainbow source code and newsgroups dataset are available online.
- EXTRA-CREDIT (25 points)
Use the Rainbow code (or write your own code if you prefer) to construct a
naive Bayes classifier for the Census-income data from Project 1.
Compare the results that you obtain by using naive Bayesian classification
over the training data with those that you obtained in your project 1.
Discuss the advantages and disadvantages
of using naive Bayes classification instead of decision
trees for this learning task.
REPORT AND DUE DATE
Project 2 is due on Tuesday, March 23 at 5:30 pm.
Your system should follow the
Departmental Documentation Standard.
- Program.
You should submit any modifications done to the Rainbow code, or your own
code if you didn't use Rainbow by email to
ruiz@cs.wpi.edu.
- Written Report.
Please bring your report to my office (FL232) or to class by the due
date/time. Your report should discuss the following issues:
- adaptation of the code (if any) or a description of your own code,
- the experiments you ran with the system,
- experimental results,
- evaluation of the accuracy of the system over the training data,
- strengths and the weaknesses of your system.
Your report should also include a short user manual explaining how to
install, run, and use your system (if different from the CMU package).
- Oral Report.
We will discuss the results from the individual projects during the class
on March 23.
Be ready to show your results (I strongly encourage you to prepare
transparencies) and to discuss your project solution in class.