CS539 MACHINE LEARNING. SPRING 99
Naive Bayes Classification
PROF. CAROLINA RUIZ
Department of Computer Science
Worcester Polytechnic Institute
Construct a learning system for text classification using naive Bayesian
classification. This project is based on the
source code and dataset provided online as a companion to Chapter 6 of
This project consists of two parts:
- Classification of Newsgroup Articles. (100 points)
For this, you just need to follow the steps of the training process
Section 6.10 of the textbook. You should reproduce their experimental
results. You may find it useful to read the statement of
Mitchell's Assignment 6.
Rainbow source code and newsgroups dataset are available online.
- EXTRA-CREDIT (25 points)
Use the Rainbow code (or write your own code if you prefer) to construct a
naive Bayes classifier for the Census-income data from Project 1.
Compare the results that you obtain by using naive Bayesian classification
over the training data with those that you obtained in your project 1.
Discuss the advantages and disadvantages
of using naive Bayes classification instead of decision
trees for this learning task.
REPORT AND DUE DATE
Project 2 is due on Tuesday, March 23 at 5:30 pm.
Your system should follow the
Departmental Documentation Standard.
You should submit any modifications done to the Rainbow code, or your own
code if you didn't use Rainbow by email to
- Written Report.
Please bring your report to my office (FL232) or to class by the due
date/time. Your report should discuss the following issues:
Your report should also include a short user manual explaining how to
install, run, and use your system (if different from the CMU package).
- adaptation of the code (if any) or a description of your own code,
- the experiments you ran with the system,
- experimental results,
- evaluation of the accuracy of the system over the training data,
- strengths and the weaknesses of your system.
- Oral Report.
We will discuss the results from the individual projects during the class
on March 23.
Be ready to show your results (I strongly encourage you to prepare
transparencies) and to discuss your project solution in class.