WPI Worcester Polytechnic Institute

Computer Science Department
------------------------------------------

CS 525 KNOWLEDGE DISCOVERY AND DATA MINING  
SYLLABUS - Fall 1999

PROF. CAROLINA RUIZ 

WARNING: Small changes to this syllabus may be made during the course of the term. 
------------------------------------------

COURSE DESCRIPTION:

Due to advances in technology and the availability of increasingly cheap storage devices, data in different domains has been accumulating at an impressively high rate in recent years, leading to very large databases. This course presents current research in Knowledge Discovery in Databases (KDD) dealing with the data integration, mining, and interpretation of patterns in such databases. Topics include data warehousing and mediation techniques aimed at integrating distributed, heterogeneous datasources; data mining techniques such as rule-based learning, decision trees, association rule mining, and statistical analysis for discovery of patterns in the integrated data; and evaluation and interpretation of the mined patterns using visualization techniques. The work discussed originates in the fields of artificial intelligence, information retrieval, data visualization, and statistics. Industrial and scientific applications will be given.

Students will be expected to read assigned research papers and work on a semester-long implementation/research project that covers the different stages of the KDD process.


PREREQUISITE:

Background in artificial intelligence and databases at the undergraduate level, or permission of the instructor. Background in statistics would be helpful but is not assumed.


CLASS MEETING:

Thursdays 6-9 pm
FL311

Students are also encouraged to attend the AIRG Seminar Thursdays at 11 am.


PROFESSOR:

Prof. Carolina Ruiz
ruiz@cs.wpi.edu
Office: FL 232
Phone Number: (508) 831-5640
Office Hours: Mo 11:00-12:00 m, Th 10-10:50 am, or by appointment.

Other speakers may occasionally be invited to lecture to the class.


READINGS:

Several research papers and book chapters will be handed out during the semester. There is no required textbook for the course, but several books on the subject and related subjects are recommended below.

GRADES:

Exam   20%
Leading class discussion of assigned topic and class participation   15%
Project   25%
Critiques (9 critiques minus the worst one)   40% (5% each) 

Your final grade will reflect your own work and achievements during the course. Any type of cheating will be penalized with an F grade for the course and will be reported to the WPI Judicial Board in accordance with the Academic Honesty Policy.


EXAMS

There will be a total of 2 exams. Each exam will cover the material presented in class since the beginning of the semester. In particular, the final exam is cumulative. Both will be in-class exams. 

PROJECTS

List of Additional References/Resources for the Project

There will be a total of three interrelated projects.

Project 1 - Data Integration for Mining

This first project will concentrate on database issues of the knowledge discovery process. Students will select datasets from online database repositories, collect target data from those data sources, remove noise from the data, fill in missing values, and integrate the data into a format suitable for mining.

Project 2 - Data Mining

This project will consist of applying concrete algorithms to find useful and novel patterns in the integrated data from project 1.

Project 3 - Mining evaluation and visualization

This project will deal with interpreting mined patterns, evaluating them according to usefulness/interestingness criteria, and possibly using visualization tools to aid in understanding the patterns graphically. Students will be required to give oral in-class presentations describing their achievements in these projects.

PROJECT REPORT GUIDELINES

The report of your course project consists of two parts: a written report and and oral report.

CLASS PARTICIPATION

All students are expected to read the material assigned for each class in advance and to participate in class discussions. Also, students will take turns presenting papers and leading class discussions of assigned readings.

CLASS MAILING LIST

The mailing list for this class is: cs525m@cs.wpi.edu
If you haven't received the "welcome to CS525M" email message by the end of the first day of classes, you should subscribe to the mailing list by sending the following one-line email message to majordomo@cs.wpi.edu:
subscribe cs525m

CLASS WEB PAGES

The web pages for this class are located at http://www.cs.wpi.edu/~cs525/f99M/
Announcements will be posted on the web pages and/or the class mailing list, and so you are urged to check your email and the class web pages frequently. 

ADDITIONAL REFERENCES

(See also the list of assigned papers in the Class Schedule.)

Knowledge Discovery and Data Mining

Machine Learning

General AI

Databases

Statistics


OTHER ONLINE RESOURCES:

Data Sets

KDD

KDD Commercial Products / Prototypes

Data Warehousing and OLAP

Machine Learning

Statistics

General AI