COURSE DESCRIPTION:
This course presents current research in Knowledge Discovery in Databases
(KDD) dealing with data integration, mining, and interpretation of
patterns in large collections of data. Topics include data warehousing and
data preprocessing techniques; data mining techniques for classification,
regression, clustering, deviation detection, and association analysis;
and evaluation of patterns minded from data. Industrial and scientific
applications are discussed.
Students will be expected to read assigned textbook chapters and
research papers,
and work on implementation/research projects that cover the different
stages of the KDD process.
This course can be used to satisfy the graduate AI bin requirement.
CLASS MEETING:
Time: Mondays and Thursdays 1:00-2:20 pm
Room: HL202
Students are also encouraged to attend the
Knowledge Discovery
in Databases and Data Mining Research Group (KDDRG) Seminar Fridays at
1 pm in Beckett Conference Room (FL246).
Prof. Carolina Ruiz

Office: FL 232
Phone Number: (508) 831-5640
Office Hours: Thursdays 2:30-3:30 pm, or by appointment.
Several other books on the subject and related subjects are
recommended below.
Some research papers will be handed out during the semester.
Background in artificial intelligence, databases, and statistics
at the undergraduate level, or permission of the instructor.
Proficiency in a high level programming language (preferable Java)
is required.
5 Projects |
90% |
Class Participation |
10% |
Your final grade will reflect your own work and achievements during
the course. Any type of cheating will be penalized
and reported to the WPI Judicial Board in accordance
with the Academic
Honesty Policy.
All students are expected to read the material assigned for each class in
advance and to participate in class discussions. Also, students will
take turns presenting papers and leading class discussions of assigned
readings.
Class participation will be taken into account when deciding
students' final grades.
There will be a total of five projects
related to the data mining stages and/or techniques
covered in the class.
Datasets for those projects will be selected from
online database repositories,
or other sources.
About the Weka System:
For most of the projects, we will use the
Weka system
(http://www.cs.waikato.ac.nz/ml/weka/).
Weka is an excellent data-mining environment.
It provides a large collection of Java-based mining algorithms,
data preprocessing filters, and experimentation capabilities.
Weka is open source software issued under the GNU General Public License.
For more information on the Weka system, to download the system and
to get its documentation, look at
Weka's webpage
(http://www.cs.waikato.ac.nz/ml/weka/).
You should download and use the latest developer version of the system
(currently 3-7-5).
Students will be required
to provide both a written report and an oral (in-class) presentation
describing their work on each of these projects.
More detailed descriptions of the assignments and projects will be
posted to the
course webpage at the appropriate times during the semester.
The mailing list for this class is:

This mailing list reaches the professor and all the students in the class.
The webpages for this class are located at
http://www.cs.wpi.edu/~cs548/s12/
Announcements will be posted on the web pages and/or
the class mailing list, and so you are urged to check your email and
the class web pages frequently.
Small changes to this syllabus may be made during the course
of the semester.
Knowledge Discovery and Data Mining
See my list of additional
Machine Learning, AI, Data Mining, Statistics, Databases, Data Sets and other online resources.
OTHER ONLINE RESOURCES: