CS 525 KNOWLEDGE DISCOVERY AND DATA MINING
Small changes to this syllabus may be made during the course of the semester.
SYLLABUS - Spring 2004
Due to advances in technology and the availability of increasingly cheap
storage devices, data in different domains has been accumulating at an
impressively high rate in recent years, leading to very large databases.
This course presents current research in Knowledge Discovery in Databases
(KDD) dealing with the data integration, mining, and interpretation of
patterns in such databases. Topics include data warehousing and mediation
techniques aimed at integrating distributed, heterogeneous datasources;
data mining techniques such as rule-based learning, decision trees, association
rule mining, and statistical analysis for discovery of patterns in the
integrated data; and evaluation and interpretation of the mined patterns
using visualization techniques. The work discussed originates in the fields
of databases, artificial intelligence, information retrieval, data visualization,
and statistics. Industrial and scientific applications will be given.
This course presents data mining from a database perspective.
For an in-depth study of the machine learning techniques
used in data mining, take
CS539 Machine Learning which is scheduled to be offered
during the 2004-2005 academic year.
Students will be expected to read assigned textbook chapters and research papers,
and work on implementation/research projects that cover the different
stages of the KDD process.
Background in databases and artificial intelligence
at the undergraduate level, or permission of the instructor. Background
in statistics would be helpful but is not assumed.
Proficiency in a high level programming language (preferable Java)
Tuesdays and Thursdays 3:00 - 4:20 pm
Students are also encouraged to attend the
in Databases and Data Mining Research Group (KDDRG) Seminar Fridays at
2 pm in Beckett Conference Room (FL246).
Prof. Carolina Ruiz
Office: FL 232
Phone Number: (508) 831-5640
Office Hours: Tu 2-2:50 pm, Fr 3-4 pm, or by appointment.
Other speakers may occasionally be invited to lecture to the class.
Several other books on the subject and related subjects are
Several research papers will be handed out during the semester.
||72% (12% each project)
|Participation in class discussions of assigned topics
||10% Extra points
Your final grade will reflect your own work and achievements during
the course. Any type of cheating will be penalized with an F grade for
the course and will be reported to the WPI Judicial Board in accordance
with the Academic
There will be one midterm exam. This exam will cover the material presented
in class since the beginning of the semester.
There will be one assigned homework. The homework is intended as preparation
for the midterm exam. The homework will cover the material in chapters 1 through 5
of the textbook.
There will be a total of six interrelated projects.
Each of the projects deals with one of the data mining techniques
covered in the class.
Datasets for those projects will be selected from
online database repositories,
or other sources.
About the Weka System:
For most of the projects, we will use the
Weka is an excellent machine-leaning/data-mining environment.
It provides a large collection of Java-based mining algorithms,
data preprocessing filters, and experimentation capabilities.
Weka is open source software issued under the GNU General Public License.
For more information on the Weka sytem, to download the system and
to get its documentation, look at
You should download the latest available stable GUI version of the system.
Students will be required
to provide both a written report and an oral (in-class) presentation describing
their achievements in each of these projects.
All students are expected to read the material assigned for each class in
advance and to participate in class discussions. Also, students will
take turns presenting papers and leading class discussions of assigned
CLASS MAILING LIST
There are two mailing lists for this class:
- messages sent to cs525d-all AT cs.wpi.edu go to the entire class (students and professor)
If you haven't received the "welcome to CS525D" email message
by the end of the first day of classes,
you should subscribe to the mailing list by sending the following one-line
email message to firstname.lastname@example.org:
- messages sent to cs525d-staff AT cs.wpi.edu go to the professor only.
(Please use this email address to reach the professor.)
CLASS WEB PAGES
The web pages for this class are located at
Announcements will be posted on the web pages and/or the class mailing
list, and so you are urged to check your email and the class web pages
(See also the list of selected papers in the Class
Knowledge Discovery and Data Mining
"Advances in Knowledge Discovery and Data Mining". Eds.: Fayyad,
Piatetsky-Shapiro, Smyth, and Uthurusamy. The MIT Press, 1995.
"Data Mining. Technologies, Techniques, Tools, and Trends".
B. Thuraisingham. CRC, 1998.
"Data Mining. A hands-on approach for business professionals".
R. Groth. Prentice Hall, 1998.
"Data Preparation for Data Mining". Dorian Pyle, 3/99.
- "Data Mining".
P. Adriaans & D. Zantinge
"Data Mining Methods for
Knowledge Discovery" Cios, Pedrycz, & Swiniarski, 1998.
"Data Mining Techniques for
Marketing, Sales and Customer Support". Berry & Linoff.
"Decision Support using
Data Mining". Anand and Buchner.
Selection for Knowledge Discovery and Data Mining". Liu
"Feature Extraction, Construction and Selection:
A Data Mining Perpective". Eds: Motoda and Liu.
"Knowledge Acquisition from Databases". Xindong Wu.
"Mining Very Large Databases with
Alex Freitas, Simon Lavington.
"Predictive Data-Mining: A
Practical Guide". Weiss & Indurkhya.
- "Machine Learning and Data Mining: Methods and Applications."
Michalski, Bratko, and Kubat, 1998; John Wiley & Sons.
"Mining Very Large Databases with Parallel
Processing". Freitas & Lavington.
- "Rough Sets and Data Mining: Analysis of Imprecise Data."
Eds: Lin and Cercone; Kluwer.
"Seven Methods for Transforming Corporate Data into
Business Intelligence". Vasant Dhar and Roger Stein; Prentice-Hall,
"Artificial Intelligence: A Modern Approach".
S. Russell, P. Norvig.
Prentice Hall, 1995. ISBN 0-13-103805-2
"Artificial Intelligence: Theory and Practice".
T. Dean, J. Allen, Y. Aloimonos.
The Benjamin/Cummings Publishing Company, Inc. 1995.
"Readings in Artificial Intelligence".
B. L. Webber, N. J. Nilsson, eds.
Tioga Publishing Company, 1981.
Patrick H. Winston.
"The Elements of Artificial Intelligence Using Common Lisp".
S. L. Tanimoto.
Computer Science Press 1990.
"Artificial Intelligence" Second edition.
E. Rich and K. Knight.
McGraw Hill 1991.
"Paradigms of Artificial Intelligence Programming: Case Studies
in Common Lisp".
Morgan Kaufmann Publishers, 1992.
"Essentials of Artificial Intelligence".
Morgan Kaufmann Publishers,
"Artificial Intelligence Structures
and Strategies for Complex Problem Solving".
G. F. Luger and W. A. Stubblefield.
"Logical Foundations of Artificial Intelligence".
M.R. Genesereth and N. Nilsson.
Morgan Kaufmann, 1987.
- "Statistical Inference for Management and Economics".
P. Billingsley, D. Croft, D. Huntsberger, C. Watson.
Boston: Allyn and Bacon, Inc. 1986.
- "Probability and Statistics". 2nd edition.
M. DeGroot. Addison Wesley, 1986.
- "Statistical Inference".
G. Casella, R. Berger.
Wadsworth and Brooks/Cole, 1990.
OTHER ONLINE RESOURCES:
KDD Commercial Products / Prototypes
Data Warehousing and OLAP