CS 525 KNOWLEDGE DISCOVERY AND DATA MINING
Small changes to this syllabus may be made during the course of the term.
SYLLABUS - Fall 1999
Due to advances in technology and the availability of increasingly cheap
storage devices, data in different domains has been accumulating at an
impressively high rate in recent years, leading to very large databases.
This course presents current research in Knowledge Discovery in Databases
(KDD) dealing with the data integration, mining, and interpretation of
patterns in such databases. Topics include data warehousing and mediation
techniques aimed at integrating distributed, heterogeneous datasources;
data mining techniques such as rule-based learning, decision trees, association
rule mining, and statistical analysis for discovery of patterns in the
integrated data; and evaluation and interpretation of the mined patterns
using visualization techniques. The work discussed originates in the fields
of artificial intelligence, information retrieval, data visualization,
and statistics. Industrial and scientific applications will be given.
Students will be expected to read assigned research papers and work
on a semester-long implementation/research project that covers the different
stages of the KDD process.
Background in artificial intelligence and databases
at the undergraduate level, or permission of the instructor. Background
in statistics would be helpful but is not assumed.
Thursdays 6-9 pm
Students are also encouraged to attend the AIRG Seminar Thursdays at
Prof. Carolina Ruiz
Office: FL 232
Phone Number: (508) 831-5640
Office Hours: Mo 11:00-12:00 m, Th 10-10:50 am, or by appointment.
Other speakers may occasionally be invited to lecture to the class.
Several research papers and book chapters will be handed out during the
semester. There is no required textbook for the course, but several books
on the subject and related subjects are recommended
|Leading class discussion of assigned topic and class participation
|Critiques (9 critiques minus the worst one)
||40% (5% each)
Your final grade will reflect your own work and achievements during
the course. Any type of cheating will be penalized with an F grade for
the course and will be reported to the WPI Judicial Board in accordance
with the Academic
There will be a total of 2 exams. Each exam will cover the material presented
in class since the beginning of the semester. In particular, the final
exam is cumulative.
Both will be in-class exams.
List of Additional References/Resources for the Project
There will be a total of three interrelated projects.
Project 1 - Data Integration for Mining
This first project will concentrate on database issues of the knowledge
datasets from online database repositories,
collect target data from those data sources, remove noise from the data,
fill in missing values, and integrate the data into a format suitable for
Project 2 - Data Mining
This project will consist of applying concrete algorithms to find useful and
novel patterns in the integrated data from project 1.
Project 3 - Mining evaluation and visualization
This project will deal with interpreting mined patterns,
evaluating them according to usefulness/interestingness criteria, and
possibly using visualization
tools to aid in understanding the patterns graphically.
Students will be required
to give oral in-class presentations describing their achievements
in these projects.
PROJECT REPORT GUIDELINES
The report of your course project consists of two
parts: a written report and and oral report.
- Written Report.
Your report is due at the beginning of the class in which
you're presenting your project. Your report should discuss
the following issues:
Your report should also include a short user manual explaining
how to install, run, and use the system you implemented/used.
- brief overview of the selected topic/problem,
- discussion of your literature review,
- description of your approach to solving the problem,
- the experiments you ran to validate your approach,
- experimental results,
- evaluation of the accuracy over the training data of the
system that you used/implemented,
- strengths and the weaknesses of your system/approach.
- Oral Report.
We will have 15 minute oral presentation of each of the individual
projects during the classes on December 9 and December 16. I suggest
that you prepare transparencies for your presentation or any other
visual aid that you find appropriate. If you need any particular
equipment for your presentation, please let me know in advance.
All students are expected to read the material assigned for each class in
advance and to participate in class discussions. Also, students will
take turns presenting papers and leading class discussions of assigned
CLASS MAILING LIST
The mailing list for this class is: firstname.lastname@example.org
If you haven't received the "welcome to CS525M" email message
by the end of the first day of classes,
you should subscribe to the mailing list by sending the following one-line
email message to email@example.com:
CLASS WEB PAGES
The web pages for this class are located at
Announcements will be posted on the web pages and/or the class mailing
list, and so you are urged to check your email and the class web pages
(See also the list of assigned papers in the Class
Knowledge Discovery and Data Mining
"Advances in Knowledge Discovery and Data Mining". Eds.: Fayyad,
Piatetsky-Shapiro, Smyth, and Uthurusamy. The MIT Press, 1995.
"Data Mining. Technologies, Techniques, Tools, and Trends".
B. Thuraisingham. CRC, 1998.
"Data Mining. A hands-on approach for business professionals".
R. Groth. Prentice Hall, 1998.
"Data Preparation for Data Mining". Dorian Pyle, 3/99.
- "Data Mining".
P. Adriaans & D. Zantinge
"Data Mining Methods for
Knowledge Discovery" Cios, Pedrycz, & Swiniarski, 1998.
"Data Mining Techniques for
Marketing, Sales and Customer Support". Berry & Linoff.
"Decision Support using
Data Mining". Anand and Buchner.
Selection for Knowledge Discovery and Data Mining". Liu
"Feature Extraction, Construction and Selection:
A Data Mining Perpective". Eds: Motoda and Liu.
"Knowledge Acquisition from Databases". Xindong Wu.
"Mining Very Large Databases with
Alex Freitas, Simon Lavington.
"Predictive Data-Mining: A
Practical Guide". Weiss & Indurkhya.
- "Machine Learning and Data Mining: Methods and Applications."
Michalski, Bratko, and Kubat, 1998; John Wiley & Sons.
"Mining Very Large Databases with Parallel
Processing". Freitas & Lavington.
- "Rough Sets and Data Mining: Analysis of Imprecise Data."
Eds: Lin and Cercone; Kluwer.
"Seven Methods for Transforming Corporate Data into
Business Intelligence". Vasant Dhar and Roger Stein; Prentice-Hall,
"Artificial Intelligence: A Modern Approach".
S. Russell, P. Norvig.
Prentice Hall, 1995. ISBN 0-13-103805-2
"Artificial Intelligence: Theory and Practice".
T. Dean, J. Allen, Y. Aloimonos.
The Benjamin/Cummings Publishing Company, Inc. 1995.
"Readings in Artificial Intelligence".
B. L. Webber, N. J. Nilsson, eds.
Tioga Publishing Company, 1981.
Patrick H. Winston.
"The Elements of Artificial Intelligence Using Common Lisp".
S. L. Tanimoto.
Computer Science Press 1990.
"Artificial Intelligence" Second edition.
E. Rich and K. Knight.
McGraw Hill 1991.
"Paradigms of Artificial Intelligence Programming: Case Studies
in Common Lisp".
Morgan Kaufmann Publishers, 1992.
"Essentials of Artificial Intelligence".
Morgan Kaufmann Publishers,
"Artificial Intelligence Structures
and Strategies for Complex Problem Solving".
G. F. Luger and W. A. Stubblefield.
"Logical Foundations of Artificial Intelligence".
M.R. Genesereth and N. Nilsson.
Morgan Kaufmann, 1987.
- "Statistical Inference for Management and Economics".
P. Billingsley, D. Croft, D. Huntsberger, C. Watson.
Boston: Allyn and Bacon, Inc. 1986.
- "Probability and Statistics". 2nd edition.
M. DeGroot. Addison Wesley, 1986.
- "Statistical Inference".
G. Casella, R. Berger.
Wadsworth and Brooks/Cole, 1990.
OTHER ONLINE RESOURCES:
KDD Commercial Products / Prototypes
Data Warehousing and OLAP