CS 444X Data Mining and Knowledge Discovery in Databases
Small changes to this syllabus may be made during the course of the term.
SYLLABUS - D Term 2003
- Course Description
- Recommended Background
- Class Meeting
- Teaching and Senior Assistants
- Class Pictures
- Schedule of Topics, Exams, and Projects
- Weekly Schedule of Office Hours
- BS/MS Graduate Credit
Project 1: Data Pre-processing, Mining, Pruning, and Evaluation of Decision Trees
Project 2: Data Pre-processing, Mining, Pruning, and Evaluation of Classification Rules
Project 3: Data Pre-processing, Mining, and Evaluation of Association Rules
Project 4: Numeric Predictions, Instance Based learning, and Clustering
- Class Participation
- Class Mailing List
- Class Web Pages
- Additional References
- Other Online Resources
This undergraduate course provides an introduction to Knowledge
Discovery in Databases (KDD) and Data Mining. KDD deals with data
integration techniques and with the discovery, interpretation and
visualization of patterns in large collections of data. Topics covered
in this course include data warehousing and mediation techniques; data
mining methods such as rule-based learning, decision trees, association
rules and sequence mining; and data visualization. The work discussed
originates in the fields of artificial intelligence, machine learning,
statistical data analysis, data visualization, databases, and
information retrieval. Several scientific and industrial applications
of KDD will be described. In particular, current applications to
bioinformatics, e-commerce, and web mining will be studied.
CS4341 Introduction to Artificial Intelligence, MA2611 Applied
Statistics I, and CS3431 Database Systems I.
Mondays, Tuesdays, Thursdays, Fridays 1:00-1:50 pm
Please come to class on time and stay for the whole class period.
Students are also encouraged to attend the
in Databases and Data Mining Research Group (KDDRG) Seminar Fridays at
2 pm in Beckett Conference Room (FL246).
Prof. Carolina Ruiz
Office: FL 232
Phone Number: (508) 831-5640
| Mondays || 2:00 || - || 3:00 pm,
| Thursdays || 3:00 || - || 4:00 pm
| or by appointment || .
TEACHING AND SENIOR ASSISTANTS:
Office Hours: Fuller Labs A22
| Tuesdays || 6:00 || - || 7:00 pm
| Wednesdays || 5:00 || - || 6:00 pm
| Fridays || 4:00 || - || 5:00 pm
Senthil K. Palanisamy
Office Hours: Fuller Labs A22
| Sundays || 6:00 || - || 7:00 pm
| Mondays || 5:00 || - || 6:00 pm
| Thursdays || 6:00 || - || 7:00 pm
Ethan A. Croteau (Senior Assistant)
Office Hours: Fuller Labs A22
| Sundays || 4:00 || - || 5:00 pm
| Tuesdays || 4:00 || - || 5:00 pm
Messages sent to email@example.com reach both the instructor and the
Several other books on the subject and related subjects are
Some research papers will be handed out during the term.
| Exam 1
| Exam 2
| Project/Homework 1 || 12.5%
| Project/Homework 2 || 12.5%
| Project/Homework 3 || 12.5%
| Project/Homework 4 || 12.5%
| Class Participation and Pop Quizzes
|| Extra Points
Your final grade will reflect your own work and achievements
during the course. Any type of cheating will be penalized
with an NR grade for the course and will be reported to the WPI Judicial Board
in accordance with the
Academic Honesty Policy.
According to the
WPI Undergraduate Catalog, "Unless otherwise indicated,
WPI courses usually carry credit of 1/3 unit. This level of activity
suggests at least 17 hours of work per week, including class and laboratory
you are expected to spend at least 13 hours
of work per week on this course outside the classroom.
BS/MS GRADUATE CREDIT
This course may be taken for graduate credit by students in the BS/MS CS program.
Written permission from the professor is required. In order to receive graduate
credit, students who have signed up for this program need to work on projects/homework
alone (that is, in groups of 1 student :-)
There will be a total of 2 exams.
Each exam will cover the
material presented in class since the beginning of the term.
In particular, the final exam is cumulative.
Exams will be in-class, 50 minute, closed-book, individual exams.
Collaboration or other outside assistance on exams is not allowed.
Regarding makeup exams,
I follow Prof. Gennert's policy:
"Makeup and/or early examinations are not given except under the most
dire of circumstances, and then only with corroborating documentation.
Note well that neither oversleeping, forgetting to show up for an exam,
nor conflicting travel arrangements are considered dire circumstances."
There will be a total of 4 projects/homework.
Each of the projects deals with one of the data mining techniques
covered in the class.
Data Mining Tool
For most of the projects, we will use the
Weka is an excellent machine-leaning/data-mining environment.
It provides a large collection of Java-based mining algorithms,
data preprocessing filters, and experimentation capabilities.
Weka is open source software issued under the GNU General Public License.
For more information on the Weka system, to download the system and
to get its documentation, look at
You should download and use the 3.2.3 GUI version of the system.
Students are expected to organize themselves into groups of exactly 2
for each of the projects/homework, except for students taking this course
for BS/MS credit who are expected to work on the projects/homework alone.
Submissions and Late Policy
See each project statement for details.
More detailed descriptions of the projects/homework will be posted to the course webpage
at the appropriate times during the term.
Although you may find similar programs/systems available online or in the
the design and all code you use and submit, the results, and the analysis of the results
in your projects/homework submissions MUST be your own original work.
Students are expected to read the material assigned for each
class in advance and to participate in class discussions.
Class participation will be taken into account when deciding
students' final grades.
CLASS MAILING LIST
There are two mailing lists for this class: firstname.lastname@example.org and
- messages sent to email@example.com go to the entire class (professor, TAs/SA,
and students), and
- messages sent to firstname.lastname@example.org go to the professor and the TAs/SA only.
CLASS WEB PAGES
The web pages for this class are located at
Announcements will be posted on the web pages and/or the class mailing
list, and so you are urged to check your email and the class web pages
Knowledge Discovery and Data Mining
"Data Mining: Concepts and Techniques".
J. Han and M. Kamber. Morgan Kaufmann Publishers. 2001.
"Advances in Knowledge Discovery and Data Mining". Eds.: Fayyad,
Piatetsky-Shapiro, Smyth, and Uthurusamy. The MIT Press, 1995.
"Data Mining. Technologies, Techniques, Tools, and Trends".
B. Thuraisingham. CRC, 1998.
"Data Mining. A hands-on approach for business professionals".
R. Groth. Prentice Hall, 1998.
"Data Preparation for Data Mining". Dorian Pyle, 3/99.
- "Data Mining".
P. Adriaans & D. Zantinge
"Data Mining Methods for
Knowledge Discovery" Cios, Pedrycz, & Swiniarski, 1998.
"Data Mining Techniques for
Marketing, Sales and Customer Support". Berry & Linoff.
"Decision Support using
Data Mining". Anand and Buchner.
Selection for Knowledge Discovery and Data Mining". Liu
"Feature Extraction, Construction and Selection:
A Data Mining Perpective". Eds: Motoda and Liu.
"Knowledge Acquisition from Databases". Xindong Wu.
"Mining Very Large Databases with
Alex Freitas, Simon Lavington.
"Predictive Data-Mining: A
Practical Guide". Weiss & Indurkhya.
- "Machine Learning and Data Mining: Methods and Applications."
Michalski, Bratko, and Kubat, 1998; John Wiley & Sons.
"Mining Very Large Databases with Parallel
Processing". Freitas & Lavington.
- "Rough Sets and Data Mining: Analysis of Imprecise Data."
Eds: Lin and Cercone; Kluwer.
"Seven Methods for Transforming Corporate Data into
Business Intelligence". Vasant Dhar and Roger Stein; Prentice-Hall,
"Artificial Intelligence: A Modern Approach".
S. Russell, P. Norvig.
Prentice Hall, 1995. ISBN 0-13-103805-2
"Artificial Intelligence: Theory and Practice".
T. Dean, J. Allen, Y. Aloimonos.
The Benjamin/Cummings Publishing Company, Inc. 1995.
"Readings in Artificial Intelligence".
B. L. Webber, N. J. Nilsson, eds.
Tioga Publishing Company, 1981.
Patrick H. Winston.
"The Elements of Artificial Intelligence Using Common Lisp".
S. L. Tanimoto.
Computer Science Press 1990.
"Artificial Intelligence" Second edition.
E. Rich and K. Knight.
McGraw Hill 1991.
"Paradigms of Artificial Intelligence Programming: Case Studies
in Common Lisp".
Morgan Kaufmann Publishers, 1992.
"Essentials of Artificial Intelligence".
Morgan Kaufmann Publishers,
"Artificial Intelligence Structures
and Strategies for Complex Problem Solving".
G. F. Luger and W. A. Stubblefield.
"Logical Foundations of Artificial Intelligence".
M.R. Genesereth and N. Nilsson.
Morgan Kaufmann, 1987.
- "Statistical Inference for Management and Economics".
P. Billingsley, D. Croft, D. Huntsberger, C. Watson.
Boston: Allyn and Bacon, Inc. 1986.
- "Probability and Statistics". 2nd edition.
M. DeGroot. Addison Wesley, 1986.
- "Statistical Inference".
G. Casella, R. Berger.
Wadsworth and Brooks/Cole, 1990.
OTHER ONLINE RESOURCES:
KDD Commercial Products / Prototypes
Data Warehousing and OLAP