COURSE DESCRIPTION:
This course provides an introduction to Knowledge
Discovery in Databases (KDD) and Data Mining. KDD deals
with data integration techniques and with the discovery,
interpretation and visualization of patterns in large
collections of data. Topics covered in this course
include data warehousing and mediation techniques; data
mining methods such as rule-based learning, decision
trees, association rules and sequence mining; and data
visualization. The work discussed originates in the
fields of artificial intelligence, machine learning,
statistical data analysis, data visualization, databases,
and information retrieval. Several scientific and
industrial applications of KDD will be studied.
RECOMMENDED BACKGROUND:
Recommended background:
- MA 2611 Applied Statistics;
- CS 2223 Algorithms; and
- CS 3431 Database Systems I or CS 3733 Software Engineering.
CLASS MEETING:
Mondays and Thursdays, 11:00 am - 12:50 pm
Room: SH202
Please come to class on time and stay for the whole class period.
COURSE OUTCOMES:
- Learn and use computational techniques for data transformation, integration, and cleaning.
Practice with and evaluation of this outcome:
Projects 1, 2, 3, 4, 5. Exams 1, 2
- Learn and use computational techniques for discovering patterns and trends in data collections.
Practice with and evaluation of this outcome:
Projects 1, 2, 3, 4, 5. Exams 1, 2
- Learn and use computational approaches for constructing and evaluating predictive models and descriptive models built upon patterns discovered from data.
Practice with and evaluation of this outcome:
Projects 1, 2, 3, 4, 5. Exams 1, 2
- Apply course material to discover patterns from data in a variety of application domains.
Practice with and evaluation of this outcome:
Projects 1, 2, 3, 4, 5.
- Analyze and experimentally evaluate algorithms and implementations of data mining techniques in multiple real-world application domains.
Practice with and evaluation of this outcome:
Projects 1, 2, 3, 4, 5.
PROFESSOR:
Prof. Carolina Ruiz
Office: FL 232
Phone Number: (508) 831-5640
Office Hours:
Mondays | 2:00 pm | - | 3:00 pm
|
Fridays | 12:00 noon | - | 1:00 pm
|
if the above times don't work for you, contact Prof. Ruiz to schedule a meeting at a different time.
TEACHING ASSISTANTS:
- Artem Gritsenko
Office Hours: Fuller Labs A22
Mondays | 5:00 pm | - | 6:00 pm
|
Wednesdays | 4:00 pm | - | 5:00 pm
|
- Chiying Wang
Office Hours: Fuller Labs A22
Tuesdays | 3:00 pm | - | 4:00 pm
|
Thursdays | 9:00 am | - | 10:00 am
|
if the above times don't work for you, contact the TAs by email
to schedule a meeting at a different time.
See Class Mailing Lists
for instructions on how to reach the professor and the TAs by email.
TEXTBOOK:
Several other books on the subject and related subjects are
recommended below.
Some research papers will be handed out during the term.
GRADES:
Exam 1 | 20%
|
Exam 2 | 20%
|
Project 1 | 10%
|
Project 2 | 12%
|
Project 3 | 12%
|
Project 4 | 12%
|
Project 5 | 12%
|
Class Participation:
| 2%
|
Your final grade will reflect your own work and achievements
during the course. Any type of cheating
will be reported to the WPI Judicial Board and penalized
in accordance with the
Academic Honesty Policy.
According to the
WPI Undergraduate Catalog, "Unless otherwise indicated,
WPI courses usually carry credit of 1/3 unit. This level of activity
suggests at least 17 hours of work per week, including class and laboratory
time." Hence,
you are expected to spend at least 13 hours
of work per week on this course outside the classroom.
BS/MS GRADUATE CREDIT
This course may be taken for graduate credit by students in the BS/MS CS program.
Written permission from the professor is required. In order to receive graduate
credit, students who have signed up for this program need to work on projects/homework
alone (that is, in "groups" of 1 student).
EXAMS
Format
There will be a total of 2 exams.
Each exam will cover the
material presented in class since the beginning of the term.
In particular, the final exam is cumulative.
Exams will be in-class, closed-book, individual exams.
Collaboration or other outside assistance on exams is not allowed.
Check the course schedule for exam dates.
Makeups
Regarding makeup exams,
I follow Prof. Gennert's policy:
"Makeup and/or early examinations are not given except under the most
dire of circumstances, and then only with corroborating documentation.
Note well that neither oversleeping, forgetting to show up for an exam,
nor conflicting travel arrangements are considered dire circumstances."
PROJECTS & HOMEWORK
There will be several projects assigned during the term.
Each of the projects deals with one or more of the data mining techniques
covered in the class.
Data Mining Tool
For most of the projects, we will use the
Weka system
(http://www.cs.waikato.ac.nz/ml/weka/).
Weka is an excellent machine-leaning/data-mining environment.
It provides a large collection of Java-based mining algorithms,
data preprocessing filters, and experimentation capabilities.
Weka is open source software issued under the GNU General Public License.
For more information on the Weka system, to download the system and
to get its documentation, look at
Weka's webpage
(http://www.cs.waikato.ac.nz/ml/weka/).
You should download and use the latest developer version (currently weka-3-7-11)
of the system.
Teams
Students are expected to organize themselves in groups of exactly 2
for each of the projects, except for students taking this course
for BS/MS credit who are expected to work on the projects alone.
Each project will contain both an individual assignment and a group
assignment.
Groups need not be the same for all projects.
Submissions and Late Policy
See each project statement for details.
Project Descriptions
More detailed descriptions of the projects will be posted to the course webpage
at the appropriate times during the term.
Although you may find similar programs/systems available online or in the
references,
the design and all code you use and submit, the results, and the analysis of the results
in your projects/homework submissions MUST be your own original work.
CLASS PARTICIPATION
Students are expected to read the material assigned for each
class in advance and to participate in class discussions.
Class participation will count toward students' final grades.
CLASS MAILING LISTS AND myWPI
There are two mailing lists for this class (replace XXXX with 4445 below):
CLASS WEB PAGES
The web pages for this class are located at
http://www.cs.wpi.edu/~cs4445/b14/
Announcements will be posted on the web pages and/or the class mailing
list, and so you are urged to check your email and the class web pages
frequently.
WARNING:
Small changes to this syllabus may be made during the course
of the term.
See my list of additional
Machine Learning, AI, Data Mining, Statistics, Databases, Data Sets and other online resources.
OTHER ONLINE RESOURCES:
Previous offerings of CS4445
Webpages of my previous offerings of this course have plenty of
useful resources: practice exams, exams, homework, solutions of
those exams/hw, etc.