WPI Worcester Polytechnic Institute

Computer Science Department
------------------------------------------

CS548 Knowledge Discovery and Data Mining 
Schedule of Classes - Spring 2015

PROF. CAROLINA RUIZ 

WARNING: Changes to this schedule may be made during the course of the semester. 
------------------------------------------

WEEK DATE DUE TOPIC READINGS
Tan, Steinbach, Kumar's Textbook
1 Jan. 20   Introduction to KDD & Data Mining
Introduction to Weka
Data & Data Preparation
  • Concepts, instances, attributes
  • Data preprocessing
  • Attribute selection
  •   Chp. 1, 2 & 3
      Jan. 27   Class cancelled because of snow  
    2 Feb. 3   Quiz
    Introduction to Python
    Data & Data Preparation (cont.)
  • Data integration
  • Data warehousing & OLAP
  • Dimensionality reduction
  •   Chp. 1, 2 & 3, Appendix B
    3 Feb. 10 Project 1 Quiz
    Mining process
  • Training and Testing
  • Cross validation
  • Performance evaluation
    Project 1 presentations and discussion
  •   Sect. 4.5
    4 Feb. 17   Quiz
    Classification
  • Decision trees
    Showcase
  •   Sect. 4.1-4.4.
    5 Feb. 24   Quiz. See Quiz 4 Solutions
    Numeric Predictions
  • linear regression
  • model trees
  • regression trees
    Showcase
  •   Appendix D.
    Witten's, Frank's, and Hall textbook: Sect. 3.3, 4.6 (linear regression), and 6.6;
    and corresponding slides and all resources at Prof. Ruiz's Lecture Notes.
    6 Mar. 3 Project 2 Quiz
    Association Analysis
  • association rules
    Project 2 presentations and discussion
    Showcase
  •   Same as assigned readings for Feb. 24th.
      Mar. 10   Spring Break  
    7 Mar. 17   Quiz
    Association Analysis (cont.)
  • association rules
    Showcase
  •   Sect. 6.1-6.3, 6.7-6.9.
    8 Mar. 24   Quiz
    Cluster Analysis
  • partitioning methods
  • hierarchical methods
    Showcase
  •   Sect. 8.1-8.3.
    9 Mar. 31 Project 3 Quiz
    Cluster Analysis (cont.)
  • density-based methods
  • model-based methods
  • clustering evaluation
    Project 3 presentations and discussion
    Showcase
  •   Sect. 8.4-8.5.
    10 Apr. 7   Quiz
    Anomaly Detection
  • model-based methods
  • proximity-based methods
  • density-based methods
    Showcase
  •   Chp. 10
    11 Apr. 14 Project 4 Quiz
    Advanced topics
  • Visualization
  • Text mining
    Project 4 presentations and discussion
    Showcase
  •   Section 3.3.
    All posted visualization resources
    COMAD 2010 tutorial slides on IR and text mining in bioinformatics: Part I and Part II. Focus on the CS aspects of the tutorial above (not the biology aspects).
    Use the online textbook Christopher D. Manning, Prabhakar Raghavan & Hinrich Schütze. "Introduction to Information Retrieval" as a reference to learn about any information retrival or text mining terminology from the COMAD tutorial that you don't know/understand.
    12 Apr. 21   Quiz
    Advanced topics (cont.)
  • Sequence mining
  • Multimedia data mining
    Showcase
  •   Sect. 7.4
    13 Apr. 28 Project 5 Quiz
    Advanced topics (cont.)
  • Web mining
  • Industrial applications of data mining
  • Scientific applications of data mining
    Project 5 presentations and discussion
    Showcase
  •   "Web Mining - Accomplishments & Future Directions." by Jaideep Srivastava, Prasanna Desikan, and.Vipin Kumar.
    Optional: Jaideep Srivastava's Web Mining Tutorial.
    14 May 5   Project 5 presentations and discussion (cont.)
    Final remarks
    Showcase