WPI Worcester Polytechnic Institute

Computer Science Department



WARNING: Small changes to this schedule may be made during the course of the semester. 

Week  Date         Due    Chapter Topic 

 1    Jan 20, 22            1     Introduction
                                  The Knowledge Discovery in Databases 
                                  (KDD) Process. KDD Applications 

 2    Jan 27, 29            2     Data Integration: Data Warehouses, 
                                  Mediation. OLAP. Multidim. Analysis 
                                  Book Sections 2.1, 2.2, 2.3, 2.6

 3    Feb 03, 05            3     Pre-Processing, Feature Selection    

 4    Feb 10, 12            4     Primitives, Languages, and Systems
                                  Book Sections 4.1, 4.4, 4.5.

 5    Feb 17, 19            5     Concept Description 

 6    Feb 24, 26            6     Association Rules 

 7    Mar 02, 04    PJ1     9     Sequential Patterns. Similar Time Seq.

      Mar 09        HW            Review and discussion of HW solutions  

 8    Mar 16                      ** MIDTERM EXAM **

 9    Mar 23, 25    PJ2     7     Classification: Decision Trees 

10    Mar 30, 01    PJ3           Rule Mining: Inductive Logic Programming

11    Apr 06, 08    PJ4     8     Clustering

12    Apr 13, 15    PJ5     9     Web Mining, XML

13    Apr 20, 22    PJ6    10     Applications and Trends
14    Apr 27, 29           10     Applications and Trends

Readings - Initial List

    The Knowledge Discovery in Databases (KDD) Process

  1. Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. "From Data Mining to Knowledge Discovery in Databases" AAAI Magazine, pp. 37-54. Fall 1996.

  2. Bhandari, I. et al. "Advanced Scout: Data mining and Knowledge Discovery in NBA Data" Data Mining and Knowledge Discovery Journal, Vol 1, pp 121-125. 1997.

    Data Warehouses, OLAP and Multidimensional Analysis

  3. J. Widom, "Research Problems in Data Warehousing" Fourth Int'l Conf. on Information and Knowledge Management (CIKM) 1995.

  4. S. Chaudhuri and U. Dayal "An overview of data warehousing and OLAP technology" ACM SIGMOD Record, 26(1):65-74, 1997.

  5. Gray, J. el al. "Data Cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals" Data Mining and Knowledge Discovery Journal, Vol 1, pp 29-53. 1997.

    Pre-Processing, Feature Selection

  6. Langley, P. "Selection of Relevant Features in Machine Learning" Proceedings of the AAAI Fall Symposium on RElevance. New Orleans, LA. AAAI Press. 1994.

  7. M.W. Berry, Z. Drmac, and E.R. Jessup. "Matrices, Vector Spaces, and Information Retrieval" SIAM Reviews. Vol. 41, No. 2, pp. 335-362.

  8. Barbara et al. "The New Jersey Data Reduction Report" Bulletin of the IEEE Computer Sociaty Technical Committee on Data Engineering.

    Mining Association Rules

  9. R. Agrawal, T. Imielinski, and A. Swami "Mining Association rules between sets of items in large databases" Proc. of the ACM SIGMOD Int'l Conference on Management of Data, Washington D.C., May 1993, 207-216. PostScript and PDF Online.

  10. R. Agrawal, R. Srikant: "Fast Algorithms for Mining Association Rules" Proc. of the 20th Int'l Conference on Very Large Databases, Santiago, Chile, Sept. 1994. PostScript and PDF Online.

    Mining Sequential Patterns and Similar Time Sequences

  11. Srikant, R. and Agrawal, R. "Mining Sequential Patterns: Generalizations and Performance Improvements" Proc. of the Fifth Int'l Conference on Extending Database Technology (EDBT), Avignon, France, March 1996.

  12. Mannila, H., Toivonen, H., and Verkamo, A.I. "Discovery of frequent episodes in sequences" First International Conference on Knowledge Discovery and Data Mining (KDD'95) 210 - 215, Montreal, Canada, August 1995.

  13. R. Agrawal, C. Faloutsos and A. Swami. "Efficient Similarity Search in Sequence Databases Foundations of Data Organization and Algorithms". (FODO) Conference, Oct. 1993, Evanston, Illinois, Oct. 13-15, 1993. PostScript Online.

  14. C. Faloutsos, M. Ranganathan and Y. Manolopoulos. "Fast Subsequence Matching in Time-Series Databases". Proc. ACM SIGMOD, May 25-27, 1994, Minneapolis, MN. pp. 419-429. PostScript Online.

    Classification: Decision Trees

  15. J. R. Quinlan. "C4.5: Programs for Machine Learning". Morgan Kaufmann Publishers. 1993. Chapters 1 and 2.

  16. J.R. Quinlan. "Induction of Decision Trees". Machine Learning 1:81-106. 1986.

  17. R. Rastogi and K. Shim "PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning" Proc. of the 24th VLDB Conference, NY USA. 1998. PostScript Online.

    Rule-Based Mining: Inductive Logic Programming

  18. J.R. Quinlan. "Learning Logical Definitions from Relations". Machine Learning 5:239-266. 1990.

  19. I. Bratko and S. Muggleton. "Applications of Inductive Logic Programming". Communications of the ACM. Vol. 38, No. 11, pp 65-70. 1995. Available online from the WPI library (e-journal collection)

    Evaluation of Patterns and Visualization

  20. J.A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, V. Crow. "Visualizing the Non-Visual: Spatial Analysis and Interaction with Information from Text Documents". Proc. IEEE Information Visualization Symposium, pp 51-58. IEEE Computer Society Press. 1995.


  21. S. Guha, R. Rastogi and K. Shim. "CURE: An efficient algorithm for clustering large databases". In Proceedings of ACM-SIGMOD 1998 International Conference on Management of Data, Seattle, 1998. Available from: http://www.bell-labs.com/user/rastogi/

  22. P. S. Bradley, U. M. Fayyad and C. Reina. "Scaling Clustering Algorithms to Large Databases". Fourth International Conference on Knowledge Discovery & Data Mining KDD-98, pages 9-15. AAAI Press, Menlo Park, CA, 1998. Available from: http://www.research.microsoft.com/users/bradley/papers.html

    Web Mining, XML

  23. Cooley, Bamshad Mobasher, and J. Srivastava, "Web Mining: Information and Pattern Discovery on the World Wide Web." Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'97), November 1997. Available from http://maya.cs.depaul.edu/~mobasher/pubs.html

  24. M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. "Learning to Extract Symbolic Knowledge from the World Wide Web." Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98). Available from http://www.cs.cmu.edu/~webkb/

  25. Reference for XML.

    Selected Applications

  26. Tom Fawcett, Foster Provost. "Adaptive Fraud Detection". Data Mining and Knowledge Discovery Journal. Volume 1, Issue 3, 1997. pp. 291-316. Available from http://www.wkap.nl/oasis.htm/145812

  27. Kenneth C. Cox, Stephen G. Eick, Graham J. Wills, Ronald J. Brachman. "Brief Application Description; Visual Data Mining: Recognizing Telephone Calling Fraud". Data Mining and Knowledge Discovery Journal. Volume 1, Issue 2, 1997. pp. 225-231 Available from http://www.wkap.nl/oasis.htm/140561