WPI Worcester Polytechnic Institute

Computer Science Department


------------------------------------------

CS 525M KNOWLEDGE DISCOVERY AND DATA MINING  
SCHEDULE OF CLASSES - Fall 2001

PROF. CAROLINA RUIZ 

WARNING: Small changes to this schedule may be made during the course of the semester. 
------------------------------------------
Week  Date         Due    Chapter Topic 

 1    Sep 04, 06            1     Introduction
                                  The Knowledge Discovery in Databases 
                                  (KDD) Process. KDD Applications 

 2    Sep 11, 13            2     Data Integration: Data Warehouses, 
                                  Mediation. OLAP. Multidim. Analysis 

 3    Sep 18, 20            3     Pre-Processing, Feature Selection    

 4    Sep 25, 27            4     Primitives, Languages, and Systems

 5    Oct 02, 04            5     Concept Description 

 6    Oct 09, 11    HW      6     Association Rules 

 7    Oct 16, 18                  Review and ** MIDTERM EXAM **

 8    Oct 23, 25    PJ1     9     Sequential Patterns. Similar Time Seq.

 9    Oct 30, 01    PJ2     7     Classification: Decision Trees 

10    Nov 06, 08    PJ3           Rule Mining: Inductive Logic Programming

11    Nov 13, 15    PJ4     8     Clustering

      Nov 20                      Make-up class

12    Nov 27, 29                  Web Mining, XML

13    Dec 04, 06    PJ5    10     Applications and Trends
                        
14    Dec 11, 13    PJ6    10     Applications and Trends


Readings

    The Knowledge Discovery in Databases (KDD) Process

  1. Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. "From Data Mining to Knowledge Discovery in Databases" AAAI Magazine, pp. 37-54. Fall 1996.

  2. Bhandari, I. et al. "Advanced Scout: Data mining and Knowledge Discovery in NBA Data" Data Mining and Knowledge Discovery Journal, Vol 1, pp 121-125. 1997.

    Data Warehouses, OLAP and Multidimensional Analysis

  3. J. Widom, "Research Problems in Data Warehousing" Fourth Int'l Conf. on Information and Knowledge Management (CIKM) 1995.

  4. S. Chaudhuri and U. Dayal "An overview of data warehousing and OLAP technology" ACM SIGMOD Record, 26(1):65-74, 1997.

  5. Gray, J. el al. "Data Cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals" Data Mining and Knowledge Discovery Journal, Vol 1, pp 29-53. 1997.

    Pre-Processing, Feature Selection

  6. Langley, P. "Selection of Relevant Features in Machine Learning" Proceedings of the AAAI Fall Symposium on RElevance. New Orleans, LA. AAAI Press. 1994.

  7. M.W. Berry, Z. Drmac, and E.R. Jessup. "Matrices, Vector Spaces, and Information Retrieval" SIAM Reviews. Vol. 41, No. 2, pp. 335-362.

  8. Barbara et al. "The New Jersey Data Reduction Report" Bulletin of the IEEE Computer Sociaty Technical Committee on Data Engineering.

    Mining Association Rules

  9. R. Agrawal, T. Imilinski, and A. Swami "Mining Association rules between sets of items in large databases" Proc. of the ACM SIGMOD Int'l Conference on Management of Data, Washington D.C., May 1993, 207-216. PostScript and PDF Online.

  10. R. Agrawal, R. Srikant: "Fast Algorithms for Mining Association Rules" Proc. of the 20th Int'l Conference on Very Large Databases, Santiago, Chile, Sept. 1994. PostScript and PDF Online.

    Mining Sequential Patterns and Similar Time Sequences

  11. Srikant, R. and Agrawal, R. "Mining Sequential Patterns: Generalizations and Performance Improvements" Proc. of the Fifth Int'l Conference on Extending Database Technology (EDBT), Avignon, France, March 1996.

  12. Mannila, H., Toivonen, H., and Verkamo, A.I. "Discovery of frequent episodes in sequences" First International Conference on Knowledge Discovery and Data Mining (KDD'95) 210 - 215, Montreal, Canada, August 1995.

  13. R. Agrawal, C. Faloutsos and A. Swami. "Efficient Similarity Search in Sequence Databases Foundations of Data Organization and Algorithms". (FODO) Conference, Oct. 1993, Evanston, Illinois, Oct. 13-15, 1993. PostScript Online.

  14. C. Faloutsos, M. Ranganathan and Y. Manolopoulos. "Fast Subsequence Matching in Time-Series Databases". Proc. ACM SIGMOD, May 25-27, 1994, Minneapolis, MN. pp. 419-429. PostScript Online.

    Classification: Decision Trees

  15. J. R. Quinlan. "C4.5: Programs for Machine Learning". Morgan Kaufmann Publishers. 1993. Chapters 1 and 2.

  16. J.R. Quinlan. "Induction of Decision Trees". Machine Learning 1:81-106. 1986.

  17. R. Rastogi and K. Shim "PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning" Proc. of the 24th VLDB Conference, NY USA. 1998. PostScript Online.

    Rule-Based Mining: Inductive Logic Programming

  18. J.R. Quinlan. "Learning Logical Definitions from Relations". Machine Learning 5:239-266. 1990.

  19. I. Bratko and S. Muggleton. "Applications of Inductive Logic Programming". Communications of the ACM. Vol. 38, No. 11, pp 65-70. 1995. Available online from the WPI library (e-journal collection)

    Evaluation of Patterns and Visualization

  20. J.A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, V. Crow. "Visualizing the Non-Visual: Spatial Analysis and Interaction with Information from Text Documents". Proc. IEEE Information Visualization Symposium, pp 51-58. IEEE Computer Society Press. 1995.

    Clustering

  21. S. Guha, R. Rastogi and K. Shim. "CURE: An efficient algorithm for clustering large databases". In Proceedings of ACM-SIGMOD 1998 International Conference on Management of Data, Seattle, 1998. Available from: http://www.bell-labs.com/user/rastogi/

  22. P. S. Bradley, U. M. Fayyad and C. Reina. "Scaling Clustering Algorithms to Large Databases". Fourth International Conference on Knowledge Discovery & Data Mining KDD-98, pages 9-15. AAAI Press, Menlo Park, CA, 1998. Available from: http://www.research.microsoft.com/users/bradley/papers.html

    Web Mining, XML

  23. Cooley, Bamshad Mobasher, and J. Srivastava, "Web Mining: Information and Pattern Discovery on the World Wide Web." Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'97), November 1997. Available from http://maya.cs.depaul.edu/~mobasher/pubs.html

  24. M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. "Learning to Extract Symbolic Knowledge from the World Wide Web." Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98). Available from http://www.cs.cmu.edu/~webkb/

  25. Reference for XML.

    Selected Applications

  26. Tom Fawcett, Foster Provost. "Adaptive Fraud Detection". Data Mining and Knowledge Discovery Journal. Volume 1, Issue 3, 1997. pp. 291-316. Available from http://www.wkap.nl/oasis.htm/145812

  27. Kenneth C. Cox, Stephen G. Eick, Graham J. Wills, Ronald J. Brachman. "Brief Application Description; Visual Data Mining: Recognizing Telephone Calling Fraud". Data Mining and Knowledge Discovery Journal. Volume 1, Issue 2, 1997. pp. 225-231 Available from http://www.wkap.nl/oasis.htm/140561