WPI Worcester Polytechnic Institute

Computer Science Department



Week	Date	Due	Topic  

 1	Sep 09		Introduction					  
			The Knowledge Discovery in Databases (KDD) Process 
                        KDD Applications 

 2      Sep 16		Data Integration: Data Warehouses, Mediation

 3      Sep 23		OLAP and Multidimensional Analysis 

 4      Sep 30 		Pre-Processing, Feature Selection

 5	Oct 07		Mining Association Rules

 6	Oct 14	PJ1	Mining Sequential Patterns and Similar Time Sequences

 7	Oct 21		Review and ** EXAM I **

 8	Oct 28		Classification: Decision Trees 

 9	Nov 04		Rule-Based Mining: Inductive Logic Programming

10	Nov 11	PJ2	Regression: Instance-Based Learning

11	Nov 18		Evaluation of Patterns and Visualization

12	Dec 02		Clustering

13	Dec 09	PJ3	Web Mining, XML
                        Project presentation I

14	Dec 16	        Project presentation II	

Assigned Readings

    The Knowledge Discovery in Databases (KDD) Process

  1. Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. "From Data Mining to Knowledge Discovery in Databases" AAAI Magazine, pp. 37-54. Fall 1996.

  2. Bhandari, I. et al. "Advanced Scout: Data mining and Knowledge Discovery in NBA Data" Data Mining and Knowledge Discovery Journal, Vol 1, pp 121-125. 1997.

    Data Warehouses, OLAP and Multidimensional Analysis

  3. J. Widom, "Research Problems in Data Warehousing" Fourth Int'l Conf. on Information and Knowledge Management (CIKM) 1995.

  4. S. Chaudhuri and U. Dayal "An overview of data warehousing and OLAP technology" ACM SIGMOD Record, 26(1):65-74, 1997.

  5. (Not included in the exam) Gray, J. el al. "Data Cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals" Data Mining and Knowledge Discovery Journal, Vol 1, pp 29-53. 1997.

    Pre-Processing, Feature Selection

  6. Langley, P. "Selection of Relevant Features in Machine Learning" Proceedings of the AAAI Fall Symposium on RElevance. New Orleans, LA. AAAI Press. 1994.

  7. M.W. Berry, Z. Drmac, and E.R. Jessup. "Matrices, Vector Spaces, and Information Retrieval" SIAM Reviews. Vol. 41, No. 2, pp. 335-362.

  8. (Not included in the exam) Barbara et al. "The New Jersey Data Reduction Report" Bulletin of the IEEE Computer Sociaty Technical Committee on Data Engineering.

    Mining Association Rules

  9. R. Agrawal, T. Imilinski, and A. Swami "Mining Association rules between sets of items in large databases" Proc. of the ACM SIGMOD Int'l Conference on Management of Data, Washington D.C., May 1993, 207-216. PostScript and PDF Online.

  10. R. Agrawal, R. Srikant: "Fast Algorithms for Mining Association Rules" Proc. of the 20th Int'l Conference on Very Large Databases, Santiago, Chile, Sept. 1994. PostScript and PDF Online.

    Mining Sequential Patterns and Similar Time Sequences

  11. Srikant, R. and Agrawal, R. "Mining Sequential Patterns: Generalizations and Performance Improvements" Proc. of the Fifth Int'l Conference on Extending Database Technology (EDBT), Avignon, France, March 1996.

  12. Mannila, H., Toivonen, H., and Verkamo, A.I. "Discovery of frequent episodes in sequences" First International Conference on Knowledge Discovery and Data Mining (KDD'95) 210 - 215, Montreal, Canada, August 1995.

  13. (Not included in the exam) R. Agrawal, C. Faloutsos and A. Swami. "Efficient Similarity Search in Sequence Databases Foundations of Data Organization and Algorithms". (FODO) Conference, Oct. 1993, Evanston, Illinois, Oct. 13-15, 1993. PostScript Online.

  14. (Not included in the exam) C. Faloutsos, M. Ranganathan and Y. Manolopoulos. "Fast Subsequence Matching in Time-Series Databases". Proc. ACM SIGMOD, May 25-27, 1994, Minneapolis, MN. pp. 419-429. PostScript Online.

    Classification: Decision Trees

  15. J. R. Quinlan. "C4.5: Programs for Machine Learning". Morgan Kaufmann Publishers. 1993. Chapters 1 and 2.

  16. J.R. Quinlan. "Induction of Decision Trees". Machine Learning 1:81-106. 1986.

  17. R. Rastogi and K. Shim "PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning" Proc. of the 24th VLDB Conference, NY USA. 1998. PostScript Online.

    Rule-Based Mining: Inductive Logic Programming

  18. J.R. Quinlan. "Learning Logical Definitions from Relations". Machine Learning 5:239-266. 1990.

  19. I. Bratko and S. Muggleton. "Applications of Inductive Logic Programming". Communications of the ACM. Vol. 38, No. 11, pp 65-70. 1995. Available online from the WPI library (e-journal collection)

    Regression: Instance-Based Learning

  20. Tom M. Mitchell "Machine Learning" McGraw-Hill 1997. Chapter 8.

    Evaluation of Patterns and Visualization

  21. J.A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, V. Crow. "Visualizing the Non-Visual: Spatial Analysis and Interaction with Information from Text Documents". Proc. IEEE Information Visualization Symposium, pp 51-58. IEEE Computer Society Press. 1995.


  22. S. Guha, R. Rastogi and K. Shim. "CURE: An efficient algorithm for clustering large databases". In Proceedings of ACM-SIGMOD 1998 International Conference on Management of Data, Seattle, 1998. Available from: http://www.bell-labs.com/user/rastogi/

  23. P. S. Bradley, U. M. Fayyad and C. Reina. "Scaling Clustering Algorithms to Large Databases". Fourth International Conference on Knowledge Discovery & Data Mining KDD-98, pages 9-15. AAAI Press, Menlo Park, CA, 1998. Available from: http://www.research.microsoft.com/users/bradley/papers.html

    Web Mining, XML

  24. Cooley, Bamshad Mobasher, and J. Srivastava, "Web Mining: Information and Pattern Discovery on the World Wide Web." Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'97), November 1997. Available from http://maya.cs.depaul.edu/~mobasher/pubs.html

  25. M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. "Learning to Extract Symbolic Knowledge from the World Wide Web." Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98). Available from http://www.cs.cmu.edu/~webkb/

  26. Reference for XML.