DS 3001: Foundations of Data Science
Tuesday and Friday 2:00 - 3:50pm, via a Zoom meeting (online learning)
Instructor: Kyumin Lee
Office Hours: Tuesday 9:30am - 10:30am and Wednesday 4:00pm - 5:00pm, via a Zoom meeting
Email: kmlee (at) wpi.edu
Teaching Assistant: Jianjun Luo
TA's Office Hours: Monday, Thursday and Friday 10:00 - 11:00 am, via a Zoom meeting
TA's email: jluo (at) wpi.edu
Course Summary
This course provides an introduction to the core ideas in Data Science. It covers a broad range of methodologies for working with and making informed decisions based on real-world data. Students will learn how to manage and analyze data at scale (e.g., big data). Specifically, the students will study big data management and processing techniques, data analytics, statistical methods and models, data visualization, and etc.
By the end of the semester students will be able to:
- Define and explain the key concepts and models relevant to data science.
- Design, implement, and evaluate the core algorithms underlying an end-to-end data science workflow, including the experimental design, data collection, mining, analysis, and presentation of information derived from large datasets.
- Apply "best practices" in data science, including facility with modern tools.
Communication
All course announcements will be posted via the Canvas course mailing list.
Recommended Background
Statistics knowledge equivalent to MA 2611 and MA 2612, linear algebra equivalent to MA 2071, and the ability to program equivalent to (CS 1004 or CS 1101 or CS 1102) and (CS 2102 or CS 2119)
Textbooks
Course readings will be drawn from the following textbooks:
-
(DMCT) Data Mining: Concepts and Techniques, 3rd edition (2012). Jiawei Han and Micheline Kamber. Morgan Kaufmann.
-
(IDM) Introduction to Data Mining (2006). Pang-Ning Tan, Michael Steinbach and Vipin Kumar. Addison-Wesley.
-
(MMD) Mining of Massive Datasets(2020). Jure Leskovec, Anand Rajarman and Jeffrey D. Ullman. Cambridge University Press.
-
Doing Data Science (2013). Rachel Schutt, Cathy O'Neil. O'Reilly Media.
-
Mining the Social Web, 3rd Edition (2019). Mikhail Klassen, Matthew A. Russell. O'Reilly Media.
-
Data Science from Scratch, 2nd Edition (2019). Joel Grus. O'Reilly Media.
-
(Tufte) The Visual Display of Quantitative Information (2001) by Tufte.
-
(IIR) Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze, Cambridge University Press.
Grading (tentative)
- Assignments 40%
- Exams 30%
- Project 30%
The detailed information regarding the grading is described in the syllabus.