Advanced Topics in Database Systems
Large-Scale Data Management
Date/Time: Tuesday and
Thursday, 4:00pm - 5:20pm.
Mohamed Eltabakh, FL-235, firstname.lastname@example.org
Tuesday and Thursady, 3:00pm-4:00pm. Students are also welcomed to
arrange other meeting times by emails.
Overview (Catalog Info)
advances in technology, science, hardware, software, and communication
networks have enabled many emerging applications in business
enterprises, scientific and engineering disciplines, social networks,
government endeavors, among others to generate and collect data at
unprecedented scale and complexity that need to managed and analyzed
efficiently. In fact, the progress and innovation in these domains and
applications is no longer hindered by their ability to collect data,
but by their ability to manage, analyze, summarize, visualize, and
discover knowledge from the collected data in a timely manner and in a
scalable fashion. In this course, we focus on studying new technologies
and infrastructures developed for large-scale data management including
MapReduce Infrastructure, Pregel platform, and cloud-enabled computing.
We will also cover the query optimizations, access methods, storage
layouts, and energy management techniques developed over these
infrastructures. As an advanced course, a research-oriented project(s)
will be proposed to allow students to explore new directions and
research ideas in large-scale data management. This course will be very
useful for students pursuing research (either MSc or PhD) in database
systems and data management.
Overview on Topics
The main theme of
the topics will be divided into four categories, (i) Motivating
applications, (ii) State-of-art infrastructures such as Hadoop
MapReduce, (iii) Different optimizations on these infrastructures, (iv)
Advanced techniques and algorithms on these infrastructures. Here is
the tentative overview:
(i) Motivation and Applications
- Introduction to
Large-Scale Data Management
- Application I:
Scientific Data Management
- Application II:
- Application III:
Business Enterprises and Log Processing
- MapReduce Framework
- Pregel Platform
- Could-Enabled Computing
Industry-Developed Large-Scale Distributed Platforms
(iii) Optimizations (For
each Infrastructure mentioned above)
- Query Optimizations
- Access Methods
- Storage Layouts and
- Energy Management in
(iv) Advanced techniques
- Integrating MapReduce
Framework with Other Data Management Technologies
- Machine learning, data mining, and statistical
algorithms on MapReduce
There are several objectives from this course including:
1- Learning state-of-art techniques in large-scale
data management that apply to many modern applications.
2- Learning how the prepare and present technical
papers which is an essential skill for students and researchers.
3- Learning how to review papers. Reviewing
technical and scientific papers is a skill that you need to develop.
Throughout this course, you will review several papers.
4- Working in a semester-long project that can
potentially lead to a publication.
The course is organized as
series of seminars presented by the instructor and students. The
instructor will present several lectures covering the state-of-art
techniques in various topics. Each student is expected to present two
to three papers in a certain topic. For a given lecture, all
non-presenting students are expected to read the presented paper and to
submit a one-page review that highlights (1) the main idea of the
paper, (2) two/three strong points, and (3) two/three weak points of
Most of the course material will be taken from conferences in database
systems such as SIGMOD, VLDB, ICDE, etc.
With respect to the project, students will also form terms of two or
three to work on a semester-long
research project. An ideal project will involve implementing some of
techniques covered in class along with some modifications/extensions to
them, or performing comparative study between alternative techniques.
However, the project is not limited to the covered material. A good
project would possibly result in writing a publishable paper.
are expected to have strong background and knowledge of relational
database management systems. Prior courses in databases, e.g., CS542,
equivalent courses, are recommended. Also students are expected to have
strong skills in programming languages such as C or Java.
Course Load & Grading Policy
|Projects (6 or 7)
|Each project will be done in
teams of two.
|Presentations (1 or 2)
|Each presentation will be done
in teams of two. If the number of teams is large, some teams may do one
presentation + an extra project.
|Reviews are done individually.
Whenever a team is presenting a paper, other students are expected to
read the presented paper and submit a review on it.
|Includes discussions in class
addition to this website, the course is also available at blackboard.wpi.edu.
Please use the discussion board available at blackboard.wpi.edu
for any course-related discussion and exchange of emails.