CS525.
Advanced Topics in Database Systems
Large-Scale Data Management
Teams:
- Students will form teams of two to
work in each project.
Platform:
A virtual machine will be available for download that includes the needed platform for the projects. The virtual machine
(around 7GB in size) will basically consist of:
-- Ubuntu OS (Version 12.10)
-- Hadoop platform (Version 1.1.0)
-- Apache Pig (Version 0.10.0)
-- Mahout library (Version 0.7)
-- RHadoop
-- In addition to other software
such as: Jave (Version 1.7) , C (Version 4.7), Perl (Version 5.14.2),
Python (Version 2.7.3), etc.
Virtual Box:
The virtual machine is named "ubuntu-Hadoop-VBoxVersion.ova.zip", and can be downloaded (Here). For this version you will need "VirtualBox" software (Free) available Here.
VMWare:
The virtual machine is named "ubuntu-Hadoop-VMWareVersion.zip",
and can be downloaded (Here). For this version you will need
"VMWare" software (Not Free) available Here.
Note: All
students are granted access to the VMWare software available on the Zoo lab
machines. So, either you can work from your own PC or laptop, or you
can use the Zoo lab facility.
List of Projects:
ID
|
Project Description
|
Release Date
|
Due Date
|
Link
|
1
|
Java coding in Hadoop
|
01/22/2013
|
01/31/2013 (11:59PM)
|
Project 1
|
2
|
Pig and Hadoop Streaming
|
02/05/2013
|
02/14/2013 (11:59PM)
|
Project 2
|
3
|
Spatial Joins and Input Formats
|
02/19/2013
|
02/28/2013 (11:59PM)
|
Project 3
|
4
|
Clustering and Classification in Hadoop
|
03/12/2013
|
03/28/2013 (11:59PM)
|
Project 4
|
5
|
Elective Project
|
03/30/2013
|
04/25/2013 (11:59PM)
|
Ideas
|