CS525. Advanced Topics in Database Systems
Large-Scale Data Management
Home Textbook & Reading List
Grading
Project
Schedule Additional Resources

Teams:

Platform:
A virtual machine will be available for download that includes the needed platform for the projects. The virtual machine (around 7GB in size) will basically consist of:
        -- Ubuntu OS (Version 12.10)
        -- Hadoop platform (Version 1.1.0)
        -- Apache Pig  (Version 0.10.0)
        -- Mahout library (Version 0.7)
        -- RHadoop
        -- In addition to other software such as: Jave (Version 1.7) , C (Version 4.7), Perl (Version 5.14.2), Python (Version 2.7.3), etc.

Virtual Box:
       The virtual machine is named "ubuntu-Hadoop-VBoxVersion.ova.zip", and can be downloaded (Here). For this version you will need "VirtualBox" software (Free) available Here.

VMWare:
       The virtual machine is named "ubuntu-Hadoop-VMWareVersion.zip", and can be downloaded (Here). For this version you will need "VMWare" software (Not Free) available Here.

Note: All students are granted access to the VMWare software available on the Zoo lab machines. So, either you can work from your own PC or laptop, or you can use the Zoo lab facility.

  

List of Projects:

ID
Project  Description
Release Date
Due Date
Link
1
Java coding in Hadoop
01/22/2013
01/31/2013 (11:59PM)
 Project 1
2
Pig and Hadoop Streaming
02/05/2013
02/14/2013 (11:59PM)
Project 2
3
Spatial Joins and Input Formats
02/19/2013
02/28/2013 (11:59PM)
Project 3
4
Clustering and Classification in Hadoop
03/12/2013
03/28/2013 (11:59PM)
Project 4
5
Elective Project
03/30/2013
04/25/2013 (11:59PM)
Ideas