Project #2 - The Netflix Challenge
Netflix is an online movie rental company. Their website sports a feature that is fairly common nowadays, namely it
displays recommendations of movies you might like based on what you liked before.
Think about it a moment. How would you solve this problem? Your input is a large list of people, each one containing
their ratings for different movies they have seen. Aside from this, you know nothing about your clients. How do you even
begin to form a recommendation, let alone good ones?
It's a fascinating problem which is keeping many computer science labs around the world busy. It is called the
collaborative filtering problem.
Netflix decided that, while their approach did a decent job, it could be improved. If their recommendations were
more accurate, their customer would be renting more movies, and they would be making more money.
So in October 2006, they set up a competition with the following rules. Netflix made available the first half of the
content of a large database of ratings. The goal is to predict the ratings in the second half of the database (which was
kept secret.) The first person or group to demonstrate an algorithm that is 10% better than Netflix's own wins. The
prize? One million dollar.
For your next project, you are going to tackle a simplified version the Netflix Challenge problem.
Part 1
-
On paper, draw the class diagram for the following four classes, as well as all classes you need to define lists of those classes.
A Person has an ID number (but no name, the customers are
anonymous in the dataset,) and a list of RatingGiven objects. A RatingGiven contains a Movie being rated, the date, and
the rating given. A Movie has a name, an ID number, and a list of RatingReceived. A RatingReceived contains the Person
who gave the rating to that movie, the date and the rating given. Notice how the classes are mutually referential. The
data structure you will use is a cyclic collection of mutually referential classes, which will need to be constructed
via mutation. Dealing with this complexity is the most difficult aspect of this project.
- Write the Java code that implements the classes you have sketched in your class diagram.
- Define a class called Netflix which contains a list of movies and a list of people.
- Use your class definitions to write examples of the data. Use mutation to close the cycles.
Write a sufficient number of examples to be able to
write effective tests during Part 2. Write tests that are sufficiently simple that you can deduce the answer on your own,
before writing the rest of the program.
Part 2
This part is quite hard. Do not try to solve this alone. Work with your partner as much as you can!
Implement the following methods:
- Netflix.recommendASecondMovie(String movieName, int n)
When someone rents a movie, the website suggest a second movie to accompany the first. Look at all the people
who have rated the given movie. What other movies have they rated? Which ones did they rate the highest, on average?
Write a function that returns a list of MovieRecommendation
objects. A recommendation object contains the name of the recommended movie, and
the average of the ratings used to compute this recommendation. Your function should return the n
highest-scoring recommendations, sorted so that that highest rated recommendations come first. Your function may return fewer than n result if there are fewer than n other people in your example who have rated the given movie.
- Netflix.recommendAPair(int n)
On the homepage, before the customer begins to browse, the website
presents pairs of movies that go well together. To support this, you must be able to identify the most popular pairs
of movies. The popularity of a pair of movies is the average of the ratings of both movies, as given by all the people
who have rating these two movies. Do not count people who have rated one movie but not both. Write a function
that returns a sorted list of at most n PairRecommendation objects containing the most popular pairs over all, and the average
rating.
Once you are done
Once you're done, name your file Project-2.java, lists the name of all your
homework partners at the top of the file, then submit it to Turnin. Submit your class diagram in class Tuesday.
If you are coding in Eclipse, make a zip file containing all of your project's file and submit it. To make the zip, in Eclipse, select File, choose Export. Expand the General folder and choose Archive File. Eclipse will ask you for the file name of the zip file.
Turning is
at http://turnin.cs.wpi.edu:8087/servlets/turnin.ss
Remember, the deadline is Monday, November 9, 2009 at 11:59 PM.
The project deadline has been pushed by 24 hours, to accommodate the groups that
have had to migrate to Eclipse. The new deadline is Tuesday, November 10, 2009, at 11:59.

[CS2102]