WPI (Worcester Polytechnic Institute)

Computer Science Department

Improvements to Collaborative Filtering Algorithms

Anuja Gokhale

Advisor: Professor Mark Claypool

M.S. Thesis
Computer Science Department, WPI
May 1999


The explosive growth of mailing lists, Web sites and Usenet news has caused information overload. It is no longer feasible to search through all the sources of information available in order to find those that are of interest to an individual user.

Collaborative filtering systems recommend items based upon opinions of people with similar tastes. Collaborative filtering overcomes some difficulties faced by traditional information filtering by eliminating the need for computers to understand the content of the items. Further, collaborative filtering can also recommend articles that are not similar in content to items rated in the past as long as like-minded users have rated the items. Unfortunately, collaborative filtering is not effective when there are too few users that have rated an item or for users that do not have a strong history or correlation with other users.

Content-based systems use content to filter or recommend items. These perform well when users know and specify topics in which they are interested. Recommendations for a user are based solely on a profile built by analyzing the content of the items which that user has rated in the past. Content based filters face problems of over-specialization. When the system can only recommend items scoring highly against a user's profile, the user is restricted to seeing items similar to those she has already seen. Also, it is often difficult for content-based filters to understand the meaning of text or even the actual content of complex items.

We combine the strengths of content-based filtering techniques with collaborative filtering to provide more accurate recommendations. We use thresholds to improve the accuracy of traditional filtering algorithms, and design and implement a way to apply content-based filtering to an online newspaper. We compare our improved algorithms to current algorithms using both off-line and online experiments and show that these result in more effective filters that can help manage the massive amount of information that is confronting us today.

Complete Writeup

Related Publications

Mark Claypool, Anuja Gokhale, Tim Miranda, Pavel Murnikov, Dmitry Netes and Matthew Sartin, Combining Content-Based and Collaborative Filters in an Online Newspaper, ACM SIGIR Workshop on Recommender Systems, Berkeley, CA, August 19, 1999.

Anuja Gokhale and Mark Claypool, Thresholds for More Accurate Collaborative Filtering, IASTED International Conference on Artificial Intelligence and Soft Computing, Honolulu, Hawaii, August 9-12, 1999.