movie recommendation system jon c. hammer 04/30/2015
TRANSCRIPT
Introduction• Goal:
• Build a complete movie recommendation system.
• Requirements:
• User can view personalized recommendations
• User can input new ratings
• Interactive
ALS• Collaborative Filtering technique
• Alternative to user/item based recommendation
• Uses Matrix Factorization
• Split ratings matrix (u x m) into U (u x d) and M (d x m)
• u = Number of users
• m = Number of movies
• d = Number of intrinsic dimensions (our choice)
• Alternate between optimizing U and M
• Gradient Descent
• Least Squares method
Architecture• Key components:
• Recommender
• Database
• Web server
• Client application
Client Application
Web Server
Recommender Database
Implementation• Platforms
• Servers hosted on AWS
• T2.micro and M3.xlarge EC2 instances
• Ubuntu 14.04 LTS
• Additional Software:
• Hadoop
• Mahout
• MySQL
• Android application
• Languages
• Python
• Bash
• Java
Recommender• Dataset
• MovieLens
• 20 million ratings
• 27,000 movies
• 138,000 users
• Mahout
• Given ratings matrix, produces N recommendations per user
• Uses ALS algorithm
• Recommendations are used by web server
• Recomputed when new ratings are provided by user
Database• User Table
• Login & customer information
• Movie Table
• Movie name, year, IMDB link, and poster
• Most information already provided in the dataset
• Posters scraped from IMDB a priori for client applications
• MySQL implementation
Web Server• Interface between clients and
recommendation engine / database
• Written in Python
• Twisted, Klein, MySQLdb modules
• Communication via HTTP Get, HTTP Post, JSON
• Returns most recent recommendations
• Interactive database queries
Client Application• Features
• Login system
• Ability to create new accounts on the fly
• View personalized recommendations
• Search database for movies
• Enter new ratings
• Written in Java for Android
Lessons Learned• Operate with AWS
• Configuring & launching instances
• Creating images
• VPC & Security
• Hadoop / Mahout
• Installation & configuration
• HDFS
• General
• Making / responding to web requests in both Java & Python
• Website scraping
References• Zhou, Yunhong, et al. "Large-scale parallel
collaborative filtering for the netflix prize." Algorithmic Aspects in Information and Management. Springer Berlin Heidelberg, 2008. 337-348.
• Mahout. https://mahout.apache.org/
• Hadoop. https://hadoop.apache.org/
• MovieLens dataset. http://grouplens.org/datasets/movielens/