lessons from the netflix prize robert bell at&t labs-research in collaboration with chris...

54
Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs- Research & Yehuda Koren, Yahoo! Research

Upload: bertina-baldwin

Post on 16-Dec-2015

239 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

Lessons fromthe Netflix Prize

Robert Bell AT&T Labs-Research

In collaboration with

Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

Page 2: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

2

“We’re quite curious, really. To the tune of one million dollars.” – Netflix Prize rules

• Goal to improve on Netflix’s existing movie recommendation technology

• Contest began October 2, 2006• Prize

– Based on reduction in root mean squared error (RMSE) on test data

– $1,000,000 grand prize for 10% drop (19% for MSE)– Or, $50,000 progress for best result each year

Page 3: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

3

Data Details

• Training data– 100 million ratings (from 1 to 5 stars)– 6 years (2000-2005)– 480,000 users– 17,770 “movies”

• Test data– Last few ratings of each user– Split as shown on next slide

Page 4: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

4

Test Data Split into Three Pieces

• Probe– Ratings released– Allows participants to assess

methods directly

• Daily submissions allowed for combined Quiz/Test data– Identity of Quiz cases

withheld– RMSE released for Quiz– Test RMSE withheld– Prizes based on Test RMSE

Page 5: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

5

Higher Mean Rating in Probe Data

0

5

10

15

20

25

30

35

40

1 2 3 4 5

Rating

Per

cen

tag

e

Training (m = 3.60)

Probe (m = 3.67)

Page 6: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

6

2004

Something Happened in Early 2004

Page 7: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

Data about the MoviesMost Loved Movies Avg rating Count

The Shawshank Redemption 4.593 137812

Lord of the Rings: The Return of the King 4.545 133597

The Green Mile 4.306 180883

Lord of the Rings: The Two Towers 4.460 150676

Finding Nemo 4.415 139050

Raiders of the Lost Ark 4.504 117456

Most Rated MoviesMiss Congeniality

Independence Day

The Patriot

The Day After Tomorrow

Pretty Woman

Pirates of the Caribbean

Highest VarianceThe Royal Tenenbaums

Lost In Translation

Pearl Harbor

Miss Congeniality

Napolean Dynamite

Fahrenheit 9/11

Page 8: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

8

Most Active Users

User ID # Ratings Mean Rating

305344 17,651 1.90

387418 17,432 1.81

2439493 16,560 1.22

1664010 15,811 4.26

2118461 14,829 4.08

1461435 9,820 1.37

1639792 9,764 1.33

1314869 9,739 2.95

Page 9: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

9

Major Challenges

1. Size of data– Places premium on efficient algorithms– Stretched memory limits of standard PCs

2. 99% of data are missing– Eliminates many standard prediction methods– Certainly not missing at random

3. Training and test data differ systematically– Test ratings are later– Test cases are spread uniformly across users

Page 10: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

10

Major Challenges (cont.)

4. Countless factors may affect ratings– Genre, movie/TV series/other– Style of action, dialogue, plot, music et al.– Director, actors– Rater’s mood

5. Large imbalance in training data– Number of ratings per user or movie varies by

several orders of magnitude– Information to estimate individual parameters varies

widely

Page 11: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

11

Ratings per Movie in Training Data

Avg #ratings/movie: 5627

Page 12: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

12

Ratings per User in Training Data

Avg #ratings/user: 208

Page 13: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

13

The Fundamental Challenge

• How can we estimate as much signal as possible where there are sufficient data, without over fitting where data are scarce?

Page 14: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

14

Recommender Systems

• Personalized recommendations of items (e.g., movies) to users

• Increasingly common– To deal with explosive number of choices on

the internet– Netflix– Amazon– Many others

Page 15: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

15

Content Based Systems

• A pre-specified list of attributes

• Score each item on all attributes

• User interest obtained for the same attributes– Direct solicitation, or– Estimated based on user rating, purchases,

or other behavior

Page 16: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

16

Pandora

• Music recommendation system

• Songs rated on 400+ attributes– Music genome project– Roots, instrumentation, lyrics, vocals

• Two types of user feedback– Seed songs– Thumbs up/down for recommended songs

Page 17: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

17

Collaborative Filtering (CF)

• Avoids need for:– Determining “proper” content– Collecting information about items or users

• Infers user-item relationships from purchases or ratings

• Used by Amazon and Netflix• Two main CF tools

– Nearest neighbors– Latent factor models

Page 18: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

18

Nearest Neighbor Methods

• Most common CF tool at the beginning of the contest• Predict rating for a specific user-item pair based on

ratings of– Similar items– By the same user– Or vice versa

• Pearson correlation or cosine similarity

);(

);(ˆuiNj ij

uiNj ujij

ui s

rsr

Page 19: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

19

Merits of Nearest Neighbors

• Few modeling assumptions• Few tuning parameters to learn• Easy to explain to users

– Dear Amazon.com Customer, We've noticed that customers who have purchased or rated How Does the Show Go On: An Introduction to the Theater by Thomas Schumacher have also purchased Princess Protection Program #1: A Royal Makeover (Disney Early Readers).

Page 20: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

20

Latent Factor Models

• Models with latent classes of items and users– Individual items and users are assigned to either a

single class or a mixture of classes

• Neural networks– Restricted Boltzmann machines

• Singular Value Decomposition (SVD)– AKA matrix factorization– Items and users described by unobserved factors– Main method used by leaders of competition

Page 21: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

21

SVD

• Dimension reduction technique for matrices• Each item summarized by a

d-dimensional vector qi • Similarly, each user summarized by pu

• Choose d much smaller than number of items or users– e.g., d = 50 << 18,000 or 480,000

• Predicted rating for Item i by User u– Inner product of qi and pu

– ˆor ˆ ''uiiuuiuiui pqbarpqr

Page 22: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

22

Geared towards females

Geared towards males

serious

escapist

The PrincessDiaries

The Lion King

Braveheart

Lethal Weapon

Independence Day

AmadeusThe Color Purple

Dumb and Dumber

Ocean’s 11Sense and Sensibility

Page 23: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

23

Geared towards females

Geared towards males

serious

escapist

The PrincessDiaries

The Lion King

Braveheart

Lethal Weapon

Independence Day

AmadeusThe Color Purple

Dumb and Dumber

Ocean’s 11Sense and Sensibility

Gus

Dave

Page 24: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

24

Regularization for SVD

• Want to minimize SSE for Test data• One idea: Minimize SSE for Training data

– Want large d to capture all the signals– But, Test RMSE begins to rise for d > 2

• Regularization is needed– Allow rich model where there are sufficient data– Shrink aggressively where data are scarce

• Minimize

ii

uu

trainingiuui qpqpr

222' )(

Page 25: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

25

Geared towards females

Geared towards males

serious

escapist

The PrincessDiaries

The Lion King

Braveheart

Lethal Weapon

Independence Day

AmadeusThe Color Purple

Dumb and Dumber

Ocean’s 11Sense and Sensibility

Gus

Page 26: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

26

Geared towards females

Geared towards males

serious

escapist

The PrincessDiaries

The Lion King

Braveheart

Lethal Weapon

Independence Day

AmadeusThe Color Purple

Dumb and Dumber

Ocean’s 11Sense and Sensibility

Gus

Page 27: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

27

Geared towards females

Geared towards males

serious

escapist

The PrincessDiaries

The Lion King

Braveheart

Lethal Weapon

Independence Day

AmadeusThe Color Purple

Dumb and Dumber

Ocean’s 11Sense and Sensibility

Gus

Page 28: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

28

Geared towards females

Geared towards males

serious

escapist

The PrincessDiaries

The Lion King

Braveheart

Lethal Weapon

Independence Day

AmadeusThe Color Purple

Dumb and Dumber

Ocean’s 11Sense and Sensibility

Gus

Page 29: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

29

Estimation for SVD

• Fit by gradient descent– Loop over observed ratings– Update each relevant parameter– Small step in each parameter, proportional to gradient– Repeat until convergence

• Alternatively, fit by sequence of ridge regressions– Fix item factors– Loop over users, estimating user factors– Do same to estimate item factors– Repeat until convergence

Page 30: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

Improvements toCollaborative Filtering

• Fine tune existing methods

• Incorporate alternative “effects”

• Incorporate a variety of modeling methods

• Careful regularization to avoid over fitting

Page 31: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

Localized SVD

• SVD uses all of a user’s ratings to train the user’s factors

• But what if the user is multiple people?– Different factor values may apply to movies rated by

Mom vs. Dad vs. the Kids

• This approach computes user factors, pu , specific to the movie being predicted– Given all the {qi}, pu is the solution of a ridge regression

– Weighted ridge regressions with higher weights for movies similar to the target movie

Page 32: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

Improvement from Localized SVD

Page 33: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

33

Lesson #1: Data >> Models

• Very limited feature set– User, movie, date– Places focus on models/algorithms

• Major steps forward associated with incorporating new data features– What movies a user rated– Temporal effects

Page 34: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

34

You are What You Rate

• What you rate (and don’t) provides information about your preferences

• Paterek’s NSVD explicitly characterizes users by which movies they like

• Incorporate what a user rated into the user factor–

• Substantially reduces RMSE

ˆ)(

21'

uNj

j/-

uiiuui y|N(u)|pqbar

Page 35: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

35

Temporal Effects

• User behavior may change over time– Ratings go up or down– Interests change– For example, with addition of a new rater

• Allow user biases and/or factors to change over time– – Model au(t) and pu(t) as linear, unrestricted,

or a sum of both types

)()()()(ˆ)(

21'

uNj

j/-

uiiuui y|N(u)|tpqtbtatr

Page 36: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

3636

Geared towards females

Geared towards males

serious

escapist

The PrincessDiaries

The Lion King

Braveheart

Lethal Weapon

Independence Day

AmadeusThe Color Purple

Dumb and Dumber

Ocean’s 11Sense and Sensibility

Gus

Page 37: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

3737

Geared towards females

Geared towards males

serious

escapist

The PrincessDiaries

The Lion King

Braveheart

Lethal Weapon

Independence Day

AmadeusThe Color Purple

Dumb and Dumber

Ocean’s 11Sense and Sensibility

Gus

Page 38: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

3838

Geared towards females

Geared towards males

serious

escapist

The PrincessDiaries

The Lion King

Braveheart

Lethal Weapon

Independence Day

AmadeusThe Color Purple

Dumb and Dumber

Ocean’s 11Sense and Sensibility

Gus

Page 39: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

3939

Geared towards females

Geared towards males

serious

escapist

The PrincessDiaries

The Lion King

Braveheart

Lethal Weapon

Independence Day

AmadeusThe Color Purple

Dumb and Dumber

Ocean’s 11Sense and Sensibility

Gus +

Page 40: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

40

#2: The Power of Regularized SVD Fit by Gradient Descent

• Allowed anyone to approach early leaders– Powerful predictor– Efficient– Easy to program

• Flexibility to incorporate additional features– Implicit feedback– Temporal effects– Neighborhood effects

• Accurate regularization is essential

Page 41: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

41

Factor models: RMSE vs. #parameters

200

100

50

200100

50

500200

100

50

500200100

15001000500200100

50

0.875

0.880

0.885

0.890

0.895

0.900

0.905

10 100 1000 10000 100000

Millions of Parameters

RMSE

Basic SVD

… + What was Rated

… + Linear Time Factors

… + Per-Day User Biases

… + per-Day User Factors

Page 42: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

#3: The Wisdom of Crowds (of Models)

• All models are wrong; some are useful – G. Box

• Used linear blends of many prediction sets– 107 in Year 1– Over 800 at the end

• Difficult, or impossible, to build the grand unified model

• Mega blends are not needed in practice– A handful of simple models achieves 80 percent of

the improvement of the full blend

Page 43: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

43

#4: Find Good Teammates

• Yehuda Koren– The engine of progress for the Netflix Prize– Implicit feedback– Temporal effects– Nearest neighbor modeling

• Big Chaos: Michael Jahrer, Andreas Toscher (Year 2)– Optimization of tuning parameters– Blending methods

• Pragmatic Theory: Martin Chabbert, Martin Piotte (Year 3)– Some movies age better than others– Link functions

Page 44: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

44

The Final Leaderboard

Page 45: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

45

Test Set Results

• The Ensemble: 0.856714

Page 46: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

46

Test Set Results

• The Ensemble: 0.856714

• BellKor’s Pragmatic Theory: 0.856704

Page 47: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

47

Test Set Results

• The Ensemble: 0.856714

• BellKor’s Pragmatic Theory: 0.856704

• Both scores round to 0.8567

Page 48: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

48

Test Set Results

• The Ensemble: 0.856714

• BellKor’s Pragmatic Theory: 0.856704

• Both scores round to 0.8567

• Tie breaker is submission date/time

Page 49: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

49

Final Test Set Leaderboard

Page 50: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

Who Got the Money?

• AT&T’s donated its full share to organizations supporting science education

• Young Science Achievers Program• New Jersey Institute of Technology pre-college

and educational opportunity programs• North Jersey Regional Science Fair• Neighborhoods Focused on African American

Youth

Page 51: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

51

#5: Is This the Way to Do Science?

• Big Success for Netflix– Lots of cheap labor, good publicity– Already incorporated 6 percent improvement– Potential for much more using other data they have

• Big advances to the science of recommender systems– Regularized SVD– Identification of new features– Understanding nearest neighbors– Contributions to literature

Page 52: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

52

Why Did this Work so Well?

• Industrial strength data

• Very good design

• Accessibility to anyone with a PC

• Free flow of ideas– Leaderboard– Forum– Workshop and papers

• Money?

Page 53: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

53

But There are Limitations

• Need a conceptually simple task

• Winner-take-all has drawbacks

• Intellectual property and liability issues

• How many prizes can overlap?

Page 54: Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research

54

Thank You!

[email protected]

• www.netflixprize.com– …/leaderboard– …/community

• Click BellKor’s Pragmatic Chaos or The Ensemble on Leaderboard for details