a cognitive psychologist's approach to data mining

24
A Cognitive Psychologist's- Approach to Data Mining - How I beat Netflix Cinematch Maggie Xiong April 22, 2014

Upload: maggiexyz

Post on 01-Jul-2015

293 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: A cognitive psychologist's approach to data mining

A Cognitive Psychologist's- Approach to Data Mining

- How I beat Netflix Cinematch

Maggie XiongApril 22, 2014

Page 2: A cognitive psychologist's approach to data mining

Parallel FrameworksCognitive psychology & data mining

Case StudyThe Netflix Prize Project

General Outline

Page 3: A cognitive psychologist's approach to data mining

Abstraction and Generalization

CategorizationPrototypeExemplarDecision boundaryTheory-based categories

Semantic space / LSAConnectionism

Page 4: A cognitive psychologist's approach to data mining

Abstraction

Linguistic ideas (Bransford & Franks, 1971)“The ants in the kitchen ate the sweet jelly which was on the table.”“The ants in the kitchen ate the sweet jelly.”“The ants in the kitchen ate the jelly.”“The ants were in the kitchen.”Participants were more confident in “recognizing” fuller sentences.

Prototype (Posner & Keele, 1968)Participants studied instances generated from distortions of prototypes.They showed the same accuracy and response time for never-seen

prototypes and memorized instances in a later test.

Page 5: A cognitive psychologist's approach to data mining

Categorization

Category structure (Collins & Quillian, 1969)Economy of organizationParticipants takes longer to respond to statements across category levels.Typicality

Exemplar(Jacoby & Brooks, 1984)

Page 6: A cognitive psychologist's approach to data mining

Decision BoundaryTheory-based Categories

Decision boundary (Ashby & Gott, 1988)

Theory-based categories (Murphy & Medin, 1985)Categories organized around theories about the world.clean vs unclean foods; apples and prime numbers

Page 7: A cognitive psychologist's approach to data mining

Semantic SpaceLatent Semantic Analysis

Shepard, 1987Probability of generalization decays exponentially with

distance.

Osgood, 1957Factor analysisEvaluative, potency, activity

Dumais et al., 1988SVD, cosine similarityLandauer & Dumais, 1997

Page 8: A cognitive psychologist's approach to data mining

Semantic SpaceLatent Semantic Analysis

Shepard, 1987Probability of generalization decays exponentially with

distance.

Osgood, 1957Factor analysisEvaluative, potency, activity

Dumais et al., 1988SVD, cosine similarityLandauer & Dumais, 1997

Page 9: A cognitive psychologist's approach to data mining

Connectionism

Selfridge, 1958Pandemonium

Rumelhart, McClelland, & PDP Research Group, 1986Parallel Distributed Processing, 2 Vol Set

Page 10: A cognitive psychologist's approach to data mining

ConnectionismRumelhart & Todd, 1993

Page 11: A cognitive psychologist's approach to data mining

Common Ground

PrototypeKmeans

ExemplarK-Nearest Neighbor

Theory-based categoriesCollaborative filtering, decision-tree

Decision boundarySupport Vector Machine

Semantic space / LSAConnectionism - artificial neural net

Page 12: A cognitive psychologist's approach to data mining

How Cognitive Psychologists Analyze Data

Task completion rate:

Main effect of coffeeavg(10,8,10,23,18,15) - avg(12,13,10,14,15,12)

Main effect of time-of-dayavg(14,15,12,23,18,15) - avg(12,13,10,10,8,10)

Interaction [avg(23,18,15) - avg(14,15,12)]- [avg(10,8,10) - avg(12,13,10)]

1 Cup 3 Cups

Morning 12,13,10 10,8,10

Evening 14,15,12 23,18,15

Page 13: A cognitive psychologist's approach to data mining

Graph It

Main effects and interaction

Rate

Evening

Morning

Cups of Coffee

Page 14: A cognitive psychologist's approach to data mining

Training set17770 movies, 500K users, 100M ratings

user_id, movie_id, rating, date_of_ratingmovie_id, title, year

Probe set (1.4M ratings)

Qualifying set (2.8M ratings)user_id, movie_id, date_of_rating

RMSEsqrt( sum(X - X.pred)2 / N )Cinematch: 0.9514

The Netflix Prize Problem, 2006/10/02

0.8563 => $1M

Page 15: A cognitive psychologist's approach to data mining

Standard Deviation and RMSE

Page 16: A cognitive psychologist's approach to data mining

The Netflix Problem, Interpreted

Overall average movie rating: 3.620*Main effect of movie:

Miss Congeniality: avg(u1,u2,u3...)Mission Impossible: avg(u1,u2,u3...)

Main effect of user:Alex: avg(m1,m2,m3…)Brian: avg(m2,m2,m3…)

Interaction:Alex - Miss Congeniality, Mission Impossible, ...Brian - Miss Congeniality, Mission Impossible, ...

Page 17: A cognitive psychologist's approach to data mining

RMSE, Appreciated

Overall standard deviation: 1.0822*“Trivial approach” (main effect of movie): 1.0540Main effects of movie and user: 0.9889*

R.pred = M.avg + U.dev

Cinematch: 0.9514...

...

Prize: 0.8563

Page 18: A cognitive psychologist's approach to data mining

The Arithmetic Approach

R = M.avg + U.dev + interactioninteraction = R - (M.avg + U.dev)R.pred = M.avg + U.dev + w.avg(interaction * sim(M.p, M))

Alex R M.avg dev interactionMission Impossible 4 4.3 -0.3 4 - [4.3 + (-1.4)] = 1.1Coyote Ugly 1 3.5 -2.5 1 - [3.5 + (-1.4)] = - 1.1Miss Congeniality ? 4.5

Alex U.dev = ((4 - 4.3) + (1 - 3.5)) / 2 = -1.4sim(Miss Congeniality, Coyote Ugly) = 0.8sim(Miss Congeniality, Mission Impossible) = 0.2? = 4.5 + (-1.4) + (-1.1*0.8 + 1.1 * 0.2) / (|0.8| + |0.2|) = 2.44

Page 19: A cognitive psychologist's approach to data mining

Similarity Measures

Romesburg, 1984Shape difference vs.Size displacement

Euclidean distanceCosine similarityCorrelation coefficient

Page 20: A cognitive psychologist's approach to data mining

Movie Similarity

Similarity measuresCo-occurrence count

How often a person rented both movies.

CorrelationA function of the difference in ratings when a person rented both

movies.

Correlation weighted by probability (significance)Mean Euclidean distance of movie x user interactions

interaction = R - (M.avg + U.dev)

Weighted by similarities inmovie release times, rental frequencies, mean ratings

Page 21: A cognitive psychologist's approach to data mining

User Clusters

Differentiate movie mean rating and similarityR.pred = M.cluster_avg + U.cluster_dev + w.avg(interaction * sim_cluster(M,M.p))

By experience (number of movies rated)[2,180], [81,180], [181,240], [240,400], [401,3000]

By genderInferred from preference for different movie clusters

By cluster analysisPCA, Kmeans

Page 22: A cognitive psychologist's approach to data mining

Blend It

Generate different sets of predictions using different movie similarity and user cluster strategies

Use linear regression to combine the sets of predictions into one final prediction

Weak learners are good too, as long as they provide unique information.

Page 23: A cognitive psychologist's approach to data mining

RMSE, 2008/04/01

Overall standard deviation: 1.0822*“Trivial approach” (main effect of movie): 1.0540Main effects of movie and user: 0.9889*

R.pred = M.avg + U.dev

Cinematch: 0.9514...

Naga FX: 0.9063...

Prize: 0.8563

Page 24: A cognitive psychologist's approach to data mining

Cognitive theories and data mining methodsPrototype K-MeansExemplar K-Nearest NeighborTheory-based categories Collaborative filtering, decision-treeDecision boundary Support Vector MachineSemantic space / LSAConnectionism - artificial neural net

Abstraction and generalizationIt’s all about similarity.

Tversky, 1977Murphy & Medin, 1985

Looking Back