ted dunning, chief application architect, mapr at mlconf sf

41
© 2014 MapR Technologies 1 © 2014 MapR Technologies

Upload: sessionsevents

Post on 04-Jul-2015

1.924 views

Category:

Technology


0 download

DESCRIPTION

Abstract: Near real-time Updates for Cooccurrence-based Recommenders Most recommendation algorithms are inherently batch oriented and require all relevant history to be processed. In some contexts such as music, this does not cause significant problems because waiting a day or three before recommendations are available for new items doesn’t significantly change their impact. In other contexts, the value of items drops precipitously with time so that recommending day-old items has little value to users. In this talk, I will describe how a large-scale multi-modal cooccurrence recommender can be extended to include near real-time updates. In addition, I will show how these real-time updates are compatible with delivery of recommendations via search engines.

TRANSCRIPT

Page 1: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 1© 2014 MapR Technologies

Page 2: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 2

Who I am

Ted Dunning, Chief Applications Architect, MapR Technologies

Email [email protected] [email protected]

Twitter @Ted_Dunning

Apache Mahout https://mahout.apache.org/

Twitter @ApacheMahout

Page 3: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 3

Agenda

• Background – recommending with puppies and ponies

• Speed tricks

• Accuracy tricks

• Moving to real-time

Page 4: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 4

Puppies and Ponies

Page 5: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 5

Cooccurrence Analysis Cooccurrence Analysis

Page 6: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 6

How Often Do Items Co-occurHow often do items co-occur?

¢AA

A

Page 7: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 7

Which Co-occurrences are Interesting?Which cooccurences are interesting?

Each row of indicators becomes a field in a

search engine document

¢AA

sparsify ¢AA( )

Page 8: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 8

Recommendations

Alice got an apple and

a puppyAlice

Page 9: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 9

Recommendations

Alice got an apple and

a puppyAlice

Charles got a bicycleCharles

Page 10: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 10

Recommendations

Alice got an apple and

a puppyAlice

Charles got a bicycleCharles

Bob Bob got an apple

Page 11: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 11

Recommendations

Alice got an apple and

a puppyAlice

Charles got a bicycleCharles

Bob What else would Bob like?

Page 12: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 12

Recommendations

Alice got an apple and

a puppyAlice

Charles got a bicycleCharles

Bob A puppy!

Page 13: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 13

By the way, like me, Bob also

wants a pony…

Page 14: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 14

Recommendations

?

Alice

Bob

Charles

Amelia

What if everybody gets a

pony?

What else would you recommend for

new user Amelia?

Page 15: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 15

Recommendations

?

Alice

Bob

Charles

Amelia

If everybody gets a pony, it’s

not a very good indicator of

what to else predict...

Page 16: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 16

Problems with Raw Co-occurrence

• Very popular items co-occur with everything or why it’s not very

helpful to know that everybody wants a pony…

– Examples: Welcome document; Elevator music

• Very widespread occurrence is not interesting to generate indicators

for recommendation

– Unless you want to offer an item that is constantly desired, such as

razor blades (or ponies)

• What we want is anomalous co-occurrence

– This is the source of interesting indicators of preference on which to

base recommendation

Page 17: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 17

Overview: Get Useful Indicators from Behaviors

1. Use log files to build history matrix of users x items

– Remember: this history of interactions will be sparse compared to all

potential combinations

2. Transform to a co-occurrence matrix of items x items

3. Look for useful indicators by identifying anomalous co-occurrences to

make an indicator matrix

– Log Likelihood Ratio (LLR) can be helpful to judge which co-

occurrences can with confidence be used as indicators of preference

– ItemSimilarityJob in Apache Mahout uses LLR

Page 18: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 18

Which one is the anomalous co-occurrence?

A not A

B 13 1000

not B 1000 100,000

A not A

B 1 0

not B 0 10,000

A not A

B 10 0

not B 0 100,000

A not A

B 1 0

not B 0 2

Page 19: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 19

Which one is the anomalous co-occurrence?

A not A

B 13 1000

not B 1000 100,000

A not A

B 1 0

not B 0 10,000

A not A

B 10 0

not B 0 100,000

A not A

B 1 0

not B 0 2

0.90 1.95

4.52 14.3

Dunning Ted, Accurate Methods for the Statistics of Surprise and Coincidence,

Computational Linguistics vol 19 no. 1 (1993)

Page 20: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 20

Collection of Documents: Insert Meta-Data

Search Engine

Item

meta-data

Document for

“puppy”

id: t4

title: puppy

desc: The sweetest little puppy

ever.

keywords: puppy, dog, pet

Ingest easily via NFS

✔indicators: (t1)

Page 21: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 22

Cooccurrence Mechanics

• Cooccurrence is just a self-join

for each user, i

for each history item j1 in Ai*for each history item j2 in Ai*

count pair (j1, j2)

aij1aij2 = ¢A Ai

å

Page 22: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 23

Cross-occurrence Mechanics

• Cross occurrence is just a self-join of adjoined matrices

for each user, i

for each history item j1 in Ai*for each history item j2 in Bi*

count pair (j1, j2)

aij1bij2 = ¢A Bi

å A | B[ ]¢ A | B[ ] =¢A A ¢A B

¢B A ¢B B

é

ëê

ù

ûú

Page 23: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 24

A word about scaling

Page 24: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 25

A few pragmatic tricks

• Downsample all user histories to max length (interaction cut)

– Can be random or most-recent (no apparent effect on accuracy)

– Prolific users are often pathological anyway

– Common limit is 300 items (no apparent effect on accuracy)

• Downsample all items to limit max viewers (frequency limit)

– Can be random or earliest (no apparent effect)

– Ubiquitous items are uninformative

– Common limit is 500 users (no apparent effect)

Schelter, et al. Scalable similarity-based neighborhood methods with MapReduce.

Proceedings of the sixth ACM conference on Recommender systems. 2012

Page 25: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 26

But note!

• Number of pairs for a user history with ki distinct items is ≈ ki2/2

• Average size of user history increases with increasing dataset

– Average may grow more slowly than N (or not!)

– Full cooccurrence cost grows strictly faster than N

– i.e. it just doesn’t scale

• Downsampling interactions places bounds on per user cost

– Cooccurrence with interaction cut is scalable

tµ ki2

i

å Î o(N )

tµ min kmax

2 ,ki2( ) < Nkmax

2

i

å ÎO(N )

Page 26: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 27

0 200 400 600 800 1000

01

23

45

6

Benefit of down−sampling

User Limit

Pa

irs (

x 1

09)

Without down−sampling

Track limit = 1000

500

200

Computed on 48,373,586 pair−wise tr iplesfrom the million song dataset

Page 27: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 28

Batch Scaling in Time Implies Scaling in Space

• Note:

– With frequency limit sampling, max cooccurrence count is small (<1000)

– With interaction cut, total number of non-zero pairs is relatively small

– Entire cooccurrence matrix can be stored in memory in ~10-15 GB

• Specifically:

– With interaction cut, cooccurrence scales in size

– Without interaction cut, cooccurrence does not scale size-wise

¢AA0

ÎO(N )

¢AA0

Î w(N )

Page 28: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 29

Impact of Interaction Cut Downsampling

• Interaction cut allows batch cooccurrence analysis to be O(N) in

time and space

• This is intriguing

– Amortized cost is low

– Could this be extended to an on-line form?

• Incremental matrix factorization is hard

– Could cooccurrence be a key alternative?

• Scaling matters most at scale

– Cooccurrence is very accurate at large scale

– Factorization shows benefits at smaller scales

Page 29: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 30

Online update

Page 30: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 31

Requirements for Online Algorithms

• Each unit of input must require O(1) work

– Theoretical bound

• The constants have to be small enough on average

– Pragmatic constraint

• Total accumulated data must be small (enough)

– Pragmatic constraint

Page 31: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 32

Log Files

Search

TechnologyItem

Meta-Data

via

NFS

MapR Cluster

via

NFSPostPre

Recommendations

New User

HistoryWeb

Tier

Recommendations

happen in real-time

Batch co-

occurrence

Want this to be real-time

Real-time recommendations using MapR data platform

Page 32: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 33

Space Bound Implies Time Bound

• Because user histories are pruned, only a limited number of

value updates need be made with each new observation

• This bound is just twice the interaction cut kmax

– Which is a constant

• Bounding the number of updates trivially bounds the time

Page 33: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 34

Implications for Online Update

A+ eij( )¢ A+ eij( ) - ¢AA = ¢eijA+ ¢A eij +d jj

=

0

Ai*

0

é

ë

êêê

ù

û

úúú

+ 0 Ai*( )¢ 0é

ëê

ù

ûú+d jj

Page 34: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 35

Ai* + Ai*( )¢ +dii0

£ 2kmax -1ÎO(1)

With interaction cut at kmax

Page 35: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 36

But Wait, It Gets Better

• The new observation may be pruned

– For users at the interaction cut, we can ignore updates

– For items at the frequency cut, we can ignore updates

– Ignoring updates only affects indicators, not recommendation query

– At million song dataset size, half of all updates are pruned

• On average ki is much less than the interaction cut

– For million song dataset, average appears to grow with log of frequency

limit, with little dependency on values of interaction cut > 200

• LLR cutoff avoids almost all updates to counting

• Average grows slowly with frequency cut

Page 36: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 37

0 200 400 600 800 1000

05

10

15

20

25

30

35

Interaction cut (kmax)

ka

ve

Frequency cut = 1000

= 500

= 200

Page 37: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 38

0 200 400 600 800 1000

05

10

15

20

25

30

35

Frequency cut

ka

ve

Page 38: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 39

Recap

• Cooccurrence-based recommendations are simple

– Deploying with a search engine is even better

• Interaction cut and frequency cut are key to batch scalability

• Similar effect occurs in online form of updates

– Only dozens of updates per transaction needed

– Data structure required is relatively small

– Very, very few updates cause search engine updates

• Fully online recommendation is very feasible, almost easy

Page 39: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 40

More Details Available

available for free athttp://www.mapr.com/practical-machine-learning

available for free at

http://www.mapr.com/practical-machine-learning

Page 40: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 41

Who I am

Ted Dunning, Chief Applications Architect, MapR Technologies

Email [email protected] [email protected]

Twitter @Ted_Dunning

Apache Mahout https://mahout.apache.org/

Twitter @ApacheMahout

Apache Drill http://incubator.apache.org/drill/

Twitter @ApacheDrill

Page 41: Ted Dunning, Chief Application Architect, MapR at MLconf SF

© 2014 MapR Technologies 42

Q & A

@mapr maprtech

[email protected]

Engage with us!

MapR

maprtech

mapr-technologies