evan estola – data scientist, meetup.com at mlconf atl

24
Beyond Collaborative Filtering: ML & Recommendations at Meetup Evan Estola Machine Learning Engineer Meetup.com [email protected] @estola

Upload: sessionsevents

Post on 18-Nov-2014

1.054 views

Category:

Technology


2 download

DESCRIPTION

Beyond Collaborative Filtering: using Machine Learning to power recommendations at Meetup Collaborative filtering and other common recommendation algorithms are a powerful technique for some scenarios. I will cover how to design a recommendation system from the ground up using an ensemble classifier and supervised learning to avoid some of the pitfalls of collaborative filtering. From sampling to deployment, we’ve had to invent our approach with few non-academic and non-toy examples to follow. At Meetup we’re all about sharing information and empowering communities, so I’ll present the details of our model as well as some of the new features we are still developing.

TRANSCRIPT

Page 1: Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Beyond Collaborative Filtering: ML &

Recommendations at MeetupEvan Estola

Machine Learning EngineerMeetup.com

[email protected]@estola

Page 2: Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Meetup what are you

Page 3: Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Why Meetup data is cool

● Real people meeting up● Every meetup could change someone's life● No ads, just do the best thing● Oh and >125 million rsvps by 18 million

members● 3 million rsvps in the last 30 days

○ 1/second

Page 4: Evan Estola – Data Scientist, Meetup.com at MLconf ATL
Page 5: Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Tools at Meetup

● Hive - SQL on Hadoop● Spark - Distributed Scala on Hadoop cluster● Scala - Recommendations service● R - Data analysis, Model building● Python - Scripting, Data organizing● Java - Backend of our web stack

Page 6: Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Collaborative Filtering

● Classic recommendations approach● Users who like this also like this

Page 7: Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Weaknesses of CF

● Sparsity● Cold Start● Coverage● Diversity

Page 8: Evan Estola – Data Scientist, Meetup.com at MLconf ATL
Page 9: Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Why Recs at Meetup are hard

● Incomplete Data (topics)● Cold start● Asking user for data is hard● Going to meetups is scary● Sparsity

○ Location○ Low rsvp/person○ Membership: 0.001%○ Compare to Netflix Prize Dataset: 1%

Page 10: Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Supervised Learning/Classification

● “Inferring a function from labeled training data”

● Joined Meetup group/Didn’t join Meetup group

Page 11: Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Preprocessing

● Schenectady● Fake RSVP boosts (+100 guests!)● Outliers● Bucketing● Etc etc

Page 12: Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Problem definition and assumptions

● Assumption: if you’re not in a given group, you don’t want to be○ Negative samples: groups you’re not in○ Also a good classifier...

● Membership << expected error rate○ Solution: sample to 50/50 join/no-join

Page 13: Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Ranking

● Model output label no longer explicitly true○ Luckily, we’d rather rank all of the results anyway

● Use a classifier that gives you a useful output○ Fancy black box○ Logistic Regression

■ Easier to explain

Page 14: Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Meetup what are you

Page 15: Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Ensemble Learning

“... use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms”

Page 16: Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Ensemble Learning

● Topic match (original algorithm)● Collaborative Filtering on Topics● Social algorithm● Other simple features (Popularity, Gender…)

● Add output of algorithms as features into Logistic Regression model

Page 17: Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Logistic Regression Output

● TopicScore 4.14● ExtendedTS 0.47● RelatedTS 0.66● FbFriends 2.02● 2ndFbFriends 0.09● AgeUnmatch -2.40● GenUnmatch -3.37● Distance -0.04● StateMatch 0.54● CountyMatch 0.41● ZipScore 0.06● RsvpScore 0.02

Page 18: Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Facebook Likes

● Lots of information, but how to use?

● Map to topics, let training the model take care of the rest!

Page 19: Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Mapping FB Likes to Meetup Topics

● Text based?○ Go(game) vs Go(lang)?○ Burton?

● Data approach!○ Grab most popular topics across all members with

the same like

Page 20: Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Normalization

● Top topics for Burton-Likers○ Meeting New People, Coffee, bla bla○ Most popular still dominates

● Solution: Normalize based on expected topic occurrence in sample

Page 21: Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Normalization

● For members with a given Like, compare percent with each topic to expected among total population

● Total population○ 20% “Meeting New People”○ 2% “Snowboarding

● Burton:○ 20% “Meeting New People”○ 9% “Snowboarding”

Page 22: Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Results

● Generate top topics for all likes○ Path from member to like to topic to group

● Add Facebook Like based topic match feature to model

● Positive weight○ Very good sign!

● Deploy/Split test○ TBD

Page 23: Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Summary

● Supervised Learning for Ranking as Recommendations is cool

● Simple, interpretable models are cool● Feature engineering is cool

Page 24: Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Thanks!

Smart people come work with me.http://www.meetup.com/jobs/