scott triglia, mlconf 2013
DESCRIPTION
Scott Triglia, Search and Data Mining Engineer at YelpTRANSCRIPT
Starting Recommendations
Scott Triglia
Goals
● Show our thought process
● Expose some useful questions
● Practical solutions for new teams
Disclaimer
This is a case study, not a prescription
Yelp 101
Our Topic
Our Goal: Interesting businesses relevant to you right now
Before
Decision Time
Brainstorming what matters with your ace team of devs
Context matters
Context matters
Don’t be shy
Interesting reasons are half the point!
Know your Team
The organizational context matters too:
● We have very little infrastructure to support large-scale ML
● Must scale to all Yelp data+users on day 1.
● Our team is small, and will be so for a while
● This is a first version of a (hopefully!) long lived product
So what do we build?
Guiding Principles
1) We know we need to solve a retrieval problem
Guiding Principles
2) Build for what you have, plan for expansion
Guiding Principles
3) We need to build a good product, not beat a benchmark
The Big Picture
API Request
ExpertsExpertsExpertsFinal
Results
Elastic Search
General flow:
1. Gather sufficient contextual information
2. Consult each expert for their top candidates
3. Wisely combine suggestions from each expert
Building the request
API Request
ExpertsExpertsExpertsFinal
Results
Elastic Search
● From client: location, user_id
● Derived context
● Neighborhood preferences
● User preferences
● Time preferences
Expert Opinions
API Request
ExpertsExpertsExpertsFinal
Results
Elastic Search
Each expert handles a single reason and knows its own requirements.
For example, a LikedByFriends expert would only return candidate businesses which one of the user’s friends had rated highly.
Expert Opinions
API Request
ExpertsExpertsExpertsFinal
Results
Elastic Search
Liked By Friends Expert:General Requirements:
Open NowSufficiently Nearby
Expert Requirements:At least one friend gave it 5 stars
Expert Opinions
API Request
ExpertsExpertsExpertsFinal
Results
Elastic Search
Why an expert-based system?
● Think in terms of small, isolated components
● Implementation agnostic
● Adding, removing experts is trivial
Efficient Search
API Request
ExpertsExpertsExpertsFinal
Results
Elastic Search
What do we need from our datastore?
● Fast geographic filtering
● Simple but efficient sorting
● All of this happening in 100ms
Final Decisions
API Request
ExpertsExpertsExpertsFinal
Results
Elastic Search
How to combine expert results? We need to factor in:
● Must balance preferences (distance, rating, category)
● Should prefer better reasons when possible
● Sufficiently high quality candidates makes this very safe
Get to the point already!
Get to the point already!
Get to the point already!
Get to the point already!
Get to the point already!
Final Version
Extension
Now that we’re iterating, what are our future plans?
● Richer context (user, location, etc.)
● Infrastructure support for faster ML prototyping
● Better personalized ranking
● Training data!
Summary
So what are the takeaways for building a first recommender system?
● Solve your problem, not someone else’s
● Being cutting edge may not be the top priority
● Build for the tools you have, plan for what will come
● Good software engineering enables quality ML