related searches at linkedin - mitul...

35
Related Searches at LinkedIn Mitul Tiwari Joint work with Azarias Reda, Yubin Park, Christian Posse, and Sam Shah LinkedIn 1 Tuesday, August 21, 2012

Upload: others

Post on 25-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Related Searches at LinkedIn

Mitul TiwariJoint work with Azarias Reda, Yubin Park, Christian

Posse, and Sam ShahLinkedIn

1

Tuesday, August 21, 2012

Page 2: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Who am I

2

Tuesday, August 21, 2012

Page 3: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Outline

• About LinkedIn

• Related Searches

‣ Design

‣ Implementation

‣ Evaluation

3

Tuesday, August 21, 2012

Page 4: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

LinkedIn by the numbers

• 175M+ members

• 2+ new user registrations per second

• 4.2 Billion people searches in 2011

• 9.3 Billion page views in Q2 2012

• 100+ million monthly active users in Q2 2012

4

Tuesday, August 21, 2012

Page 5: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Broad Range of Products

Profile Search Hiring Solutions

People You May KnowSkills

News

Tuesday, August 21, 2012

Page 6: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Related Searches at LinkedIn

• Millions of searches everyday

• Goal: Build related searches system at LinkedIn

• To help users to explore and refine their queries

6

Tuesday, August 21, 2012

Page 7: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Related Searches at LinkedIn

Tuesday, August 21, 2012

Page 8: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Related Searches

• Design

• Implementation

• Evaluation

8

Tuesday, August 21, 2012

Page 9: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Related Searches

• Design

• Implementation

• Evaluation

9

Tuesday, August 21, 2012

Page 10: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Design

10

Tuesday, August 21, 2012

Page 11: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Design

• Signals

‣ Collaborative Filtering

‣ Query-Result Click graph

‣ Overlapping terms

10

Tuesday, August 21, 2012

Page 12: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Design

• Signals

‣ Collaborative Filtering

‣ Query-Result Click graph

‣ Overlapping terms

• Length-bias

10

Tuesday, August 21, 2012

Page 13: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Design

• Signals

‣ Collaborative Filtering

‣ Query-Result Click graph

‣ Overlapping terms

• Length-bias

• Ensemble approach for unified recommendation

10

Tuesday, August 21, 2012

Page 14: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Design

• Signals

‣ Collaborative Filtering

‣ Query-Result Click graph

‣ Overlapping terms

• Length-bias

• Ensemble approach for unified recommendation

• Practical considerations

10

Tuesday, August 21, 2012

Page 15: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Design: Collaborative Filtering

• Searches correlated by time

‣ Searches done in the same session by the same user

‣ Collaborative filtering: implicit feedback

‣ TFIDF scoring to take care of popular queries (e.g. `Obama’)

Q1 Q2 Q3 Q4

Time

11

Tuesday, August 21, 2012

Page 16: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Design: Query-Result Clicks

• Searches correlated by result clicks

Q1

Qn

R1

Rm

12

Tuesday, August 21, 2012

Page 17: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Design: Overlapping Terms

• Searches with overlapping terms

‣ TFIDF scoring to give importance to terms

Software Developer

Software Engineer

Q1

Q2

13

Tuesday, August 21, 2012

Page 18: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Design: Length Bias• Insight: clicks on suggestions one term longer

14

Tuesday, August 21, 2012

Page 19: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Design: Length Bias

• Insight: clicks on suggestions one term longer

• Corresponds to refining the initial query

• Statistical biasing model to score a longer query higher

Tuesday, August 21, 2012

Page 20: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Design: Length Bias

• Insight: clicks on suggestions one term longer

• Corresponds to refining the initial query

• Statistical biasing model to score a longer query higher

Tuesday, August 21, 2012

Page 21: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Design: Ensemble Approach

• Need to generate unified recommendation dataset

• Analysis to figure out engagement of each signal

• Attempted ML approach

‣ Minimal overlap across different signals

16

Tuesday, August 21, 2012

Page 22: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Design: Ensemble Approach

• Step-wise unionization

• Importance based on individual signal performance

‣ First, collaborative filter

‣ Second, queries correlated by query-result clicks

‣ Third, queries overlapping terms

17

Tuesday, August 21, 2012

Page 23: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Design: Practical Considerations

• System designed for public consumption

‣ Strong profanity filters

‣ Need to deal with misspellings

‣ Languages

‣ Remove spammy search queries

18

Tuesday, August 21, 2012

Page 24: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Related Searches

• Design

• Implementation

• Evaluation

19

Tuesday, August 21, 2012

Page 25: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Implementation Challenge

• Scale

‣ 175M+ members

‣ Billions of searches

‣ Terabytes of data to process

20

Tuesday, August 21, 2012

Page 26: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Implementation

• Kafka: publish-subscribe messaging system

• Hadoop: MapReduce data processing system

• Azkaban: Hadoop workflow management tool

• Voldemort: Key-value store

Metaphor

Hadoop

Search Backend

Kafka

VoldemortRelated

SearchesBackend

FrontEnd

HDFS

Tuesday, August 21, 2012

Page 27: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Implementation: WorkflowQuery Pre-processing

Pair Formation

CF Scoring

Graph Formation

QRQ Scoring

Token Identification

Partial Scoring

Step-wise Union

Length Bias Filter

Step-wise Union

Tuesday, August 21, 2012

Page 28: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Related Searches

• Design

• Implementation

• Evaluation

23

Tuesday, August 21, 2012

Page 29: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Evaluation

• Performance of each signal and combination

• How does the system scale?

24

Tuesday, August 21, 2012

Page 30: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Evaluation Cont’d

• Offline evaluation

‣ Precision-Recall

• Online evaluation

‣ A/B testing to measure engagement

‣ Performance evaluation

25

Tuesday, August 21, 2012

Page 31: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Offline Evaluation

• Correct set: set of searches performed by a user in the following K minutes, here K=10

26

Tuesday, August 21, 2012

Page 32: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Online Evaluation

• Used A/B testing

• Metrics

‣ Coverage: queries with recommendations

‣ Impressions: # of recommendations shown

‣ Clicks: Clicks on recommendations

‣ Click-through rate (CTR): Clicks per impression

27

Tuesday, August 21, 2012

Page 33: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Online Evaluation

Tuesday, August 21, 2012

Page 34: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Evaluation: System Runtime

29

Tuesday, August 21, 2012

Page 35: Related Searches at LinkedIn - Mitul Tiwari54.89.233.243/docs/presentations/related_searches_sigir.pdfImplementation ¥ Kafka: publish-subscribe messaging system ¥ Hadoop: MapReduce

Details

• Metaphor: a System for Related Search Recommendations, Azarias Reda, Yubin Park, Mitul Tiwari, Christian Posse, and Sam Shah. In Proceedings of the CIKM, 2012.

30

Tuesday, August 21, 2012