search quality at linkedin

71
Recruiting Solutions Abhimanyu Lad Satya Kanduri Senior Software Engineer Senior Software Engineer 1 Abhi Satya Search Quality at LinkedIn

Upload: daniel-tunkelang

Post on 08-Sep-2014

5.673 views

Category:

Technology


0 download

DESCRIPTION

Presented to the Bay Area Search Meetup on February 26, 2014 http://www.meetup.com/Bay-Area-Search/events/136150622/ At LinkedIn, we face a number of challenges in delivering high quality search results to 277M+ members. Our results are highly personalized, requiring us to build machine-learned relevance models that combine document, query, and user features. And our emphasis on entities (names, companies, job titles, etc.) affects how we process and understand queries. In this talk, we'll talk about these challenges in detail, and we'll describe some of the solutions we are building to address them. Speakers: Satya Kanduri has worked on LinkedIn search relevance since 2011. Most recently he led the development of LinkedIn's machine-learned ranking platform. He previously worked at Microsoft, improving relevance for Bing Product Search. He has an MS in Computer Science from the University of Nebraska - Lincoln, and a BE in Computer Science from the Osmania University College of Engineering. Abhimanyu Lad has worked at LinkedIn as a software engineer and data scientist since 2011. He has worked on a variety of relevance and query understanding problems, including query intent prediction, query suggestion, and spelling correction. He has a PhD in Computer Science from CMU, where he worked on developing machine learning techniques for diversifying search results.

TRANSCRIPT

Page 1: Search Quality at LinkedIn

Recruiting SolutionsRecruiting SolutionsRecruiting Solutions

Abhimanyu LadSatya KanduriSenior Software Engineer Senior Software Engineer 1

Abhi Satya

Search Quality at LinkedIn

Page 2: Search Quality at LinkedIn

2

tag: skill OR titlerelated skills: search, ranking, …

tag: companyid: 1337industry: internet

verticals:people, jobs

intent: exploratory

Page 3: Search Quality at LinkedIn

3

SEARCH USE CASES

How do people use LinkedIn’s search?

Page 4: Search Quality at LinkedIn

4

PEOPLE SEARCH

Search for people by name

Page 5: Search Quality at LinkedIn

5

PEOPLE SEARCH

Search for people by other attributes

Page 6: Search Quality at LinkedIn

6

EXPLORATORY PEOPLE SEARCH

Page 7: Search Quality at LinkedIn

7

JOB SEARCH

Page 8: Search Quality at LinkedIn

8

COMPANY SEARCH

Page 9: Search Quality at LinkedIn

9

AND MUCH MORE…

Page 10: Search Quality at LinkedIn

10

OUR GOAL

Universal Search– Single search box

High Recall– Spelling correction, synonym expansion, …

High Precision– Entity-oriented search: match things, not strings

Page 11: Search Quality at LinkedIn

11

QUERY UNDERSTANDINGPIPELINE

Page 12: Search Quality at LinkedIn

12

QUERY UNDERSTANDING PIPELINE

Spellcheck

Query Tagging

Vertical Intent Prediction

Query Expansion

Raw query

Structured query+

Annotations

Page 13: Search Quality at LinkedIn

13

QUERY UNDERSTANDING PIPELINE

Spellcheck

Query Tagging

Vertical Intent Prediction

Query Expansion

Raw query

Structured query+

Annotations

Page 14: Search Quality at LinkedIn

14

SPELLING CORRECTION

Fix obvious typos

Help users spell names

Page 15: Search Quality at LinkedIn

15

SPELLING OUT THE DETAILS

PEOPLE NAMESCOMPANIES

TITLES

PAST QUERIES

N-gramsmarissa => ma ar ri is ss sa

Metaphonemark/marc => MRK

Co-occurrence countsmarissa:mayer = 1000

marisa meyer yahoo

marissa

marisa

meyer

mayer

yahoo

Page 16: Search Quality at LinkedIn

16

SPELLING OUT THE DETAILS

PROBLEM: Corpus as well as query logs contain many spelling errors

Certain spelling errors are quite frequent

While genuine words (especially names) might be infrequent

Page 17: Search Quality at LinkedIn

17

SPELLING OUT THE DETAILS

PROBLEM: Corpus as well as query logs contain many spelling errors

SOLUTION: Use query chains to infer correct spelling

[product manger] [product manager] CLICK

[marissa mayer] CLICK

Page 18: Search Quality at LinkedIn

18

QUERY UNDERSTANDING PIPELINE

Spellcheck

Query Tagging

Vertical Intent Prediction

Query Expansion

Raw query

Structured query+

Annotations

Page 19: Search Quality at LinkedIn

19

QUERY TAGGING IDENTIFYING ENTITIES IN THE QUERY

TITLE CO GEO

TITLE-237software engineersoftware developer

programmer…

CO-1441Google Inc.

Industry: Internet

GEO-7583Country: US

Lat: 42.3482 NLong: 75.1890 W

(RECOGNIZED TAGS: NAME, TITLE, COMPANY, SCHOOL, GEO, SKILL )

Page 20: Search Quality at LinkedIn

20

QUERY TAGGING IDENTIFYING ENTITIES IN THE QUERY

TITLE CO GEO

MORE PRECISE MATCHING WITH DOCUMENTS

Page 21: Search Quality at LinkedIn

21

ENTITY-BASED FILTERING

BEFORE

Page 22: Search Quality at LinkedIn

22

AFTER

ENTITY-BASED FILTERING

BEFORE

Page 23: Search Quality at LinkedIn

23

BEFORE

ENTITY-BASED FILTERING

Page 24: Search Quality at LinkedIn

24

AFTER

ENTITY-BASED FILTERING

BEFORE

Page 25: Search Quality at LinkedIn

25

ENTITY-BASED SUGGESTIONS

Page 26: Search Quality at LinkedIn

26

ENTITY-BASED SUGGESTIONS

Page 27: Search Quality at LinkedIn

27

QUERY TAGGING : SEQUENTIAL MODEL

EMISSION PROBABILITIES(Learned from user profiles)

TRANSITION PROBABILITIES(Learned from query logs)

TRAINING

Page 28: Search Quality at LinkedIn

28

QUERY TAGGING : SEQUENTIAL MODEL

INFERENCE

Given a query, find the most likely sequence of tags

Page 29: Search Quality at LinkedIn

29

QUERY UNDERSTANDING PIPELINE

Spellcheck

Query Tagging

Vertical Intent Prediction

Query Expansion

Raw query

Structured query+

Annotations

Page 30: Search Quality at LinkedIn

30

VERTICAL INTENT PREDICTION

JOBS

PEOPLE

COMPANIES

(Probability distribution over verticals)

Page 31: Search Quality at LinkedIn

31

VERTICAL INTENT PREDICTION : SIGNALS

[Company]

1. Past query counts in each vertical + Query tags

2. Personalization: User’s search history

[Employees]

[Jobs]

[Name Search]

(TAG:COMPANY) (TAG:NAME)

Page 32: Search Quality at LinkedIn

32

QUERY UNDERSTANDING PIPELINE

Spellcheck

Query Tagging

Vertical Intent Prediction

Query Expansion

Raw query

Structured query+

Annotations

Page 33: Search Quality at LinkedIn

33

QUERY EXPANSION

GOAL: Improve recall through synonym expansion

Page 34: Search Quality at LinkedIn

34

QUERY EXPANSION : NAME SYNONYMS

Page 35: Search Quality at LinkedIn

35

QUERY EXPANSION : JOB TITLE SYNONYMS

Page 36: Search Quality at LinkedIn

36

QUERY EXPANSION : SIGNALS

[jon] [jonathan] CLICK

Trained using query chains:

[programmer] [developer] CLICK

Symmetric but not transitive!

[francis] ⇔ [frank][franklin] ⇔ [frank]

[francis] ≠ [franklin]

[software engineer] [software developer] CLICK

Context based!

[software engineer] => [software developer]

[civil engineer] ≠ [civil developer]

Page 37: Search Quality at LinkedIn

37

QUERY UNDERSTANDING PIPELINE

Spellcheck

Query Tagging

Vertical Intent Prediction

Query Expansion

Raw query

Structured query+

Annotations

Page 38: Search Quality at LinkedIn

38

QUERY UNDERSTANDING: SUMMARY

High degree of structure in queries as well as corpus(user profiles, job postings, companies, …)

Query understanding allows us to optimally balance recall and precision by supporting entity-oriented search

Query tagging and query log analysis play a big role in query understanding

Page 39: Search Quality at LinkedIn

39

ranking

Page 40: Search Quality at LinkedIn

WHAT’S IN A NAME QUERY?

Page 41: Search Quality at LinkedIn

kevin scott

BUT NAMES CAN BE AMBIGUOUS

Page 42: Search Quality at LinkedIn

SEARCHING FOR A COMPANY’S EMPLOYEES

Page 43: Search Quality at LinkedIn

SEARCHING FOR PEOPLE WITH A SKILL

Page 44: Search Quality at LinkedIn

RANKING IS COMPLICATED

Seemingly similar queries require dissimilar scoring functions

Personalization matters– Multiple dimensions to personalize on– Dimensions vary with query class

Page 45: Search Quality at LinkedIn

Model

Page 46: Search Quality at LinkedIn

TRAINING

Documents for training

Features

Human evaluation

Labels

Machine learning model

Page 47: Search Quality at LinkedIn

TRAINING

Documents for training

Features

Human evaluation

Labels

Machine learning model

Page 48: Search Quality at LinkedIn

ASSESSING RELEVANCE

Page 49: Search Quality at LinkedIn

RELEVANCE DEPENDS ON WHO’S SEARCHING

What if the searcher is a job seeker?

Or a recruiter?

Or…

Page 50: Search Quality at LinkedIn

THE QUERY IS NOT ENOUGH

Page 51: Search Quality at LinkedIn

WE NEED USER FEATURES

Non-personalized relevance model:score = f(Document | Query)

Personalized relevance model:score = f(Document | Query, User)

Page 52: Search Quality at LinkedIn

COLLECTING RELEVANCE JUDGMENTS WON’T SCALE

Page 53: Search Quality at LinkedIn

TRAINING

Documents for training

Features

Human evaluation

Search logs

Labels

Machine learning model

Page 54: Search Quality at LinkedIn

CLICKS AS TRAINING DATAApproach: Clicked = Relevant, Not-Clicked = Not Relevant

Page 55: Search Quality at LinkedIn

CLICKS AS TRAINING DATAApproach: Clicked = Relevant, Not-Clicked = Not Relevant

Page 56: Search Quality at LinkedIn

CLICKS AS TRAINING DATAApproach: Clicked = Relevant, Not-Clicked = Not Relevant

Page 57: Search Quality at LinkedIn

CLICKS AS TRAINING DATA

Unfairly penalized?

Good results not seen are marked Not Relevant.

Approach: Clicked = Relevant, Not-Clicked = Not Relevant

User eye scan direction

Page 58: Search Quality at LinkedIn

CLICKS AS TRAINING DATAApproach: Clicked = Relevant, Skipped = Not Relevant

• Only penalize results that the user has seen but ignored

Page 59: Search Quality at LinkedIn

CLICKS AS TRAINING DATAApproach: Clicked = Relevant, Skipped = Not Relevant

• Only penalize results that the user has seen but ignored• Risks inverting model by overweighing low-ranked results

Page 60: Search Quality at LinkedIn

FAIR PAIRS

[Radlinski and Joachims, AAAI’06]

• Fair Pairs: • Randomize, Clicked= R,

Skipped= NR

Page 61: Search Quality at LinkedIn

FAIR PAIRS

Flipped

[Radlinski and Joachims, AAAI’06]

• Fair Pairs: • Randomize, Clicked= R,

Skipped= NR

Page 62: Search Quality at LinkedIn

FAIR PAIRS

Flipped

[Radlinski and Joachims, AAAI’06]

• Fair Pairs: • Randomize, Clicked= R,

Skipped= NR• Great at dealing with position bias• Does not invert models

Page 63: Search Quality at LinkedIn

EASY NEGATIVES

Page 1

Page 99

• Assumption: A decent current model would push out bad results to the very end.

• Easy Negatives: Some of the results at the end are picked up as negative examples

Page 64: Search Quality at LinkedIn

EASY NEGATIVES

• Use strategies that sample across the feature space• Searches with less results preferred• Always sample from a given page, say page 10

2 pages90+ pages

Page 65: Search Quality at LinkedIn

PUTTING IT ALL TOGETHER

Human evaluation is not practical for personalized searches

Learn from user behavior– Multiple heuristics depending on the need– Different pros and cons

Page 66: Search Quality at LinkedIn

66

EFFICIENCY VS EXPRESSIVENESS Build tree with logistic regression leaves. By restricting decision nodes to (Query, User) segments,

only one regression model can be evaluated for each document.

X 2=0

X2=?

X2=1

X4?

X 4=0

X4=1

Page 67: Search Quality at LinkedIn

SCORING

New document

Features

Machine

learning model

scoreNew

document

Features

Machine

learning model

scoreNew

document

Features

Machine learning model

score

Ordered listOrdered

listOrdered list

Page 68: Search Quality at LinkedIn

68

A SIMPLIFIED EXAMPLE

Yes

Name Query?

No

Skill Query?

Yes

No

Page 69: Search Quality at LinkedIn

69

TEST, TEST, TEST

a

b

c

d

g

h

b

e

a

f

g

h

Model 1 Model 2

a

b

c

e

d

f

Interleaved

[Radlinski et al., CIKM 2008]

Interleaving

Page 70: Search Quality at LinkedIn

SUMMARY

Query understanding leverages the rich structure of LinkedIn’s content and information needs.

Query tagging and rewriting allows us to deliver precision and recall.

For ranking, personalization is both the biggest challenge and the core of our solution.

Segmenting relevance models by query type helps us efficiently address the diversity of search needs.

Page 71: Search Quality at LinkedIn

71

Abhimanyu Lad Satya [email protected] [email protected]://linkedin.com/in/abhilad https://linkedin.com/in/skanduri