search quality at linkedin

Recruiting SolutionsRecruiting SolutionsRecruiting Solutions

Abhimanyu LadSatya KanduriSenior Software Engineer Senior Software Engineer 1

Abhi Satya

Search Quality at LinkedIn

2

tag: skill OR titlerelated skills: search, ranking, …

tag: companyid: 1337industry: internet

verticals:people, jobs

intent: exploratory

3

SEARCH USE CASES

How do people use LinkedIn’s search?

4

PEOPLE SEARCH

Search for people by name

5

PEOPLE SEARCH

Search for people by other attributes

6

EXPLORATORY PEOPLE SEARCH

7

JOB SEARCH

8

COMPANY SEARCH

9

AND MUCH MORE…

10

OUR GOAL

Universal Search– Single search box

High Recall– Spelling correction, synonym expansion, …

High Precision– Entity-oriented search: match things, not strings

11

QUERY UNDERSTANDINGPIPELINE

12

QUERY UNDERSTANDING PIPELINE

Spellcheck

Query Tagging

Vertical Intent Prediction

Query Expansion

Raw query

Structured query+

Annotations

13


Spellcheck

Query Tagging


Query Expansion

Raw query

Structured query+

Annotations

14

SPELLING CORRECTION

Fix obvious typos

Help users spell names

15

SPELLING OUT THE DETAILS

PEOPLE NAMESCOMPANIES

TITLES

PAST QUERIES

N-gramsmarissa => ma ar ri is ss sa

Metaphonemark/marc => MRK

Co-occurrence countsmarissa:mayer = 1000

marisa meyer yahoo

marissa

marisa

meyer

mayer

yahoo

16


PROBLEM: Corpus as well as query logs contain many spelling errors

Certain spelling errors are quite frequent

While genuine words (especially names) might be infrequent

17


PROBLEM: Corpus as well as query logs contain many spelling errors

SOLUTION: Use query chains to infer correct spelling

[product manger] [product manager] CLICK

[marissa mayer] CLICK

18


Spellcheck

Query Tagging


Query Expansion

Raw query

Structured query+

Annotations

19

QUERY TAGGING IDENTIFYING ENTITIES IN THE QUERY

TITLE CO GEO

TITLE-237software engineersoftware developer

programmer…

CO-1441Google Inc.

Industry: Internet

GEO-7583Country: US

Lat: 42.3482 NLong: 75.1890 W

(RECOGNIZED TAGS: NAME, TITLE, COMPANY, SCHOOL, GEO, SKILL )

20

QUERY TAGGING IDENTIFYING ENTITIES IN THE QUERY

TITLE CO GEO

MORE PRECISE MATCHING WITH DOCUMENTS

21

ENTITY-BASED FILTERING

BEFORE

22

AFTER


BEFORE

23

BEFORE


24

AFTER


BEFORE

25

ENTITY-BASED SUGGESTIONS

26

ENTITY-BASED SUGGESTIONS

27

QUERY TAGGING : SEQUENTIAL MODEL

EMISSION PROBABILITIES(Learned from user profiles)

TRANSITION PROBABILITIES(Learned from query logs)

TRAINING

28

QUERY TAGGING : SEQUENTIAL MODEL

INFERENCE

Given a query, find the most likely sequence of tags

29


Spellcheck

Query Tagging


Query Expansion

Raw query

Structured query+

Annotations

30

VERTICAL INTENT PREDICTION

JOBS

PEOPLE

COMPANIES

(Probability distribution over verticals)

31

VERTICAL INTENT PREDICTION : SIGNALS

[Company]

1. Past query counts in each vertical + Query tags

2. Personalization: User’s search history

[Employees]

[Jobs]

[Name Search]

(TAG:COMPANY) (TAG:NAME)

32


Spellcheck

Query Tagging


Query Expansion

Raw query

Structured query+

Annotations

33

QUERY EXPANSION

GOAL: Improve recall through synonym expansion

34

QUERY EXPANSION : NAME SYNONYMS

35

QUERY EXPANSION : JOB TITLE SYNONYMS

36

QUERY EXPANSION : SIGNALS

[jon] [jonathan] CLICK

Trained using query chains:

[programmer] [developer] CLICK

Symmetric but not transitive!

[francis] ⇔ [frank][franklin] ⇔ [frank]

[francis] ≠ [franklin]

[software engineer] [software developer] CLICK

Context based!

[software engineer] => [software developer]

[civil engineer] ≠ [civil developer]

37


Spellcheck

Query Tagging


Query Expansion

Raw query

Structured query+

Annotations

38

QUERY UNDERSTANDING: SUMMARY

High degree of structure in queries as well as corpus(user profiles, job postings, companies, …)

Query understanding allows us to optimally balance recall and precision by supporting entity-oriented search

Query tagging and query log analysis play a big role in query understanding

39

ranking

WHAT’S IN A NAME QUERY?

kevin scott

≠

BUT NAMES CAN BE AMBIGUOUS

SEARCHING FOR A COMPANY’S EMPLOYEES

SEARCHING FOR PEOPLE WITH A SKILL

RANKING IS COMPLICATED

Seemingly similar queries require dissimilar scoring functions

Personalization matters– Multiple dimensions to personalize on– Dimensions vary with query class

TRAINING

Documents for training

Features

Human evaluation

Labels

Machine learning model

ASSESSING RELEVANCE

RELEVANCE DEPENDS ON WHO’S SEARCHING

What if the searcher is a job seeker?

Or a recruiter?

Or…

THE QUERY IS NOT ENOUGH

WE NEED USER FEATURES

Non-personalized relevance model:score = f(Document | Query)

Personalized relevance model:score = f(Document | Query, User)

COLLECTING RELEVANCE JUDGMENTS WON’T SCALE

TRAINING

Documents for training

Features

Human evaluation

Search logs

Labels


CLICKS AS TRAINING DATAApproach: Clicked = Relevant, Not-Clicked = Not Relevant

CLICKS AS TRAINING DATA

Unfairly penalized?

Good results not seen are marked Not Relevant.

Approach: Clicked = Relevant, Not-Clicked = Not Relevant

User eye scan direction

CLICKS AS TRAINING DATAApproach: Clicked = Relevant, Skipped = Not Relevant

• Only penalize results that the user has seen but ignored

CLICKS AS TRAINING DATAApproach: Clicked = Relevant, Skipped = Not Relevant

• Only penalize results that the user has seen but ignored• Risks inverting model by overweighing low-ranked results

FAIR PAIRS

[Radlinski and Joachims, AAAI’06]

• Fair Pairs: • Randomize, Clicked= R,

Skipped= NR

FAIR PAIRS

Flipped



Skipped= NR

FAIR PAIRS

Flipped



Skipped= NR• Great at dealing with position bias• Does not invert models

EASY NEGATIVES

Page 1

Page 99

• Assumption: A decent current model would push out bad results to the very end.

• Easy Negatives: Some of the results at the end are picked up as negative examples

EASY NEGATIVES

• Use strategies that sample across the feature space• Searches with less results preferred• Always sample from a given page, say page 10

2 pages90+ pages

PUTTING IT ALL TOGETHER

Human evaluation is not practical for personalized searches

Learn from user behavior– Multiple heuristics depending on the need– Different pros and cons

66

EFFICIENCY VS EXPRESSIVENESS Build tree with logistic regression leaves. By restricting decision nodes to (Query, User) segments,

only one regression model can be evaluated for each document.

X 2=0

X2=?

X2=1

X4?

X 4=0

X4=1

SCORING

New document

Features

Machine

learning model

scoreNew

document

Features

Machine

learning model

scoreNew

document

Features


score

Ordered listOrdered

listOrdered list

68

A SIMPLIFIED EXAMPLE

Yes

Name Query?

No

Skill Query?

Yes

No

69

TEST, TEST, TEST

a

b

c

d

g

h

b

e

a

f

g

h

Model 1 Model 2

a

b

c

e

d

f

Interleaved

[Radlinski et al., CIKM 2008]

Interleaving

SUMMARY

Query understanding leverages the rich structure of LinkedIn’s content and information needs.

Query tagging and rewriting allows us to deliver precision and recall.

For ranking, personalization is both the biggest challenge and the core of our solution.

Segmenting relevance models by query type helps us efficiently address the diversity of search needs.

71

Abhimanyu Lad Satya [email protected] [email protected]://linkedin.com/in/abhilad https://linkedin.com/in/skanduri

mailto:[email protected]

mailto:[email protected]

search quality at linkedin

Technology

personalized

training data

spelling errors

anq

based filtering

software engineer

penalize results

query chains