search quality at linkedin

Recruiting SolutionsRecruiting SolutionsRecruiting Solutions

Abhimanyu LadSatya KanduriSenior Software Engineer Senior Software Engineer 1

Abhi Satya

Search Quality at LinkedIn

tag: skill OR titlerelated skills: search, ranking, …

tag: companyid: 1337industry: internet

verticals:people, jobs

intent: exploratory

SEARCH USE CASES

How do people use LinkedIn’s search?

PEOPLE SEARCH

Search for people by name

PEOPLE SEARCH

Search for people by other attributes

EXPLORATORY PEOPLE SEARCH

JOB SEARCH

COMPANY SEARCH

AND MUCH MORE…

OUR GOAL

Universal Search– Single search box

High Recall– Spelling correction, synonym expansion, …

High Precision– Entity-oriented search: match things, not strings

QUERY UNDERSTANDINGPIPELINE

QUERY UNDERSTANDING PIPELINE

Spellcheck

Query Tagging

Vertical Intent Prediction

Query Expansion

Raw query

Structured query+

Annotations

Spellcheck

Query Tagging

Query Expansion

Raw query

Structured query+

Annotations

SPELLING CORRECTION

Fix obvious typos

Help users spell names

SPELLING OUT THE DETAILS

PEOPLE NAMESCOMPANIES

TITLES

PAST QUERIES

N-gramsmarissa => ma ar ri is ss sa

Metaphonemark/marc => MRK

Co-occurrence countsmarissa:mayer = 1000

marisa meyer yahoo

marissa

marisa

PROBLEM: Corpus as well as query logs contain many spelling errors

Certain spelling errors are quite frequent

While genuine words (especially names) might be infrequent

PROBLEM: Corpus as well as query logs contain many spelling errors

SOLUTION: Use query chains to infer correct spelling

[product manger] [product manager] CLICK

[marissa mayer] CLICK

Spellcheck

Query Tagging

Query Expansion

Raw query

Structured query+

Annotations

QUERY TAGGING IDENTIFYING ENTITIES IN THE QUERY

TITLE CO GEO

TITLE-237software engineersoftware developer

programmer…

CO-1441Google Inc.

Industry: Internet

GEO-7583Country: US

Lat: 42.3482 NLong: 75.1890 W

(RECOGNIZED TAGS: NAME, TITLE, COMPANY, SCHOOL, GEO, SKILL )

QUERY TAGGING IDENTIFYING ENTITIES IN THE QUERY

TITLE CO GEO

MORE PRECISE MATCHING WITH DOCUMENTS

ENTITY-BASED FILTERING

BEFORE

ENTITY-BASED SUGGESTIONS

QUERY TAGGING : SEQUENTIAL MODEL

EMISSION PROBABILITIES(Learned from user profiles)

TRANSITION PROBABILITIES(Learned from query logs)

TRAINING

QUERY TAGGING : SEQUENTIAL MODEL

INFERENCE

Given a query, find the most likely sequence of tags

Spellcheck

Query Tagging

Query Expansion

Raw query

Structured query+

Annotations

VERTICAL INTENT PREDICTION

PEOPLE

COMPANIES

(Probability distribution over verticals)

VERTICAL INTENT PREDICTION : SIGNALS

[Company]

1. Past query counts in each vertical + Query tags

2. Personalization: User’s search history

[Employees]

[Jobs]

[Name Search]

(TAG:COMPANY) (TAG:NAME)

Spellcheck

Query Tagging

Query Expansion

Raw query

Structured query+

Annotations

QUERY EXPANSION

GOAL: Improve recall through synonym expansion

QUERY EXPANSION : NAME SYNONYMS

QUERY EXPANSION : JOB TITLE SYNONYMS

QUERY EXPANSION : SIGNALS

[jon] [jonathan] CLICK

Trained using query chains:

[programmer] [developer] CLICK

Symmetric but not transitive!

[francis] ⇔ [frank][franklin] ⇔ [frank]

[francis] ≠ [franklin]

[software engineer] [software developer] CLICK

Context based!

[software engineer] => [software developer]

[civil engineer] ≠ [civil developer]

Spellcheck

Query Tagging

Query Expansion

Raw query

Structured query+

Annotations

QUERY UNDERSTANDING: SUMMARY

High degree of structure in queries as well as corpus(user profiles, job postings, companies, …)

Query understanding allows us to optimally balance recall and precision by supporting entity-oriented search

Query tagging and query log analysis play a big role in query understanding

ranking

WHAT’S IN A NAME QUERY?

kevin scott

BUT NAMES CAN BE AMBIGUOUS

SEARCHING FOR A COMPANY’S EMPLOYEES

SEARCHING FOR PEOPLE WITH A SKILL

RANKING IS COMPLICATED

Seemingly similar queries require dissimilar scoring functions

Personalization matters– Multiple dimensions to personalize on– Dimensions vary with query class

TRAINING

Documents for training

Features

Human evaluation

Labels

Machine learning model

TRAINING

Features

Human evaluation

Labels

ASSESSING RELEVANCE

RELEVANCE DEPENDS ON WHO’S SEARCHING

What if the searcher is a job seeker?

Or a recruiter?

THE QUERY IS NOT ENOUGH

WE NEED USER FEATURES

Non-personalized relevance model:score = f(Document | Query)

Personalized relevance model:score = f(Document | Query, User)

COLLECTING RELEVANCE JUDGMENTS WON’T SCALE

TRAINING

Features

Human evaluation

Search logs

Labels

CLICKS AS TRAINING DATAApproach: Clicked = Relevant, Not-Clicked = Not Relevant

CLICKS AS TRAINING DATA

Unfairly penalized?

Good results not seen are marked Not Relevant.

Approach: Clicked = Relevant, Not-Clicked = Not Relevant

User eye scan direction

CLICKS AS TRAINING DATAApproach: Clicked = Relevant, Skipped = Not Relevant

• Only penalize results that the user has seen but ignored

CLICKS AS TRAINING DATAApproach: Clicked = Relevant, Skipped = Not Relevant

• Only penalize results that the user has seen but ignored• Risks inverting model by overweighing low-ranked results

FAIR PAIRS

[Radlinski and Joachims, AAAI’06]

• Fair Pairs: • Randomize, Clicked= R,

Skipped= NR

FAIR PAIRS

Flipped

Skipped= NR

FAIR PAIRS

Flipped

Skipped= NR• Great at dealing with position bias• Does not invert models

EASY NEGATIVES

• Assumption: A decent current model would push out bad results to the very end.

• Easy Negatives: Some of the results at the end are picked up as negative examples

EASY NEGATIVES

• Use strategies that sample across the feature space• Searches with less results preferred• Always sample from a given page, say page 10

2 pages90+ pages

PUTTING IT ALL TOGETHER

Human evaluation is not practical for personalized searches

Learn from user behavior– Multiple heuristics depending on the need– Different pros and cons

EFFICIENCY VS EXPRESSIVENESS Build tree with logistic regression leaves. By restricting decision nodes to (Query, User) segments,

only one regression model can be evaluated for each document.

SCORING

New document

Features

Machine

learning model

scoreNew

document

Features

Machine

learning model

scoreNew

document

Features

Ordered listOrdered

listOrdered list

A SIMPLIFIED EXAMPLE

Name Query?

Skill Query?

TEST, TEST, TEST

Model 1 Model 2

Interleaved

[Radlinski et al., CIKM 2008]

Interleaving

SUMMARY

Query understanding leverages the rich structure of LinkedIn’s content and information needs.

Query tagging and rewriting allows us to deliver precision and recall.

For ranking, personalization is both the biggest challenge and the core of our solution.

Segmenting relevance models by query type helps us efficiently address the diversity of search needs.

Abhimanyu Lad Satya Kandurialad@linkedin.com skanduri@linkedin.comhttps://linkedin.com/in/abhilad https://linkedin.com/in/skanduri

search quality at linkedin

personalized

training data

spelling errors

anq

based filtering

software engineer

penalize results

query chains

Technology

the 2016 linkedin job search guide

linkedin: leveraging job-search 2.0 tools

linkedin job search fundamentals part 1

linkedin the ultimate job search tool

linkedin for career and job search

auroin - smart linkedin search

linkedin features for job search (2011)

powerful linkedin resources for your job search

linkedin frisco connect career search network

leveraging linkedin for job search

active search on linkedin - 2015

lessons from redesigning linkedin search

leveraging linkedin for the job search

[in]formation retrieval: search at linkedin

machine learning for search at linkedin

linkedin job search basics

how to search on linkedin

s.m.a.r.t job search strategies for linkedin

supercharge your job search with linkedin

new linkedin sales navigator search filters