search ranking across heterogeneous information sources

37
Recruiting Solutions Search Ranking Across Heterogeneous Information Sources Viet Ha-Thuc and Dhruv Arya Search Quality - LinkedIn 1 Heterogeneous Information Access at SIGIR 2016

Upload: viet-ha-thuc

Post on 25-Jan-2017

220 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Search Ranking Across Heterogeneous Information Sources

Recruiting SolutionsRecruiting SolutionsRecruiting Solutions

Search Ranking Across Heterogeneous Information Sources

Viet Ha-Thuc and Dhruv AryaSearch Quality - LinkedIn

1

Heterogeneous Information Access at SIGIR 2016

Page 2: Search Ranking Across Heterogeneous Information Sources

2

• 200+ countries and territories

• 2+ new members per second

Page 3: Search Ranking Across Heterogeneous Information Sources

3

● Dual Roles of Search○ Enable talent discover opportunity○ Help companies to search for the right talent

Page 4: Search Ranking Across Heterogeneous Information Sources

4

FLAGSHIP SEARCH

RECRUITER SEARCH

SALES NAVIGATOR

Page 5: Search Ranking Across Heterogeneous Information Sources

Unique Nature of LinkedIn Search

5

▪Heterogeneous sources–Different entity types: People, jobs, companies,

slideshares–Many use-cases: Hiring, sales, connecting, job

seeking, content discovery–Requires different features, training data and

objectives▪Scale

–400+MM members, 6+MM jobs, 18+MM slideshows▪Federation across the sources

Page 6: Search Ranking Across Heterogeneous Information Sources

Overview

6

Query

Federated SearchSpell CorrectionQuery Tagging

Intent Prediction

People Companies

Federated SearchPage Construction

Name Title Skill

Jobs

Page 7: Search Ranking Across Heterogeneous Information Sources

Overview

7

Query

Federated SearchSpell CorrectionQuery Tagging

Intent Prediction

People Companies

Federated SearchPage Construction

Name Title Skill

Jobs

Page 8: Search Ranking Across Heterogeneous Information Sources

Agenda

▪Introduction

▪Vertical Ranking–Job Search [KDD’16]–People Search by Skills [BigData’15, SIGIR’16]

▪Federation [CIKM’15]

▪Lessons 8

Page 9: Search Ranking Across Heterogeneous Information Sources

Challenges of Job Search

▪“Hidden” structures

▪Query only represents a small fraction of information need–“San Francisco”, “software engineer”, “java”

▪Job attractiveness varies on many aspects–“Hot” titles: “data scientist”–Top companies: Google, Facebook, etc. –Trending skills: machine learning, big data, etc.,–Location

9

Page 10: Search Ranking Across Heterogeneous Information Sources

Entity-Aware Matching

10

Page 11: Search Ranking Across Heterogeneous Information Sources

Expertise Homophily

▪“Classic” homophily in social networks–People tend to interact with similar ones

▪Expertise homophily in job search–Searcher tends to apply for jobs with similar expertise–Apply rate of job results with overlapping skills is 2x higher

▪Expertise:–Jobs: extract skills from job description–Searcher: explicit and implicit skills–Jaccard similarity

11

Page 12: Search Ranking Across Heterogeneous Information Sources

Entity-faceted CTRs

▪Job attractiveness–Historical CTRs for individual jobs

–Challenge: job lifetime is short -> unreliable estimation

▪Entity-faceted historical CTRs–CTRs of jobs with standardized tile “data scientist”–CTRs of jobs from company IBM –CTRs of jobs requiring trending skill: machine learning, big data, etc.

▪Advantages–Alleviate data sparseness by grouping jobs by facets–Resolve cold start problem

12

Page 13: Search Ranking Across Heterogeneous Information Sources

Other features

13

Page 14: Search Ranking Across Heterogeneous Information Sources

Labeling Strategy

▪ Job Applies, Views and Skips are considered

Uncertain (removed)

Skipped: label = 0

Good: label = 1Click

Applied Highest: label = 4

Page 15: Search Ranking Across Heterogeneous Information Sources

Learning to Rank

▪Listwise– Consider relevance is relative to every query– Allow optimizing quality metric directly

▪Objective function– Normalized Discounted Cumulative Gain (NDCG@K)– Graded relevance labels

15

Page 16: Search Ranking Across Heterogeneous Information Sources

Experiment Results

16

▪Baseline–All of the existing features except entity-aware ones–Machine learned–Optimized for the same objective function

CTR Apply RateImprovement +11.3% +5.3%

Page 17: Search Ranking Across Heterogeneous Information Sources

Overview

17

Query

Federated SearchSpell CorrectionQuery Tagging

Intent Prediction

People Companies

Federated SearchPage Construction

Name Title Skill

Jobs

Page 18: Search Ranking Across Heterogeneous Information Sources

Introduction

▪Skills–Represent

professional expertise– 35K+ standardized skills– Members get endorsed on

skills▪Skill queries

–Contains skills and no personal name

18

Page 19: Search Ranking Across Heterogeneous Information Sources

Introduction▪Unique challenges to LinkedIn expertise Search

– Scale: 400M members x 35K standardized skills

– Sparsity of skills in profiles

– Personalization

19

Page 20: Search Ranking Across Heterogeneous Information Sources

ReputationInformation a decision maker uses to make a

judgment on an entity with a record (*)

20

(*) “Building web reputation systems”, Glass and Farmer, 2010

Page 21: Search Ranking Across Heterogeneous Information Sources

Skill Reputation Scores [Ha-Thuc et al. BigData’15]

21

▪Decision Maker: searcher

▪Record: Professional career

▪Skill reputation: member expertise on a skill

▪Judgment: Hire?

Page 22: Search Ranking Across Heterogeneous Information Sources

Estimating Skill Reputation

22

Endorse profile

browsemap

? .85 .45? ? .35

? .42 ?

? ? .05Mem

bers

Skills

P(expert| member, skill)

Supervised Learning algorithm

Page 23: Search Ranking Across Heterogeneous Information Sources

Estimating Skill Reputation

23

Endorse profile

browsemap

? .85 .45

? ? .35

? .42 ?

? ? .05Mem

bers

Skills0.5 1

0.7 0

0 0.6

0.1 0

0.2 0.3 0.5

0.5 0.7 0.2

Mem

bers

Skills

Each row is a representation of a member in latent space

Each column represents a skill in

latent space

Matrix Factorization

Page 24: Search Ranking Across Heterogeneous Information Sources

Estimating Skill Reputation

24

Endorse profile

browsemap

? .85 .45

? ? .35

? .42 ?

.02 ? ?Mem

bers

Skills0.5 1

0.7 0

0 0.6

0.1 0

0.2 0.3 0.5

0.5 0.7 0.2

Mem

bers

Skills

.6 .85 .45

.14 .21 .35

.3 .42 .12

.02 .03 .05Mem

bers

SkillsFill in unknown cells in

the original matrix

Page 25: Search Ranking Across Heterogeneous Information Sources

Features▪Reputation feature

▪Social Connection

▪Homophily– Geo– Industry

▪Textual Features

25

Page 26: Search Ranking Across Heterogeneous Information Sources

Experiments

CTR@10 # Messages per Search

Flagship +11% +20%

Premium +18% +37%

26

▪Query Tagging

▪Target Segment: skill and no-name▪ Baseline

– No skill reputation feature– Hand-tuned

Page 27: Search Ranking Across Heterogeneous Information Sources

Overview

27

Query

Federated SearchSpell CorrectionQuery Tagging

Intent Prediction

People Companies

Federated SearchPage Construction

Name Title Skill

Jobs

Page 28: Search Ranking Across Heterogeneous Information Sources

Personalized Federated Search

28

Page 29: Search Ranking Across Heterogeneous Information Sources

▪Why do we need this?

29

Personalized Federated Search - Motivation

Page 30: Search Ranking Across Heterogeneous Information Sources

Personalized Federated Search - Overall

30

Page 31: Search Ranking Across Heterogeneous Information Sources

Personalized Federated Model [Arya, Ha-Thuc et al. CIKM’15]

▪ Relevance scores from base rankers▪ Query intent: P(vertical| query)▪ Searcher intent

– Mine searcher profiles and past behavior to infer intent▪ Title recruiter -> recruiting intent▪ Search for jobs -> job seeking intent

– Machine-learned models predict member intents:▪ Job seeking▪ Recruiting▪ Content consuming

31

Page 32: Search Ranking Across Heterogeneous Information Sources

Calibrate Signals across Verticals

▪Verticals associate with different intents

32

People Result

Job Result

Group Result

Recruiting Intent

Job Seeking Intent

Content Consuming

Intent

Page 33: Search Ranking Across Heterogeneous Information Sources

Calibrate Signals across Verticals

▪Verticals associate with different intents

33

People Result

Job Result

Group Result

Recruiting Intent

Job Seeking Intent

Content Consuming

Intent

Page 34: Search Ranking Across Heterogeneous Information Sources

Calibrate Signals across Verticals

▪Verticals associate with different intents

34

People Result

Job Result

Group Result

Recruiting Intent

Job Seeking Intent

Content Consuming

Intent

Page 35: Search Ranking Across Heterogeneous Information Sources

Take-Aways▪Text match is still important but not enough

▪Advanced features based on semi-structured data

–People search: skill reputation scores–Job Search: expertise homophily

▪Personalized Learning-to-Rank is crucial

35

Page 36: Search Ranking Across Heterogeneous Information Sources

References

▪“Personalized Expertise Search at LinkedIn”, Ha-Thuc, Venkataraman, Rodriguez, Sinha, Sundaram and Guo, BigData, 2015▪“Personalized Federated Search at LinkedIn”, Arya, Ha-Thuc and Sinha, CIKM, 2015▪“Learning to Rank Personalized Search Results in Professional Networks”, Ha-Thuc and Sinha, SIGIR, 2016▪“How to Get Them a Dream Job?”, Li, Arya, Ha-Thuc, Sinha, KDD, 2016

36

Page 37: Search Ranking Across Heterogeneous Information Sources

37