a language modeling framework for expert finding

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

A language modeling framework for expert finding

Presenter : Lin, Shu-Han

Authors : Krisztian Balog, Leif Azzopardi, Maarten de Rijke

Information Processing and Management (IPM) 45 (2009) 1–19


N.Y.U.S.T.I. M.

2

Outline

Motivation Objective Methodology Experiments Conclusion Comments


N.Y.U.S.T.I. M.Motivation

The expert finding: finding experts given a topic. Yellow Pages:

Profiles: employees self-assess their skills.

Keywords; e.g., marketing

Problem: Information: antiquated

Keywords: restricted

3


N.Y.U.S.T.I. M.Objectives

Within the organization… Mine published intranet documents.

Search all kinds of expertise.

‘Who are the experts on topic “Internet marketing and internet advertising” in my organization?’

4


N.Y.U.S.T.I. M.Methodology – Overview

To capture the association between a candidate expert and an area of expertise…

“What is the probability of a candidate ca being an expert given the query topic q?”

Model 1: candidate-based (query-independent) approach:

idea: build a profile of candidate experts, and rank them based on query.

Model 2: document-based (query-dependent) approach

idea: find the query-relevant documents, then associate with experts.

5

(constant)Bayes’ Theorem

(uniform)


N.Y.U.S.T.I. M.Methodology – Model 1

Build a textual representation (model) of a person’s knowledge according to his documents.

Then estimate the probability of the query given the candidate’s model.

6

e.g., p(Internet marketing and internet advertising| θca)=p(“Internet”| θca)2 p(“Marketing”| θ‧ ca)

p(“and”| θ‧ ca) p(“Advertising”| θ‧ ca)

(Smoothed)

(weighted)


N.Y.U.S.T.I. M.Methodology – Model 1B

Estimate p(t | d, ca) Candidate identifier

Window size (w)

7

e.g., p(“Internet”| “Mail.No.43”, “John”)… John ([email protected]) is a major in marketing. … … <731842> (< 731842 >) is a major in marketing. …

p.s. the closer, the more powerful.

(weighted)

mailto:[email protected]


N.Y.U.S.T.I. M.Methodology – Model 2

8

(Smoothed)


N.Y.U.S.T.I. M.Methodology – Model 2B

Model 2

Model2B

9


N.Y.U.S.T.I. M.

Methodology – document-candidate associations

Boolean model

TF-IDF

10

(document importance) (senior member of organization)


N.Y.U.S.T.I. M.Experiments

Evaluation measures: MAP (mean average precision)

MRR (mean reciprocal rank):

11

Query Results Correct response Rank Reciprocal rank

cat catten, cati, cats cats 3 1/3

torus torii, tori, toruses tori 2 1/2

virus viruses, virii, viri viruses 1 1

(1/3 + 1/2 + 1)/3 = 11/18



Model 1 vs. Model 2

Window-based models

12



Association methods

Parameter sensitivity

13


N.Y.U.S.T.I. M.

14

Conclusions

Model 1: build a profile of candidate experts, and rank them based on query.

Model 2: find the query-relevant documents, then associate with experts.

Model 2 was to be preferred over Model 1: Effectiveness: in terms of average precision and reciprocal rank

Implement: only requiring a regular document index

window-based extensions improved : Effectiveness: especially on top of Model 1

Frequency-based (TF-IDF) document-candidate associations is helpful.


N.Y.U.S.T.I. M.

15

Comments

Advantage Integrate ideas

Drawback …

Application …

a language modeling framework for expert finding

Documents