a language modeling framework for expert finding
DESCRIPTION
A language modeling framework for expert finding. Presenter : Lin, Shu -Han Authors : Krisztian Balog , Leif Azzopardi , Maarten de Rijke. Information Processing and Management (IPM) 45 (2009) 1–19. Outline. Motivation Objective Methodology Experiments Conclusion Comments. - PowerPoint PPT PresentationTRANSCRIPT
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
A language modeling framework for expert finding
Presenter : Lin, Shu-Han
Authors : Krisztian Balog, Leif Azzopardi, Maarten de Rijke
Information Processing and Management (IPM) 45 (2009) 1–19
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
2
Outline
Motivation Objective Methodology Experiments Conclusion Comments
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Motivation
The expert finding: finding experts given a topic. Yellow Pages:
Profiles: employees self-assess their skills.
Keywords; e.g., marketing
Problem: Information: antiquated
Keywords: restricted
3
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Objectives
Within the organization… Mine published intranet documents.
Search all kinds of expertise.
‘Who are the experts on topic “Internet marketing and internet advertising” in my organization?’
4
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – Overview
To capture the association between a candidate expert and an area of expertise…
“What is the probability of a candidate ca being an expert given the query topic q?”
Model 1: candidate-based (query-independent) approach:
idea: build a profile of candidate experts, and rank them based on query.
Model 2: document-based (query-dependent) approach
idea: find the query-relevant documents, then associate with experts.
5
(constant)Bayes’ Theorem
(uniform)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – Model 1
Build a textual representation (model) of a person’s knowledge according to his documents.
Then estimate the probability of the query given the candidate’s model.
6
e.g., p(Internet marketing and internet advertising| θca)=p(“Internet”| θca)2 p(“Marketing”| θ‧ ca)
p(“and”| θ‧ ca) p(“Advertising”| θ‧ ca)
(Smoothed)
(weighted)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – Model 1B
Estimate p(t | d, ca) Candidate identifier
Window size (w)
7
e.g., p(“Internet”| “Mail.No.43”, “John”)… John ([email protected]) is a major in marketing. … … <731842> (< 731842 >) is a major in marketing. …
p.s. the closer, the more powerful.
(weighted)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – Model 2
8
(Smoothed)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – Model 2B
Model 2
Model2B
9
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Methodology – document-candidate associations
Boolean model
TF-IDF
10
(document importance) (senior member of organization)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Experiments
Evaluation measures: MAP (mean average precision)
MRR (mean reciprocal rank):
11
Query Results Correct response Rank Reciprocal rank
cat catten, cati, cats cats 3 1/3
torus torii, tori, toruses tori 2 1/2
virus viruses, virii, viri viruses 1 1
(1/3 + 1/2 + 1)/3 = 11/18
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Experiments
Model 1 vs. Model 2
Window-based models
12
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Experiments
Association methods
Parameter sensitivity
13
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
14
Conclusions
Model 1: build a profile of candidate experts, and rank them based on query.
Model 2: find the query-relevant documents, then associate with experts.
Model 2 was to be preferred over Model 1: Effectiveness: in terms of average precision and reciprocal rank
Implement: only requiring a regular document index
window-based extensions improved : Effectiveness: especially on top of Model 1
Frequency-based (TF-IDF) document-candidate associations is helpful.
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
15
Comments
Advantage Integrate ideas
Drawback …
Application …