com bi national ranking
TRANSCRIPT
-
8/8/2019 Com Bi National Ranking
1/16
Spring 2010 1
IT2:Web Information retrieval(Web IR)Handout #12:
Combinational RankingCombinational Ranking
Ali Mohammad Zareh BidokiECE Department, Yazd [email protected]
-
8/8/2019 Com Bi National Ranking
2/16
Spring 2010 2
Ranking Algorithm Problems Rich-get- richer (Connectivity based)
Low precision (at most 0.30)
Each ranking algorithm operates well insome situations
-
8/8/2019 Com Bi National Ranking
3/16
Spring 2010 3
Combinational Ranking Content + connectivity +???
How can we combine these features? R=f( query, content, connectivity)
-
8/8/2019 Com Bi National Ranking
4/16
Spring 2010 4
Relevance propagation Model (byShakery)
A hyper score (h) is computed for each document.
WI and WO are weighting functions for in-link andout-link pages, respectively.
S (p) is similarity between query q and page p(selfrelevance):
1
),()(
),()()()(
!
!
p
p
KFE
K
FE
j
i
pp
jj
pp
ii
ppWOph
ppWIphpSph
-
8/8/2019 Com Bi National Ranking
5/16
Spring 2010 5
Three Iterative Models Weighted In-Link
Weighted Out-Link
Uniform Out-Link
-
8/8/2019 Com Bi National Ranking
6/16
Spring 2010 6
Weighted In-Link
This model of user behavior is quite similar toRandom surfer, except that it is not query-independent. The probability that the random
surfer visits a page is its hyper-relevance score.
)()(
)()()1()()(
pSppW
ppWphpSph
i
pp
ii
i
wp
p! p
EE
-
8/8/2019 Com Bi National Ranking
7/16
Spring 2010 7
Weighted Out-Link
In this model, we assume that given a page to a user, hereads the content of the page with probability alpha and hetraverses the outgoing edges with probability (1-alpha). The
pages that are linked from a page do not have the sameimpact on its weight.
)()(
)()()1()()(
jj
pp
jj
pSppW
ppWphpSphj
wp
p! p
EE
-
8/8/2019 Com Bi National Ranking
8/16
Spring 2010 8
Uniform Out-Link
In this special case, they assume that at eachpage, the user reads the content of the page, andwith probability (1-alpha) he reads all the pagesthat are linked from the page.
p
!jpp
jphpSph )()1()()( E
-
8/8/2019 Com Bi National Ranking
9/16
Spring 2010 9
Algorithm Implementation Algorithm is run on a working set
Working set construction: They first find the top 100000 pages which have the
highest content similarity to the query
From these 100000 pages, a small number (about 200) ofthe most similar pages are selected to be the core set ofpages.
They then expand the core set to the working set byadding the pages that are among the 100000 pages andwhich point to the pages in the core set or are pointed toby the pages in the core set
-
8/8/2019 Com Bi National Ranking
10/16
Spring 201010
Algorithm Properties It is
Online??
Recursive Query independent
It is shown on TREC Weighted In-Linkoutperforms others
-
8/8/2019 Com Bi National Ranking
11/16
Spring 201011
Frequency Propagation (By Song) Instead of Propagation of score, frequency
of query terms are propagated
We can use it online It is used based on site structure
-
8/8/2019 Com Bi National Ranking
12/16
Spring 201012
Propagation Formula
ft(p) is the frequency of tem t in page p ft(p) is the frequency of tem t in page p
after propagation
-
8/8/2019 Com Bi National Ranking
13/16
Spring 201013
Overall Framework for propagation
SS is the best ST & HT-WI are similar
-
8/8/2019 Com Bi National Ranking
14/16
Spring 201014
Combinational Ranking AlgorithmsCombinational Ranking Algorithms
Based on learning (Learning to Rank)Based on learning (Learning to Rank)
-
8/8/2019 Com Bi National Ranking
15/16
Spring 2010 15
Combination Framework
Learning
System
q1:{(x11,4),(x12,3),(x1m,0)}
q2:{(x21,3),(x22,2),(x2m,1)}
.
qn:{(xn1,4),(xn2,3),(xnm,2)}
Training Set
RankingModel
g(x,w)
Ranking
System(x1,?),(x2,?),
Test Set
(x1,g(x1,w))(x2,g(x2,w))
(x3,g(x2,w))
Labels (Relevance judgments or click
orders)
-
8/8/2019 Com Bi National Ranking
16/16
Spring 2010 16
Three learning categories Point wise
Pair wise
List wise