com bi national ranking

8/8/2019 Com Bi National Ranking

1/16

Spring 2010 1

IT2:Web Information retrieval(Web IR)Handout #12:

Combinational RankingCombinational Ranking

Ali Mohammad Zareh BidokiECE Department, Yazd [email protected]


2/16

Spring 2010 2

Ranking Algorithm Problems Rich-get- richer (Connectivity based)

Low precision (at most 0.30)

Each ranking algorithm operates well insome situations


3/16

Spring 2010 3

Combinational Ranking Content + connectivity +???

How can we combine these features? R=f( query, content, connectivity)


4/16

Spring 2010 4

Relevance propagation Model (byShakery)

A hyper score (h) is computed for each document.

WI and WO are weighting functions for in-link andout-link pages, respectively.

S (p) is similarity between query q and page p(selfrelevance):

1

),()(

),()()()(

!

!

p

p

KFE

K

FE

j

i

pp

jj

pp

ii

ppWOph

ppWIphpSph


5/16

Spring 2010 5

Three Iterative Models Weighted In-Link

Weighted Out-Link

Uniform Out-Link


6/16

Spring 2010 6

Weighted In-Link

This model of user behavior is quite similar toRandom surfer, except that it is not query-independent. The probability that the random

surfer visits a page is its hyper-relevance score.

)()(

)()()1()()(

pSppW

ppWphpSph

i

pp

ii

i

wp

p! p

EE


7/16

Spring 2010 7

Weighted Out-Link

In this model, we assume that given a page to a user, hereads the content of the page with probability alpha and hetraverses the outgoing edges with probability (1-alpha). The

pages that are linked from a page do not have the sameimpact on its weight.

)()(

)()()1()()(

jj

pp

jj

pSppW

ppWphpSphj

wp

p! p

EE


8/16

Spring 2010 8

Uniform Out-Link

In this special case, they assume that at eachpage, the user reads the content of the page, andwith probability (1-alpha) he reads all the pagesthat are linked from the page.

p

!jpp

jphpSph )()1()()( E


9/16

Spring 2010 9

Algorithm Implementation Algorithm is run on a working set

Working set construction: They first find the top 100000 pages which have the

highest content similarity to the query

From these 100000 pages, a small number (about 200) ofthe most similar pages are selected to be the core set ofpages.

They then expand the core set to the working set byadding the pages that are among the 100000 pages andwhich point to the pages in the core set or are pointed toby the pages in the core set


10/16

Spring 201010

Algorithm Properties It is

Online??

Recursive Query independent

It is shown on TREC Weighted In-Linkoutperforms others


11/16

Spring 201011

Frequency Propagation (By Song) Instead of Propagation of score, frequency

of query terms are propagated

We can use it online It is used based on site structure


12/16

Spring 201012

Propagation Formula

ft(p) is the frequency of tem t in page p ft(p) is the frequency of tem t in page p

after propagation


13/16

Spring 201013

Overall Framework for propagation

SS is the best ST & HT-WI are similar


14/16

Spring 201014

Combinational Ranking AlgorithmsCombinational Ranking Algorithms

Based on learning (Learning to Rank)Based on learning (Learning to Rank)


15/16

Spring 2010 15

Combination Framework

Learning

System

q1:{(x11,4),(x12,3),(x1m,0)}

q2:{(x21,3),(x22,2),(x2m,1)}

.

qn:{(xn1,4),(xn2,3),(xnm,2)}

Training Set

RankingModel

g(x,w)

Ranking

System(x1,?),(x2,?),

Test Set

(x1,g(x1,w))(x2,g(x2,w))

(x3,g(x2,w))

Labels (Relevance judgments or click

orders)


16/16

Spring 2010 16

Three learning categories Point wise

Pair wise

List wise

com bi national ranking

Documents