com bi national ranking

Upload: essi1134

Post on 10-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Com Bi National Ranking

    1/16

    Spring 2010 1

    IT2:Web Information retrieval(Web IR)Handout #12:

    Combinational RankingCombinational Ranking

    Ali Mohammad Zareh BidokiECE Department, Yazd [email protected]

  • 8/8/2019 Com Bi National Ranking

    2/16

    Spring 2010 2

    Ranking Algorithm Problems Rich-get- richer (Connectivity based)

    Low precision (at most 0.30)

    Each ranking algorithm operates well insome situations

  • 8/8/2019 Com Bi National Ranking

    3/16

    Spring 2010 3

    Combinational Ranking Content + connectivity +???

    How can we combine these features? R=f( query, content, connectivity)

  • 8/8/2019 Com Bi National Ranking

    4/16

    Spring 2010 4

    Relevance propagation Model (byShakery)

    A hyper score (h) is computed for each document.

    WI and WO are weighting functions for in-link andout-link pages, respectively.

    S (p) is similarity between query q and page p(selfrelevance):

    1

    ),()(

    ),()()()(

    !

    !

    p

    p

    KFE

    K

    FE

    j

    i

    pp

    jj

    pp

    ii

    ppWOph

    ppWIphpSph

  • 8/8/2019 Com Bi National Ranking

    5/16

    Spring 2010 5

    Three Iterative Models Weighted In-Link

    Weighted Out-Link

    Uniform Out-Link

  • 8/8/2019 Com Bi National Ranking

    6/16

    Spring 2010 6

    Weighted In-Link

    This model of user behavior is quite similar toRandom surfer, except that it is not query-independent. The probability that the random

    surfer visits a page is its hyper-relevance score.

    )()(

    )()()1()()(

    pSppW

    ppWphpSph

    i

    pp

    ii

    i

    wp

    p! p

    EE

  • 8/8/2019 Com Bi National Ranking

    7/16

    Spring 2010 7

    Weighted Out-Link

    In this model, we assume that given a page to a user, hereads the content of the page with probability alpha and hetraverses the outgoing edges with probability (1-alpha). The

    pages that are linked from a page do not have the sameimpact on its weight.

    )()(

    )()()1()()(

    jj

    pp

    jj

    pSppW

    ppWphpSphj

    wp

    p! p

    EE

  • 8/8/2019 Com Bi National Ranking

    8/16

    Spring 2010 8

    Uniform Out-Link

    In this special case, they assume that at eachpage, the user reads the content of the page, andwith probability (1-alpha) he reads all the pagesthat are linked from the page.

    p

    !jpp

    jphpSph )()1()()( E

  • 8/8/2019 Com Bi National Ranking

    9/16

    Spring 2010 9

    Algorithm Implementation Algorithm is run on a working set

    Working set construction: They first find the top 100000 pages which have the

    highest content similarity to the query

    From these 100000 pages, a small number (about 200) ofthe most similar pages are selected to be the core set ofpages.

    They then expand the core set to the working set byadding the pages that are among the 100000 pages andwhich point to the pages in the core set or are pointed toby the pages in the core set

  • 8/8/2019 Com Bi National Ranking

    10/16

    Spring 201010

    Algorithm Properties It is

    Online??

    Recursive Query independent

    It is shown on TREC Weighted In-Linkoutperforms others

  • 8/8/2019 Com Bi National Ranking

    11/16

    Spring 201011

    Frequency Propagation (By Song) Instead of Propagation of score, frequency

    of query terms are propagated

    We can use it online It is used based on site structure

  • 8/8/2019 Com Bi National Ranking

    12/16

    Spring 201012

    Propagation Formula

    ft(p) is the frequency of tem t in page p ft(p) is the frequency of tem t in page p

    after propagation

  • 8/8/2019 Com Bi National Ranking

    13/16

    Spring 201013

    Overall Framework for propagation

    SS is the best ST & HT-WI are similar

  • 8/8/2019 Com Bi National Ranking

    14/16

    Spring 201014

    Combinational Ranking AlgorithmsCombinational Ranking Algorithms

    Based on learning (Learning to Rank)Based on learning (Learning to Rank)

  • 8/8/2019 Com Bi National Ranking

    15/16

    Spring 2010 15

    Combination Framework

    Learning

    System

    q1:{(x11,4),(x12,3),(x1m,0)}

    q2:{(x21,3),(x22,2),(x2m,1)}

    .

    qn:{(xn1,4),(xn2,3),(xnm,2)}

    Training Set

    RankingModel

    g(x,w)

    Ranking

    System(x1,?),(x2,?),

    Test Set

    (x1,g(x1,w))(x2,g(x2,w))

    (x3,g(x2,w))

    Labels (Relevance judgments or click

    orders)

  • 8/8/2019 Com Bi National Ranking

    16/16

    Spring 2010 16

    Three learning categories Point wise

    Pair wise

    List wise