topic-sensitive pagerank reference: taher h. haveliwala, topic-sensitive pagerank: a...

32
Topic-Sensitive PageRank Reference: Taher H. Haveliwala, “Topic-Sensitive Page Rank: A context-sensitive ranking algorithm for websear ch”, IEEE Trans. On Knowledge and Data Engineering, vol. 15, No. 4, PP. 784-796. The original PageRank: purely based on the hyperlinks of web pages. Contents are not considered. A vector of PageRank is used for all web pages. Topic-Sensitive PageRank For each topic, a vector of PageRank is create d. Each page has several PageRank values. One for each topic.

Upload: johnathan-jason

Post on 29-Mar-2015

232 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

Topic-Sensitive PageRank

Reference: Taher H. Haveliwala, “Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch”, IEEE Trans. On Knowledge and Data Engineering, vol. 15, No. 4, PP. 784-796.

The original PageRank: • purely based on the hyperlinks of web pages.• Contents are not considered.• A vector of PageRank is used for all web pages.

Topic-Sensitive PageRank• For each topic, a vector of PageRank is created.• Each page has several PageRank values. One for each t

opic.

Page 2: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

Creating a Page Rank vector for each topic

• How to select topics?– Using a small set of topics is important for low co

mputation cost and quick response time.– Open Directory: http://www.dmoz.org

• 16 top level topics

Page 3: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

Original PageRank

• Rank=MRank– Rank is a vector, one element for each web p

age.– M is a nn matric

• If there is a link from page j to page I, then

Mi,j =1/Nj, where Nj is the number of out-links of page j.

Page 4: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

Another Version

• Let n be the total number of web pages.

P=[1/n] n1 be a vector.

• d is a n1 matrix. – di=1 if page i has no out links. Otherwise, di=0.

• D=ppT, and E=p[1] 1n

• M’=(1-)(M+D) +E.• Rank=M’Rank=(1-)(M+D) Rank +P.

Page 5: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

Topic sensitive Page Rank

• Let Tj be the set of URLs in the ODP category cj.

• P=vj, where – Vj,i =1/|Tj| if page j points to page i. Otherwisr vj,i=0.

• The pageRank vector for topic cj is PR(, vj).

• Compute the pageRAnk for all pages related to topic cj as if for the original PageRank by considering Tj.

Page 6: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

The Retrieval Score

• Let r j, d be the PageRank of document d given by the rank PR( , vj).

• Sq,d= j P(cj|q)•r j,d ,

• P(cj|q) is the score that topic is related to q.

Page 7: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

Similarity Measures for Induced Rankings Another Version

• Let 1 and 2 be two rankings of documents.• OSim(1,2) indicates the degree of overlap b

etween the top k URLs of the two rankings. • OSim (1,2) =|AB|/k. • KSimn (1,2) = |(u, v): 1 and 2 agree on order of (u, v)|/(|U|)(|U|-1)

Let be the true ranking given by user. To compare 1 and 2, we can use OSim (1,2) or KSimn (1,2) .

Page 8: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

Weighted PageRank

• Assign larger rank values to more important pages.• Each outlink page gets its value proportional to it popul

arity.

• W in (v, u) is the weight of link(v, u) calculated based on the number of inlinks of page u and the number of inlinks of all reference pages of page v.

w in (v, u) =Iu/pR(v) Ip,

Iu—number of inlinks of page u.

R(v)-the set of all pages that v points to.

Page 9: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

Weighted PageRank

• w out (v, u) =Ou/pR(v) Op,

Ou—number of outlinks of page u.

Let B(u) be the set of pages that points to v.

PR(u)=(1-d) +d vB(u) PR(v) w in (v, u) w out (v, u)

Reference: Wenpu Xing and Ali Ghorbani, Weighted PageRank Algorithm, Proceedings of the 2nd Annual Conference on Communication Networks and Services Research (CNSR’04), 2004.

Page 10: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

Choices of Search Engines

• Many search engines exist to compete for users– The results are not necessarily the same– Different users prefer different search engines– Search results may, in the future, be biased

towards paid advertisements.

Page 11: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

GOOGLE: City University

1. City University London - the University for business and the ... ... The University for business and the professions. Contact Us | About City University| Maps & Directions, AZ Index | Site Map | Help. Prospective Students. ... Description: Official site with information about courses, research, schools, and departments. Includes details...

2. City University HomeA University which believes in forward-thinking business and leadership skills, exposing students to the latest technology in all courses of study. Description: A private, nonprofit institution founded to serve working adults wanting to pursue educational opportunit...

3. Welcome to Dublin City University... here... Dublin City University, Dublin 9, Ireland. Tel. +353 (0) 1700 5000, Fax. +353 (0) 1 836 0830. Page updated: 02/09/03 legal. ... Description: Information on facilities, services, degree courses, research, the campus, student life, the library...

4. The City University of New YorkDescription: The University's Main Website.

5. City University of Hong KongAD Working Group lends staff, students an ear. Several issues stillevoked strong emotions as staff members and students voiced their ... Description: Formerly the City Polytechnic of Hong Kong. Includes information on university, links to learning...

6. Welcome to Oklahoma City University... At Oklahoma City University, our students come first. Our ... Oklahoma CityUniversity offers a quality, values-centered education. United ... Description: Admissions, academic programs and alumni relations, sports, services, news, calendar of events and...

Page 12: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

YAHOO

1. City University London - the University for business and the ... ... The University for business and the professions. Contact Us | About City University | Maps & Directions, AZ Index | Site Map | Help. Prospective Students. ...

2. City University A University which believes in forward-thinking business and leadership skills, exposing students to the latest technology in all courses of study.www.cityu.edu/ - 3k - Cached - More pages from this site

3. Dublin City University ... here... Dublin City University, Dublin 9, Ireland. Tel. +353 (0) 1 700 5000, Fax. +353 (0) 1 836 0830. Page updated: 02/09/03 legal. ...www.dcu.ie/ - 6k - Cached - More pages from this site

4. City University of New York The University's Main Website.www.cuny.edu/ - More pages from this site

5. City University of Hong Kong AD Working Group lends staff, students an ear. Several issues still evoked strong emotions as staff members and students voiced their ...www.cityu.edu.hk/ - 26k - Cached - More pages from this site

6. Oklahoma City University ... At Oklahoma City University, our students come first. Our ... Oklahoma City University offers a quality, values-centered education. United ...www.okcu.edu/ - 20k - Cached - More pages from this site

Page 13: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

AllTheWeb

1. Apartment Listings in University City  (sponsored)St. Louis, Missouri area apartment listings at Apartments.com. Free nationwide apartment search with visual rental listings online.http://www.apartments.com

2. City University - Washington  (sponsored)Contact information and resources such as yellow page information, phone number, address, maps and directions as provided by QwestDex.http://service.bfast.com

3. City University Apartments - Rent.com  (sponsored)Rent.com has millions of free apartment listings nationwide. Get $100 when you sign a lease near your school - it's easy.http://www.rent.com

4. City University London - the University for business and the professions... Contact Us | About City University | Maps & Directions A-Z Index | Site Map | Help ... Description: Official site with information about courses, research, schools, and departments. Includes details of news and events.more hits from:  http://www.city.ac.uk/  -  13 KB

5. Flash UpgradeDescription: A University which believes in forward-thinking business and leadership skills, exposing students to the latest technology in all courses of study.http://www.cityu.edu/  -  27 KB

6. Welcome to Dublin City University... KnowledgeWorks, wins DCU Mallin-invent award Full text you can go anywhere in the world from here... Dublin City University, Dublin 9, Ireland. Tel. +353 (0) 1 700 5000, Fax. +353 (0) 1 836 0830. Page updated: 02/09/03 legal search ... Description: Information on facilities, services, degree courses, research, the campus, student life, the library and DCU news.

Page 14: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

MSN Search

1. City University of New York Details the CUNY system's campuses and admissions requirements. Eye photos of notable graduates such as actor Judd Hirsch.www.cuny.edu

2. City University Offers prospectus, admissions, academic calendar, courses and degrees, and news.www.cityuniversity.net

3. City University Private, nonprofit institution located in Belleview, Washington, offers higher education opportunities to working professionals and lifelong learners.www.cityu.edu

4. Valley City University Admissions Site covers financial aid and tuition and offers a virtual campus tour. Site also provides access to an online application.www.vcsu.nodak.edu/admissions

5. Oklahoma City University The Oklahoma City University site contains information about admissions, academic departments and student life. The site also provides library resources, faculty information and news.frodo.okcu.edu

6. Dublin City University, Dublin - TripAdvisor Browse reviews of the Dublin City University in Dublin, County Dublin, Ireland. Check consumer reviews, articles and opinions about the Dublin City University.www.tripadvisor.com/Hotel_Review-g186605-d209093-Reviews-Dublin_City_U...

Page 15: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

AOL Search

1. City University London - the University for business and the ... - ... The University for business and the professions. Contact Us | About City University | Maps & Directions, AZ Index | Site Map | Help. Prospective Students. ... http://www.city.ac.uk/

2. City University Home - A University which believes in forward-thinking business and leadership skills, exposing students to the latest technology in all courses of study. http://www.cityu.edu/

3. Welcome to Dublin City University - ... here... Dublin City University, Dublin 9, Ireland. Tel. +353 (0) 1 700 5000, Fax. +353 (0) 1 836 0830. Page updated: 02/09/03 legal. ... http://www.dcu.ie/

4. http://www.cuny.edu/ - The University's Main Website.http://www.cuny.edu/

5. City University of Hong Kong - AD Working Group lends staff, students an ear. Several issues still evoked strong emotions as staff members and students voiced their ... http://www.cityu.edu.hk/

6. Welcome to Oklahoma City University - ... At Oklahoma City University, our students come first. Our ... Oklahoma City University offers a quality, values-centered education. United ... http://www.okcu.edu/

Page 16: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

Ask Jeeves

1. City University of Hong Kong Formerly the City Polytechnic of Hong Kong. Includes information on university, links to learning resources, research and departmental information.....From:www.cityu.edu.hk/

2. City University Text Version. The University for business and the professions ... Contact Us | About City University | Maps & Directions A-Z Index | Site Map | Help...From:www.city.ac.uk/

3. New Jersey City University Official site includes graduate and undergraduate information, news, events, online learning, calendars, and athletics.From:www.njcu.edu/

4. Elizabeth City State University Admission Application Introduction 2004-2005 APPLICATION FOR ADMISSION. Welcome to the Elizabeth City State University Online Application. New Freshmen...From:www.ncmentor.org/applications/unc/apply/elizabeth_city_state_univ...

5. City University A University which believes in forward-thinking business and leadership skills, exposing students to the latest technology in all courses of study.From:www.cityu.edu/

6. The City College of New York City College of New York (CUNY) A senior college of The City University of New York. Located in Manhattan. 138th Street at Convent Avenue. A Harlem masterpiece in the neo-Gothic...From:www.ccny.cuny.edu/

Page 17: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

HotBot

1. City University of New YorkDetails the CUNY system's campuses and admissions requirements. Eye photos of notable graduates such as actor Judd Hirsch.www.cuny.edu/ - August 10, 2003 - 25 KB

2. City UniversityPrivate, nonprofit institution located in Belleview, Washington, offers higher education opportunities to working professionals and lifelong learners.www.cityu.edu/ - August 23, 2003 - 27 KB

3. New Jersey City UniversityExplore the course requirements for this liberal arts institution. Link to admissions and financial aid information. ... New Jersey City University. 2039 Kennedy Boulevard Jersey City, New Jersey 07305-1597 ... www.njcu.edu/ - November 1, 2003 - 20 KB

4. City University London - the University for business and the...... Contact Us | About City University | Maps & Directions. A-Z Index | Site Map | Help ... www.city.ac.uk/ - September 28, 2003 - 14 KB

5. Oklahoma City UniversityTake a tour of the campus, and explore a roster of academic programs. ... At Oklahoma City University, our students come first. ... www.okcu.edu/ - October 26, 2003 - 20 KB

6. University of Missouri, Kansas CityUniversity based in Kansas City, Missouri, presents and overview of its academic programs, and offers campus news. ... UNIVERSITY OF MISSOURI-KANSAS CITY. Bulletin. Winter 2004 Fee Update, more... ... www.umkc.edu/ - October 23, 2003 - 16 KB

Page 18: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

Lycos

1. City University London - the University for business and the...… Contact Us | About City University | Maps & Directions A-Z Index | Site Map | Help … More results from: www.city.ac.uk   September 16, 2003 - 14 KB

2. Flash UpgradeA University which believes in forward-thinking business and leadership skills, exposing students to the latest technology in all courses of study.www.cityu.edu   November 1, 2003 - 27 KB

3. The City University of New YorkThe University's Main Website.www.cuny.edu   December 31, 1969 - 121 B

4. Welcome to Dublin City University… KnowledgeWorks, wins DCU Mallin-invent award Full text you can go anywhere in the world from here... Dublin City University , Dublin 9, Ireland. Tel. +353 (0) 1 700 5000, Fax. +353 (0) 1 836...More results from: www.dcu.ie   October 29, 2003 - 5 KB

5. City University of Hong KongFormerly the City Polytechnic of Hong Kong. Includes information on university, links to learning resources, research and departmental information and student information.More results from: www.cityu.edu.hk   October 29, 2003 - 26 KB

6. HCU HomePage -English-… contact us For any suggestion and requests to this web site, [email protected] Feel free to link this web site. Hiroshima City University More results from: www.hiroshima-cu.ac.jp   May 11, 2003 - 26 KB

Page 19: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

Teoma

1. City University of Hong Kong Formerly the City Polytechnic of Hong Kong. Includes information on university, links to learning resources, research and departmental information...www.cityu.edu.hk/[Related Pages][More Results from www.cityu.edu.hk]

2. City University Text Version. The University for business and the professions ... Contact Us | About City University | Maps & Directions A-Z Index | Site Map | Help...www.city.ac.uk/[More Results from www.city.ac.uk]

3. New Jersey City University Official site includes graduate and undergraduate information, news, events, online learning, calendars, and athletics.www.njcu.edu/[Related Pages][More Results from www.njcu.edu]

4. Elizabeth City State University Admission Application Introduction 2004-2005 APPLICATION FOR ADMISSION. Welcome to the Elizabeth City State University Online Application. New Freshmen...www.ncmentor.org/applications/unc/apply/el...

5. City University A University which believes in forward-thinking business and leadership skills, exposing students to the latest technology in all courses of study.www.cityu.edu/

6. The City College of New York City College of New York (CUNY) A senior college of The City University of New York. Located in Manhattan. 138th Street at Convent Avenue. A Harlem masterpiece in the neo-Gothic...www.ccny.cuny.edu/[More Results from www.ccny.cuny.edu]

Page 20: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

MetaSearch Engine

• Metasearch Engines are designed to increase the coverage of web by forwarding users’ queries to multiple search engines– Users’ requests are sent to multiple search

engines such as AlltheWeb, Google, MSN.

• Then the results from the individual search engine are combined into a single result set to present to users.

Page 21: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

Longest common subsequence

• Definition 1: Given a sequence X=x1x2...xm, another sequence Z=z1z2...zk is a subsequence of X if there exists a strictly increasing sequence i1i2...ik of indices of X such that for all j=1,2,...k, we have x ij=zj.

• Example 1: If X=abcdefg, Z=abdg is a subsequence of X. X=abcdefg,Z=ab d g

Page 22: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

• Definition 2: Given two sequences X and Y, a sequence Z is a common subsequence of X and Y if Z is a subsequence of both X and Y.

• Example 2: X=abcdefg and Y=aaadgfd. Z=adf is a common subsequence of X and Y.

X=abc defg Y=aaaadgfd Z=a d f

Page 23: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

• Definition 3: A longest common subsequence of X and Y is a common subsequence of X and Y with the longest length. (The length of a sequence is the number of letters in the seuqence.)

• Longest common subsequence may not be unique.

• Example: abcd acbd Both acd and abd are LCS.

Page 24: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

Longest common subsequence problem

• Input: Two sequences X=x1x2...xm, and

Y=y1y2...yn.

• Output: a longest common subsequence of X and Y.

• A brute-force approach

Suppose that mn. Try all subsequence of X (There are 2m subsequence of X), test if such a subsequence is also a subsequence of Y, and select the one with the longest length.

Page 25: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

Charactering a longest common subsequence

• Theorem (Optimal substructure of an LCS)• Let X=x1x2...xm, and Y=y1y2...yn be two

sequences, and • Z=z1z2...zk be any LCS of X and Y.• 1. If xm=yn, then zk=xm=yn and Z[1..k-1] is an LCS

of X[1..m-1] and Y[1..n-1]. • 2. If xm yn, then zkxm implies that Z is an LCS

of X[1..m-1] and Y.• 2. If xm yn, then zkyn implies that Z is an LCS

of X and Y[1..n-1].

Page 26: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

The recursive equation

• Let c[i,j] be the length of an LCS of X[1...i] and X[1...j].

• c[i,j] can be computed as follows: 0 if i=0 or j=0,c[i,j]= c[i-1,j-1]+1 if i,j>0 and x i=yj, max{c[i,j-1],c[i-1,j]} if i,j>0 and x iyj.

Computing the length of an LCS• There are nm c[i,j]’s. So we can compute them

in a specific order.

Page 27: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

The algorithm to compute an LCS

• 1. for i=1 to m do • 2. c[i,0]=0;• 3. for j=0 to n do• 4. c[0,j]=0;• 5. for i=1 to m do• 6. for j=1 to n do• 7. { • 8. if x[I] ==y[j] then• 9. c[i,j]=c[i-1,j-1]=1;• 10 b[i,j]=1; • 11. else if c[i-1,j]>=c[i,j-1] then • 12. c[i,j]=c[i-1,j]• 13. b[i,j]=2;• 14. else c[i,j]=c[i,j-1] • 15. b[i,j]=3;• 14 }

Page 28: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

Example 3: X=BDCABA and Y=ABCBDAB.

Page 29: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

Constructing an LCS (back-tracking)

• We can find an LCS using b[i,j]’s. • We start with b[n,m] and track back to some cell b[0,i] or

b[i,0].• The algorithm to construct an LCS

1. i=m2. j=n;3. if i==0 or j==0 then exit;4. if b[i,j]==1 then { i=i-1; j=j-1; print “xi”; } 5. if b[i,j]==2 i=i-16. if b[i,j]==3 j=j-17. Goto Step 3.

• The time complexity: O(nm).

Page 30: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

Shortest common supersequence

• Definition: Let X and Y be two sequences. A sequence Z is a supersequence of X and Y if both X and Y are subsequence of Z.

• Shortest common supersequence problem:Input: Two sequences X and Y.Output: a shortest common supersequence of X and Y.

• Example: X=abc and Y=abb. Both abbc and abcb are the shortest common supersequences for X and Y.

Page 31: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On

Recursive Equation:

• Let c[i,j] be the length of an LCS of X[1...i] and X[1...j].

• c[i,j] can be computed as follows:

j if i=0

i if j=0,

c[i,j]= c[i-1,j-1]+1 if i,j>0 and xi=yj,

min{c[i,j-1]+1,c[i-1,j]+1} if i,j>0 and xiyj.

Page 32: Topic-Sensitive PageRank Reference: Taher H. Haveliwala, Topic-Sensitive PageRank: A context-sensitive ranking algorithm for websearch, IEEE Trans. On