optimizing web search using social annotations

21
Optimizing Web Search Optimizing Web Search Using Social Using Social Annotations Annotations Shenghua Bao, Xiaoyuan Wu, Guirong Xue, Shenghua Bao, Xiaoyuan Wu, Guirong Xue, Yong Yu Yong Yu Shanghai JiaoTong University Shanghai JiaoTong University Ben Fei, Zhong Su Ben Fei, Zhong Su IBM China Research Lab IBM China Research Lab WWW 2007 WWW 2007

Upload: jael-horne

Post on 03-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Optimizing Web Search Using Social Annotations. Shenghua Bao, Xiaoyuan Wu, Guirong Xue, Yong Yu Shanghai JiaoTong University Ben Fei, Zhong Su IBM China Research Lab WWW 2007. Introduction (1/3). Two general aspects on improving web search - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Optimizing Web Search Using Social Annotations

Optimizing Web Search Optimizing Web Search Using Social AnnotationsUsing Social Annotations

Shenghua Bao, Xiaoyuan Wu, Guirong Xue, Yong YuShenghua Bao, Xiaoyuan Wu, Guirong Xue, Yong YuShanghai JiaoTong UniversityShanghai JiaoTong University

Ben Fei, Zhong SuBen Fei, Zhong SuIBM China Research LabIBM China Research Lab

WWW 2007WWW 2007

Page 2: Optimizing Web Search Using Social Annotations

22

Introduction (1/3)Introduction (1/3)

Two general aspects on improving web searchTwo general aspects on improving web search– Ordering the web pages according to the query-document Ordering the web pages according to the query-document

similaritysimilarityEx: Anchor text generation, search log mining …etcEx: Anchor text generation, search log mining …etc

– Ordering the web pages according to their qualitiesOrdering the web pages according to their qualitiesStatic rankingStatic ranking

Ex: PageRank, HITS …etcEx: PageRank, HITS …etc

Page 3: Optimizing Web Search Using Social Annotations

33

Introduction (2/3)Introduction (2/3)

Social annotation service (= social bookmarking)Social annotation service (= social bookmarking)– Developed for web users to organize and share their favorite Developed for web users to organize and share their favorite

web pages online by social annotationsweb pages online by social annotations– Emergent useful information that has been explored for Emergent useful information that has been explored for

folksonomy, visualization, semantic web, etcfolksonomy, visualization, semantic web, etc– DeliciousDelicious

Page 4: Optimizing Web Search Using Social Annotations

44

Introduction (3/3)Introduction (3/3)

Utilizing social annotations for better web search from Utilizing social annotations for better web search from the two aspects:the two aspects:– Similarity rankingSimilarity ranking

The annotations provided by web users are usually good summaries The annotations provided by web users are usually good summaries (new metadata) of the corresponding web pages(new metadata) of the corresponding web pages

The annotation data may be sparse and incompleteThe annotation data may be sparse and incomplete

SocialSimRank (SSR) algorithmSocialSimRank (SSR) algorithm

– Static rankingStatic rankingThe amount of annotations assigned to a page indicates its The amount of annotations assigned to a page indicates its popularity and implies its quality in some sensepopularity and implies its quality in some sense

Different annotation may have different weights in indicating the Different annotation may have different weights in indicating the popularity of web pagespopularity of web pages

SocialPageRank (SPR) algorithmSocialPageRank (SPR) algorithm

Page 5: Optimizing Web Search Using Social Annotations

55

Search with Social AnnotationSearch with Social Annotation

Web page annotators provide cleaner data for users’ browsingWeb page annotators provide cleaner data for users’ browsing

Similar or closely related annotations are usually given to the Similar or closely related annotations are usually given to the same web pagessame web pages

Page 6: Optimizing Web Search Using Social Annotations

66

Similarity Ranking between the Similarity Ranking between the Query and Social AnnotationsQuery and Social Annotations

Term-MatchingTerm-Matching based similarity ranking based similarity ranking– suffers from the synonymy problemsuffers from the synonymy problem

qq={={qq11,,qq22,…, ,…, qqnn}, }, AA((pp)={a)={a11,a,a22,…, a,…, amm}}

Social Similarity RankingSocial Similarity Ranking (SSR) (SSR)

Observation 1Observation 1:: Similar (semantically-related) annotations are Similar (semantically-related) annotations are usually assigned to similar (semantically-related) web pages by usually assigned to similar (semantically-related) web pages by users with common interests. In the social annotation environment, users with common interests. In the social annotation environment, the similarity among annotations in various forms can further be the similarity among annotations in various forms can further be identified by the common web pages they annotated.identified by the common web pages they annotated.

|)(|

|)(|),(

pA

pAqpqsimTM

Page 7: Optimizing Web Search Using Social Annotations

77

Illustration of SocialSimRankIllustration of SocialSimRank

AA((aa)={ubuntu}, )={ubuntu}, AA((bb)={linux,ubuntu}, )={linux,ubuntu}, AA((cc)={gnome,linux,ubuntu})={gnome,linux,ubuntu}

PP(ubuntu)={(ubuntu)={a,b,ca,b,c}, }, PP(linux)={(linux)={b,cb,c}, }, PP(gnome)={(gnome)={cc}}

MMAPAP(ubuntu, (ubuntu, aa)=1, )=1, MMAPAP(linux, (linux, bb)=1, )=1, MMAPAP(gnome, (gnome, cc)=2)=2

Page 8: Optimizing Web Search Using Social Annotations

88

n

i

m

j

jiA pAqSpqsimSSR

1 1

))(,(),(

Page 9: Optimizing Web Search Using Social Annotations

99

Page Quality Estimation Using Page Quality Estimation Using Social AnnotationsSocial Annotations

Observation 2Observation 2:: High quality web pages are usually popularly High quality web pages are usually popularly annotated. annotated. Popular web pagesPopular web pages, , up-to-date web usersup-to-date web users and and hot social hot social annotationsannotations usually have the following relations: 1) popular web usually have the following relations: 1) popular web pages are bookmarked by many up-to-date users and annotated by pages are bookmarked by many up-to-date users and annotated by hot annotations; 2) up-to-date users like to bookmark popular pages hot annotations; 2) up-to-date users like to bookmark popular pages and use hot annotations; 3) hot annotations are used to annotate and use hot annotations; 3) hot annotations are used to annotate popular web pages and used by up-to-date users.popular web pages and used by up-to-date users.

NotationsNotationsMMPUPU: : NNP P × × NNUU association matrix between pages and users association matrix between pages and users

MMUAUA: : NNU U ×× N NAA association matrix between users and annotations association matrix between users and annotations

MMAPAP: : NNA A ×× N NPP association matrix between annotations and pages association matrix between annotations and pages

PP00: vector of randomly initialized SocialPageRank scores: vector of randomly initialized SocialPageRank scores

Page 10: Optimizing Web Search Using Social Annotations

1010

SocialPageRank AlgorithmSocialPageRank Algorithm

Page 11: Optimizing Web Search Using Social Annotations

1111

Illustration of Quality Transition in Illustration of Quality Transition in the SPR Algorithmthe SPR Algorithm

Page 12: Optimizing Web Search Using Social Annotations

1212

Dynamic Ranking with Social Dynamic Ranking with Social InformationInformation

Dynamic ranking methodDynamic ranking method– RankSVMRankSVM

FeaturesFeatures

Page 13: Optimizing Web Search Using Social Annotations

1313

Experiment Data (1/2)Experiment Data (1/2)

Delicious dataDelicious data– 1,736,268 web pages and 269.566 annotations are crawled 1,736,268 web pages and 269.566 annotations are crawled

from from DeliciousDelicious during May, 2006. during May, 2006.– Split compound annotations into standard words with the help Split compound annotations into standard words with the help

of WordNetof WordNet

ex: java.programming ex: java.programming java, programming java, programming

Page 14: Optimizing Web Search Using Social Annotations

1414

Experiment Data (2/2)Experiment Data (2/2)

Test set for dynamic ranking with social annotationTest set for dynamic ranking with social annotation– Manual query set (MQ)Manual query set (MQ)

50 queries and their corresponding ground truths in Delicious data 50 queries and their corresponding ground truths in Delicious data manually created by CS studentsmanually created by CS students

Pooling: judge the top 100 documents returned by LucenePooling: judge the top 100 documents returned by Lucene

– Automatic query set (AQ) from Open Directory Project (ODP)Automatic query set (AQ) from Open Directory Project (ODP)Merging Delicious data with ODP and discarding ODP categories Merging Delicious data with ODP and discarding ODP categories that contain no Delicious URLsthat contain no Delicious URLs

Randomly sample 3000 ODP categories and extract the category Randomly sample 3000 ODP categories and extract the category paths as the query set and the corresponding web pagespaths as the query set and the corresponding web pagesex: extract path ex: extract path TOP/Computer/Software/Graphics TOP/Computer/Software/Graphics as “as “Computer Software GraphicsComputer Software Graphics””

– 5-fold cross validation for each query set5-fold cross validation for each query set

Page 15: Optimizing Web Search Using Social Annotations

1515

Evaluation of Annotation Evaluation of Annotation SimilaritiesSimilarities

Table. Explored similar annotations based on SocialSimRank

Page 16: Optimizing Web Search Using Social Annotations

1616

PageRank vs. Average CountPageRank vs. Average Count

Page 17: Optimizing Web Search Using Social Annotations

1717

SPR vs. PageRankSPR vs. PageRank

SPR is normalized into a scale of 0-10 so that SPR and PageRank have SPR is normalized into a scale of 0-10 so that SPR and PageRank have the same number of pages in each grade from 0 to 10the same number of pages in each grade from 0 to 10

The pages with each PageRank value diversify a lot on the number of The pages with each PageRank value diversify a lot on the number of annotations and usersannotations and users

SPR successfully characterizes the web pages’ popularity degrees among SPR successfully characterizes the web pages’ popularity degrees among web annotatorsweb annotators

Page 18: Optimizing Web Search Using Social Annotations

1818

Results of Dynamic Ranking (1/2)Results of Dynamic Ranking (1/2)

Table. Comparison of MAP between similarity featuresTable. Comparison of MAP between similarity features

MethodMethod MQ50MQ50 AQ3000AQ3000

Baseline (BM25)Baseline (BM25) 0.41150.4115 0.10910.1091

Baseline+TMBaseline+TM 0.43410.4341 0.11280.1128

Baseline+SSRBaseline+SSR 0.46970.4697 0.11470.1147

Baseline+PRBaseline+PR 0.41410.4141 0.11660.1166

Baseline+SPRBaseline+SPR 0.42780.4278 0.12250.1225

Baseline+SSR,SPRBaseline+SSR,SPR 0.4724 (+14.80%)0.4724 (+14.80%) 0.1364 (+25.02%)0.1364 (+25.02%)

Page 19: Optimizing Web Search Using Social Annotations

1919

Results of Dynamic Ranking (2/2)Results of Dynamic Ranking (2/2)

Figure. Figure. NDCG at K for comparison of baseline, baseline+TM, NDCG at K for comparison of baseline, baseline+TM, baseline+SSR, baseline+SSR, baseline+PR, and baseline+SPR on query set AQbaseline+PR, and baseline+SPR on query set AQ

Page 20: Optimizing Web Search Using Social Annotations

2020

DiscussionsDiscussions

There are still several problems to further addressThere are still several problems to further address– Annotation CoverageAnnotation Coverage

The user submitted queries may not match any social annotationThe user submitted queries may not match any social annotation

Many web pages may have no annotations: 1) newly emerging web Many web pages may have no annotations: 1) newly emerging web pages; 2) key-page-associated web pages while users tend to pages; 2) key-page-associated web pages while users tend to annotate key pages only; 3) uninteresting web pages.annotate key pages only; 3) uninteresting web pages.

– Annotation AmbiguityAnnotation AmbiguitySSR may find the similar terms to the query terms while fail to SSR may find the similar terms to the query terms while fail to disambiguate terms that have more than one meaningsdisambiguate terms that have more than one meanings

– Annotation SpammingAnnotation SpammingAs social annotation becomes more and more popular, the amount As social annotation becomes more and more popular, the amount of spam could drastically increase in the near futureof spam could drastically increase in the near future

Page 21: Optimizing Web Search Using Social Annotations

2121

ConclusionConclusion

The problem of integrating social annotations into web The problem of integrating social annotations into web search is studied.search is studied.

We observed that social annotations could benefit web We observed that social annotations could benefit web search in both similarity ranking and static ranking.search in both similarity ranking and static ranking.

The experimental results showed that SSR can The experimental results showed that SSR can successfully find the latent semantic relations among successfully find the latent semantic relations among annotations and SPR can provide the static ranking from annotations and SPR can provide the static ranking from the web annotators perspective.the web annotators perspective.

In the future, we would optimize the proposed algorithms In the future, we would optimize the proposed algorithms and explore more sophisticated social features.and explore more sophisticated social features.