ir, ie and qa over social media social media (blogs, community qa, news aggregators) complementary...

3
IR, IE and QA over Social Media Social media (blogs, community QA, news aggregators) Complementary to “traditional” news sources (Rathergate) Grow faster than “traditional” web content, gap widening Traditional/published: 4Gb/day; social media: 10gb/day [from Andrew Tomkins/Yahoo!, “Future or Web Search”, May 2007] Research challenges Low(er) quality Content more dynamic User interactions crucial: ratings, comments, link structure to retrieve documents and to evaluate extracted information

Upload: may-lee

Post on 14-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IR, IE and QA over Social Media Social media (blogs, community QA, news aggregators)  Complementary to “traditional” news sources (Rathergate)  Grow

IR, IE and QA over Social Media Social media (blogs, community QA, news aggregators) Complementary to “traditional” news sources (Rathergate) Grow faster than “traditional” web content, gap widening

Traditional/published: 4Gb/day; social media: 10gb/day [from Andrew Tomkins/Yahoo!, “Future or Web Search”, May 2007]

Research challenges Low(er) quality Content more dynamic User interactions crucial:

ratings, comments, link structure

to retrieve documents and to

evaluate extracted information

Page 2: IR, IE and QA over Social Media Social media (blogs, community QA, news aggregators)  Complementary to “traditional” news sources (Rathergate)  Grow

Finding High Quality Content for IE/QA Goal: find high-quality content (accurate & well-presented)

Setting: Community QA (Yahoo! Answers) Classifying social media (e.g., cQA) is substantially different from document classification

Sources of information Content analysis Usage data (page views, etc) Community ratings, link analysis

General framework for quality estimation in social media

Graph-based model of contributor relationships, combined with content and usage analysis

Can identify high-quality items with accuracy ~ human agreement

E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne, Finding High Quality Content in Social Media, in Proc. of WSDM 2008

Page 3: IR, IE and QA over Social Media Social media (blogs, community QA, news aggregators)  Complementary to “traditional” news sources (Rathergate)  Grow

Finding Relevant Content for IE/QA

Goal: given a query, rank social content (cQA) by expected relevance and quality

Approach: Learn ranking functions specifically for social media retrieval Features

Textual content: relevance, stylistics, language models User Interactions: link structure, discussion threads User ratings: incorporate user-provided content ratings

Method: Gradient boosting (GBrank) Developed a new objective function for learning ranking

function using (noisy) preference data.

Results: Outperform Yahoo! default ranking or naïve ranking

by user votes Can be made robust to ratings spam

[same authors, to appear in AIRWeb 2008]

J. Bian, Y. Liu, E. Agichtein and H. Zha. Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media, to appear in Proc. of WWW 2008