twinner : understanding news queries with geo-content using twitter

27
TWinner: Understanding News Queries with Geo-content using Twitter Satyen Abrol,Latifur Khan University of Texas at Dallas,Department of Computer Science GIR ’10 29 April, 2011 Sengyu Rim

Upload: chavi

Post on 05-Jan-2016

35 views

Category:

Documents


1 download

DESCRIPTION

TWinner : Understanding News Queries with Geo-content using Twitter. Satyen Abrol,Latifur Khan University of Texas at Dallas,Department of Computer Science GIR ’1 0. 29 April, 2011 Sengyu Rim. Outline. Introduction Related Work Twitter as News-wire Determining News Intent - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: TWinner : Understanding News Queries with Geo-content using Twitter

TWinner: Understanding News Queries with Geo-con-tent using Twitter

Satyen Abrol,Latifur KhanUniversity of Texas at Dallas,Department of Computer Science

GIR ’10

29 April, 2011Sengyu Rim

Page 2: TWinner : Understanding News Queries with Geo-content using Twitter

Outline

Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion

2/26

Page 3: TWinner : Understanding News Queries with Geo-content using Twitter

Introduction Motivations

– Users find news through search engines

– The search results of common search engines are different from the

user expected Non-critical information Unorganized content

– Necessary for search engines to understand the intend of the user query

3/26

Page 4: TWinner : Understanding News Queries with Geo-content using Twitter

Introduction Motivation

E.g what event in Korea attracted most attention in 2002?

A naive user is searching the news with keyword “korea” on 2002.06-18

Map:korea

Wiki: Korea

News:Korea:Italy

2:1

Food:Kimchi

4/26

Page 5: TWinner : Understanding News Queries with Geo-content using Twitter

Introduction Analyze the content of a popular social networking

site, Twitter to know the intention of the user query

– Twitter provides popular news topics– Twitter provides keywords that may enhance the user query

TWinner makes two novel contributions to the field of Geographic information retrieval

– Identifying the intent of the user query– Adding additional keywords to the query

5/26

Page 6: TWinner : Understanding News Queries with Geo-content using Twitter

Introduction The architecture of the news intent system Twinner

6/26

Page 7: TWinner : Understanding News Queries with Geo-content using Twitter

Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion

7/26

Page 8: TWinner : Understanding News Queries with Geo-content using Twitter

Related Work To identify and disambiguate the locations of users

– Natural Language Processing– Data Mining

To establish the relationship between the location of the

news and news content– A model using NLP techniques

8/26

Page 9: TWinner : Understanding News Queries with Geo-content using Twitter

Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion

9/26

Page 10: TWinner : Understanding News Queries with Geo-content using Twitter

Twitter as News-wire Twitter

– Free social networking – Micro-blogging service– Medium for news updates

10/26

Page 11: TWinner : Understanding News Queries with Geo-content using Twitter

Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion

11/26

Page 12: TWinner : Understanding News Queries with Geo-content using Twitter

Determining News Intent Identification of Location

– Geo-tags the query to a location with certain confidence

Frequency-Population Ratio– FPR always remains constant in the absence of a news mak-

ing event irrespective of the location– Used to assign a news intent confidence to the query– FPR = (α + β) * Nt

α: the population density factor β: location type constant Nt:the number of tweets per minute at that instant

12/26

Page 13: TWinner : Understanding News Queries with Geo-content using Twitter

Determining News Intent Experiments on determining the effect of geo-type

and population density

13/26

Page 14: TWinner : Understanding News Queries with Geo-content using Twitter

Determining News Intent The drawback of FPR

– Fails to take into account the geographical relatedness of features

Modified FPR– FPR = Σ δi (αi + βi) * Nt

δi: factor that each geo-location related to the primary search query

14/26

Page 15: TWinner : Understanding News Queries with Geo-content using Twitter

Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion

15/26

Page 16: TWinner : Understanding News Queries with Geo-content using Twitter

Assigning Weights to Tweets Detecting Spam Messages

– Spam messages carry little or no relevant information– Nature of spam messages– The formula that tags to a certain level of confidence

whether the message is spam or not

Np: the number of followers Nq: the number of people the user is following μ: an arbitrary constant Nr: the ratio of number of tweets containing a reply to the total number of tweets

16/26

Page 17: TWinner : Understanding News Queries with Geo-content using Twitter

Assigning Weights to Tweets On basis of user location

– The experiment conducted to understand the relation be-tween Twitter

messages and the location of the user

17/26

Page 18: TWinner : Understanding News Queries with Geo-content using Twitter

Assigning Weights to Tweets Using Hyperlinks Mentioned in Tweets

– 30-50% of the general Twitter messages contain a hyperlink to

external website

– The news Twitter messages of this percentage increases to 70-80%

– We also make use of this pointer to assign the weights to tweets

18/26

Page 19: TWinner : Understanding News Queries with Geo-content using Twitter

Assigning Weights to Tweets

Semantic Similarity– Summarize the Twitter messages into a couple of keywords– Naïve approach picks k keywords ignoring the sematic simi-

larity– The definition of the semantic similarity

M: the total number of articles searched in New York Times Cor-pus

f(x): the number of articles for term x f(y): the number of articles for term y

19/26

Page 20: TWinner : Understanding News Queries with Geo-content using Twitter

Assigning Weights to Tweets

Reassigns the weight of all keywords on the basis of the following

formula– Wi*= Wi + ΣSij* Wj

Wi*: the new weight of the keyword i Wi: the weight without semantic similarity Sij: the semantic similarity derived from semantic formula

Wj : the initial weight of the other words being considered

Identifies k keywords that are semantically dissimilar but together contribute maximum weight.– Spq<Sthreshold, the similarity between any two word(p) and

word(q) belonging to the set of k is less than a threshold– W1+W2+W3+….+Wk is maximum for all groups satisfying

the condition above mentioned

20/26

Page 21: TWinner : Understanding News Queries with Geo-content using Twitter

Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion

21/26

Page 22: TWinner : Understanding News Queries with Geo-content using Twitter

Experiment and Results Experiments-to see the validity of the hypothesis

– First: a naïve user is looking for the latest on the happenings in the

context to the Ford Hood incident on 12th November 2009

– Second: a naïve user is looking for the latest on the happen-ings in the

context to ‘Russia’ on 5th December 2009

– Third: :a naïve user is looking for the latest on the happen-ings in the

context to ‘Haiti’ on 18th January 2010

22/26

Page 23: TWinner : Understanding News Queries with Geo-content using Twitter

Experiment and Results Results

23/26

Page 24: TWinner : Understanding News Queries with Geo-content using Twitter

Experiment and Results Result-shows the contrast in search results produced

by using original query and after adding keywords obtained by TWinner

24/26

Page 25: TWinner : Understanding News Queries with Geo-content using Twitter

Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion

25/26

Page 26: TWinner : Understanding News Queries with Geo-content using Twitter

Conclusion We present a system to predict a user’s news intent

– Takes location mentioned and time of query into considera-tion

– Makes use of the social networking site Twitter to understand the relationship between geo-information and the news in-tend of the query

Future work– Understanding the content of the social media message– Sentiment conveyed by the messages– Enhancing the accuracy of the system

26/26

Page 27: TWinner : Understanding News Queries with Geo-content using Twitter

Thank you!