twinner : understanding news queries with geo-content using twitter
DESCRIPTION
TWinner : Understanding News Queries with Geo-content using Twitter. Satyen Abrol,Latifur Khan University of Texas at Dallas,Department of Computer Science GIR ’1 0. 29 April, 2011 Sengyu Rim. Outline. Introduction Related Work Twitter as News-wire Determining News Intent - PowerPoint PPT PresentationTRANSCRIPT
TWinner: Understanding News Queries with Geo-con-tent using Twitter
Satyen Abrol,Latifur KhanUniversity of Texas at Dallas,Department of Computer Science
GIR ’10
29 April, 2011Sengyu Rim
Outline
Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion
2/26
Introduction Motivations
– Users find news through search engines
– The search results of common search engines are different from the
user expected Non-critical information Unorganized content
– Necessary for search engines to understand the intend of the user query
3/26
Introduction Motivation
E.g what event in Korea attracted most attention in 2002?
A naive user is searching the news with keyword “korea” on 2002.06-18
Map:korea
Wiki: Korea
News:Korea:Italy
2:1
Food:Kimchi
4/26
Introduction Analyze the content of a popular social networking
site, Twitter to know the intention of the user query
– Twitter provides popular news topics– Twitter provides keywords that may enhance the user query
TWinner makes two novel contributions to the field of Geographic information retrieval
– Identifying the intent of the user query– Adding additional keywords to the query
5/26
Introduction The architecture of the news intent system Twinner
6/26
Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion
7/26
Related Work To identify and disambiguate the locations of users
– Natural Language Processing– Data Mining
To establish the relationship between the location of the
news and news content– A model using NLP techniques
8/26
Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion
9/26
Twitter as News-wire Twitter
– Free social networking – Micro-blogging service– Medium for news updates
10/26
Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion
11/26
Determining News Intent Identification of Location
– Geo-tags the query to a location with certain confidence
Frequency-Population Ratio– FPR always remains constant in the absence of a news mak-
ing event irrespective of the location– Used to assign a news intent confidence to the query– FPR = (α + β) * Nt
α: the population density factor β: location type constant Nt:the number of tweets per minute at that instant
12/26
Determining News Intent Experiments on determining the effect of geo-type
and population density
13/26
Determining News Intent The drawback of FPR
– Fails to take into account the geographical relatedness of features
Modified FPR– FPR = Σ δi (αi + βi) * Nt
δi: factor that each geo-location related to the primary search query
14/26
Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion
15/26
Assigning Weights to Tweets Detecting Spam Messages
– Spam messages carry little or no relevant information– Nature of spam messages– The formula that tags to a certain level of confidence
whether the message is spam or not
Np: the number of followers Nq: the number of people the user is following μ: an arbitrary constant Nr: the ratio of number of tweets containing a reply to the total number of tweets
16/26
Assigning Weights to Tweets On basis of user location
– The experiment conducted to understand the relation be-tween Twitter
messages and the location of the user
17/26
Assigning Weights to Tweets Using Hyperlinks Mentioned in Tweets
– 30-50% of the general Twitter messages contain a hyperlink to
external website
– The news Twitter messages of this percentage increases to 70-80%
– We also make use of this pointer to assign the weights to tweets
18/26
Assigning Weights to Tweets
Semantic Similarity– Summarize the Twitter messages into a couple of keywords– Naïve approach picks k keywords ignoring the sematic simi-
larity– The definition of the semantic similarity
M: the total number of articles searched in New York Times Cor-pus
f(x): the number of articles for term x f(y): the number of articles for term y
19/26
Assigning Weights to Tweets
Reassigns the weight of all keywords on the basis of the following
formula– Wi*= Wi + ΣSij* Wj
Wi*: the new weight of the keyword i Wi: the weight without semantic similarity Sij: the semantic similarity derived from semantic formula
Wj : the initial weight of the other words being considered
Identifies k keywords that are semantically dissimilar but together contribute maximum weight.– Spq<Sthreshold, the similarity between any two word(p) and
word(q) belonging to the set of k is less than a threshold– W1+W2+W3+….+Wk is maximum for all groups satisfying
the condition above mentioned
20/26
Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion
21/26
Experiment and Results Experiments-to see the validity of the hypothesis
– First: a naïve user is looking for the latest on the happenings in the
context to the Ford Hood incident on 12th November 2009
– Second: a naïve user is looking for the latest on the happen-ings in the
context to ‘Russia’ on 5th December 2009
– Third: :a naïve user is looking for the latest on the happen-ings in the
context to ‘Haiti’ on 18th January 2010
22/26
Experiment and Results Results
23/26
Experiment and Results Result-shows the contrast in search results produced
by using original query and after adding keywords obtained by TWinner
24/26
Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion
25/26
Conclusion We present a system to predict a user’s news intent
– Takes location mentioned and time of query into considera-tion
– Makes use of the social networking site Twitter to understand the relationship between geo-information and the news in-tend of the query
Future work– Understanding the content of the social media message– Sentiment conveyed by the messages– Enhancing the accuracy of the system
26/26
Thank you!