location-based topic evolution haiqin yang, shouyuan chen, michael r. lyu, irwin king the chinese...
TRANSCRIPT
Location-Based Topic Evolution
Haiqin Yang, Shouyuan Chen, Michael R. Lyu, Irwin King
The Chinese University of Hong Kong
1
Outline
Motivation Location-Based Topic Evolution Model Experiments Conclusion
2
Location Information is attainable IPGPS3G, Wi-FiNFC
New Mobile Technologies
3
Geo-information
Twitter Typhoon trajectory estimation Earthquake location [Sakaki et
al.,WWW’10] Flickr
Geo-tagged photos [Crandall et al., WWW’09]
Geofolk [Sizov, WSDM’10]
4
New Applications-Timeliness
Identify users’ interests in a region
5
New Applications-Commercial Value
Determine appropriate marketing strategy
6
Solution-Topics Learning
Topics: Distributions over words Location-associated documents
Geo-informaiton with message, posts, tags
Help to learn the topics more accurately
7
Current Problems
Do not consider appearance and disappearance of topics
Do not model topic evolution Have to determine the number of
topics Location-aware Topic Model [Wang et al.
GIR’07] Geofolk [Sizov, WSDM’10] Geographical topic discovery [Yin et al.
WWW’11]8
Our Contributions
Propose a location-based topic evolution (LBTE) model Model topic changes of users’ interests
in a region Allow for appearance and disappearance
of topics Automatically determine topic numbers
Efficient inference
9
Problem Setup
Vocabulary: Data:
Objective: modeling the topics of data with an unknown number of topics and parameters.
10
Assumptions
Documents from unknown topics Topic from hidden
functions, determined by the function value
Functions from a probability measure
11
Evolution with Regions
Domains of functions include regions Values of functions represent topics
12
Evolution with Regions and Time
The beginning (end) of function domain correspond to appearance (disappearance) of a topic
13
Generative Process
l
llll
H
Ghh
DDDh
G
l
llll
GGGGl
N
lll
~.3
.2
~,~.1
:, 1
w
w
documents Generate
topicszeCharacteri
DP functionsGenerate
nsobservatio generative of process The
14
by zedparameteri ondistributiy probabilit a is
location aat processDirichlet a is DP~
H
GD GG
Inference-Gibbs Sampler
1. Sample auxiliary variables: To determine whether the domain of the function contain the region (Bernoulli)
2. Sample assignment: Calculate the probability of assigning to existing function and that of assigning to a new function
3. Draw topics parameters15
Experiments
Datasets Synthetic data Flickr data
Comparison methods Dirichlet Process Mixture (DPM) Location-Based Topic Evolution (LBTE)
16
Synthetic Data
Topics Generation Topics Initialization-Two topics
Center: Parameter:
Topics Evolution Die off rate 40% New topic follows Poisson distribution with parameter
0.8. Location-associated Documents Generation
10 documents for each topic Location of each documents follows the uniform
distribution at the center of the topic with radius, 5 Values of topics follow
17
Results of Synthetic Data
LBTE outperforms the DPM at all the time stamps
18
LBTE recovers true topics and achieves zero variation of information
Flickr Data
Geo-tagged photos crawled from 2009/01/01 to 2010/01/01
Only in USA territory.
19
An example{ "date": "2009-07-07 19:34:04", "lat": "36.058961", "lon": "-112.083442", "id": "5919764020", "tags": [ "grandcanyon", "nationalpark", "sunset", "limestone", "scenic"] }
Results of National Park
Topics learned from DPM are scattered
20
Results of National Park LBTE utilizes location information and
discovers topics based on the regions
21
Yellow Stone
Grand Canyon
Big Bend
Joshua Tree
Results of National Park
22
Conclusion
Advantages of Location-based Topic Evolution Model Automatically modeling the number of
total topics Automatically modeling topics’
appearance and disappearance Succinct sampling-Gibbs sampling
23
Thank you !
24
Sample Auxiliary Variables
Sample Assignment