a topic model for traffic speed data analysis
DESCRIPTION
http://link.springer.com/chapter/10.1007%2F978-3-319-07467-2_8TRANSCRIPT
Real-Time Traffic Speed Data | NYC Open Datahttps://data.cityofnewyork.us/Transportation/Real-Time-Traffic-Speed-Data/xsat-x5sa
Speed measurements at hundreds of sensors
(Regrettably, the data seems no longer maintained.)
Problem
• Traffic speed data show a periodicity at
one day period.
• However, there is a wide variety not only
between periods but also within periods.
• How can we analyze it?
Solution
• We take intuition from topic models
in text mining.
Topic models for documents
• We can assume that each document contains
multiple topics.
• That is, each document is modeled
– not as a single word probability distribution,
– but as a mixture of word probability distributions.
Latent Dirichlet Allocation (LDA)
• LDA [Blei et al. 03]
topic <-> word probability distribution
document <-> mixing proportions of topics
• LDA models each document as follows:
v3v3
v1v1
v3v3
v2v2
v2v2
v1 v2 v3 v4
t3φ31
φ32
φ33
φ34
v1 v2 v3 v4
t2φ21
φ22
φ23 φ24
v1 v2 v3 v4
t1φ11
φ12
φ13
φ14
θj1 θj2
θj3
An important difference
• Words are discrete entities.
– Therefore, LDA uses multinomial distributions for
modeling per-topic word distributions.
• Speeds (in mph) are continuous entities.
– We can’t use multinomial distributions.
Gamma distribution
Comparing LDA with Patchy
• LDA <-> Patchy
– Word <-> Speed observation (in mph)
– Topic (multinomial) <-> Patch (Gamma)
– Document <-> Roll (from 0 AM to 12 PM)
Full joint distribution of Patchy
• We estimate parameters by a variational
Bayesian inference.
Variational Bayesian inference
• The posterior parameters are estimated
by maximizing ELBO.
– ELBO = the lower bound of the evidence
Context dependency
Observations of the same mph
are assigned to different patches.
Observations of the same mph
are assigned to different patches.
Context dependency
• Context = mixing proportions of patches
– Which patch is dominant?
• Context-dependency
–Observations of the same speed can be
assigned to different patches depending on
their contexts.
Context dependencyOn May 27, this purple patch is
dominant.
On May 28, this yellow patch is
dominant.
Evaluation
• Binary classification
–Weekdays / Weekends (Sat, Sun)
• Data
– Training: May 27 ~ June 16 (three weeks)
– Test: July 23 ~ August 5 (two weeks)
Comparison
• Nearest neighbor
–Measure similarity by Euclidean distance
–Require timestamps
• Patchy
–Measure similarity by predictive probability
–Require no timestamps
Classification results
Nearest neighbor
Summary
• We proposed a topic model for traffic data analysis.
• Patchy can assign the observations of the same
traffic speed to different groups in a context-
dependent manner.
• Patchy achieved a classification accuracy comparable
with NN with no timestamps.
Future work
• Model timestamps