Differentially Private Multi-
Dimensional Time Series Release
for Traffic Monitoring
Liyue Fan, Li Xiong, Vaidy Sunderam
Department of Math & Computer Science
Emory University
DBSec’13
Outline
• Traffic Monitoring
• User Privacy
• Challenges
• Proposed Solutions
• Temporal Estimation
• Spatial Estimation
• Empirical Evaluation
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 2
Monitoring Traffic
• Congestions/Trending places/Everyday life
• How many cars are there? Where are they?
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 3
Monital Metropol, Brazil
Google Traffic View
Real-time user location
Traffic Monitoring
• Real-time GPS data traffic histogram
• At any timestamp:
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 4
Aggregate 2D Histogram
User Privacy
• User privacy should be protected when releasing their data!
• Real-time location data is sensitive
• pleaserobme.com
• GPS traces are identifying
• “We study fifteen months of human mobility data for one and a half
million individuals and find that human mobility traces are highly unique.
… in a dataset where the location of an individual is specified hourly, and
with a spatial resolution equal to that given by the carrier's antennas, four
spatio-temporal points are enough to uniquely identify 95% of the
individuals.”
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 5
De Montjoye, Yves-Alexandre, Cesar A. Hidalgo, Michel Verleysen, and Vincent D. Blondel.
"Unique in the Crowd: The Privacy Bounds of Human Mobility." Scientific Reports 3 (2013)
Differentially Private Data Sharing
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 6
Differential Privacy (in a nutshell)
• Rigorous definition
• Doesn’t stipulate the prior knowledge of the attacker
• Upon seeing the published data, an attacker should gain
little knowledge about any specific individual.
• α-Differential Privacy[BLR08]
• Smaller α values (𝛼 < 1) indicate stronger privacy
guarantee
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 7
Privacy Budget
Static α-Differential Privacy
• Laplace perturbation
𝐴 𝐷 = 𝑓 𝐷 + 𝐿𝑎𝑝(∆𝑓
𝛼)𝑑
• Global Sensitivity
∆𝑓 = max𝐷,𝐷′
𝑓 𝐷 − 𝑓(𝐷′) 1
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 8
𝑐1:2 𝑐2:1
𝑐3:3 𝑐4:4 𝑓(𝐷):
Δ𝑓 = 1
Laplace Perturbation 𝑐 1:1 𝑐 2:0
𝑐 3:5 𝑐 4:3 A(𝐷):
𝑐 𝑖=𝑐𝑖+ Lap(1
𝛼)
strong privacy → high
perturbation noise
Dataset D
Query f
Composability of Differential Privacy
• Sequential Composition [McSherry10]
• Let 𝐴𝑘 each provide 𝛼𝑘-differential privacy. A sequence of 𝐴𝑘(𝐷)
over dataset 𝐷 provides 𝛼𝑘 -differential privacy.
• Timestamp k = 0, … 𝑇 − 1
• 𝑓𝑘(𝐷): 2D cell histogram at time 𝑘
• 𝐴𝑘(𝐷): released 2D histogram that satisfies 𝛼
𝑇-DP
• 𝐴0 𝐷 , … , 𝐴𝑇−1(𝐷) satisfies 𝛼-DP
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 9
Baseline Solution: LPA
• Laplace Perturbation Algorithm
• For each timestamp k:
• Release 𝐴𝑘 𝐷 = 𝑓𝑘(𝐷) + 𝐿𝑎𝑝(𝑇
𝛼)𝑑
• High perturbation noise for long time-series, i.e. when T is large
• Low utility output since data is sparse
• Fact: location data is VERY sparse.
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 10
𝑐1:2 𝑐2:1
𝑐3:3 𝑐4:4
𝑐 1:1 𝑐 2:0
𝑐 3:5 𝑐 4:3
Relative error
𝑐1: 50%
𝑐2: 100%
Two Proposed Solutions
• Temporal Estimation for each cell
• Spatial Estimation within each partition
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 11
𝑐1 𝑐2
𝑐3 𝑐4
1 1 0 0
1 2 1 0
2 3 4 4
3 3 6 10
Utilize time series model
and posterior estimation to
reduce perturbation error.
Group similar cells together
to overcome data sparsity.
Framework
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 12
Laplace Perturbation
Estimation
Modeling/Partitioning Raw Series Differentially Private Series
Domain knowledge: known Sparse or
Dense label for each cell.
Doesn’t incur extra
differential privacy cost
Temporal Estimation
• For each cell, its count series {𝑥𝑘}, k = 0, … 𝑇 − 1
• e.g. {3,3,4,5,4,3,2,…}
• Process Model 𝑥𝑘+1 = 𝑥𝑘 + 𝜔
𝜔~ℕ(0, 𝑄)
• Measurement Model 𝑧𝑘 = 𝑥𝑘 + 𝜈
𝜈~𝐿𝑎𝑝(𝑇
𝛼)
• Goal: given 𝑧𝑘 and the above models, estimate 𝑥𝑘.
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 13
Small value for Sparse cells;
Large value for Dense cells.
Temporal Estimation(cont.)
• Estimation algorithm based on the Kalman filter
• Gaussian approx 𝜈~ℕ(0, 𝑅) , 𝑅 ∝𝑇2
𝛼2
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 14
Model-based
Prediction
Posterior
Estimate/Output
Linearly combine
prediction and
measurement
O(1) computation
per timestamp
Fan and Xiong CIKM’12, TKDE’13
Temporal Estimation Example
• For cell c, at time k:
• Suppose 𝑥𝑘 = 4
• Prediction 𝑥 𝑘−
, e.g. 2
• Measurement/Laplace perturbed value 𝑧𝑘, e.g. 8
• Posterior estimation 𝑥 𝑘, e.g. 3
• Impact of perturbation noise is reduced by taking into account of the
process model and prediction!
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 15
Spatial Estimation
• Goal: group cells to overcome data sparsity.
• First partition the space until each partition contains Sparse or
Dense cells only
• Topdown algorithm based on QuadTree
• Data independency and efficiency
• For each timestamp k:
• 𝑓′𝑘
𝐷 : partition counts
• 𝐴′𝑘 𝐷 = 𝑓′𝑘(𝐷) + 𝐿𝑎𝑝(𝑇
𝛼)𝑑′
• Release 𝑓 𝑘(𝐷) estimated from 𝐴′𝑘 𝐷
• Each cell is visited O(1) times at each timestamp.
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 16
S S S S
S S S S
S S S S
S S D D
Δ𝑓′𝑘
= 1
Spatial Estimation Example
• At time k
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 17
5 1
11 4 4
6 10
6 0
12 5 3
6 11
1 1 0 0
1 1 0 0
3 3 5 3
3 3 6 11
1 1 0 0
1 2 1 0
2 3 4 4
3 3 6 10
Original Cell
Histogram 𝒇𝒌 𝑫 :
Partition
Histogram 𝒇′𝒌
𝑫
Laplace
Perturbed 𝑨′𝒌 𝑫
Estimated Cell
Histogram 𝒇 𝒌(𝑫)
Perturbation noise is evenly
distributed to every cell
within the partition.
Evaluation: Data
• Generated moving objects on a road network
• City of Oldenburg, Germany
• 500K objects at the beginning
• 25K new objects at every timestamp
• total time: 100 timestamps
• Two-dimensional 1024 by 1024 grid over the city map
• Each cell represents 400 m2
• Record object locations at cell resolution
• 95% cells are labeled Sparse!
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 18
http://iapg.jade-hs.de/personen/brinkhoff/generator/
Temporal Estimation
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 19
-500
-400
-300
-200
-100
0
100
200
300
400
1 11 21 31 41 51
orig
Laplace
Kalman
time
cell
cou
nt
Spatial Partitions
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 20
Oldenburg Road Network Partitions by QuadTree
Evaluation: Utility vs. Privacy
• Utility of each cell: Average Relative Error of released series
• For each 𝛼 value, median utility among each class is plotted
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 21
DFT: Rastogi and Nath, SIGMOD’10
Evaluation: Range Queries
• How many objects are in the area of m by m cells at every
timestamp?
• For each m, 100 areas are randomly selected and evaluated.
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 23
Evaluation: Runtime
• Overall runtime is plotted in millisecond.
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 24
Conclusion
• Difficult when time series is long and data is sparse!
• Domain knowledge can be used for temporal modeling as well as spatial partitioning.
• Output utility is improved with same privacy guarantee.
• We don’t observe extra time cost by our solutions.
• Ongoing work:
• Utilize rich information in spatio-temporal data.
• Model learning and parameter learning.
• Contact: [email protected]
• AIMS Group: www.mathcs.emory.edu/aims
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 25
Q&A
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 26