differentially private multi- dimensional time series

25
Differentially Private Multi- Dimensional Time Series Release for Traffic Monitoring Liyue Fan, Li Xiong, Vaidy Sunderam Department of Math & Computer Science Emory University DBSec’13

Upload: others

Post on 19-Mar-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Differentially Private Multi- Dimensional Time Series

Differentially Private Multi-

Dimensional Time Series Release

for Traffic Monitoring

Liyue Fan, Li Xiong, Vaidy Sunderam

Department of Math & Computer Science

Emory University

DBSec’13

Page 2: Differentially Private Multi- Dimensional Time Series

Outline

• Traffic Monitoring

• User Privacy

• Challenges

• Proposed Solutions

• Temporal Estimation

• Spatial Estimation

• Empirical Evaluation

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 2

Page 3: Differentially Private Multi- Dimensional Time Series

Monitoring Traffic

• Congestions/Trending places/Everyday life

• How many cars are there? Where are they?

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 3

Monital Metropol, Brazil

Google Traffic View

Page 4: Differentially Private Multi- Dimensional Time Series

Real-time user location

Traffic Monitoring

• Real-time GPS data traffic histogram

• At any timestamp:

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 4

Aggregate 2D Histogram

Page 5: Differentially Private Multi- Dimensional Time Series

User Privacy

• User privacy should be protected when releasing their data!

• Real-time location data is sensitive

• pleaserobme.com

• GPS traces are identifying

• “We study fifteen months of human mobility data for one and a half

million individuals and find that human mobility traces are highly unique.

… in a dataset where the location of an individual is specified hourly, and

with a spatial resolution equal to that given by the carrier's antennas, four

spatio-temporal points are enough to uniquely identify 95% of the

individuals.”

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 5

De Montjoye, Yves-Alexandre, Cesar A. Hidalgo, Michel Verleysen, and Vincent D. Blondel.

"Unique in the Crowd: The Privacy Bounds of Human Mobility." Scientific Reports 3 (2013)

Page 6: Differentially Private Multi- Dimensional Time Series

Differentially Private Data Sharing

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 6

Page 7: Differentially Private Multi- Dimensional Time Series

Differential Privacy (in a nutshell)

• Rigorous definition

• Doesn’t stipulate the prior knowledge of the attacker

• Upon seeing the published data, an attacker should gain

little knowledge about any specific individual.

• α-Differential Privacy[BLR08]

• Smaller α values (𝛼 < 1) indicate stronger privacy

guarantee

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 7

Privacy Budget

Page 8: Differentially Private Multi- Dimensional Time Series

Static α-Differential Privacy

• Laplace perturbation

𝐴 𝐷 = 𝑓 𝐷 + 𝐿𝑎𝑝(∆𝑓

𝛼)𝑑

• Global Sensitivity

∆𝑓 = max𝐷,𝐷′

𝑓 𝐷 − 𝑓(𝐷′) 1

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 8

𝑐1:2 𝑐2:1

𝑐3:3 𝑐4:4 𝑓(𝐷):

Δ𝑓 = 1

Laplace Perturbation 𝑐 1:1 𝑐 2:0

𝑐 3:5 𝑐 4:3 A(𝐷):

𝑐 𝑖=𝑐𝑖+ Lap(1

𝛼)

strong privacy → high

perturbation noise

Dataset D

Query f

Page 9: Differentially Private Multi- Dimensional Time Series

Composability of Differential Privacy

• Sequential Composition [McSherry10]

• Let 𝐴𝑘 each provide 𝛼𝑘-differential privacy. A sequence of 𝐴𝑘(𝐷)

over dataset 𝐷 provides 𝛼𝑘 -differential privacy.

• Timestamp k = 0, … 𝑇 − 1

• 𝑓𝑘(𝐷): 2D cell histogram at time 𝑘

• 𝐴𝑘(𝐷): released 2D histogram that satisfies 𝛼

𝑇-DP

• 𝐴0 𝐷 , … , 𝐴𝑇−1(𝐷) satisfies 𝛼-DP

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 9

Page 10: Differentially Private Multi- Dimensional Time Series

Baseline Solution: LPA

• Laplace Perturbation Algorithm

• For each timestamp k:

• Release 𝐴𝑘 𝐷 = 𝑓𝑘(𝐷) + 𝐿𝑎𝑝(𝑇

𝛼)𝑑

• High perturbation noise for long time-series, i.e. when T is large

• Low utility output since data is sparse

• Fact: location data is VERY sparse.

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 10

𝑐1:2 𝑐2:1

𝑐3:3 𝑐4:4

𝑐 1:1 𝑐 2:0

𝑐 3:5 𝑐 4:3

Relative error

𝑐1: 50%

𝑐2: 100%

Page 11: Differentially Private Multi- Dimensional Time Series

Two Proposed Solutions

• Temporal Estimation for each cell

• Spatial Estimation within each partition

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 11

𝑐1 𝑐2

𝑐3 𝑐4

1 1 0 0

1 2 1 0

2 3 4 4

3 3 6 10

Utilize time series model

and posterior estimation to

reduce perturbation error.

Group similar cells together

to overcome data sparsity.

Page 12: Differentially Private Multi- Dimensional Time Series

Framework

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 12

Laplace Perturbation

Estimation

Modeling/Partitioning Raw Series Differentially Private Series

Domain knowledge: known Sparse or

Dense label for each cell.

Doesn’t incur extra

differential privacy cost

Page 13: Differentially Private Multi- Dimensional Time Series

Temporal Estimation

• For each cell, its count series {𝑥𝑘}, k = 0, … 𝑇 − 1

• e.g. {3,3,4,5,4,3,2,…}

• Process Model 𝑥𝑘+1 = 𝑥𝑘 + 𝜔

𝜔~ℕ(0, 𝑄)

• Measurement Model 𝑧𝑘 = 𝑥𝑘 + 𝜈

𝜈~𝐿𝑎𝑝(𝑇

𝛼)

• Goal: given 𝑧𝑘 and the above models, estimate 𝑥𝑘.

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 13

Small value for Sparse cells;

Large value for Dense cells.

Page 14: Differentially Private Multi- Dimensional Time Series

Temporal Estimation(cont.)

• Estimation algorithm based on the Kalman filter

• Gaussian approx 𝜈~ℕ(0, 𝑅) , 𝑅 ∝𝑇2

𝛼2

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 14

Model-based

Prediction

Posterior

Estimate/Output

Linearly combine

prediction and

measurement

O(1) computation

per timestamp

Fan and Xiong CIKM’12, TKDE’13

Page 15: Differentially Private Multi- Dimensional Time Series

Temporal Estimation Example

• For cell c, at time k:

• Suppose 𝑥𝑘 = 4

• Prediction 𝑥 𝑘−

, e.g. 2

• Measurement/Laplace perturbed value 𝑧𝑘, e.g. 8

• Posterior estimation 𝑥 𝑘, e.g. 3

• Impact of perturbation noise is reduced by taking into account of the

process model and prediction!

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 15

Page 16: Differentially Private Multi- Dimensional Time Series

Spatial Estimation

• Goal: group cells to overcome data sparsity.

• First partition the space until each partition contains Sparse or

Dense cells only

• Topdown algorithm based on QuadTree

• Data independency and efficiency

• For each timestamp k:

• 𝑓′𝑘

𝐷 : partition counts

• 𝐴′𝑘 𝐷 = 𝑓′𝑘(𝐷) + 𝐿𝑎𝑝(𝑇

𝛼)𝑑′

• Release 𝑓 𝑘(𝐷) estimated from 𝐴′𝑘 𝐷

• Each cell is visited O(1) times at each timestamp.

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 16

S S S S

S S S S

S S S S

S S D D

Δ𝑓′𝑘

= 1

Page 17: Differentially Private Multi- Dimensional Time Series

Spatial Estimation Example

• At time k

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 17

5 1

11 4 4

6 10

6 0

12 5 3

6 11

1 1 0 0

1 1 0 0

3 3 5 3

3 3 6 11

1 1 0 0

1 2 1 0

2 3 4 4

3 3 6 10

Original Cell

Histogram 𝒇𝒌 𝑫 :

Partition

Histogram 𝒇′𝒌

𝑫

Laplace

Perturbed 𝑨′𝒌 𝑫

Estimated Cell

Histogram 𝒇 𝒌(𝑫)

Perturbation noise is evenly

distributed to every cell

within the partition.

Page 18: Differentially Private Multi- Dimensional Time Series

Evaluation: Data

• Generated moving objects on a road network

• City of Oldenburg, Germany

• 500K objects at the beginning

• 25K new objects at every timestamp

• total time: 100 timestamps

• Two-dimensional 1024 by 1024 grid over the city map

• Each cell represents 400 m2

• Record object locations at cell resolution

• 95% cells are labeled Sparse!

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 18

http://iapg.jade-hs.de/personen/brinkhoff/generator/

Page 19: Differentially Private Multi- Dimensional Time Series

Temporal Estimation

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 19

-500

-400

-300

-200

-100

0

100

200

300

400

1 11 21 31 41 51

orig

Laplace

Kalman

time

cell

cou

nt

Page 20: Differentially Private Multi- Dimensional Time Series

Spatial Partitions

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 20

Oldenburg Road Network Partitions by QuadTree

Page 21: Differentially Private Multi- Dimensional Time Series

Evaluation: Utility vs. Privacy

• Utility of each cell: Average Relative Error of released series

• For each 𝛼 value, median utility among each class is plotted

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 21

DFT: Rastogi and Nath, SIGMOD’10

Page 22: Differentially Private Multi- Dimensional Time Series

Evaluation: Range Queries

• How many objects are in the area of m by m cells at every

timestamp?

• For each m, 100 areas are randomly selected and evaluated.

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 23

Page 23: Differentially Private Multi- Dimensional Time Series

Evaluation: Runtime

• Overall runtime is plotted in millisecond.

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 24

Page 24: Differentially Private Multi- Dimensional Time Series

Conclusion

• Difficult when time series is long and data is sparse!

• Domain knowledge can be used for temporal modeling as well as spatial partitioning.

• Output utility is improved with same privacy guarantee.

• We don’t observe extra time cost by our solutions.

• Ongoing work:

• Utilize rich information in spatio-temporal data.

• Model learning and parameter learning.

• Contact: [email protected]

• AIMS Group: www.mathcs.emory.edu/aims

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 25

Page 25: Differentially Private Multi- Dimensional Time Series

Q&A

9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 26