de-anonymization of mobility trajectories: dissecting the ... · ü1717 users in [www 2016]...

23
De-anonymization of Mobility Trajectories: Dissecting the Gaps between Theory and Practice Huandong Wang 1 , Chen Gao 1 , Yong Li 1 , Gang Wang 2 , Depeng Jin 1 , Jingbo Sun 3 1 Tsinghua University, China 2 Virginia Tech 3 China Telecom Beijing Research Institute

Upload: others

Post on 27-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: De-anonymization of Mobility Trajectories: Dissecting the ... · ü1717 users in [WWW 2016] ØSynthetized datasets üParts of the same dataset [TON 2011] 3 De-anonymization Attack:

De-anonymizationofMobilityTrajectories:DissectingtheGapsbetweenTheoryandPractice HuandongWang1,ChenGao1,YongLi1,GangWang2,DepengJin1,JingboSun3

1TsinghuaUniversity,China2VirginiaTech3ChinaTelecomBeijingResearchInstitute

Page 2: De-anonymization of Mobility Trajectories: Dissecting the ... · ü1717 users in [WWW 2016] ØSynthetized datasets üParts of the same dataset [TON 2011] 3 De-anonymization Attack:

n AnonymizedusertrajectoriesareincreasinglycollectedbyISPsØ Highresearchandbusinessvalue

n GrowingprivacyconcernØ ISPsaremotivatedtomonetizeorshareusertrajectorydata

n De-anonymizationattackØ Howlikelyuserscanbede-anonymizedinthesharedISPtrajectorydataset?

2

IncreasingConcernonPrivacy/Security

Page 3: De-anonymization of Mobility Trajectories: Dissecting the ... · ü1717 users in [WWW 2016] ØSynthetized datasets üParts of the same dataset [TON 2011] 3 De-anonymization Attack:

n AppallingTheoreticalPrivacyBoundØ 4locationpointsuniquelyre-identify95%users[ScientificReport2013]

n PracticalChallenge:Lackoflargereal-worldground-truthdatasetsØ Smalldatasets

ü 1717usersin[WWW2016]Ø Synthetizeddatasets

ü Partsofthesamedataset[TON2011]

3

De-anonymizationAttack:TheoryandPractice

Isthistrueinpractice?

Page 4: De-anonymization of Mobility Trajectories: Dissecting the ... · ü1717 users in [WWW 2016] ØSynthetized datasets üParts of the same dataset [TON 2011] 3 De-anonymization Attack:

4

OurApproach:CollectThreeReal-worldGround-truthDatasets

Dataset Total#Users Total#Records ISP 2,161,500 134,033,750 WeiboApp-level 56,683 239,289WeiboCheck-in(Historical) 10,750 141,131WeiboCheck-in(One-week)

506 873

DianpingApp-level 45,790 107,543

Ground-Truth:Tracesfromthesamesetofusers

Weibo Dianping ISP

Attack

n ISPDatasetØ Shanghai,4/19-4/26,2016(victimdataset)Ø 2millionusersØ AccesslogstocellulartoweràLocationtraces

n WeiboDataset:OneofthelargestsocialnetworksinChina(externalinformation)n DianpingDataset:“ChineseYelp”(externalinformation)

Page 5: De-anonymization of Mobility Trajectories: Dissecting the ... · ü1717 users in [WWW 2016] ØSynthetized datasets üParts of the same dataset [TON 2011] 3 De-anonymization Attack:

HowtoObtaintheGround-Truth?

EthicalapprovalobtainedfromWeiboandDianping

WeiboàCheck-insàGPSinULRparameter

WeiboIDinHTTPRequest

ISPTraces

DianpingàGPSinULRparameterDianpingIDinHTTPRequest

5

Page 6: De-anonymization of Mobility Trajectories: Dissecting the ... · ü1717 users in [WWW 2016] ØSynthetized datasets üParts of the same dataset [TON 2011] 3 De-anonymization Attack:

n AnonymizedTrajectoryDataPublishedbyISPØ Anonymization:Replaceuseridentitywiththepseudonym

n AdversaryØ Matchtheanonymizedtraces(e.g.,ISPtraces)andexternaltraces(e.g.,Weibo/Dianpingtraces)

Ø SocialnetworkhasPIIàreal-worldidentifier

6

ExternalTrajectoriesvs.

AnonymizedTrajectories

Candidatetrajectories

PerformanceFunction

SimilarityScore

Function

AttackPerformance

Top𝟏Top𝒏

De-anonymizationAttack:ThreatModel

Page 7: De-anonymization of Mobility Trajectories: Dissecting the ... · ü1717 users in [WWW 2016] ØSynthetized datasets üParts of the same dataset [TON 2011] 3 De-anonymization Attack:

7

De-anonymization:TheoreticalBoundbasedonUniqueness

5 points are sufficient to uniquely identify 75% trajectories! Highpotentialriskoftrajectoriestobede-anonymized!

n Numberofpointssufficienttouniquelyidentifyatrajectory

n 𝑇↓𝑝 :Randomlysampledppointsn 𝐴(𝑇↓𝑝 ):findalltrajectoriescontainingtheppointsof 𝑇↓𝑝 

n Uniqueness:|𝐴(𝑇↓𝑝 )|=1?

Uniqueness of ISP trajectories

75%Unique

Page 8: De-anonymization of Mobility Trajectories: Dissecting the ... · ü1717 users in [WWW 2016] ØSynthetized datasets üParts of the same dataset [TON 2011] 3 De-anonymization Attack:

ActualPerformanceBasedonWeibo’sApp-levelTrajectories

8

Hit-precision

De-anonymizationAttack:ActualPerformance

Implement7state-of-the-artalgorithmsn “Encountering”event

Ø POIS[WWW2016]Ø ME[AIHC2016]

n Individualuser’smobilitypatternsØ HMM[IEEESP2011]Ø WYCI[WOSN2014]Ø HIST[TIFS2016]

n Toleratingtemporal/spatialmismatchesØ NFLX[IEEESP2008]Ø MSQ[TON2013]

Maximumhit-precisionisonly25%!Farfromtheprivacybound!

Page 9: De-anonymization of Mobility Trajectories: Dissecting the ... · ü1717 users in [WWW 2016] ØSynthetized datasets üParts of the same dataset [TON 2011] 3 De-anonymization Attack:

9

Existingalgorithmstoleratingspatio-temporalmismatcheshavethebestperformance

ReasonsBehindUnderperformance Algorithmswithbestperformance

MSQ[TON2013]n Similarityfunction

Ø Squarerootofdistancebetweentrajectories

n Toleratespatialmismatches

NFLX[IEEESP2008]n SimilarityfunctionØ Minimumtimegapbetweenusers’visitstothesamelocation

n Toleratetemporalmismatches

Page 10: De-anonymization of Mobility Trajectories: Dissecting the ... · ü1717 users in [WWW 2016] ØSynthetized datasets üParts of the same dataset [TON 2011] 3 De-anonymization Attack:

10

ReasonsBehindUnderperformance:LargeSpatio-TemporalMismatches

Temporalmismatchesofover30%records≥1hour

App-level(Weibo)

App-level(Weibo)

2km

>40%

App-level(Dianping)

2km

>30%

App-level(Dianping)

<30% ≈ 70%

Spatialmismatchesofover40%records≥ 2km

1hour 1hour

SignificantTimeandlocationMismatchesbetweenDifferentDatasets!

Page 11: De-anonymization of Mobility Trajectories: Dissecting the ... · ü1717 users in [WWW 2016] ØSynthetized datasets üParts of the same dataset [TON 2011] 3 De-anonymization Attack:

11

PotentialReasonsbehindtheMismatches n GPSerrors

Ø GPSunreachablelocations(Indoor,underground)Ø LazyGPSupdatingmechanisms[UbiComp2007]

n DeploymentofbasestationsØ Lowerdensityàlargermismatches

n UserbehaviorØ 39.9%remote(fake)check-ins[ICWSM2016]Ø Earnvirtualrewords,competewiththeirfriends

Page 12: De-anonymization of Mobility Trajectories: Dissecting the ... · ü1717 users in [WWW 2016] ØSynthetized datasets üParts of the same dataset [TON 2011] 3 De-anonymization Attack:

Thevastmajorityofusershavesparselocationrecords!

12

Cumulativedistributionfunction(CDF)

DataSparsity=>Rare“Encountering”Event!=>InaccurateMobilityModelling!

ReasonsBehindUnderperformance:DataSparsity

SparserlocationrecordsàWorseperformance

Page 13: De-anonymization of Mobility Trajectories: Dissecting the ... · ü1717 users in [WWW 2016] ØSynthetized datasets üParts of the same dataset [TON 2011] 3 De-anonymization Attack:

13

Canwebridgethisgap?

Page 14: De-anonymization of Mobility Trajectories: Dissecting the ... · ü1717 users in [WWW 2016] ØSynthetized datasets üParts of the same dataset [TON 2011] 3 De-anonymization Attack:

14

n 1)ModellingSpatio-TemporalMismatches:GaussianMixtureModel(GMM)𝑃𝑆(𝑡)�𝐿 =∑𝑝=− 𝐻↓𝑙 ↑𝐻↓𝑢 ▒𝜋(𝑝)⋅𝒩(𝑆(𝑡)|𝐿(𝑡−𝑝), 𝜎↑2 (𝑝)) 

Ø ParameterschosenbyempiricalvaluesorestimatedbyEMalgorithm

n 2)ModellingUsers’MobilityPattern:MarkovModelØ Solvingthedatasparsityissue:rare“encountering”eventØ MissinglocationsareestimatedbyMarkovModel

OurDe-anonymizationMethod

Page 15: De-anonymization of Mobility Trajectories: Dissecting the ... · ü1717 users in [WWW 2016] ØSynthetized datasets üParts of the same dataset [TON 2011] 3 De-anonymization Attack:

n 3)UseLocationContextØ SolvethedatasparsityissueØ UseaggregateduserbehavioratlocationsØ Toinferindividualuserbehavior(locationtransitionprobability)

n 4)UseTimeContextØ “Whethertheuserisactive”ishelpfulØ Modellinguserinactiveperiod(previouslyignoredfeature)

𝑳↓𝟏 

𝑳↓𝟏 

𝑳↓𝟐 

𝑳↓𝟐 

𝑳↓𝟐 

𝑳↓𝟑 

𝑳↓𝟏 

𝑳↓𝟏 

Time-bins Sameinactivetime-bins

Sameuserindifferentdatasets

OurDe-anonymizationMethod

14

Page 16: De-anonymization of Mobility Trajectories: Dissecting the ... · ü1717 users in [WWW 2016] ØSynthetized datasets üParts of the same dataset [TON 2011] 3 De-anonymization Attack:

15%

Dianping’sApp-LevelTrajectories

16

PerformanceEvaluation

n 7state-of-the-artalgorithmsn Ourproposedalgorithm:GM-B,GMn Transferredparameters:GM-B(Trans.)

Weibo’sApp-LevelTrajectories

17%

Ourproposedalgorithmsoutperformbaselinesbyover17%

Page 17: De-anonymization of Mobility Trajectories: Dissecting the ... · ü1717 users in [WWW 2016] ØSynthetized datasets üParts of the same dataset [TON 2011] 3 De-anonymization Attack:

n Large-scaleGround-truthDatasetsØ ISPtrajectorieswithover2millionusersØ 2differentsocialnetworks,2differenttypesofexternalinformation

n DemonstratetheGapsbetweenTheoryandPracticeØ HightheoreticalboundØ Lowactualperformance

n BridgetheGapsbetweenTheoryandPracticeØ Consideringspatio-temporalmismatches,datasparsity,location/timecontextØ Improvetheperformanceàconfirmourobservations

17

Summary

Page 18: De-anonymization of Mobility Trajectories: Dissecting the ... · ü1717 users in [WWW 2016] ØSynthetized datasets üParts of the same dataset [TON 2011] 3 De-anonymization Attack:

Thanksyou!

32

ForDataSampleandCode,[email protected]@tsinghua.edu.cn

Page 19: De-anonymization of Mobility Trajectories: Dissecting the ... · ü1717 users in [WWW 2016] ØSynthetized datasets üParts of the same dataset [TON 2011] 3 De-anonymization Attack:

[ScientificReport2013]Y.-A.DeMontjoye,C.A.Hidalgo,M.Verleysen,andV.D.Blondel,“Uniqueinthecrowd:Theprivacyboundsofhumanmobility,”Scientificreports,vol.3,p.1376,2013.[WWW2016]C.Riederer,Y.Kim,A.Chaintreau,N.Korula,andS.Lattanzi,“Linkingusersacrossdomainswithlocationdata:Theoryandvalidation,”inProc.WWW,2016.[AIHC2016]A.Cecaj,M.Mamei,andF.Zambonelli,“Re-identificationandinformationfusionbetweenanonymizedcdrandsocialnetworkdata,”JournalofAmbientIntelligenceandHumanizedComputing,vol.7,no.1,pp.83–96,2016.[WOSN2014]L.RossiandM.Musolesi,“It’sthewayyoucheck-in:identifyingusersinlocation-basedsocialnetworks,”inProc.ACMWOSN,2014.

[TIFS2016]F.M.Naini,J.Unnikrishnan,P.Thiran,andM.Vetterli,“Whereyouareiswhoyouare:Useridentificationbymatchingstatistics,”IEEETransactionsonInformationForensicsandSecurity(TIFS),vol.11,no.2,pp.358–372,2016.[IEEESP2008]A.NarayananandV.Shmatikov,“Robustde-anonymizationoflargesparsedatasets,”inProc.IEEESP,2008.

[IEEESP2011]R.Shokri,G.Theodorakopoulos,J.-Y.LeBoudec,andJ.-P.Hubaux,“Quantifyinglocationprivacy,”inProc.IEEESP,2011.[TON2013]C.Y.Ma,D.K.Yau,N.K.Yip,andN.S.Rao,“Privacyvulnerabilityofpublishedanonymousmobilitytraces,”IEEE/ACMTransactionsonNetworking(TON),vol.21,no.3,pp.720–733,2013.[UbiComp2007]N.Banerjee,A.Rahmati,M.Corner,S.Rollins,andL.Zhong,“Usersandbatteries:interactionsandadaptiveenergymanagementinmobilesystems,”Proc.ACMUbiComp,2007.[ICWSM2016]G.Wang,S.Y.Schoenebeck,H.Zheng,andB.Y.Zhao,“”willcheckinforbadges”:Understandingbiasandmisbehavioronlocation-basedsocialnetworks.”inProc.ICWSM,2016.

19

Reference

Page 20: De-anonymization of Mobility Trajectories: Dissecting the ... · ü1717 users in [WWW 2016] ØSynthetized datasets üParts of the same dataset [TON 2011] 3 De-anonymization Attack:

n Hit-precision:

n Iftherightonerank1incandidatetrajectories,ℎ(𝑥)=1.n Iftherightonerank3incandidatetrajectories,ℎ(𝑥)=(𝑘−2)/𝑘.

20

Metricoftheranking

Page 21: De-anonymization of Mobility Trajectories: Dissecting the ... · ü1717 users in [WWW 2016] ØSynthetized datasets üParts of the same dataset [TON 2011] 3 De-anonymization Attack:

21

PerformanceEvaluation:ParameterStudy

n LargerTolerantDelay=>BetterPerformance

Ø 0->1:SignificantimprovementØ 12->24:Littleimprovement

ImpactofMaximumTolerantDelay ImpactofParametersinGMM

n ComparablePerformanceØ Empiricalvs.EstimatedØ Robusttoparametersettings.

Page 22: De-anonymization of Mobility Trajectories: Dissecting the ... · ü1717 users in [WWW 2016] ØSynthetized datasets üParts of the same dataset [TON 2011] 3 De-anonymization Attack:

n UseLocationContext:Ø Solvethesparsityissue(inaccuratemobilitymodelling)

Usetheaggregateuserbehavioratlocations!

Marginaldistribution

Transitionmatrix

22

OurDe-anonymizationMethod

Page 23: De-anonymization of Mobility Trajectories: Dissecting the ... · ü1717 users in [WWW 2016] ØSynthetized datasets üParts of the same dataset [TON 2011] 3 De-anonymization Attack:

n UseTimeContextØ Whetherthereisrecordineachtimebinisalsoanimportantinformation(previouslyignoredfeature).

23

𝑳↓𝟏 

𝑳↓𝟏 

𝑳↓𝟐 

𝑳↓𝟐 

𝑳↓𝟐 

𝑳↓𝟑 

𝑳↓𝟏 

𝑳↓𝟏 

Time-bins Sameinactivetime-bins

Sameuserindifferentdatasets

OurDe-anonymizationMethod