de-anonymization of mobility trajectories: dissecting the ... · ü1717 users in [www 2016]...
TRANSCRIPT
De-anonymizationofMobilityTrajectories:DissectingtheGapsbetweenTheoryandPractice HuandongWang1,ChenGao1,YongLi1,GangWang2,DepengJin1,JingboSun3
1TsinghuaUniversity,China2VirginiaTech3ChinaTelecomBeijingResearchInstitute
n AnonymizedusertrajectoriesareincreasinglycollectedbyISPsØ Highresearchandbusinessvalue
n GrowingprivacyconcernØ ISPsaremotivatedtomonetizeorshareusertrajectorydata
n De-anonymizationattackØ Howlikelyuserscanbede-anonymizedinthesharedISPtrajectorydataset?
2
IncreasingConcernonPrivacy/Security
n AppallingTheoreticalPrivacyBoundØ 4locationpointsuniquelyre-identify95%users[ScientificReport2013]
n PracticalChallenge:Lackoflargereal-worldground-truthdatasetsØ Smalldatasets
ü 1717usersin[WWW2016]Ø Synthetizeddatasets
ü Partsofthesamedataset[TON2011]
3
De-anonymizationAttack:TheoryandPractice
Isthistrueinpractice?
4
OurApproach:CollectThreeReal-worldGround-truthDatasets
Dataset Total#Users Total#Records ISP 2,161,500 134,033,750 WeiboApp-level 56,683 239,289WeiboCheck-in(Historical) 10,750 141,131WeiboCheck-in(One-week)
506 873
DianpingApp-level 45,790 107,543
Ground-Truth:Tracesfromthesamesetofusers
Weibo Dianping ISP
Attack
n ISPDatasetØ Shanghai,4/19-4/26,2016(victimdataset)Ø 2millionusersØ AccesslogstocellulartoweràLocationtraces
n WeiboDataset:OneofthelargestsocialnetworksinChina(externalinformation)n DianpingDataset:“ChineseYelp”(externalinformation)
HowtoObtaintheGround-Truth?
EthicalapprovalobtainedfromWeiboandDianping
WeiboàCheck-insàGPSinULRparameter
WeiboIDinHTTPRequest
ISPTraces
DianpingàGPSinULRparameterDianpingIDinHTTPRequest
5
n AnonymizedTrajectoryDataPublishedbyISPØ Anonymization:Replaceuseridentitywiththepseudonym
n AdversaryØ Matchtheanonymizedtraces(e.g.,ISPtraces)andexternaltraces(e.g.,Weibo/Dianpingtraces)
Ø SocialnetworkhasPIIàreal-worldidentifier
6
ExternalTrajectoriesvs.
AnonymizedTrajectories
Candidatetrajectories
PerformanceFunction
SimilarityScore
Function
AttackPerformance
Top𝟏Top𝒏
De-anonymizationAttack:ThreatModel
7
De-anonymization:TheoreticalBoundbasedonUniqueness
5 points are sufficient to uniquely identify 75% trajectories! Highpotentialriskoftrajectoriestobede-anonymized!
n Numberofpointssufficienttouniquelyidentifyatrajectory
n 𝑇↓𝑝 :Randomlysampledppointsn 𝐴(𝑇↓𝑝 ):findalltrajectoriescontainingtheppointsof 𝑇↓𝑝
n Uniqueness:|𝐴(𝑇↓𝑝 )|=1?
Uniqueness of ISP trajectories
75%Unique
ActualPerformanceBasedonWeibo’sApp-levelTrajectories
8
Hit-precision
De-anonymizationAttack:ActualPerformance
Implement7state-of-the-artalgorithmsn “Encountering”event
Ø POIS[WWW2016]Ø ME[AIHC2016]
n Individualuser’smobilitypatternsØ HMM[IEEESP2011]Ø WYCI[WOSN2014]Ø HIST[TIFS2016]
n Toleratingtemporal/spatialmismatchesØ NFLX[IEEESP2008]Ø MSQ[TON2013]
Maximumhit-precisionisonly25%!Farfromtheprivacybound!
9
Existingalgorithmstoleratingspatio-temporalmismatcheshavethebestperformance
ReasonsBehindUnderperformance Algorithmswithbestperformance
MSQ[TON2013]n Similarityfunction
Ø Squarerootofdistancebetweentrajectories
n Toleratespatialmismatches
NFLX[IEEESP2008]n SimilarityfunctionØ Minimumtimegapbetweenusers’visitstothesamelocation
n Toleratetemporalmismatches
10
ReasonsBehindUnderperformance:LargeSpatio-TemporalMismatches
Temporalmismatchesofover30%records≥1hour
App-level(Weibo)
App-level(Weibo)
2km
>40%
App-level(Dianping)
2km
>30%
App-level(Dianping)
<30% ≈ 70%
Spatialmismatchesofover40%records≥ 2km
1hour 1hour
SignificantTimeandlocationMismatchesbetweenDifferentDatasets!
11
PotentialReasonsbehindtheMismatches n GPSerrors
Ø GPSunreachablelocations(Indoor,underground)Ø LazyGPSupdatingmechanisms[UbiComp2007]
n DeploymentofbasestationsØ Lowerdensityàlargermismatches
n UserbehaviorØ 39.9%remote(fake)check-ins[ICWSM2016]Ø Earnvirtualrewords,competewiththeirfriends
Thevastmajorityofusershavesparselocationrecords!
12
Cumulativedistributionfunction(CDF)
DataSparsity=>Rare“Encountering”Event!=>InaccurateMobilityModelling!
ReasonsBehindUnderperformance:DataSparsity
SparserlocationrecordsàWorseperformance
13
Canwebridgethisgap?
14
n 1)ModellingSpatio-TemporalMismatches:GaussianMixtureModel(GMM)𝑃𝑆(𝑡)�𝐿 =∑𝑝=− 𝐻↓𝑙 ↑𝐻↓𝑢 ▒𝜋(𝑝)⋅𝒩(𝑆(𝑡)|𝐿(𝑡−𝑝), 𝜎↑2 (𝑝))
Ø ParameterschosenbyempiricalvaluesorestimatedbyEMalgorithm
n 2)ModellingUsers’MobilityPattern:MarkovModelØ Solvingthedatasparsityissue:rare“encountering”eventØ MissinglocationsareestimatedbyMarkovModel
OurDe-anonymizationMethod
n 3)UseLocationContextØ SolvethedatasparsityissueØ UseaggregateduserbehavioratlocationsØ Toinferindividualuserbehavior(locationtransitionprobability)
n 4)UseTimeContextØ “Whethertheuserisactive”ishelpfulØ Modellinguserinactiveperiod(previouslyignoredfeature)
𝑳↓𝟏
𝑳↓𝟏
𝑳↓𝟐
𝑳↓𝟐
𝑳↓𝟐
𝑳↓𝟑
𝑳↓𝟏
𝑳↓𝟏
Time-bins Sameinactivetime-bins
Sameuserindifferentdatasets
OurDe-anonymizationMethod
14
15%
Dianping’sApp-LevelTrajectories
16
PerformanceEvaluation
n 7state-of-the-artalgorithmsn Ourproposedalgorithm:GM-B,GMn Transferredparameters:GM-B(Trans.)
Weibo’sApp-LevelTrajectories
17%
Ourproposedalgorithmsoutperformbaselinesbyover17%
n Large-scaleGround-truthDatasetsØ ISPtrajectorieswithover2millionusersØ 2differentsocialnetworks,2differenttypesofexternalinformation
n DemonstratetheGapsbetweenTheoryandPracticeØ HightheoreticalboundØ Lowactualperformance
n BridgetheGapsbetweenTheoryandPracticeØ Consideringspatio-temporalmismatches,datasparsity,location/timecontextØ Improvetheperformanceàconfirmourobservations
17
Summary
[ScientificReport2013]Y.-A.DeMontjoye,C.A.Hidalgo,M.Verleysen,andV.D.Blondel,“Uniqueinthecrowd:Theprivacyboundsofhumanmobility,”Scientificreports,vol.3,p.1376,2013.[WWW2016]C.Riederer,Y.Kim,A.Chaintreau,N.Korula,andS.Lattanzi,“Linkingusersacrossdomainswithlocationdata:Theoryandvalidation,”inProc.WWW,2016.[AIHC2016]A.Cecaj,M.Mamei,andF.Zambonelli,“Re-identificationandinformationfusionbetweenanonymizedcdrandsocialnetworkdata,”JournalofAmbientIntelligenceandHumanizedComputing,vol.7,no.1,pp.83–96,2016.[WOSN2014]L.RossiandM.Musolesi,“It’sthewayyoucheck-in:identifyingusersinlocation-basedsocialnetworks,”inProc.ACMWOSN,2014.
[TIFS2016]F.M.Naini,J.Unnikrishnan,P.Thiran,andM.Vetterli,“Whereyouareiswhoyouare:Useridentificationbymatchingstatistics,”IEEETransactionsonInformationForensicsandSecurity(TIFS),vol.11,no.2,pp.358–372,2016.[IEEESP2008]A.NarayananandV.Shmatikov,“Robustde-anonymizationoflargesparsedatasets,”inProc.IEEESP,2008.
[IEEESP2011]R.Shokri,G.Theodorakopoulos,J.-Y.LeBoudec,andJ.-P.Hubaux,“Quantifyinglocationprivacy,”inProc.IEEESP,2011.[TON2013]C.Y.Ma,D.K.Yau,N.K.Yip,andN.S.Rao,“Privacyvulnerabilityofpublishedanonymousmobilitytraces,”IEEE/ACMTransactionsonNetworking(TON),vol.21,no.3,pp.720–733,2013.[UbiComp2007]N.Banerjee,A.Rahmati,M.Corner,S.Rollins,andL.Zhong,“Usersandbatteries:interactionsandadaptiveenergymanagementinmobilesystems,”Proc.ACMUbiComp,2007.[ICWSM2016]G.Wang,S.Y.Schoenebeck,H.Zheng,andB.Y.Zhao,“”willcheckinforbadges”:Understandingbiasandmisbehavioronlocation-basedsocialnetworks.”inProc.ICWSM,2016.
19
Reference
n Hit-precision:
n Iftherightonerank1incandidatetrajectories,ℎ(𝑥)=1.n Iftherightonerank3incandidatetrajectories,ℎ(𝑥)=(𝑘−2)/𝑘.
20
Metricoftheranking
21
PerformanceEvaluation:ParameterStudy
n LargerTolerantDelay=>BetterPerformance
Ø 0->1:SignificantimprovementØ 12->24:Littleimprovement
ImpactofMaximumTolerantDelay ImpactofParametersinGMM
n ComparablePerformanceØ Empiricalvs.EstimatedØ Robusttoparametersettings.
n UseLocationContext:Ø Solvethesparsityissue(inaccuratemobilitymodelling)
Usetheaggregateuserbehavioratlocations!
Marginaldistribution
Transitionmatrix
22
OurDe-anonymizationMethod
n UseTimeContextØ Whetherthereisrecordineachtimebinisalsoanimportantinformation(previouslyignoredfeature).
23
𝑳↓𝟏
𝑳↓𝟏
𝑳↓𝟐
𝑳↓𝟐
𝑳↓𝟐
𝑳↓𝟑
𝑳↓𝟏
𝑳↓𝟏
Time-bins Sameinactivetime-bins
Sameuserindifferentdatasets
OurDe-anonymizationMethod