de-anonymizing social networks
DESCRIPTION
De-anonymizing Social Networks. Presenter: Lijie Zhang Advisor: Weining Zhang. Outlines. Motivation Attack Model De-anonymization Algorithm Experiments Conclusions. Motivation. Social network (SN) owner publishes graph data for sharing - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/1.jpg)
De-anonymizing Social Networks
Presenter: Lijie ZhangAdvisor: Weining Zhang
![Page 2: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/2.jpg)
Outlines
Motivation Attack Model De-anonymization Algorithm Experiments Conclusions
![Page 3: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/3.jpg)
Motivation
Social network (SN) owner publishes graph data for sharing Academic and government data-mining: phone call networks Advertising: Third-party applications: 550,000 Facebook applications
Private information on SNs: Node attributes: node degree in a sexual network Edge presence: a single call, romantic relationship
![Page 4: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/4.jpg)
Motivation
SN owner publishes anonymized graph:Nodes have no identifying attributes
Propose a model to identify nodes from the anonymized graph:Re-identification: learn the entity to which the
node belongs to. Entity: an account, a real person, a group, an
organization
![Page 5: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/5.jpg)
Outlines
Motivation Attack Model De-anonymization Algorithm Experiments Conclusions
![Page 6: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/6.jpg)
Model – Social Network
Social Network S:A directed graph G=(V,E)A set of node attributes X: name, telephone
numberA set of edge attributes Y: type of relationshipTreat attributes values from a discrete domain
![Page 7: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/7.jpg)
Model – Data Release
A sanitized subset of nodes and edges in S Computation:
Vsan: subset of V Xsan: subset of X including sensitive attributes Ysan: subset of Y including sensitive attributes Published attributes by themselves are insufficient for re-
identification Compute induced subgraph on Vsan Remove some edges and add faked edges
}),|{Y(e)},XVsan,v|{X(v)Esan,(Vsan,Ssan YsanYEsaneXsan
![Page 8: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/8.jpg)
Model – Attacker
Purpose: extract sensitive information about specific individuals from anonymized SN graphs
Attacker’s knowledge Aggregate auxiliary information Individual auxiliary information
![Page 9: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/9.jpg)
Aggregate auxiliary information
Large-scale information from other data sources and social networks whose membership overlaps with the target network Ssan Gaux={Vaux, Eaux} AuxX and AuxY: probability distributions of each node
attribute in Vaux and edge attribute in Eaux, respectively (prior knowledge).
![Page 10: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/10.jpg)
Individual auxiliary information
Identifiable details about a small number of individuals from the target network Ssan and possibly relationships between them
![Page 11: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/11.jpg)
Model – Breaching Privacy
Extract sensitive information about specific individuals from Ssan
Re-identify nodes from target SN Ssan Re-identification: find a mapping μbetween a node
in Vaux and a node in Vsan : ground truth mapping Succeeds if
G)()( vv G
![Page 12: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/12.jpg)
Model – Breaching Privacy
Re-identification algorithm: Input: Ssan and Saux Output is the probability that vaux maps to vsan
Mapping adversary:
]1,0[}){(:~ VauxVsan),(~ sanaux vv
],[,,
],[,,
][,
][,
),(~),(~),(~),(~
],,,[
),(~),(~
],,[
vuYVsanvu auxaux
yvuYVsanvu auxauxauxaux
vXVsanv aux
xvXVsanv auxaux
vvuu
vvuuyvuYAdv
vv
vvxvXAdv
![Page 13: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/13.jpg)
Model – Breaching Privacy
Privacy breach: privacy of vsan is breached w.r.t adversary Adv and privacy parameter , if
],,,[],,,[
],,[],,[
yvuYAuxyvuYAdvor
xvXAuxxvXAdv
auxauxauxaux
auxaux
![Page 14: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/14.jpg)
Model – Measuring Success of an Attack
Let . The success rate of a de-anonymization algorithm outputting a probabilistic mapping , w.r.t a centrality measure , is the probability that μsampled from maps a node v to if v is selected according to
})(:{ vVvV Gauxmapped
~
~ )(vG
mapped
mapped
Vv
Vv G
v
vvvPR
)(
)()]()([
![Page 15: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/15.jpg)
Outlines
Motivation Attack Model De-anonymization Algorithm Experiments Conclusions
![Page 16: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/16.jpg)
De-anonymization Algorithm
Seed identification: apply individual auxiliary information
Propagation: apply aggregate auxiliary information
![Page 17: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/17.jpg)
Algorithm - Seed Identification Input:
The target graph A clique of k nodes which are present both in the
auxiliary and the target graphs. The degree values of k nodes pairs of common-neighbor counts Error parameter ε
Output : k-clique with matching ( ) node degrees and common-neighbor counts.
2k
1S
![Page 18: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/18.jpg)
Algorithm - Propagation
Inputs: G1, G2, Output: μ Iteratively find new mappings using the
topological structure of the network and the feedback from previously constructed mappings.
S
![Page 19: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/19.jpg)
Algorithm - Propagationfunction propagationStep(lgraph, rgraph, mapping) for lnode in lgraph.nodes:
scores[lnode] = matchScores(lgraph, rgraph, mapping, lnode)if eccentricity(scores[lnode]) < theta: continuernode = (pick node from rgraph.nodes where
scores[lnode][node] = max(scores[lnode]))
scores[rnode] = matchScores(rgraph, lgraph, invert(mapping), rnode)if eccentricity(scores[rnode]) < theta: continuereverse_match = (pick node from lgraph.nodes where
scores[rnode][node] = max(scores[rnode]))
if reverse_match != lnode: continue
mapping[lnode] = rnode
![Page 20: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/20.jpg)
Algorithm - Propagation
Eccentricity: measure how much a node in a graph “stands out” from the rest nodes.
Rejects the match if eccentricity of the set of mapping scores is below a threshold,
)()(max)max( 2
XXX
![Page 21: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/21.jpg)
Algorithm - Propagation
Complexity: O((|E1|+|E2|)d1d2) d1 : a bound on the degree of the nodes in V1
![Page 22: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/22.jpg)
Outlines
Motivation Attack Model De-anonymization Algorithm Experiments Conclusions
![Page 23: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/23.jpg)
Experiments – Data Sets
Twitter, Flickr, LiveJournal:
![Page 24: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/24.jpg)
Experiments – Seed Identification
Evaluate the feasibility of seed identification by measuring how much auxiliary information is needed to identify a unique node in the target graph.
LiveJournal graph: auxiliary and target Construct 4-cliques, and treat a 4-clique in the target
graph as a match as long as each degree and common-neighbor count matches within a factor of 1
![Page 25: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/25.jpg)
Experiments – Seed Identification
![Page 26: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/26.jpg)
Experiments – Propagation
Evaluate the robustness against perturbation and seed selection
Pairs of subgraphs (V1,V2), over 100,000 nodes each of a real-world SN One for auxiliary SN, the other as the target SN Perturbation strategy: two subgraphs has nodes
overlapped 25% and edges overlapped 50%
![Page 27: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/27.jpg)
Evaluate the robustness against perturbation and seed selection
![Page 28: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/28.jpg)
Experiments – Propagation
Mapping between two real-world social networks: Flickr and Twitter
Finding ground truth : Exact matches in either the username, or name field 27,000 mappings Human inspect ground truth error that is under 5%.
G
![Page 29: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/29.jpg)
Mapping between two real-world social networks
Seeds: 150 pairs of nodes selected from Results:
30.8% of the mappings were re-identified correctly, 12.1% were identified incorrectly, and 57% were not identified.
41% of the incorrectly identified mappings (5% overall) were mapped to nodes which are at a distance 1 from the true mapping.
55% of the incorrectly identified mappings (6.7% overall) were mapped to nodes where the same geographic location was reported.
The above two categories overlap; of all the incorrect mappings, only 27% (or 3.3% overall) fall into neither category and are completely erroneous.
G
![Page 30: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/30.jpg)
Conclusions
Anonymity is not sufficient for privacy when dealing with social networks.
Demonstrate feasibility of successful re-identification based solely on the network topology and assuming that the target graph is completely anonymized.
![Page 31: De-anonymizing Social Networks](https://reader035.vdocuments.us/reader035/viewer/2022062521/5681674a550346895ddbfa4e/html5/thumbnails/31.jpg)
Reference
[1] Arvind Narayanan and Vitaly Shmatikov, “De-anonymizing Social Networks”, IEEE Security & Privacy '09.