1 global privacy guarantee in serial data publishing raymond chi-wing wong 1, ada wai-chee fu 2, jia...

48
1 Global Privacy Guarantee in Serial Data Publishing Raymond Chi-Wing Wong 1 , Ada Wai-Chee Fu 2 , Jia Liu 2 , Ke Wang 3 , Yabo Xu 4 The Hong Kong University of Science and Technology 1 The Chinese University of Hong Kong 2 Simon Fraser University 3 Sun Yat-sen University 4 Prepared by Raymond Chi-Wing Wong Presented by Raymond Chi-Wing Wong

Post on 20-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

1

Global Privacy Guarantee in Serial Data PublishingRaymond Chi-Wing Wong1, Ada Wai-Chee Fu2,

Jia Liu2, Ke Wang3, Yabo Xu4

The Hong Kong University of Science and Technology1

The Chinese University of Hong Kong2

Simon Fraser University3

Sun Yat-sen University4

Prepared by Raymond Chi-Wing WongPresented by Raymond Chi-Wing Wong

2

Outline

1. Sequential Releases2. Related Work 3. Our Proposed Privacy Model

Local Guarantee

4. Conclusion

3

1. Sequential Releases

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Time = 1

Release the data set to public

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data This table satisfies some privacy requirements(e.g., m-invariance)

4

1. Sequential Releases

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Time = 1

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Time = 2

Release the data set to publicHospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

This table satisfies some privacy requirements(e.g., m-invariance)

Insertions, deletions and updates

5

1. Sequential Releases

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Time = 1

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Time = 2

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Time = 3

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

This table satisfies some privacy requirements(e.g., m-invariance)

Insertions, deletions and updates

6

1. Sequential Releases

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Time = 1

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Time = 2

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Time = 3

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t

7

1. Sequential Releases

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Time = 1

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Time = 2

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Time = 3

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t

Name Sex Zipcode

Disease

Raymond

M 65001 flu

Peter M 65002 chlamydia

Mary F 65014 flu

Alice F 65015 fever

Medical Data

Privacy Requirement:Peter would not want anyone to deduce with high confidence from thesepublished data that he has ever contracted chlamydia in the past.

A sexually transmitted disease (STD)

one or more published dataset

8

1. Sequential Releases

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Time = 1

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Time = 2

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Time = 3

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t

Name Sex Zipcode

Disease

Raymond

M 65001 flu

Peter M 65002 chlamydia

Mary F 65014 flu

Alice F 65015 fever

Medical Data

Privacy Requirement:Peter would not want anyone to deduce with high confidence from thesepublished data that he has ever contracted chlamydia in the past.

A sexually transmitted disease (STD)

Privacy Requirement:Probability that Peter is linked to chlamydia in one or more published dataset is at most a given threshold (e.g., 1/2).

Global Guarantee

9

1. Sequential Releases

This global guarantee requirement seems to be quite “obvious” and “natural”

No existing works consider this global guarantee requirement

Instead, they consider another requirement called local guarantee.

Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t

Privacy Requirement:Peter would not want anyone to deduce with high confidence from thesereleased data that he has ever contracted chlamydia in the past.

Privacy Requirement:Probability that Peter is linked to chlamydia in one or more published dataset is at most a given threshold (e.g., 1/2).

Global Guarantee

10

1. Sequential Releases

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Time = 1

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Time = 2

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Time = 3

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Name Sex Zipcode

Disease

Raymond

M 65001 flu

Peter M 65002 chlamydia

Mary F 65014 flu

Alice F 65015 fever

Medical DataA sexually transmitted disease (STD)

Privacy Requirement:Probability that Peter is linked to chlamydia in each published dataset isat most a given threshold (e.g., 1/2).

Local Guarantee

Probability that Peter is linked to chlamydia in the dataset at time = 1 is at most a given threshold (e.g., 1/2).

Probability that Peter is linked to chlamydia in the dataset at time = 2 is at most a given threshold (e.g., 1/2).

Probability that Peter is linked to chlamydia in the dataset at time = 3 is at most a given threshold (e.g., 1/2).

11

2. Related Work Local Guarantee

m-invariance Xiao et al, “m-invariance: Towards Privacy

Preserving Re-publication of Dynamic Datasets”, SIGMOD, 2007

l-scarcity Bu et al, “Privacy Preserving Serial Data Publishing

by Role Composition”, VLDB, 2008

12

Contribution

We are the first to propose the global guarantee requirement

We prove that global guarantee is a stronger requirement than local guarantee

13

How can we calculate the probability? According to the published datasets,

we derive a formula based on the possible world analysis

We skip the details.

Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t

Privacy Requirement:Peter would not want anyone to deduce with high confidence from thesereleased data that he has ever contracted chlamydia in the past.

Privacy Requirement:Probability that Peter is linked to chlamydia in one or more published dataset is at most a given threshold (e.g., 1/2).

Global Guarantee

14

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Time = 1

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Time = 2

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Time = 3

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

15

Property

Theorem: Global guarantee is a stronger privacy requirement than local guarantee.

If the published tables satisfy global guarantee,then they satisfy local guarantee.

16

Our Algorithm

How can we generate tables such that they satisfy global guarantee?

Idea: Large group size

17

5. Conclusion

We are the first to propose global guarantee

Global guarantee is a stronger privacy requirement than local guarantee.

18

Q&A

19

In the following, I will elaborate two concepts. Local Guarantee (e.g., m-invariance) Global Guarantee

20

Public

Hospital

Name Sex Zipcode

Disease

Raymond

M 65001 flu

Peter M 65002 chlamydia

Mary F 65014 flu

Alice F 65015 fever

Medical Data

Time = 1

Sex Zipcode

Disease

M 65001 flu

M 65002 chlamydia

F 65014 flu

F 65015 fever

Published Data Voter Registration ListName Sex Zipcod

e

Raymond

M 65001

Peter M 65002

Mary F 65014

Alice F 65015

Emily F 65010Release the data set to public

21

Public

Hospital

Name Sex Zipcode

Disease

Raymond

M 65001 flu

Peter M 65002 chlamydia

Mary F 65014 flu

Alice F 65015 fever

Medical Data

Time = 1

Sex Zipcode

Disease

M 65001 flu

M 65002 chlamydia

F 65014 flu

F 65015 fever

Published Data Voter Registration ListName Sex Zipcod

e

Raymond

M 65001

Peter M 65002

Mary F 65014

Alice F 65015

Emily F 65010Release the data set to public

22

Public

Hospital

Name Sex Zipcode

Disease

Raymond

M 65001 flu

Peter M 65002 chlamydia

Mary F 65014 flu

Alice F 65015 fever

Medical Data

Time = 1

Sex Zipcode

Disease

M 6500* flu

M 6500* chlamydia

F 6501* flu

F 6501* fever

Published Data Voter Registration ListName Sex Zipcod

e

Raymond

M 65001

Peter M 65002

Mary F 65014

Alice F 65015

Emily F 65010Release the data set to public

Generalization

Each individual is linked to “chlamydia” with probability at most 1/2 in THIS PUBLISHED TABLE

2-diversity only focuses on ONE-TIME publishing

2-invariance focuses on MULTIPLE-TIME publishingIt also makes use of the idea of 2-diversity

Idea:

Each individual is linked to “chlamydia” with probability at most 1/2 for each of the MULTIPLE PUBLISHED TABLES

23

Public

Hospital

Name Sex Zipcode

Disease

Raymond

M 65001 flu

Peter M 65002 chlamydia

Mary F 65014 flu

Alice F 65015 fever

Medical Data

Time = 1

Sex Zipcode

Disease

M 6500* flu

M 6500* chlamydia

F 6501* flu

F 6501* fever

Published Data Voter Registration ListName Sex Zipcod

e

Raymond

M 65001

Peter M 65002

Mary F 65014

Alice F 65015

Emily F 65010Release the data set to public

Name Signature

Raymond

Peter

Mary

Alice

{flu, chlamydia}

Raymond

Peter

Mary

Alice

{flu, chlamydia}

{flu, fever}

{flu, fever}

2-invariance

24

Public

Hospital

Name Sex Zipcode

Disease

Raymond

M 65001 flu

Peter M 65002 chlamydia

Mary F 65014 flu

Alice F 65015 fever

Medical Data

Time = 1

Sex Zipcode

Disease

M 6500* flu

M 6500* chlamydia

F 6501* flu

F 6501* fever

Published Data Voter Registration ListName Sex Zipcod

e

Raymond

M 65001

Peter M 65002

Mary F 65014

Alice F 65015

Emily F 65010Release the data set to public

Name Signature

Raymond

Peter

Mary

Alice

{flu, chlamydia}

{flu, chlamydia}

{flu, fever}

{flu, fever}

2-invariance

25

Public

Hospital

Name Sex Zipcode

Disease

Raymond

M 65001 flu

Peter M 65002 chlamydia

Mary F 65014 flu

Alice F 65015 fever

Medical Data

Time = 1

Sex Zipcode

Disease

M 6500* flu

M 6500* chlamydia

F 6501* flu

F 6501* fever

Published Data Voter Registration ListName Sex Zipcod

e

Raymond

M 65001

Peter M 65002

Mary F 65014

Alice F 65015

Emily F 65010

Release the data set to public

Name Signature

Raymond

Peter

Mary

Alice

{flu, chlamydia}

{flu, chlamydia}

{flu, fever}

{flu, fever}

2-invariance

26

Public

Hospital

Name Sex Zipcode

Disease

Raymond

M 65001 flu

Peter M 65002 chlamydia

Mary F 65014 flu

Alice F 65015 fever

Medical Data

Time = 1

Sex Zipcode

Disease

M 6500* flu

M 6500* chlamydia

F 6501* flu

F 6501* fever

Published Data

Name Sex Zipcode

Raymond

M 65001

Peter M 65002

Mary F 65014

Alice F 65015

Emily F 65010

Release the data set to public

Name Signature

Raymond

Peter

Mary

Alice

{flu, chlamydia}

{flu, chlamydia}

{flu, fever}

{flu, fever}

Voter Registration List2-invariance

27

Public

Hospital

Name Sex Zipcode

Disease

Raymond

M 65001 flu

Peter M 65002 chlamydia

Mary F 65014 flu

Alice F 65015 fever

Medical Data

Time = 1

Sex Zipcode

Disease

M 6500* flu

M 6500* chlamydia

F 6501* flu

F 6501* fever

Published Data

Name Sex Zipcode

Raymond

M 65001

Peter M 65002

Mary F 65014

Alice F 65015

Emily F 65010

Release the data set to public

Name Signature

Raymond

Peter

Mary

Alice

{flu, chlamydia}

{flu, chlamydia}

{flu, fever}

{flu, fever}

Voter Registration ListTime = 2

Hospital

Name Sex Zipcode

Disease

Raymond

M 65001 chlamydia

Peter M 65002 flu

Mary F 65014 fever

Emily F 65010 flu

Medical Data

Release the data set to public

Sex Zipcode

Disease

M 6500* chlamydia

M 6500* flu

F 6501* fever

F 6501* flu

Published Data

Raymond

Peter

Mary

Emily

2-invariance

28

Public

Hospital

Name Sex Zipcode

Disease

Raymond

M 65001 flu

Peter M 65002 chlamydia

Mary F 65014 flu

Alice F 65015 fever

Medical Data

Time = 1

Sex Zipcode

Disease

M 6500* flu

M 6500* chlamydia

F 6501* flu

F 6501* fever

Published Data

Name Sex Zipcode

Raymond

M 65001

Peter M 65002

Mary F 65014

Alice F 65015

Emily F 65010

Release the data set to public

Name Signature

Raymond

Peter

Mary

Alice

{flu, chlamydia}

{flu, chlamydia}

{flu, fever}

{flu, fever}

Voter Registration ListTime = 2

Hospital

Name Sex Zipcode

Disease

Raymond

M 65001 chlamydia

Peter M 65002 flu

Mary F 65014 fever

Emily F 65010 flu

Medical Data

Release the data set to public

Sex Zipcode

Disease

M 6500* chlamydia

M 6500* flu

F 6501* fever

F 6501* flu

Published Data

Raymond

Peter

Mary

Emily

Name Signature

Raymond

Peter

Mary

Emily

{flu, chlamydia}

{flu, chlamydia}

{flu, fever}

{flu, fever}

This table satisfies 2-invariance.

This is because each individual is linked to the SAME signature.Idea of 2-invariance:

Each individual is linked to the SAME signature in each published table.

2-invariance

29

Public

Hospital

Name Sex Zipcode

Disease

Raymond

M 65001 flu

Peter M 65002 chlamydia

Mary F 65014 flu

Alice F 65015 fever

Medical Data

Time = 1

Sex Zipcode

Disease

M 6500* flu

M 6500* chlamydia

F 6501* flu

F 6501* fever

Published Data

Name Sex Zipcode

Raymond

M 65001

Peter M 65002

Mary F 65014

Alice F 65015

Emily F 65010

Release the data set to public

Name Signature

Raymond

Peter

Mary

Alice

{flu, chlamydia}

{flu, chlamydia}

{flu, fever}

{flu, fever}

Voter Registration ListTime = 2

Hospital

Name Sex Zipcode

Disease

Raymond

M 65001 Chlamydia

Peter M 65002 flu

Mary F 65014 fever

Emily F 65010 flu

Medical Data

Release the data set to public

Sex Zipcode

Disease

M 6500* chlamydia

M 6500* flu

F 6501* fever

F 6501* flu

Published Data

Name Signature

Raymond

Peter

Mary

Emily

{flu, chlamydia}

{flu, chlamydia}

{flu, fever}

{flu, fever}

2-invariance

30

Public

Time = 1

Sex Zipcode

Disease

M 6500* flu

M 6500* chlamydia

F 6501* flu

F 6501* fever

Published Data

Name Sex Zipcode

Raymond

M 65001

Peter M 65002

Mary F 65014

Alice F 65015

Emily F 65010

Voter Registration ListTime = 2

Sex Zipcode

Disease

M 6500* chlamydia

M 6500* flu

F 6501* fever

F 6501* flu

Published Data

2-invariance

2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.

Why?

Possible World Analysis

31

Public

Time = 1

Sex Zipcode

Disease

M 6500* flu

M 6500* chlamydia

Published Data

Name Sex Zipcode

Raymond

M 65001

Peter M 65002

Mary F 65014

Alice F 65015

Emily F 65010

Voter Registration ListTime = 2

Sex Zipcode

Disease

M 6500* chlamydia

M 6500* flu

Published Data

2-invariance

2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.

Why?

Possible World Analysis

32

Public

Time = 1

Sex Zipcode

Disease

M 6500* flu

M 6500* chlamydia

Published Data

Name Sex Zipcode

Raymond

M 65001

Peter M 65002

Mary F 65014

Alice F 65015

Emily F 65010

Voter Registration ListTime = 2

Sex Zipcode

Disease

M 6500* chlamydia

M 6500* flu

Published Data

2-invariance

2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.

Why?

Possible World Analysis

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

This is the possible world analysis based on the published table at time = 1 only.

33

Public

Time = 1

Sex Zipcode

Disease

M 6500* flu

M 6500* chlamydia

Published Data

Name Sex Zipcode

Raymond

M 65001

Peter M 65002

Mary F 65014

Alice F 65015

Emily F 65010

Voter Registration ListTime = 2

Sex Zipcode

Disease

M 6500* chlamydia

M 6500* flu

Published Data

2-invariance

2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.

Why?

Possible World Analysis

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

This is the possible world analysis based on the published table at time = 2 only.

34

Public

Time = 1

Sex Zipcode

Disease

M 6500* flu

M 6500* chlamydia

Published Data

Name Sex Zipcode

Raymond

M 65001

Peter M 65002

Mary F 65014

Alice F 65015

Emily F 65010

Voter Registration ListTime = 2

Sex Zipcode

Disease

M 6500* chlamydia

M 6500* flu

Published Data

2-invariance

2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.

Why?

Possible World Analysis

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

World 1

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

World 2

World 3

World 4

35

Public

Time = 1

Sex Zipcode

Disease

M 6500* flu

M 6500* chlamydia

Published Data

Name Sex Zipcode

Raymond

M 65001

Peter M 65002

Mary F 65014

Alice F 65015

Emily F 65010

Voter Registration ListTime = 2

Sex Zipcode

Disease

M 6500* chlamydia

M 6500* flu

Published Data

2-invariance

2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.

Why?

Possible World Analysis

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

World 1

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

World 2

World 3

World 4

In the published data at time = 1,Prob(the second individual (i.e. Peter) is linked to chlamydia) =2/4 = 1/2

Yes

Yes

No

No

36

Public

Time = 1

Sex Zipcode

Disease

M 6500* flu

M 6500* chlamydia

Published Data

Name Sex Zipcode

Raymond

M 65001

Peter M 65002

Mary F 65014

Alice F 65015

Emily F 65010

Voter Registration ListTime = 2

Sex Zipcode

Disease

M 6500* chlamydia

M 6500* flu

Published Data

2-invariance

2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.

Why?

Possible World Analysis

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

World 1

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

World 2

World 3

World 4

In the published data at time = 2,Prob(the second individual (i.e. Peter) is linked to chlamydia) =2/4 = 1/2

Yes

No

Yes

No

37

Public

Time = 1

Name Sex Zipcode

Raymond

M 65001

Peter M 65002

Mary F 65014

Alice F 65015

Emily F 65010

Voter Registration ListTime = 2

2-invariance

2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.

Possible World Analysis

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

World 1

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

World 2

World 3

World 4

Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2.

Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) =

38

Public

Time = 1

Name Sex Zipcode

Raymond

M 65001

Peter M 65002

Mary F 65014

Alice F 65015

Emily F 65010

Voter Registration ListTime = 2

2-invariance

2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.

Possible World Analysis

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

World 1

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

World 2

World 3

World 4

Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2.

Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) =

Yes

39

Public

Time = 1

Name Sex Zipcode

Raymond

M 65001

Peter M 65002

Mary F 65014

Alice F 65015

Emily F 65010

Voter Registration ListTime = 2

2-invariance

2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.

Possible World Analysis

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

World 1

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

World 2

World 3

World 4

Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2.

Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) =

Yes

Yes

40

Public

Time = 1

Name Sex Zipcode

Raymond

M 65001

Peter M 65002

Mary F 65014

Alice F 65015

Emily F 65010

Voter Registration ListTime = 2

2-invariance

2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.

Possible World Analysis

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

World 1

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

World 2

World 3

World 4

Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2.

Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) =

Yes

Yes

Yes

41

Public

Time = 1

Name Sex Zipcode

Raymond

M 65001

Peter M 65002

Mary F 65014

Alice F 65015

Emily F 65010

Voter Registration ListTime = 2

2-invariance

2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.

Possible World Analysis

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

World 1

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

Sex Zipcode Disease

M 65001 flu

M 65002 chlamydia

Sex Zipcode Disease

M 65001 chlamydia

M 65002 flu

World 2

World 3

World 4

Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2.

Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) =

Yes

Yes

Yes

No

3/4

This value is larger than 1/2.

42

We illustrate how we derive a probabilty that an individual is linked to chlamydia with an example (for both local guarantee and global guarantee).

In fact, the general formula is much more complicated.

43

Theorem: Global guarantee is a stronger privacy requirement than local guarantee.

If the published tables satisfy global guarantee,then they satisfy local guarantee.

44

How can we generate tables such that they satisfy global guarantee?

Idea: Large group size

45

Public

Hospital

Name Sex Zipcode

Disease

Raymond

M 65001 flu

Peter M 65002 chlamydia

Mary F 65014 flu

Alice F 65015 fever

Medical Data

Time = 1

Sex Zipcode

Disease

M/F 650** flu

M/F 650** chlamydia

M/F 650** flu

M/F 650** fever

Published Data

Release the data set to public

Time = 2

Hospital

Name Sex Zipcode

Disease

Raymond

M 65001 flu

Peter M 65002 chlamydia

Mary F 65014 fever

Emily F 65010 flu

Medical Data

Release the data set to public

Sex Zipcode

Disease

M/F 650** flu

M/F 650** chlamydia

M/F 650** fever

M/F 650** flu

Published Data

Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published datasets) = 7/16

Global Guarantee

This value is smaller than 1/2.

46

5. Conclusion

We are the first to propose global guarantee

Global guarantee is a stronger privacy requirement than local guarantee.

47

Q&A

48

Public

Hospital

Name Sex Zipcode

Disease

Raymond

M 65001 flu

Peter M 65002 chlamydia

Mary F 65014 flu

Alice F 65015 fever

Medical Data

Time = 1

Sex Zipcode

Disease

M 6500* flu

M 6500* chlamydia

F 6501* flu

F 6501* fever

Published Data

Release the data set to public

Time = 2

Hospital

Name Sex Zipcode

Disease

Raymond

M 65001 flu

Peter M 65002 chlamydia

Mary F 65014 fever

Emily F 65010 flu

Medical Data

Release the data set to public

Sex Zipcode

Disease

M 6500* flu

M 6500* chlamydia

F 6501* fever

F 6501* flu

Published Data

2-invariance (Local Guarantee)

Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) = 3/4

This value is larger than 1/2.