collusion-resistant anonymous data collection method

36
Collusion-Resistant Anonymous Data Collection Method Mafruz Zaman Ashrafi See-Kiong Ng Institute for Infocomm Research Singapore

Upload: barry-cooley

Post on 03-Jan-2016

31 views

Category:

Documents


1 download

DESCRIPTION

Collusion-Resistant Anonymous Data Collection Method. Mafruz Zaman Ashrafi See-Kiong Ng Institute for Infocomm Research Singapore. Introduction. Quality data is a pre-requisite to obtain good data mining results . Collecting good quality data requires efforts and money. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Collusion-Resistant Anonymous  Data Collection Method

Collusion-Resistant Anonymous

Data Collection Method

Mafruz Zaman AshrafiSee-Kiong Ng

Institute for Infocomm ResearchSingapore

Page 2: Collusion-Resistant Anonymous  Data Collection Method

IntroductionIntroduction

Quality data is a pre-requisite to obtain good data mining results.

Collecting good quality data requires efforts and money.

Internet is a convenient and low-cost platform for large-scale data collection.

Page 3: Collusion-Resistant Anonymous  Data Collection Method

Some Motivating ExamplesSome Motivating Examples

Page 4: Collusion-Resistant Anonymous  Data Collection Method

Corporate SurveyCorporate Survey

A large organization wishes to poll its employees for sensitive information.

eg. How satisfied they are with their bosses’ management skills.

- Individuals need to rate their bosses.

- However, they are afraid of the price to pay for honesty.

Page 5: Collusion-Resistant Anonymous  Data Collection Method

Health InformationHealth Information

A drug company wishes to find out adverse effects of a drug.

eg. Relationship between the effects of a drug with other drugs.

- Patients need to disclose all the drugs they are taking.

- However, disclosing drug info may reveal health condition.

Page 6: Collusion-Resistant Anonymous  Data Collection Method

Traffic MonitoringTraffic Monitoring

Individual drivers wish to avoid roads with problematic conditions.

eg. Find out the congested road intersections and other bottlenecks.

- Individuals need to disclose their GPS info.

- However, disclosing GPS info may reveal current position.

Page 7: Collusion-Resistant Anonymous  Data Collection Method

Introduction Cont’d..Introduction Cont’d..

However, collecting data online has its challenges.

Privacy is the number-one concern for online respondents.

Respondents are reluctant to provide truthful information if their privacy is not protected.

Page 8: Collusion-Resistant Anonymous  Data Collection Method

Technical ChallengesTechnical Challenges

Page 9: Collusion-Resistant Anonymous  Data Collection Method

Objective: Online Data CollectionObjective: Online Data Collection

Two Actors: Data Collector and Respondents

- The data collector wants to obtain the responses from a set of respondents.

- The respondents submit honest responses only if the data collector is unable to link a particular response and its respondent.

Page 10: Collusion-Resistant Anonymous  Data Collection Method

ChallengesChallenges

1.How does the data collector guarantee that it is unable to associate a particular response to the corresponding respondent?

2.How can a collusion attack be mitigated?

3.How can an honest respondent pull out his response without revealing it to the data collector if he finds a threat to his anonymity?

4.How can we reduce the computational and communication overhead?

Page 11: Collusion-Resistant Anonymous  Data Collection Method

Related WorksRelated Works

1. Randomized Response- Respondents’ responses are associated with the result

of the toss of a coin.- Only a respondent knows whether the answer reflects

the toss of the coin or his true experience.Pros:- A well-known technique.- Easy to use.Cons:- Adds noise to the result in response set that could

distort the accuracy of the data mining results.

Page 12: Collusion-Resistant Anonymous  Data Collection Method

Related Works Cont’d…Related Works Cont’d…

2.Cryptographic Techniques

- Respondents employ two sets of keys to encrypt their responses before sending to the data collector.

- Each respondent strips off a layer off encryption sequentially and shuffles decrypted results.

- All respondents verify the intermediate results before the data collector obtains the actual response set.

Pros:

- A deterministic technique.

- The data mining results are accurate.

Cons:

- Vulnerable against collusion attacks.

- Higher communication overhead.

Page 13: Collusion-Resistant Anonymous  Data Collection Method

Building Blocks of Our ApproachBuilding Blocks of Our Approach

1.ElGamal Crypto

- is a asymmetric public key encryption scheme.

- is a probabilistic encryption.

- achieves semantic security.

- is malleable.

2. Substitution Cipher- Replace a character with another character.- Example:

Page 14: Collusion-Resistant Anonymous  Data Collection Method

The Hybrid ModelThe Hybrid Model

ElGamal EncryptionSubstitution Cipher

ElGamal Encryption

Original response

An Onion

- Employs both ElGamal and Substitution Cipher.- Builds an Onion for a response.- Removes encryption layer (De-Onion) will result in the original response.

An Onion Layer

Page 15: Collusion-Resistant Anonymous  Data Collection Method

The Hybrid Model Cont’d..The Hybrid Model Cont’d..

An example

Onion De-Onion

Original response

1234567890980936478978934567202901560011 1234567890980936478978934567202901560011

Original response

Page 16: Collusion-Resistant Anonymous  Data Collection Method

The ProtocolThe Protocol

Page 17: Collusion-Resistant Anonymous  Data Collection Method

The ProtocolThe Protocol

The Protocol has five phases

1.Data Preparation

2.Data Submission

3.Anonymization

4.Verification

5.Decryption

Page 18: Collusion-Resistant Anonymous  Data Collection Method

Phase I: Data PreparationPhase I: Data Preparation

Suppose there are 3 respondents (Alice, Bob and Carol).

Bob’s Data Preparation Process

1234 6652

1039

Bob’s Original Response

8902DM’s. Pri key

2453Bob’s Sec. key

8091Alice’s Sec. key

5436 7065

9081

2309

2098

3905

Bob’s Encrypted Response

dBob

8893

7609

Carol’s Sec. key

Page 19: Collusion-Resistant Anonymous  Data Collection Method

Phase I: Data Preparation Phase I: Data Preparation (cont’d..)(cont’d..)

Bob also computes an partial intermediate verification code WBob

Bob Alice Carol

Bob

Alice

Carol

WBob = 6652 4240 7056 bb

Page 20: Collusion-Resistant Anonymous  Data Collection Method

Phase II: Data SubmissionPhase II: Data Submission

- Each participant submits an encrypted response i.e. and W to the data miner.

The Data Miner

- Computes the verification code ΩC = WBobWAlice WCarol

- Encrypts ΩC using its secondary key and sends the result in encrypted value to each participant.

- Shuffles response set {d1 , d2 , d3 } = { , , }

- Sends {d1 , d2 , d3 } to Carol.

Page 21: Collusion-Resistant Anonymous  Data Collection Method

Phase III: AnonymizationPhase III: Anonymization

- Carol “de-onions” one layer from each of the responses {d1 , d2 , d3 } . eg,

8893 3905 7056 5607ElGamal

DecryptionSubstitution De-

Cipher

ElGamal Decryption

d’x

Intermediate verification

Page 22: Collusion-Resistant Anonymous  Data Collection Method

Phase III: Anonymization Phase III: Anonymization (cont’d..)(cont’d..)

- … and computes intermediate verification Vcarol.

AliceBob Carol

Carol

Alice

Bob

…. …. ….

…. …. ….

- Shuffles the results in set {d’y ,d’z ,d’x} = { , , }

- Sends {d’y ,d’z ,d’x} to the Data Miner.

VCarol = 7809

2291

6790

VC

Page 23: Collusion-Resistant Anonymous  Data Collection Method

Phase III: Anonymization Phase III: Anonymization (cont’d..)(cont’d..)- The Data Miner sends the randomize set

{d’y ,d’z ,d’x} to next participant (eg, Alice)- Similar to Carol, Alice also ‘de-onion’ one layer

from each element of {d’y ,d’z ,d’x}.

- Computes intermediate verification.

- Shuffles the results in set {d’p ,d’q ,d’r}={ , , }

- Sends {d’p ,d’q ,d’r} to the Data Miner.

Page 24: Collusion-Resistant Anonymous  Data Collection Method

Phase III: Anonymization Phase III: Anonymization (cont’d..)(cont’d..)- The data miner sends {d’p ,d’q ,d’r} to the last

participant (i.e. Bob), who ‘de-onion’ another layer from this set.

- Computes intermediate verification, shuffles the result in set ‘S’= {d’m ,d’n ,d’o} and sends S to data miner.

Page 25: Collusion-Resistant Anonymous  Data Collection Method

Phase IV: VerificationPhase IV: Verification

- Data miner computes the final secondary encryption value ‘RR’ from S.

- Sends ‘RR’ along with its secondary secret key to all participants.

- Bob, Alice and Carol decrypt intermediate verification code they received at Phase 2.

- They also compute ΩV and check ΩV = ΩC

- If ok, each of them sends their secondary secret key to the data miner.

Page 26: Collusion-Resistant Anonymous  Data Collection Method

Phase V: DecryptionPhase V: Decryption

- Data miner uses the respondents’ secondary keys to strip off remaining encryption layers from S.

- It uses its own primary key to strip off the final layer to reveal the original responses {….,1234,…..}.

Page 27: Collusion-Resistant Anonymous  Data Collection Method

Results and AnalysisResults and Analysis

Page 28: Collusion-Resistant Anonymous  Data Collection Method

Performance AnalysisPerformance Analysis

- Communication Overhead

• Brickell et al. KDD 2006

Page 29: Collusion-Resistant Anonymous  Data Collection Method

ComplexityComplexity

- Computation

- Respondent’s, O(N)

- Data Miner, O(N2)

- Communication- Participant’s, O(N)

Page 30: Collusion-Resistant Anonymous  Data Collection Method

ConclusionConclusion

The privacy of individual is an important issue in online data collection.

Ignoring respondents’ privacy will result in inaccuracy in the data.

Privacy-preserving online data collection must be (i) deterministic and (ii) efficient.

Page 31: Collusion-Resistant Anonymous  Data Collection Method

ConclusionConclusion

Deterministic: We employ crypto techniques

Collusion Resistance: We incorporate onion/de-onion technique (using ElGama + Substitution) to create a protective layer against collusion

Efficiency: Verification is done on single values instead of entire datasets

Page 32: Collusion-Resistant Anonymous  Data Collection Method

Thank you

Q&A

Page 33: Collusion-Resistant Anonymous  Data Collection Method

The Protocol cont’d..The Protocol cont’d..

Suppose there are 3 respondents (Alice, Bob and Carol).

1. Data Preparation (Bob’s)

1234 8902 2453DM’s. Pri key

Bob’s Sec. key 8091

Alice’s Sec. key 7609

Carol’s Sec. key

66521039908142402094

Bob’s Pri. key

Bob’s Pri. key

Substitution Cipher

Alice’s Pri. key

Substitution Cipher

5607

Alice’s Pri. key

Carol’s Pri. key 7056 Substituti

on Cipher 3905Carol’s Pri.

key 8893

Bob’s Original Response

- Bob generates a random number θ and computes ba = gθ and bb = gθ+7609

- Bob also generates WBob = 665242407056bb

Bob’s Encrypted Response

dBob

Page 34: Collusion-Resistant Anonymous  Data Collection Method

The Protocol cont’d..The Protocol cont’d..

Suppose there are 3 respondents (Alice, Bob and Carol).

1. Data Preparation (Bob’s)

1234 8902 2453DM’s. Pri key

Bob’s Sec. key 8091

Alice’s Sec. key 7609

Carol’s Sec. key

66521039908142402094

Bob’s Pri. key

Bob’s Pri. key

Substitution Cipher

Alice’s Pri. key

Substitution Cipher

5607

Alice’s Pri. key

Carol’s Pri. key 7056 Substituti

on Cipher 3905Carol’s Pri.

key 8893

Bob’s Original Response

- Bob generates a random number θ and computes ba = gθ and bb = gθ+7609

- Bob also generates WBob = 665242407056bb

Bob’s Encrypted Response

dBob

Page 35: Collusion-Resistant Anonymous  Data Collection Method

Related Works Cont’d…Related Works Cont’d…

3.Mixed Networks

- Respondents send response to an intermediate hop.

- Each hop strips off a layer of encryption, which allows them to obtain the next hop’s address and forward the result to it.

- The process continues till the response reached to the data collector.

Pros:

- Require less communication overhead.

Cons:

- Probabilistic approach and only works well if all participants and honest.

- Intermediate hops can collaborate to breach an honest respondent’s anonymity.

Page 36: Collusion-Resistant Anonymous  Data Collection Method

The Hybrid Model Cont’d..The Hybrid Model Cont’d..

1234567890

9809364789

2901560011

7893456720

An example

2901560011

7893456720

1234567890

9809364789

Onion De-Onion

ElGamal Encryption

Substitution Cipher

ElGamal Encryption

ElGamal Decryption

Substitution De-cipher

ElGamal Decryption

Original response

Original response