distributing data for secure data services

34
1 DISTRIBUTING DATA FOR SECURE DATA SERVICES Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani March 25, 2011 Stanford, TRDDC, TRUST

Upload: dennis

Post on 11-Jan-2016

39 views

Category:

Documents


1 download

DESCRIPTION

DISTRIBUTING DATA FOR SECURE DATA SERVICES Vignesh Ganapathy , Dilys Thomas, Tomas Feder , Hector Garcia Molina, Rajeev Motwani March 25, 2011 Stanford, TRDDC, TRUST. Road Map. Motivation for Secure Databases Distributing Data Encryption, Distribution Privacy Constraints - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: DISTRIBUTING DATA FOR SECURE DATA SERVICES

1

DISTRIBUTING DATA FOR SECURE DATA

SERVICES

Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani

March 25, 2011Stanford, TRDDC, TRUST

Page 2: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Road MapMotivation for Secure Databases Distributing Data

Encryption, Distribution Privacy Constraints Schema Decomposition

Query Partitioning Cost Estimation Where and Select clause processing Query Decomposition

Experiments

Related Work

2

Page 3: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Motivation 1: Data Privacy in Enterprises

3

HealthPersonal medical details

Disease history

Clinical research dataBanking

Bank statement

Loan Details

Transaction history

FinancePortfolio information

Credit history

Transaction records

Investment details

InsuranceClaims records

Accident history

Policy details

OutsourcingCustomer data for testing

Remote DB Administration

BPO & KPORetail Business

Inventory records

Individual credit card details

Audits

ManufacturingProcess details

Blueprints

Production data

Govt. AgenciesCensus records

Economic surveys

Hospital Records

Page 4: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Motivation 2: Government Regulations

4

Country Privacy LegislationAustralia Privacy Amendment Act of 2000

European Union Personal Data Protection Directive 1998

Hong Kong Personal Data (Privacy) Ordinance of 1995

United Kingdom Data Protection Act of 1998

United States Security Breach Information Act (S.B. 1386) of 2002Gramm-Leach-Bliley Act of 1999Health Insurance Portability and Accountability Act of 1996

Page 5: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Motivation 3: Personal Information

Emails

Searches on Google/Yahoo

Profiles on Social Networking sites

Passwords / Credit Card / Personal information at multiple E-commerce sites / Organizations

Documents on the Computer / Network

5

Page 6: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Data Privacy Value disclosure: What is the value of attribute

salary of person XPerturbation

- Privacy Preserving OLAP

Identity disclosure: Whether an individual is present in the database tableRandomization, K-Anonymity etc.

- Data for Outsourcing / Research

Linkage disclosure: Linking columns from multiple sites

6

Page 7: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Losses due to Lack of Privacy: ID-Theft

7

3% of households in the US affected by ID-Theft

US $5-50B losses/year

UK £1.7B losses/year

AUD $1-4B losses/year

Page 8: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Road Map Motivation for Secure Databases

Distributing Data Encryption, Distribution Privacy Constraints Schema Decomposition

Query Partitioning Cost Estimation Where and Select clause processing Query Decomposition

Experiments

Related Work

8

Page 9: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Two Can Keep a Secret: A Distributed Architecture for Secure Database

Services

Aggarwal, Bawa, Ganesan, Garcia-Molina, Kenthapadi, Motwani, Srivastava, Thomas, Xu

CIDR 2005

9

How to distribute data across multiple sites for :1. Redundancy and 2. Privacy so that a single site being compromised

does not lead to data loss

Page 10: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Cloud Data Services Data outsourcing growing in popularity

Cheap, reliable data storage and management 1TB $399 < $0.5 per GB$5000 – Oracle 10g / SQL Server $68k/year DBAdmin

Privacy concerns looming ever largerHigh-profile thefts (often insiders)

UCLA lost 900k recordsBerkeley lost laptop with sensitive informationAcxiom, JP Morgan, Choicepointwww.privacyrights.org

10

Page 11: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Present solutions Application level: Salesforce.com

On-Demand Customer Relationship Management $65/User/Month ---- $995 / 5 Users / 1 Year

Amazon Elastic Compute Cloud 1 instance = 1.7Ghz x86 processor, 1.75GB RAM, 160GB local disk, 250 Mb/s network bandwidth Elastic, Completely controlled, Reliable, Secure$0.10 per instance hour$0.20 per GB of data in/out of Amazon$0.15 per GB-Month of Amazon S3 storage used

Google Apps for your domain Small businesses, Enterprise, School, Family or Group

11

Page 12: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Encryption Based Solution

12

EncryptClient DSP

Client-side

Processor

Query Q Q’

“Relevant Data”

Answer

Problem: Q’ “SELECT *”

Page 13: DISTRIBUTING DATA FOR SECURE DATA SERVICES

The Power of Two

13

Client DSP1

DSP2

Page 14: DISTRIBUTING DATA FOR SECURE DATA SERVICES

The Power of Two

14

DSP1

DSP2

Client-side

Processor

Query QQ1

Q2

Key: Ensure Cost (Q1)+Cost (Q2) Cost (Q)

Page 15: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Privacy ConstraintsSB1386 Privacy

{ Name, SSN}

{ Name, LicenceNo}

{ Name, CaliforniaID}

{ Name, AccountNumber}

{ Name, CreditCardNo, SecurityCode}

are all to be kept private.

A set is private if at least one of its elements is “hidden”. Element in encrypted form ok

15

Page 16: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Techniques for Satisfying Privacy Constraints

Vertical Fragmentation Partition attributes across R1 and R2 E.g., to obey constraint {Name, SSN}, R1 Name, R2 SSN Use tuple IDs for reassembly. R = R1 JOIN R2

Encoding

One-time Pad For each value v, construct random bit seq. r R1 v XOR r, R2 r

Deterministic Encryption R1 EK (v) R2 K Can detect equality and push selections with equality predicate

Random addition R1 v+r , R2 r Can push aggregate SUM

16

Page 17: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Example Schema & Privacy Constraints

An Employee relation: {Name, DoB, Position, Salary, Gender, Email, Telephone, ZipCode}

Privacy Constraints {Telephone}, {Email} {Name, Salary}, {Name, Position}, {Name, DoB} {DoB, Gender, ZipCode} {Position, Salary}, {Salary, DoB}

Will use just Vertical Fragmentation and Encoding.

17

Page 18: DISTRIBUTING DATA FOR SECURE DATA SERVICES

An Employee relation: {Name, DoB, Position, Salary, Gender, Email, Telephone, ZipCode}

Privacy Constraints {Telephone}, {Email} {Name, Salary}, {Name, Position}, {Name, DoB} {DoB, Gender, ZipCode} {Position, Salary}, {Salary, DoB}

Decomposed schema R1: {TID, Name, Email, Telephone, Gender, Salary } R2: {TID, Name, Email, Telephone, DoB, Position, ZipCode } Encrypted Attributes E: {Telephone, Email, Name}

18

Page 19: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Partitioning, Execution

Partitioning Problem Partition to minimize communication cost for given workload Even simplified version hard to approximate Hill Climbing algorithm after starting with weighted set cover

Query Reformulation and Execution Consider only centralized plans Algorithm to partition select and where clause predicates

between the two partitions

19

Page 20: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Hill Climbing Approach for Partitioning

20

Page 21: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Road Map Motivation for Secure Databases

Distributing Data Encryption, Distribution Privacy Constraints Schema Decomposition

Query Partitioning Cost Estimation Where and Select clause processing Query Decomposition

Experiments

Related Work

21

Page 22: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Predicates for cost computation

22

Page 23: DISTRIBUTING DATA FOR SECURE DATA SERVICES

State Definitions for Bottom Up Evaluation

0: condition clause cannot be pushed to either servers

1: condition clause can be pushed to Server 1

2: condition clause can be pushed to Server 2

3: condition clause can be pushed to both servers

4: condition clause can be pushed to either servers

23

Page 24: DISTRIBUTING DATA FOR SECURE DATA SERVICES

OR State Evaluation

24

Page 25: DISTRIBUTING DATA FOR SECURE DATA SERVICES

AND State Evaluation

25

Page 26: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Query Partitioning

Query 1:

SELECT TID, name, salary

FROM R1

WHERE Name=’Tom’

Query 2:

SELECT TID, dob, zipcode

FROM R2

WHERE Position=’Staff’

26

Original QuerySELECT Name, DoB, Salary

FROM R WHERE (Name =’Tom’ AND Position=’Staff’) AND (Zipcode =’94305’ OR Salary > 60000)

R1: {TID, Name, Email, Telephone, Gender, Salary R2: {TID, Email, Telephone, DoB, Position, ZipCode }

Page 27: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Distributed Query Plan

27

Page 28: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Road Map

Motivation for Secure Databases

Distributing Data Encryption, Distribution Privacy Constraints Schema Decomposition

Query Partitioning Cost Estimation Where and Select clause processing Query Decomposition

Experiments Related Work

28

Page 29: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Number of Iterations

29

Page 30: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Perfomance Gain Experiment

30

Page 31: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Iterations Vs Privacy Constraints

31

Page 32: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Papers[CIDR05]Two Can Keep A Secret.

[SIGMOD05] Privacy Preserving OLAP.

[ICDT05]Anonymizing Tables.

[PODS06]Clustering For Anonymity.

[KDD07] Probabilistic Anonymity.

32

Page 33: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Thank You!

33

Page 34: DISTRIBUTING DATA FOR SECURE DATA SERVICES

Acknowledgements: Collaborators

Stanford Privacy Group

TRDDC Privacy Group

PORTIA, TRUST, Google

34