1 two can keep a secret: a distributed architecture for secure database services gagan aggarwal,...

20
1 Secret: Secret: A Distributed A Distributed Architecture for Architecture for Secure Database Secure Database Services Services Gagan Aggarwal, Mayank Bawa, Gagan Aggarwal, Mayank Bawa, Prasanna Prasanna Ganesan Ganesan , Hector Garcia-Molina, , Hector Garcia-Molina, Krishnaram Kenthapadi, Krishnaram Kenthapadi, Rajeev Motwani, Utkarsh Srivastava, Rajeev Motwani, Utkarsh Srivastava, Dilys Thomas, Ying Xu Dilys Thomas, Ying Xu Stanford University Stanford University

Upload: millicent-watts

Post on 13-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Two Can Keep a Secret: A Distributed Architecture for Secure Database Services Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Krishnaram

11

Two Can Keep a Secret: Two Can Keep a Secret: A Distributed A Distributed

Architecture for Secure Architecture for Secure Database ServicesDatabase Services

Gagan Aggarwal, Mayank Bawa, Gagan Aggarwal, Mayank Bawa, Prasanna Prasanna GanesanGanesan, Hector Garcia-Molina, Krishnaram , Hector Garcia-Molina, Krishnaram

Kenthapadi, Kenthapadi,

Rajeev Motwani, Utkarsh Srivastava, Dilys Rajeev Motwani, Utkarsh Srivastava, Dilys Thomas, Ying XuThomas, Ying Xu

Stanford UniversityStanford University

Page 2: 1 Two Can Keep a Secret: A Distributed Architecture for Secure Database Services Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Krishnaram

22

MotivationMotivation

Data outsourcing growing in popularityData outsourcing growing in popularity– Cheap, reliable data storage and managementCheap, reliable data storage and management

Privacy concerns looming ever largerPrivacy concerns looming ever larger– High-profile thefts (often insiders)High-profile thefts (often insiders)– Govt. legislation, e.g., California SB 1386Govt. legislation, e.g., California SB 1386

The Cure: The Cure: Secure Database Service [KC04]Secure Database Service [KC04]– Outsource to Database Service Provider (DSP)Outsource to Database Service Provider (DSP)– ……but DSP cannot “see” databut DSP cannot “see” data

Page 3: 1 Two Can Keep a Secret: A Distributed Architecture for Secure Database Services Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Krishnaram

33

The Crypto Approach The Crypto Approach [HILM02, HIM04,AKSX04][HILM02, HIM04,AKSX04]

EncryptClient DSP

Client-side

Processor

Query Q Q’

“Relevant Data”

Answer

Problem: Q’ “SELECT *”

Page 4: 1 Two Can Keep a Secret: A Distributed Architecture for Secure Database Services Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Krishnaram

44

The Power of TwoThe Power of Two

Client DSP1

DSP2

Page 5: 1 Two Can Keep a Secret: A Distributed Architecture for Secure Database Services Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Krishnaram

55

The Power of TwoThe Power of Two

DSP1

DSP2

Client-side

Processor

Query QQ1

Q2

Key: Ensure Cost (Q1)+Cost (Q2) Cost (Q)

Page 6: 1 Two Can Keep a Secret: A Distributed Architecture for Secure Database Services Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Krishnaram

66

AgendaAgenda

Defining Privacy RequirementsDefining Privacy Requirements Tools for Database DecompositionTools for Database Decomposition ExampleExample Finding a “good” decompositionFinding a “good” decomposition Query reformulation and executionQuery reformulation and execution

Open QuestionsOpen Questions

Page 7: 1 Two Can Keep a Secret: A Distributed Architecture for Secure Database Services Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Krishnaram

77

Privacy according to SB Privacy according to SB 13861386

” ”……first name or first initialfirst name or first initial and and last namelast name in in combination with any one or more of the combination with any one or more of the following data elements, when either the name or following data elements, when either the name or the data elements are not encrypted:the data elements are not encrypted:

(1) Social Security Number.(1) Social Security Number.

(2) Driver’s license number or California (2) Driver’s license number or California Identification Card number.Identification Card number.

(3) Account number, credit or debit card number, (3) Account number, credit or debit card number, in combination with any required security code…”in combination with any required security code…”

Page 8: 1 Two Can Keep a Secret: A Distributed Architecture for Secure Database Services Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Krishnaram

88

In the language of set In the language of set theory…theory…

{ Name, SSN}, { Name, SSN},

{ Name, LicenceNo}{ Name, LicenceNo}

{ Name, CaliforniaID}{ Name, CaliforniaID}

{ Name, AccountNumber}{ Name, AccountNumber}

{ Name, CreditCardNo, SecurityCode}{ Name, CreditCardNo, SecurityCode}

are all to be kept private.are all to be kept private. A set is private if A set is private if at least at least one of its one of its

elements is “hidden”.elements is “hidden”.– Element in encrypted form okElement in encrypted form ok

Page 9: 1 Two Can Keep a Secret: A Distributed Architecture for Secure Database Services Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Krishnaram

99

Defining A Private Defining A Private DecompositionDecomposition

Given a set of Given a set of privacy constraintsprivacy constraints– Each constraint is a set of attributesEach constraint is a set of attributes

Adversary knows either R1 or R2Adversary knows either R1 or R2– Insider: views all data, queries at siteInsider: views all data, queries at site

Ensure for each constraint:Ensure for each constraint:– At least one attribute is “opaque” to adversary At least one attribute is “opaque” to adversary – i.e., neither R1 nor R2 exposes all attributesi.e., neither R1 nor R2 exposes all attributes– We won’t define “opaque” We won’t define “opaque”

RR2

R1

Page 10: 1 Two Can Keep a Secret: A Distributed Architecture for Secure Database Services Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Krishnaram

1010

Relation DecompositionRelation Decomposition

CharterCharter– Break up “universal” relation R into R1 and Break up “universal” relation R into R1 and

R2R2– Lossless, privacy-preserving decompositionLossless, privacy-preserving decomposition– Note: Restriction to “relational” algebraNote: Restriction to “relational” algebra

Tools of the TradeTools of the Trade– Fragmentation Fragmentation – EncodingEncoding– Semantic Attribute DecompositionSemantic Attribute Decomposition– NoiseNoise

“Bury them and nature will take care of the

rest”

Page 11: 1 Two Can Keep a Secret: A Distributed Architecture for Secure Database Services Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Krishnaram

1111

Tools of the Trade: Tools of the Trade: FragmentationFragmentation

Horizontal FragmentationHorizontal Fragmentation– R = R1 U R2R = R1 U R2– Not too exciting Not too exciting (yet?) (yet?)

Vertical FragmentationVertical Fragmentation– Partition attributes across R1 and R2Partition attributes across R1 and R2– E.g., to obey constraint {Name, SSN}, E.g., to obey constraint {Name, SSN},

R1 R1 Name, R2 Name, R2 SSN SSN– Use tuple IDs for reassembly. R = R1 Use tuple IDs for reassembly. R = R1

JOIN R2JOIN R2

Page 12: 1 Two Can Keep a Secret: A Distributed Architecture for Secure Database Services Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Krishnaram

1212

Tools of the Trade: EncodingTools of the Trade: Encoding

Encode attribute across both R1 and R2Encode attribute across both R1 and R2– Need both parts to reconstructNeed both parts to reconstruct– Why? Sensitive attributes, e.g., EmailWhy? Sensitive attributes, e.g., Email– Different options with privacy vs. query cost Different options with privacy vs. query cost

(computation, communication) trade-offs(computation, communication) trade-offs E.g., One-time PadE.g., One-time Pad

– For each value v, construct random bit seq. rFor each value v, construct random bit seq. r– R1 R1 v XOR r, R2 v XOR r, R2 r r– Reconstruction: (v XOR r) XOR r = vReconstruction: (v XOR r) XOR r = v– Perfect privacy, Expensive?Perfect privacy, Expensive?

Page 13: 1 Two Can Keep a Secret: A Distributed Architecture for Secure Database Services Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Krishnaram

1313

Tools of the Trade: Encoding Tools of the Trade: Encoding (2)(2)

Deterministic EncryptionDeterministic Encryption– R1 R1 E EK K (v) R2 (v) R2 K K – Leaks information. E.g., can detect equalityLeaks information. E.g., can detect equality– Can push selections with equality predicateCan push selections with equality predicate

Random additionRandom addition– R1 R1 v+r , R2 v+r , R2 r r– Can push aggregate SUMCan push aggregate SUM– Problem: Information leak (what is Problem: Information leak (what is

“opaque”?)“opaque”?)

Page 14: 1 Two Can Keep a Secret: A Distributed Architecture for Secure Database Services Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Krishnaram

1414

More Tools of the TradeMore Tools of the Trade

Semantic Attribute DecompositionSemantic Attribute Decomposition– Extract “public” data from private attrs.Extract “public” data from private attrs.– E.g., Area code of PhoneNo, Domain E.g., Area code of PhoneNo, Domain

name of Emailname of Email– Useful for filtering selections, Useful for filtering selections,

aggregatesaggregates Adding NoiseAdding Noise

– Add “dangling tuples” to R1 and R2Add “dangling tuples” to R1 and R2

Page 15: 1 Two Can Keep a Secret: A Distributed Architecture for Secure Database Services Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Krishnaram

1515

ExampleExample

An Employee relation: {Name, DoB, Position, An Employee relation: {Name, DoB, Position, Salary, Gender, Email, Telephone, ZipCode}Salary, Gender, Email, Telephone, ZipCode}

Privacy ConstraintsPrivacy Constraints– {Telephone}, {Email}{Telephone}, {Email}– {Name, Salary}, {Name, Position}, {Name, DoB}{Name, Salary}, {Name, Position}, {Name, DoB}– {DoB, Gender, ZipCode}{DoB, Gender, ZipCode}– {Position, Salary}, {Salary, DoB}{Position, Salary}, {Salary, DoB}

Will use just Vertical Fragmentation and Will use just Vertical Fragmentation and Encoding.Encoding.

Page 16: 1 Two Can Keep a Secret: A Distributed Architecture for Secure Database Services Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Krishnaram

1616

Example (2)Example (2)

{Telephone}{Telephone}{Email}{Email}

{Name, Salary}{Name, Salary}{Name, Position}{Name, Position}

{Name, DoB}{Name, DoB}{DoB, {DoB,

Gender,ZipCode}Gender,ZipCode}{Position, Salary}{Position, Salary}

{Salary, DoB}{Salary, DoB}

Constraints

NameName DoBDoB

PositioPositionn

SalarySalary

GenderGender EmailEmail

TelephonTelephonee

ZipCodZipCodee

R1

R2

TelephonTelephonee

EmailEmail

SalarySalaryID

ID

Page 17: 1 Two Can Keep a Secret: A Distributed Architecture for Secure Database Services Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Krishnaram

1717

Finding a “Good” Finding a “Good” DecompositionDecomposition

Find a decomposition thatFind a decomposition that– Obeys all privacy constraintsObeys all privacy constraints– Minimizing execution cost for given workloadMinimizing execution cost for given workload

Complicated optimization problemComplicated optimization problem– After 3 layers of simplification, NP-hard to After 3 layers of simplification, NP-hard to

approximate!approximate! Multiple heuristics based on min-cuts and Multiple heuristics based on min-cuts and

set coverset cover– E.g., (1) Find lots of “efficient” decompositionsE.g., (1) Find lots of “efficient” decompositions (2) Modify to obey constraints and pick the (2) Modify to obey constraints and pick the

bestbest

Page 18: 1 Two Can Keep a Secret: A Distributed Architecture for Secure Database Services Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Krishnaram

1818

Query Reformulation and Query Reformulation and ExecutionExecution

OverviewOverview– Take original query Q on RTake original query Q on R– Rewrite to get Q1(R1) and Q2(R2)Rewrite to get Q1(R1) and Q2(R2)– Combine resultsCombine results

Key ideaKey idea– Take query plan for QTake query plan for Q– Replace R by R1 JOIN R2Replace R by R1 JOIN R2– Push down selects, projects, aggregatesPush down selects, projects, aggregates– Partition plan into twoPartition plan into two

Space of plansSpace of plans– Different ways to do joins – symmetric, semi-joinsDifferent ways to do joins – symmetric, semi-joins– Note: Q2 can depend on result of Q1Note: Q2 can depend on result of Q1

Page 19: 1 Two Can Keep a Secret: A Distributed Architecture for Secure Database Services Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Krishnaram

1919

Open QuestionsOpen Questions

Does the idea work? Compare efficiency to Does the idea work? Compare efficiency to an encryption-based schemean encryption-based scheme– Need evaluation methodologyNeed evaluation methodology

Generalize decompositionGeneralize decomposition– Deal with multiple relations, functional Deal with multiple relations, functional

dependencies, normal formsdependencies, normal forms– Allow attribute replicationAllow attribute replication

Expand space of decompositionsExpand space of decompositions– We only considered simple encoding and vert. We only considered simple encoding and vert.

fragmentationfragmentation

Page 20: 1 Two Can Keep a Secret: A Distributed Architecture for Secure Database Services Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Krishnaram

2020

Open Questions (2)Open Questions (2)

Details of the 2-database Details of the 2-database architecturearchitecture– How much functionality ends up on How much functionality ends up on

client side?client side?– How to handle other DB functions? How to handle other DB functions?

Access Control? Constraint checking?Access Control? Constraint checking? How about two How about two logical logical DBsDBs with with

disjoint administration?disjoint administration?