1 two can keep a secret: a distributed architecture for secure database services gagan aggarwal,...
TRANSCRIPT
11
Two Can Keep a Secret: Two Can Keep a Secret: A Distributed A Distributed
Architecture for Secure Architecture for Secure Database ServicesDatabase Services
Gagan Aggarwal, Mayank Bawa, Gagan Aggarwal, Mayank Bawa, Prasanna Prasanna GanesanGanesan, Hector Garcia-Molina, Krishnaram , Hector Garcia-Molina, Krishnaram
Kenthapadi, Kenthapadi,
Rajeev Motwani, Utkarsh Srivastava, Dilys Rajeev Motwani, Utkarsh Srivastava, Dilys Thomas, Ying XuThomas, Ying Xu
Stanford UniversityStanford University
22
MotivationMotivation
Data outsourcing growing in popularityData outsourcing growing in popularity– Cheap, reliable data storage and managementCheap, reliable data storage and management
Privacy concerns looming ever largerPrivacy concerns looming ever larger– High-profile thefts (often insiders)High-profile thefts (often insiders)– Govt. legislation, e.g., California SB 1386Govt. legislation, e.g., California SB 1386
The Cure: The Cure: Secure Database Service [KC04]Secure Database Service [KC04]– Outsource to Database Service Provider (DSP)Outsource to Database Service Provider (DSP)– ……but DSP cannot “see” databut DSP cannot “see” data
33
The Crypto Approach The Crypto Approach [HILM02, HIM04,AKSX04][HILM02, HIM04,AKSX04]
EncryptClient DSP
Client-side
Processor
Query Q Q’
“Relevant Data”
Answer
Problem: Q’ “SELECT *”
44
The Power of TwoThe Power of Two
Client DSP1
DSP2
55
The Power of TwoThe Power of Two
DSP1
DSP2
Client-side
Processor
Query QQ1
Q2
Key: Ensure Cost (Q1)+Cost (Q2) Cost (Q)
66
AgendaAgenda
Defining Privacy RequirementsDefining Privacy Requirements Tools for Database DecompositionTools for Database Decomposition ExampleExample Finding a “good” decompositionFinding a “good” decomposition Query reformulation and executionQuery reformulation and execution
Open QuestionsOpen Questions
77
Privacy according to SB Privacy according to SB 13861386
” ”……first name or first initialfirst name or first initial and and last namelast name in in combination with any one or more of the combination with any one or more of the following data elements, when either the name or following data elements, when either the name or the data elements are not encrypted:the data elements are not encrypted:
(1) Social Security Number.(1) Social Security Number.
(2) Driver’s license number or California (2) Driver’s license number or California Identification Card number.Identification Card number.
(3) Account number, credit or debit card number, (3) Account number, credit or debit card number, in combination with any required security code…”in combination with any required security code…”
88
In the language of set In the language of set theory…theory…
{ Name, SSN}, { Name, SSN},
{ Name, LicenceNo}{ Name, LicenceNo}
{ Name, CaliforniaID}{ Name, CaliforniaID}
{ Name, AccountNumber}{ Name, AccountNumber}
{ Name, CreditCardNo, SecurityCode}{ Name, CreditCardNo, SecurityCode}
are all to be kept private.are all to be kept private. A set is private if A set is private if at least at least one of its one of its
elements is “hidden”.elements is “hidden”.– Element in encrypted form okElement in encrypted form ok
99
Defining A Private Defining A Private DecompositionDecomposition
Given a set of Given a set of privacy constraintsprivacy constraints– Each constraint is a set of attributesEach constraint is a set of attributes
Adversary knows either R1 or R2Adversary knows either R1 or R2– Insider: views all data, queries at siteInsider: views all data, queries at site
Ensure for each constraint:Ensure for each constraint:– At least one attribute is “opaque” to adversary At least one attribute is “opaque” to adversary – i.e., neither R1 nor R2 exposes all attributesi.e., neither R1 nor R2 exposes all attributes– We won’t define “opaque” We won’t define “opaque”
RR2
R1
1010
Relation DecompositionRelation Decomposition
CharterCharter– Break up “universal” relation R into R1 and Break up “universal” relation R into R1 and
R2R2– Lossless, privacy-preserving decompositionLossless, privacy-preserving decomposition– Note: Restriction to “relational” algebraNote: Restriction to “relational” algebra
Tools of the TradeTools of the Trade– Fragmentation Fragmentation – EncodingEncoding– Semantic Attribute DecompositionSemantic Attribute Decomposition– NoiseNoise
“Bury them and nature will take care of the
rest”
1111
Tools of the Trade: Tools of the Trade: FragmentationFragmentation
Horizontal FragmentationHorizontal Fragmentation– R = R1 U R2R = R1 U R2– Not too exciting Not too exciting (yet?) (yet?)
Vertical FragmentationVertical Fragmentation– Partition attributes across R1 and R2Partition attributes across R1 and R2– E.g., to obey constraint {Name, SSN}, E.g., to obey constraint {Name, SSN},
R1 R1 Name, R2 Name, R2 SSN SSN– Use tuple IDs for reassembly. R = R1 Use tuple IDs for reassembly. R = R1
JOIN R2JOIN R2
1212
Tools of the Trade: EncodingTools of the Trade: Encoding
Encode attribute across both R1 and R2Encode attribute across both R1 and R2– Need both parts to reconstructNeed both parts to reconstruct– Why? Sensitive attributes, e.g., EmailWhy? Sensitive attributes, e.g., Email– Different options with privacy vs. query cost Different options with privacy vs. query cost
(computation, communication) trade-offs(computation, communication) trade-offs E.g., One-time PadE.g., One-time Pad
– For each value v, construct random bit seq. rFor each value v, construct random bit seq. r– R1 R1 v XOR r, R2 v XOR r, R2 r r– Reconstruction: (v XOR r) XOR r = vReconstruction: (v XOR r) XOR r = v– Perfect privacy, Expensive?Perfect privacy, Expensive?
1313
Tools of the Trade: Encoding Tools of the Trade: Encoding (2)(2)
Deterministic EncryptionDeterministic Encryption– R1 R1 E EK K (v) R2 (v) R2 K K – Leaks information. E.g., can detect equalityLeaks information. E.g., can detect equality– Can push selections with equality predicateCan push selections with equality predicate
Random additionRandom addition– R1 R1 v+r , R2 v+r , R2 r r– Can push aggregate SUMCan push aggregate SUM– Problem: Information leak (what is Problem: Information leak (what is
“opaque”?)“opaque”?)
1414
More Tools of the TradeMore Tools of the Trade
Semantic Attribute DecompositionSemantic Attribute Decomposition– Extract “public” data from private attrs.Extract “public” data from private attrs.– E.g., Area code of PhoneNo, Domain E.g., Area code of PhoneNo, Domain
name of Emailname of Email– Useful for filtering selections, Useful for filtering selections,
aggregatesaggregates Adding NoiseAdding Noise
– Add “dangling tuples” to R1 and R2Add “dangling tuples” to R1 and R2
1515
ExampleExample
An Employee relation: {Name, DoB, Position, An Employee relation: {Name, DoB, Position, Salary, Gender, Email, Telephone, ZipCode}Salary, Gender, Email, Telephone, ZipCode}
Privacy ConstraintsPrivacy Constraints– {Telephone}, {Email}{Telephone}, {Email}– {Name, Salary}, {Name, Position}, {Name, DoB}{Name, Salary}, {Name, Position}, {Name, DoB}– {DoB, Gender, ZipCode}{DoB, Gender, ZipCode}– {Position, Salary}, {Salary, DoB}{Position, Salary}, {Salary, DoB}
Will use just Vertical Fragmentation and Will use just Vertical Fragmentation and Encoding.Encoding.
1616
Example (2)Example (2)
{Telephone}{Telephone}{Email}{Email}
{Name, Salary}{Name, Salary}{Name, Position}{Name, Position}
{Name, DoB}{Name, DoB}{DoB, {DoB,
Gender,ZipCode}Gender,ZipCode}{Position, Salary}{Position, Salary}
{Salary, DoB}{Salary, DoB}
Constraints
NameName DoBDoB
PositioPositionn
SalarySalary
GenderGender EmailEmail
TelephonTelephonee
ZipCodZipCodee
R1
R2
TelephonTelephonee
EmailEmail
SalarySalaryID
ID
1717
Finding a “Good” Finding a “Good” DecompositionDecomposition
Find a decomposition thatFind a decomposition that– Obeys all privacy constraintsObeys all privacy constraints– Minimizing execution cost for given workloadMinimizing execution cost for given workload
Complicated optimization problemComplicated optimization problem– After 3 layers of simplification, NP-hard to After 3 layers of simplification, NP-hard to
approximate!approximate! Multiple heuristics based on min-cuts and Multiple heuristics based on min-cuts and
set coverset cover– E.g., (1) Find lots of “efficient” decompositionsE.g., (1) Find lots of “efficient” decompositions (2) Modify to obey constraints and pick the (2) Modify to obey constraints and pick the
bestbest
1818
Query Reformulation and Query Reformulation and ExecutionExecution
OverviewOverview– Take original query Q on RTake original query Q on R– Rewrite to get Q1(R1) and Q2(R2)Rewrite to get Q1(R1) and Q2(R2)– Combine resultsCombine results
Key ideaKey idea– Take query plan for QTake query plan for Q– Replace R by R1 JOIN R2Replace R by R1 JOIN R2– Push down selects, projects, aggregatesPush down selects, projects, aggregates– Partition plan into twoPartition plan into two
Space of plansSpace of plans– Different ways to do joins – symmetric, semi-joinsDifferent ways to do joins – symmetric, semi-joins– Note: Q2 can depend on result of Q1Note: Q2 can depend on result of Q1
1919
Open QuestionsOpen Questions
Does the idea work? Compare efficiency to Does the idea work? Compare efficiency to an encryption-based schemean encryption-based scheme– Need evaluation methodologyNeed evaluation methodology
Generalize decompositionGeneralize decomposition– Deal with multiple relations, functional Deal with multiple relations, functional
dependencies, normal formsdependencies, normal forms– Allow attribute replicationAllow attribute replication
Expand space of decompositionsExpand space of decompositions– We only considered simple encoding and vert. We only considered simple encoding and vert.
fragmentationfragmentation
2020
Open Questions (2)Open Questions (2)
Details of the 2-database Details of the 2-database architecturearchitecture– How much functionality ends up on How much functionality ends up on
client side?client side?– How to handle other DB functions? How to handle other DB functions?
Access Control? Constraint checking?Access Control? Constraint checking? How about two How about two logical logical DBsDBs with with
disjoint administration?disjoint administration?