private join and compute · 2019-06-20 · 1 each pa!y has its own data city and...

1
1 Each pay has its own data City and businesses/point-of- sale data Row 2 1 4 Row 1 TOTAL: 2 rows Private Join and Compute uses a deterministic commutative cipher, such as Pohlig-Hellman, to encrypt the IDs. This allows double-encrypted values to be compared but hides the underlying values as long as at least one of the 2 encryption keys remains hidden. Intersection set Private Join and Compute uses a technique called homomorphic encryption that allows combining encrypted values without decrypting them first. When the combined value is decrypted, it contains the sum of the contributed inputs. $5 $10 $15 $5 $10 $15 Homomorphic Encryption 2 Encrypting each pay’s data First, both parties encrypt their data with private keys so that it’s not accessible or decipherable to anyone else. 3 Exchanging encrypted data Then, each party sends its encrypted data to the other party. 4 Double encrypting Both parties encrypt with their own private keys, resulting in double- encrypted data. The double-encrypted IDs can be compared but can't be decrypted by either party individually. 5 Finding intersections The point-of-sale provider can send the double-encrypted train rider data back to the city in shued order. The city can now compare the IDs to find the overlapping set of users but won't know anything about the value of their purchases at local businesses. 6 Joining encrypted values By counting intersecting rows, the city can now calculate the value of the purchases made by train riders. 7 Aggregating With homomorphic encryption, encrypted purchase values can be summed, but the actual amount spent by train riders can't be viewed. 8 Summarizing, decrypting, and viewing The city can pass its encrypted purchase data to the point-of-sale provider, where the data can be decrypted and the total amount spent can be viewed. Encrypted Train Riders (IDs) Encrypted Purchases (IDs, $ spent) Train Riders (IDs) Train ID key 1 4 8 Business ID key Business $ key 1 4 20 $5 $10 $5 Purchases (IDs, $ spent) 1 4 20 $5 $10 $5 Encrypted Purchases (IDs, $ spent) Train Riders (IDs) 1 4 8 1 4 20 $5 $10 $5 Consumer Purchases (IDs, $ spent) $15 2 Encrypted Purchases ($ spent) Total Train Riders Who Made a Purchase TWO TRAIN RIDERS MADE PURCHASES 1 4 20 $5 $10 $5 Encrypted Purchases ($ spent) Double-Encrypted Purchase (IDs) 1 4 $5 $10 Encrypted Purchases ($ spent) Double-Encrypted Purchase (IDs) Double-Encrypted Train Riders (IDs) Encrypted Purchases ($ spent) Double-Encrypted Purchase (IDs) 1 4 20 $5 $10 $5 26 1 13 4 FIND THE INTERSECTION BETWEEN BOTH SIDES Businesses / Point of Sale City Encrypted Train Riders (IDs) 1 4 8 1 4 8 1 4 20 $5 $10 $5 SHUFFLE Double-Encrypted Train Riders (IDs) 1 4 8 THE CITY CAN NOW MAKE A DECISION BASED ON THE TOTAL RIDERS AND AMOUNT SPENT 2 $15 Total Amount ($ spent) Total Train Riders Who Made a Purchase 2 Encrypted Purchases ($ spent) Total Train Riders Who Made a Purchase BUSINESS $ KEY USED TO DECRYPT $15 Private Join and Compute Problem statement Challenge A city is evaluating whether to offer a weekend train service to an up-and- coming neighborhood and will base its decision on how much train riders spend at local businesses in that neighborhood. The city knows who rides the train Local businesses in the neighborhood use a point-of-sale provider that knows the value of consumer transactions The city and point-of-sale provider do not want to share personally identifiable information with each other How can the city work together with local merchants to answer this question without revealing any personally identifiable information about train riders, where they travel, or what they buy? Figures for illustration only Number of riders and dollars spent are smaller than they would be in a real-world example. Color key

Upload: others

Post on 14-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Private Join and Compute · 2019-06-20 · 1 Each pa!y has its own data City and businesses/point-of-sale data Row 2 1 4 Row 1 TOTAL: 2 rows Private Join and Compute uses a deterministic

1 Each party has its own dataCity and businesses/point-of-sale data

Row 2

14

Row 1

TOTAL: 2 rows

Private Join and Compute uses a deterministic commutative cipher, such as Pohlig-Hellman, to encrypt the IDs. This allows double-encrypted values to be compared but hides the underlying values as long as at least one of the 2 encryption keys remains hidden.

Intersection set

Private Join and Compute uses a technique called homomorphic encryption that allows combining encrypted values without decrypting them first.

When the combined value is decrypted, it contains the sum of the contributed inputs.

$5$10

$15

$5

$10

$15

Homomorphic Encryption

2 Encrypting each party’s data First, both parties encrypt their data with private keys so that it’s not accessible or decipherable to anyone else.

3 Exchanging encrypted dataThen, each party sends its encrypted data to the other party.

4 Double encryptingBoth parties encrypt with their own private keys, resulting in double-encrypted data.

The double-encrypted IDs can be compared but can't be decrypted by either party individually.

5 Finding intersectionsThe point-of-sale provider can send the double-encrypted train rider data back to the city in shuffled order.

The city can now compare the IDs to find the overlapping set of users but won't know anything about the value of their purchases at local businesses.

6 Joining encrypted valuesBy counting intersecting rows, the city can now calculate the value of the purchases made by train riders.

7 AggregatingWith homomorphic encryption, encrypted purchase values can be summed, but the actual amount spent by train riders can't be viewed.

8 Summarizing, decrypting, and viewingThe city can pass its encrypted purchase data to the point-of-sale provider, where the data can be decrypted and the total amount spent can be viewed.

Encrypted Train Riders (IDs)

Encrypted Purchases (IDs, $ spent)

Train Riders (IDs)

Train ID key

148…

Business ID key

Business $ key

14

20…

$5$10$5…

Purchases (IDs, $ spent)

14

20…

$5$10$5…

Encrypted Purchases (IDs, $ spent)

Train Riders (IDs)

148…

14

20…

$5$10$5…

Consumer Purchases (IDs, $ spent)

$152

Encrypted Purchases ($ spent)

Total Train Riders Who Made a Purchase

TWO TRAIN RIDERS MADE PURCHASES

14

20…

$5$10$5…

Encrypted Purchases ($ spent)

Double-Encrypted Purchase (IDs)

14

$5$10

Encrypted Purchases ($ spent)

Double-Encrypted Purchase (IDs)

Double-Encrypted Train Riders (IDs)

Encrypted Purchases ($ spent)

Double-Encrypted Purchase (IDs)

14

20…

$5$10$5…

261

134…

FIND THE INTERSECTION BETWEEN BOTH SIDES

Businesses / Point of SaleCity

Encrypted Train Riders (IDs)

148…

148…

14

20…

$5$10$5…

SHUFFLE

Double-Encrypted Train Riders (IDs)

148…

THE CITY CAN NOW MAKE A DECISION BASED ON THE TOTAL RIDERS AND AMOUNT SPENT

2 $15

Total Amount($ spent)

Total Train Riders Who Made a Purchase

2

Encrypted Purchases ($ spent)

Total Train Riders Who Made a Purchase

BUSINESS $ KEY USED TO DECRYPT$15

Private Join and ComputeProblem statement

Challenge

A city is evaluating whether to offer a weekend train service to an up-and-coming neighborhood and will base its decision on how much train riders spend at local businesses in that neighborhood.

• The city knows who rides the train• Local businesses in the neighborhood use a point-of-sale provider that knows the value of consumer transactions• The city and point-of-sale provider do not want to share personally identifiable information with each other• How can the city work together with local merchants to answer this question without revealing any personally identifiable information about train riders, where they travel, or what they buy?

Figures for illustration only

Number of riders and dollars spent are smaller than they would be in a real-world example.

Color key