1
Riposte: an Anonymous Messaging System that 'Hides the Metadata'
Charles River Crypto Day20 February 2015
Henry Corrigan-Gibbs
Joint work with Dan Boneh and David MazièresTo appear at IEEE Security and Privacy 2015
2
?!?
0VUIC9zZW5zaXRpdmU
…but does that hide enough?
With PKE, we can hide the data…
(pk, sk)pk
3
…
Time From To Size
10:12 Alice Bob 2543 B
10:27 Carol Alice 567 B
10:32 Alice Bob 450 B
10:35 Bob Alice 9382 B
4
Time From To Size
10:12 Alice [email protected] 2543 B
10:27 Carol Alice 567 B
10:32 Alice Bob 450 B
10:35 Bob Alice 9382 B
[cf. Ed Felten’s testimony before the HouseJudiciary Committee, 2 Oct 2013]
…
Hiding the data is necessary, but
not sufficient
5
Focus of this talk
Goal: post “anonymously” to a public bulletin board
Building block for many problems related to “hiding the metadata” E-voting Anonymous surveys Private messaging, etc.
6
First Attempt: Tor
[Dingledine,Mathewson,Syverson 2004]
“Onion”encryption
7
Passive network adversary can correlate
flows!
Is this attack realistic?
[Murdoch andDanezis 2005]
[Bauer et al. 2007]
8
A well-placed adversary need control few links
[Murdoch and Zieliński 2007]
9
Tor is practical at Internet scale
… but its security properties are unclear
We design an anonymous messaging system that:1) satisfies clear security goals,
2) handles millions of users in an “anonymous Twitter” system.
10
Outline
• Motivation• Definitions and a “Straw man” scheme• Technical challenges• Evaluation• Conclusions
11
Outline
• Motivation• Definitions and a “Straw man” scheme• Technical challenges• Evaluation• Conclusions
12
Reframe the Problem
≅
Writing “privately” to a database
Posting anonymously to a bulletin board
13
Goal
The “Anonymity Set”
14
Goal
15
+
Goal
0
0
Protest will be held tomo…
See my cat photos at w…
0
DB does not learn who wrote which
message
16
(k,t)-Write-Anonymous DB Scheme
k = (# of servers), t = (# malicious servers)
[Gen query to write m into row l of DB]
[Apply query to state of server i]
[Combine server states to reveal plaintext DB]
s s’
qi
17
Goals
1. Correctness
2. Write-anonymity
3. Disruption resistance
18
Goal 1: Correctness
Informal: Output DB should be result of applying queries to DB state.
If queries are:
then result of Reveal() is:
19
Goal 2: Write-Anonymity
Informal: coalition of t malicious servers and any number of malicious clients should not learn who wrote what to the DB.
I will present the [simplified] two-server definition with one malicious server.
20
Challenger AdversaryLet n = number of clients total
For :
Choose on elements of H
21
Challenger AdversaryLet n = number of clients total
For :
Choose perm on elements of H
22
Challenger AdversaryLet n = number of clients total
For :
Choose perm on elements of H
23
For :
Choose perm on elements of H
24
For :
Choose perm on elements of HQueries in H
updated according to permutation π
25
For :
Choose perm on elements of H Adv should not be
able to distinguish between real π and
random π*Choose perm on elements of H
Intuition: The scheme hides “who wrote what”
(which query corresponds to which message)
26
Goal 3: Disruption Resistance
Intuition: each query should change at most one DB row—prevent disruption
Informal: An adversary cannot generate N “valid” queries that affect > N rows
[We defer the definition a“valid” query for now…]
27
Privacy-Preserving DB Schemes
ORAM [GO’96] / Group ORAM [GOMT’11]– CPU(s) writing to RAM
Private Info Retrieval (PIR) [CGKS’97]– Client reading from DB shared across servers
Private Info Storage [OS’97]– Client writing to DB shared across servers
This work: Many clients (incl malicious ones) writing to DB shared across servers
Ideally: for all k, tolerate compromise
of k-1 servers
28
(2,1)-Private“Straw man”
Scheme[Chaum ‘88]
SX
0
0
0
0
0
SY
0
0
0
0
0
29
SX
0
0
0
0
0
SY
0
0
0
0
0
“Straw man”Scheme
30
SX
0
0
0
0
0
SY
0
0
0
0
0Write msg mA into DB
row 3
“Straw man”Scheme
31
SX
0
0
0
0
0
SY
0
0
0
0
0
0
0
mA
0
0
“Straw man”Scheme
32
“Straw man”Scheme
SX
0
0
0
0
0
SY
0
0
0
0
0
0
0
mA
0
0
r1
r2
r3
r4
r5
33
“Straw man”Scheme
SX
0
0
0
0
0
SY
0
0
0
0
0
0
0
mA
0
0
r1
r2
r3
r4
r5
-r1
-r2
mA -r3
-r4
-r5
- =
34
“Straw man”Scheme
SX
0
0
0
0
0
SY
0
0
0
0
0
r1
r2
r3
r4
r5
-r1
-r2
mA -r3
-r4
-r5
35
SX
0
0
0
0
0
SY
0
0
0
0
0
r1
r2
r3
r4
r5
-r1
-r2
mA -r3
-r4
-r5
“Straw man”Scheme
36
SX
r1
r2
r3
r4
r5
SY
-r1
-r2
-r3+mA
-r4
-r5
“Straw man”Scheme
37
SX
r1
r2
r3
r4
r5
SY
-r1
-r2
-r3+mA
-r4
-r5
0
0
0
0
mB
“Straw man”Scheme
38
“Straw man”Scheme
SX
r1
r2
r3
r4
r5
SY
-r1
-r2
-r3+mA
-r4
-r5
0
0
0
0
mB
s1
s2
s3
s4
s5
-s1
-s2
-s3
-s4
mB -s5
- =
39
“Straw man”Scheme
SX
r1
r2
r3
r4
r5
SY
-r1
-r2
-r3+mA
-r4
-r5
s1
s2
s3
s4
s5
-s1
-s2
-s3
-s4
mB -s5
40
SX
r1
r2
r3
r4
r5
SY
-r1
-r2
-r3+mA
-r4
-r5
s1
s2
s3
s4
s5
-s1
-s2
-s3
-s4
mB -s5
“Straw man”Scheme
41
SX
r1 + s1
r2 + s2
r3 + s3
r4 + s4
r5 + s5
SY
-r1 - s1
-r2 - s2
-r3 - s3 + mA
-r4 - s4
-r5 - s5 - mB
“Straw man”Scheme
42
SX
r1 + s1
r2 + s2
r3 + s3
r4 + s4
r5 + s5
SY
-r1 - s1
-r2 - s2
-r3 - s3 + mA
-r4 - s4
-r5 - s5 - mB
“Straw man”Scheme
43
SX
r1 + s1
r2 + s2
r3 + s3
r4 + s4
r5 + s5
SY
-r1 - s1
-r2 - s2
-r3 - s3 + mA
-r4 - s4
-r5 - s5 - mB
“Straw man”Scheme
44
SX
r1 + s1
r2 + s2
r3 + s3
r4 + s4
r5 + s5
SY
-r1 - s1
-r2 - s2
-r3 - s3 + mA
-r4 - s4
-r5 - s5 - mB
“Straw man”Scheme
45
SX
r1 + s1
r2 + s2
r3 + s3
r4 + s4
r5 + s5
SY
-r1 - s1
-r2 - s2
-r3 - s3 + mA
-r4 - s4
-r5 - s5 - mB
At the end of the day, servers
combine DBs to reveal plaintext
+ =
0
0
mA
0
mB
“Straw man”Scheme
46
First-Attempt Scheme: Properties
Correctness
— By construction
Write-Anonymity
— Given output vector, servers can simulatetheir view of the protocol run
Practical Efficiency
— Almost no “heavy” computation involved
47
Extensions
Use k > 2 serverssecure against
k-1 evil servers
Use a large-characteristic field e.g., email-length rows
48
Outline
• Motivation• Definitions and a “Straw man” scheme• Technical challenges• Evaluation• Conclusions
49
Limitations of the “Straw man”
1. O(L) communication cost
2. Collisions
3. Malicious clients
50
Limitations of the “Straw man”
1. O(L) communication cost
2. Collisions
3. Malicious clients
51
Challenge 1: Bandwidth Efficiency
In “straw man” design, client sends DB-sized vector to each server
Idea: run PIR protocol in reverse to write into DB while sending fewer bits
PIR-in-reverse used in Ostrovsky-Shoup ’97 in single-client context
We extend their results to a many-client context (with malicious clients)
52
(k,t)-Distributed Point Functions
• We use a generalization of “DPFs” defined by Gilboa and Ishai (2014)
• Many one-round-trip PIR protocols construct DPFs implicitly
Goal:
53
(k,t)-Distributed Point Functions
Correctness:
(k,t)-Privacy: [In a minute]
Sum of the Eval() outputs will be zero everywhere, except
at position l
54
DPF Correctness
… …
x1
+x2
xk
+…
0 0 00 0m
=
55
(k,t)-Distributed Point Functions
(k,t)-Privacy: Can simulate the distribution of any subset S of at most t DPF keys
[ Intuition: t keys leak nothing about m or l ]
56
SX
0
0
0
0
0
SY
0
0
0
0
0
DPFs Reduce Bandwidth Cost
57
SX
0
0
0
0
0
SY
0
0
0
0
0
DPFs Reduce Bandwidth Cost
r1
r2
r3
r4
r5
-r1
-r2
mA -r3
-r4
-r5
58
Alice sends bits
59
Challenge 1: Bandwidth Efficiency
We show: a (k, t)-private DPF
yields
a k-server write-anonymous DB scheme tolerating up to t malicious servers
I will present a (2,1)-DPF with O(L1/2)-length keys based on PIR of Chor and Gilboa (’97)
60
(2,1)-DPF Construction
Idea: – Represent Eval() output as a matrix
– Keys can be length of side
61
(2,1)-DPF Construction
62
(2,1)-DPF Construction
Idea: – Represent Eval() output as a matrix
– Keys can be length of row
Output will sum to m at l = (i, j)
63
(2,1)-DPF Construction
k1
k2
k3
k4
k5
v
0
1
1
0
1
Using as the field
Key = (b, k, v), where each has length
Sampled at random
k1
k2*
k3
k4
k5
v
0
0
1
0
1
64
(2,1)-DPF Construction
k1
k2
k3
k4
k5
v
0
1
1
0
1
k1
k2*
k3
k4
k5
v
0
0
1
0
1
G(k1)
G(k2)
G(k3)
G(k4)
G(k5)
G(k1)
G(k2*)
G(k3)
G(k4)
G(k5)
G() is a PRG mapping keys k to L1/2 bits
65
(2,1)-DPF Construction
k1
k2
k3
k4
k5
v
0
1
1
0
1
k1
k2*
k3
k4
k5
v
0
0
1
0
1
G(k1)
G(k2) + v
G(k3) + v
G(k4)
G(k5) + v
G(k1)
G(k2*)
G(k3) + v
G(k4)
G(k5) + v
66
(2,1)-DPF Construction
k1
k2
k3
k4
k5
v
0
1
1
0
1
k1
k2*
k3
k4
k5
v
0
0
1
0
1
G(k1)
G(k2) + v
G(k3) + v
G(k4)
G(k5) + v
G(k1)
G(k2*)
G(k3) + v
G(k4)
G(k5) + v
Outputs are equal everywhere except at row 2
67
(2,1)-DPF Construction
k1
k2
k3
k4
k5
v
0
1
1
0
1
k1
k2*
k3
k4
k5
v
0
0
1
0
1
G(k1)
G(k3) + v
G(k4)
G(k5) + v
G(k1)
G(k3) + v
G(k4)
G(k5) + v
Outputs sum to zero everywhere except at row 2
G(k2) + v G(k2*)
68
(2,1)-DPF Construction
k1
k2
k3
k4
k5
v
0
1
1
0
1
k1
k2*
k3
k4
k5
v
0
0
1
0
1
G(k1)
G(k3) + v
G(k4)
G(k5) + v
G(k1)
G(k3) + v
G(k4)
G(k5) + v
G(k2) + v G(k2*)
Construct v as:v = G(k2) + G(k2*) + m ej
69
G(k1)
G(k2) + v
G(k3) + v
G(k4)
G(k5) + v
G(k1)
G(k2*)
G(k3) + v
G(k4)
G(k5) + v
+ =
00000…00000
0000000m000
00000…00000
00000…00000
00000…00000
70
Challenge 1: Bandwidth Efficiency
• Brings comm cost down to O(L1/2)– Just requires PRG — fast!
• Recursive application of the same trick– Key size down to polylog(L) [GI’14]
k1
k2
k3
k4
k5
v
0
1
1
0
1
71
New DPF Construction
Given a seed-homomorphic PRGG(s1) + G(s2) = G(s1 + s2)
we build a (k, k-1)-private DPF[NPR’99] [BLMR’13] [BP’14] [BV’15]
Privacy holds even if all but one server is adversarial
72
Limitations of the “Straw man”
1. O(L) communication cost
2. Collisions
3. Malicious clients
73
Challenge 2: Collisions
• Clients pick write location l at random• Two honest clients may write into
the same location l0
0
0
0
0
74
Challenge 2: Collisions
• Clients pick write location l at random• Two honest clients may write into
the same location l0
0
mA
0
0
75
Challenge 2: Collisions
• Clients pick write location l at random• Two honest clients may write into
the same location l0
0
mA + mB
0
0
76
Challenge 2: Collisions
• Clients pick write location l at random• Two honest clients may write into
the same location l
Instead of getting mA,mB, get the sum
mA + mB
0
0
mA + mB
0
0
77
Challenge 2: Collisions
Straightforward solution:
Make DB table large enoughto avoid collisions
Better solution:
Use coding techniques to recover fromup to d-way collisions
Key idea: even after a collision, learn the sum of colliding writes
78
Challenge 2: Collisions
Idea: To handle 2-collisions, cancode message m as: (m, m2)
[Let ]
After a 2-collision, DBs recover the values:
Given c1 and c2 can recover m1 and m2
79
Challenge 2: Collisions
Using coding technique, can tolerated-collisions for any d
For 1% loss rate, 1k users:
Naive method: 100k cells
Coding method: 6.9k cells
Reduces table size by 93%
80
Limitations of the “Straw man”
1. O(L) communication cost
2. Collisions
3. Malicious clients
81
Challenge 3: Malicious
Clients
SX
r1
r2
r3
r4
r5
SY
-r1
-r2
-r3+mA
-r4
-r5
One malicious client can corrupt
the entire DB!
b1
b2
b3
b4
b5
a1
a2
a3
a4
a5
82
Goal: Prevent evil client from destroying DB• One way to solve this is with NIZKs
– Expensive public-key crypto [Golle Juels ‘04]
• More efficient solution:– Add a third non-colluding “audit” server to get
honest majority– Fast, info-theoretic MPC techniques [GMW’87], [CCD’88], [FNW’96]
Challenge 3: Malicious Client
83
DB Server X DB Server Y
Auditor
84
DB Server X DB Server Y
Auditor
85
DB Server X DB Server Y
Auditor
86
DB Server X DB Server Y
AuditorAuditork1
k2
k3
k4
k5
v
0
1
1
0
1
k1
k2*
k3
k4
k5
v
0
0
1
0
1
87
DB Server X DB Server Y
AuditorAuditork1
k2
k3
k4
k5
0
1
1
0
1
k1
k2*
k3
k4
k5
0
0
1
0
1
88
DB Server X DB Server Y
AuditorAuditor0 | k1
1 | k2
1 | k3
0 | k4
1 | k5
0 | k1
0 | k2*
1 | k3
0 | k4
1 | k5
89
DB Server X DB Server Y
Auditor
a1 a2 a3 b1 b2 b3
90
DB Server X DB Server Y
Auditor
a1
offset
a2 a3 b1 b2 b3
91
DB Server X DB Server Y
Auditor
offset
a1a2 a3 b2 b3 b1
92
DB Server X DB Server Y
Auditor
h1, h2, h3
a1a2 a3 b2 b3 b1
93
DB Server X DB Server Y
Auditor
h1(a1)h2(a2) h3(a3) h2(b2) h3(b3) h1(b1)
h1, h2, h3
94
DB Server X DB Server Y
Auditor
h1(a1)h2(a2) h3(a3) h2(b2) h3(b3) h1(b1)
95
DB Server X DB Server Y
Auditor
h1(a1)h2(a2) h3(a3) h2(b2) h3(b3) h1(b1)
96
Auditor
r3r1 r2 r1 r2 r3
Equal almost everywhere?
97
DB Server X DB Server Y
Auditor
98
Outline
• Motivation• Definitions and a “Straw man” scheme• Technical challenges• Evaluation• Conclusions
99
Implementation
• Implemented the full protocol in Go– 2 DB servers + 1 audit server
• Ran perf evaluation on a network testbed simulating real-world net conditions
100
Bottom-Line Result
• For a DB with 65,000 Tweet-length rows, can process 30 writes/second
• Can process 1,000,000 writes in 8 hours on a single server
Main bottleneck is PRG expansion
101
Throughput(anonymous Twitter)
At large table sizes, PRG cost
dominates
102
Outline
• Motivation• Definitions and a “Straw man” scheme• Technical challenges• Evaluation• Conclusions
103
Open Problems
1. Reduce Θ(L) computation cost at server– Using multiple rounds per write?
2. Key-homomorphic DPFs– Another way to reduce cost at server
3. (k, k-1)-private DPFs without PKC– Possible without seed-hom PRGs?
104
Conclusion
In many contexts, “hiding the metadata” is as important as hiding the data
Combination of crypto tools with systems design 1,000,000-user anonymity sets
“Multi-user writable PIRs” have applications to private messaging
– Still barriers to practicality (+ open problems)∃
105
Questions?
106