side-channels tom ristenpart cs 6431. pick target(s) choose launch parameters for malicious vms each...
TRANSCRIPT
Side-channels
Tom RistenpartCS 6431
Pick target(s) Choose launch parametersfor malicious VMs
Each VM checks for co-residence
Frequently achieve advantageous placement
“Cloud cartography”
Recall from last time:
Hardware
Hypervisor
OS1
P1 P2
OS2
P1 P2
This shouldn’t matter if VMM provides good isolation!
Today
• RSA remote side-channel attack• Flush+Reload side-channel attack
RSA reminder
ZN = { i | gcd(i,N) = 1 }*
Claim: Suppose e,d Zφ(N) satisfying ed mod φ(N) = 1 then for any x ZN we have that (xe)d mod N = x
*
*
(xe)d mod N = x(ed mod φ(N)) mod N = x1 mod N = x mod N
First equality isby Euler’s Theorem
φ(N) = |ZN|
RSA reminder
ZN = { i | gcd(i,N) = 1 }*
Claim: Suppose e,d Zφ(N) satisfying ed mod φ(N) = 1 then for any x ZN we have that (xe)d mod N = x
*
*
Z15 = { 1,2,4,7,8,11,13,14 }*
e = 3 , d = 3 gives ed mod 8 = 1
Zφ(15) = { 1,3,5,7 }*
x 1 2 4 7 8 11 13 14
x3 mod 15 1 8 4 13 2 11 7 14
y3 mod 15 1 2 4 7 8 11 13 14
φ(N) = |ZN|
X fpk(X)
easy given N,e
hard given N,eeasy given N,d
The RSA trapdoor permutation
pk = (N,e) sk = (N,d) with ed mod φ(N) = 1
fN,e(x) = xe mod N gN,d(y) = yd mod N
pk = (N,e) sk = (N,d) with ed mod φ(N) = 1
fN,e(x) = xe mod N gN,d(y) = yd mod N
But how do we find suitable N,e,d ?
If p,q distinct primes and N = pq then φ(N) = (p-1)(q-1)
Why?
φ(N) = |{1,…,N-1}| - |{ip : 1 ≤ i ≤ q-1}| - |{iq : 1 ≤ i ≤ p-1}| = N-1 - (q-1) - (p-1) = pq – p – q + 1 = (p-1)(q-1)
The RSA trapdoor permutation
pk = (N,e) sk = (N,d) with ed mod φ(N) = 1
fN,e(x) = xe mod N gN,d(y) = yd mod N
But how do we find suitable N,e,d ?
If p,q distinct primes and N = pq then φ(N) = (p-1)(q-1)
Given φ(N), choose e Zφ(N) and calculate d = e-1 mod φ(N)
The RSA trapdoor permutation
*
We don’t know if inverse is true, whether inverting RSA implies ability to factor
Learning p,q from N is the factoring problem
Textbook exponentiation
Exp(h,x,N)X’ = hFor i = 2 to x do
X’ = X’*hReturn X’
SqrAndMulExp(h,x,N)bk,…,b0 = xf = 1For i = k down to 0 do
f = f2 mod NIf bi = 1 then
f = f*h mod NReturn f
How do we compute hx for any h ZN?
Requires time O(|G|) in worst case.
Requires time O(k) multiplies and squares in worst case.
*
SqrAndMulExp(h,x,N)bk,…,b0 = xf = 1For i = k down to 0 do
f = f2 mod NIf bi = 1 then
f = f*h mod NReturn f
f3 = 1 h
f2 = h2 f1 = (h2)2 h
b3 = 1
b2 = 0
b1 = 1
f0 = (h4 h)2 h b1 = 1 = h8 h2 h
h11 = h8+2+1 = h8 h2 h
SqrAndMulExp(h,x,N)bk,…,b0 = xf = 1For i = k down to 0 do
f = f2 mod NIf bi = 1 then
f = f*h mod NReturn f
What side-channels might arise?• Timing• CPU state (caches, branch predictors,…)• Power
“Remote” attack setting
Hardware
Hypervisor
OS1
RSA-Decrypt
OS2(Attacker)
Co-located placement on cloud instance
RSA-Decrypt takes adversarially supplied ciphertext c ZN and computes cd mod N
Attacker can time operation
Where would this come up in practice?
*
(Part of) TLS handshake forRSA transportClient Server
PMS <- D(sk,C)
ClientHello, MaxVer, Nc, Ciphers/CompMethods
ServerHello, Ver, Ns, SessionID, Cipher/CompMethod
CERT = (pk of bank, signature over it)Check CERTusing CA publicverification key
Pick random Nc
Pick random Ns
Pick random PMSC <- E(pk,PMS)
C
SqrAndMulExp(h,x,N)bk,…,b0 = xf = 1For i = k down to 0 do
f = f2 mod NIf bi = 1 then
f = f*h mod NReturn f
C1
t1
C2
t2
…
Prior work before BB: Kocher 96 timing attackx = bk bk-1 … bk-i+1 uk-i uk-i-1 … u0
Guess some bits at front Remaining exponent bits are unknown
Predict how long it should take to decrypt (random) value C• If guess is right, predictions will correlate with observed timings• If guess is wrong, no correlations• As more bk … bk-i+1 known, predictions get strongerKocher shows can do with q = O(k)
Kocher attack in practice?
• BB point out:– OpenSSL widely used implementation of TLS
• Kocher attack doesn’t work against it:– OpenSSL uses CRT, sliding windows, two different
modular multiplication algorithms– CRT: Chinese Remainder Theorem– Sliding window: exponentation by handling
exponent in chunks, not one bit at a time
Chinese Remainder Theorem
For n = n1n2…nk with gcd(ni,nj) = 1 System of congruences x = x1 mod n1 = … = xk mod nk
has exactly one solution and we can find it efficiently
For RSA with N = pq d1 = d mod (p-1) d2 = d mod (q-1)q-1 = q-1 mod pCompute m1 = hd1 mod p m2 = hd2 mod qCompute m = m1 + (q-1 *(m1 – m2) mod p) * q
Towards a timing attackDecryption requires m2 = hd2 mod q• Square-and-multiply (lots of multiplies mod q)• Sliding window (fewer multiplies mod q)Multiply x*y mod q uses Montgomery reduction• Let R = 2z for some z• Compute (xR*yR)*R-1 = zR mod q– Fast to compute R-1
– At end of computation: zR > q then subtract q
Schindler’s observation: Pr[subtract q] ≈ (h mod q ) / 2R
R-1 * hlo mod N
tlo
R-1 *hhi mod Nthi
Guess first few bits of q = bk … b1
Let hlo = bk bk-1 … bk-i+1 0 0 0 … 0Let hhi = bk bk-1 … bk-i+1 1 0 0 … 0
If bk-i = 0 then: hlo < q < hhi
If bk-i = 1 then: hlo < hhi < q
q
Dec
rypti
on ti
me
hlo
hhi if bk-1 = 0
hhi if bk-1 = 1
RSA-Decrypt
Many more details
• Secondary timing effect– Karatsuba versus standard multiplication– Opposite timing difference for hhi - hlo – Use absolute values: one effect always dominates
• Sliding window (few multiplications)– Amplify by querying a bunch of values:
hhi , hhi + 1 , hhi + 2, …– Called a neighborhood
Timing differences measurable
The attack worked…
• ~1.5 million queries for 1024 bit RSA• 2 hours on that era’s computers• Blinding countermeasure:
y = re * c mod N m = yd / r mod N
SqrAndMulExp(h,x,N)bk,…,b0 = xf = 1For i = k down to 0 do
f = f2 mod NIf bi = 1 then
f = f*h mod NReturn f
Hardware
Hypervisor
OS1
RSA-Decrypt
OS2(Attacker)
Cache-based side channel attacks• A long literature on cache side channel attacks– Percivel 2005: RSA side channels – Tromer et al. 2005: AES side channels
• Today: particularly simple one by Yarom and Faulkner useful in PaaS clouds
Attacker VM
Victim VM
Main memory
L1 instr cache
Prime+Probe protocol
Attacker VM
(each row represents cache set)
…
Schedulingorder on CPU core
Victim VM
Attacker VM
…
• Timings correlated to (distinct) cache usage patterns of S, M operations
• Can spy frequently (every ~16 μs) by exploiting scheduling
Interrupt
Interrupt
Slow
Fast
Runs (S)operation
Runs (M)operation
Prime+Probe limitations
• Works only for L1 caches– Some recent work extending to LLC in certain
settings– Multi-core settings difficult (Zhang et al. 2012)
• Lots of noise from various sources
Towards Flush+ReloadDeduplication-based memory page sharing (Linux, KVM, VMWare)• Duplicate memory pages
detected, physical pages coalesced
• Virtual address spaces different, but mapped to same physical addresses
Main memory
Libgcrypt square( ) instructions
Inclusive cache architecture
Towards Flush+ReloadDeduplication-based memory page sharing (Linux, KVM, VMWare)• Duplicate memory pages
detected, physical pages coalesced
• Virtual address spaces different, but mapped to same physical addresses
Main memory
Libgcrypt square( ) instructions
Inclusive cache architecture
Flush+Reload protocol
• Flush from LLC memory line of interest• Wait• Time reloading memory line
Not-accessed by victim
Accessed by victim
Attacking Square and Multiply
SqrAndMulExp(h,x,N)bk,…,b0 = xf = 1For i = k down to 0 do
f = f2 mod NIf bi = 1 then
f = f*h mod NReturn f
Flush+Reload type attacks damaging
• Immediately used in a large number of follow-up papers to break various things– Cryptographic targets– Non-cryptographic targets
• Requires memory deduplication– Turned off in Amazon EC2
• But not in Linux by default– PaaS services vulnerable [Zhang et al. 2014]
32
General victim execution tracing
#include "stdio.h”int b;int inc(int number) {
return number + 1;}int main() {
int a = 9;if (a % 2 == 1)
a = inc(a);b = a;return 0;
}
33
Control-Flow Graph#include "stdio.h”int b;int main() {
int a = 9;if (a % 2 == 1)
a = inc(a); int inc(int number) {return number + 1;
}
b = a;return 0;
}
34
Control-Flow Graph
4004b6: mov $0x9,%edi 4004bb: callq 4004b4 <inc>
400324: lea 0x1(%rdi),%eax400327: retq
4004c0: mov %eax,0x200b60(%rip) 4004c6: mov $0x0,%eax4004cb: retq
chunk 1: [400480-4004bf]
chunk 3: [4004c0-4004ff]
chunk 2: [400300-40033f]
35
Control-Flow Graph
chunk1[400480-4004bf]
chunk2[400300-40033f]
chunk 3[4004c0-4004ff]
36
An Attack NFA
{c1}
{c2, c3}
{c3}
Flush-Reload
Flush-Reload
Flush-Reload
{}
(c1, t1, t2)
(c2, t3, t4)
chunk1[400480-4004bf]
chunk2[400300-40033f]
chunk 3[4004c0-4004ff]
start
(c3, t7, t8)
(c3, t5, t6)
37
Three Example Attacks
• Inferring Sensitive User Data
• SAML-based Single Sign-on Attacks
• Password-Reset Attack
38
Three Example Attacks
• Inferring Sensitive User Data
• SAML-based Single Sign-on Attacks
• Password-Reset Attack
39
Password Reset
Password Reset
Token Verification
Pseudo-Random Number
GeneratorUser
Token
40
Password Reset Attack
Password Reset
Token Verification
Pseudo-Random Number
Generator
User
Token Pseudo-Random Number
Generator
Shared Library
gettimeofday()
Attacker
Attack NFA
Library calls
TokenPrediction
Shared OSApplication Application
41
Call Graph of Password Resetting
42
Attacker’s StrategyAttacker
applicationVictim
applicationAttacker’s
email account
Password reset against his own account
Password reset token(attack NFA)
Offline: Bruteforce value of getpid()
Password reset against victim account
Online: token guessing
(attack NFA)
(HTTP keepAlive)
43
The Attack NFA
44
Evaluation
• Demonstrated successful attacks against Magento (controlled by ourselves) in a public PaaS cloud.
• After 220 offline computation, the attacker can narrow down the password reset token to 22 possible values---easy to brute-force online.
Side-channel countermeasures
• Constant time code– Input-independent memory accesses, timing
• Very difficult to truly get!– Input-dependent instruction timing on CPU
• Best bet: don’t share physical resources