spying on your neighbor: cpu cache attacks and beyond · spying on your neighbor: cpu cache attacks...
Post on 23-Jul-2020
14 Views
Preview:
TRANSCRIPT
S P Y I N G O N Y O U R N E I G H B O R : C P U C A C H E AT TA C K S A N D B E Y O N D
B E N G R A S / @ B J G , K AV E H R A Z AV I , C R I S T I A N O G I U F F R I D A , H E R B E R T B O S V R I J E U N I V E R S I T E I T A M S T E R D A M B L A C K H AT U S A 2 0 1 8
A B O U T M E
A B O U T M E
• PhD student in VUsec VU University Research group
A B O U T M E
• PhD student in VUsec VU University Research group
• Academic group researching systems software security
A B O U T M E
• PhD student in VUsec VU University Research group
• Academic group researching systems software security
• We do software hardening, exploitation
A B O U T M E
• PhD student in VUsec VU University Research group
• Academic group researching systems software security
• We do software hardening, exploitation
• Hardware attacks, side channels
A B O U T M E
• PhD student in VUsec VU University Research group
• Academic group researching systems software security
• We do software hardening, exploitation
• Hardware attacks, side channels
• Academic recognition but also hacker scene (Pwnies!)
A B O U T M E
• PhD student in VUsec VU University Research group
• Academic group researching systems software security
• We do software hardening, exploitation
• Hardware attacks, side channels
• Academic recognition but also hacker scene (Pwnies!)
A B O U T M E
• PhD student in VUsec VU University Research group
• Academic group researching systems software security
• We do software hardening, exploitation
• Hardware attacks, side channels
• Academic recognition but also hacker scene (Pwnies!)
O V E R V I E W
• Side channels
• Cache attacks
• Cache defences
• Hyperthreading
• TLBleed
• Evaluation
S I D E C H A N N E L S
S I D E C H A N N E L S
S I D E C H A N N E L S
• Leak secrets outside the regular interface
S I D E C H A N N E L S
• Leak secrets outside the regular interface
R I C H H I S T O R Y - S M A R T C A R D S
• Power Consumption(FPGA Security by Shemal Shroff et al.)
• EM radiation: leak ECC bits(FPGA Security by Shemal Shroff et al.)
• Execution time: leak ECC, RSA bits(Timing Attacks on ECC by Shemal Shroff et al.)
• Acoustic cryptanalysis(RSA Key Extraction [..] by Adi Shamir et al.)
R I C H H I S T O R Y - S M A R T C A R D S
• Power Consumption(FPGA Security by Shemal Shroff et al.)
• EM radiation: leak ECC bits(FPGA Security by Shemal Shroff et al.)
• Execution time: leak ECC, RSA bits(Timing Attacks on ECC by Shemal Shroff et al.)
• Acoustic cryptanalysis(RSA Key Extraction [..] by Adi Shamir et al.)
R I C H H I S T O R Y - S M A R T C A R D S
• Power Consumption(FPGA Security by Shemal Shroff et al.)
• EM radiation: leak ECC bits(FPGA Security by Shemal Shroff et al.)
• Execution time: leak ECC, RSA bits(Timing Attacks on ECC by Shemal Shroff et al.)
• Acoustic cryptanalysis(RSA Key Extraction [..] by Adi Shamir et al.)
R I C H H I S T O R Y - S M A R T C A R D S
• Power Consumption(FPGA Security by Shemal Shroff et al.)
• EM radiation: leak ECC bits(FPGA Security by Shemal Shroff et al.)
• Execution time: leak ECC, RSA bits(Timing Attacks on ECC by Shemal Shroff et al.)
• Acoustic cryptanalysis(RSA Key Extraction [..] by Adi Shamir et al.)
R I C H H I S T O R Y - S M A R T C A R D S
• Power Consumption(FPGA Security by Shemal Shroff et al.)
• EM radiation: leak ECC bits(FPGA Security by Shemal Shroff et al.)
• Execution time: leak ECC, RSA bits(Timing Attacks on ECC by Shemal Shroff et al.)
• Acoustic cryptanalysis(RSA Key Extraction [..] by Adi Shamir et al.)
C A C H E AT TA C K S
C A C H E : S O F T W A R E E Q U I VA L E N T
• Computing processes ought to be compartmented
• Different owners or privilege levels: trust boundaries
C A C H E : S O F T W A R E E Q U I VA L E N T
• Computing processes ought to be compartmented
• Different owners or privilege levels: trust boundaries
C A C H E : S O F T W A R E E Q U I VA L E N T
• Computing processes ought to be compartmented
• Different owners or privilege levels: trust boundaries
C A C H E : S O F T W A R E E Q U I VA L E N T
• Computing processes ought to be compartmented
• Different owners or privilege levels: trust boundaries
C A C H E : S O F T W A R E E Q U I VA L E N T
• Computing processes ought to be compartmented
• Different owners or privilege levels: trust boundaries
C A C H E : S O F T W A R E E Q U I VA L E N T
• There are shared resources between processes
• RAM, CPU cache, TLB, computational resources ..
• Practically always: allows signaling: covert channel
• Sometimes: allows side channel (spying)
• Shared RAM row(DRAMA paper, by Peter Peßl et al.)
• Shared cache set (FLUSH+RELOAD by Yuval Yarom et al.shown, many others exist)
• Cache prefetch (Prefetch Side-Channel Attacks,by Daniel Gruss et al.)
C R O S S - P R O C E S S S H A R E D S TAT E
• Shared RAM row(DRAMA paper, by Peter Peßl et al.)
• Shared cache set (FLUSH+RELOAD by Yuval Yarom et al.shown, many others exist)
• Cache prefetch (Prefetch Side-Channel Attacks,by Daniel Gruss et al.)
C R O S S - P R O C E S S S H A R E D S TAT E
• Shared RAM row(DRAMA paper, by Peter Peßl et al.)
• Shared cache set (FLUSH+RELOAD by Yuval Yarom et al.shown, many others exist)
• Cache prefetch (Prefetch Side-Channel Attacks,by Daniel Gruss et al.)
C R O S S - P R O C E S S S H A R E D S TAT E
• Shared RAM row(DRAMA paper, by Peter Peßl et al.)
• Shared cache set (FLUSH+RELOAD by Yuval Yarom et al.shown, many others exist)
• Cache prefetch (Prefetch Side-Channel Attacks,by Daniel Gruss et al.)
1 2 3 4 5
11 12 13 14 15
6 7 8 9 10
16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
C R O S S - P R O C E S S S H A R E D S TAT E
• Shared RAM row(DRAMA paper, by Peter Peßl et al.)
• Shared cache set (FLUSH+RELOAD by Yuval Yarom et al.shown, many others exist)
• Cache prefetch (Prefetch Side-Channel Attacks,by Daniel Gruss et al.)
C R O S S - P R O C E S S S H A R E D S TAT E
• Shared RAM row(DRAMA paper, by Peter Peßl et al.)
• Shared cache set (FLUSH+RELOAD by Yuval Yarom et al.shown, many others exist)
• Cache prefetch (Prefetch Side-Channel Attacks,by Daniel Gruss et al.)
C R O S S - P R O C E S S S H A R E D S TAT E
• Shared RAM row(DRAMA paper, by Peter Peßl et al.)
• Shared cache set (FLUSH+RELOAD by Yuval Yarom et al.shown, many others exist)
• Cache prefetch (Prefetch Side-Channel Attacks,by Daniel Gruss et al.)
C R O S S - P R O C E S S S H A R E D S TAT E
C R O S S - P R O C E S S / V M S H A R E D S TAT E
• This is only possible because of shared resources
C R O S S - P R O C E S S / V M S H A R E D S TAT E
• This is only possible because of shared resources
E X A M P L E : F L U S H + R E L O A D
• One of several cache attacks
• Relies on shared memory
• Can be shared object (mmap()ed shared libraries)
• Or shared pages after deduplication (KSM in Linux)
E X A M P L E : F L U S H + R E L O A D
• Work by Yuval Yarom, Katrina Falkner
• Memory access patterns can betray secrets
• Because access patterns frequently depend on secrets
• Example: RSA keys. (n,e,d) Private: d. n=pq and d are 1024 bits or more
E X A M P L E : F L U S H + R E L O A D
• Signing is: computing md (mod n)
• Often square-and-multiply depending on bits in d
• Shared cache activity betrays memory access patterns
• Quickly probing the cache can betray the bits in d
E X A M P L E : F L U S H + R E L O A D
E X A M P L E : F L U S H + R E L O A D
E X A M P L E : F L U S H + R E L O A D
E X A M P L E : F L U S H + R E L O A D
E X A M P L E : F L U S H + R E L O A D
E X A M P L E : F L U S H + R E L O A D
E X A M P L E : F L U S H + R E L O A D
E X A M P L E : F L U S H + R E L O A D
E X A M P L E : F L U S H + R E L O A D
E X A M P L E : F L U S H + R E L O A D
E X A M P L E : F L U S H + R E L O A D
E X A M P L E : F L U S H + R E L O A D
• Can also attack AES implementation with T tables
• A table lookup happens Tj [xi = pi ⊕ ki ]
• pi is a plaintext byte, ki a key byte where pi is a plaintext byte, ki a key byte,
E X A M P L E : F L U S H + R E L O A D
• Again: secrets are betrayed by memory accesses
• Known plaintext + accesses = key recovery where pi is a plaintext byte, ki a key byte,
E X A M P L E : F L U S H + R E L O A D
• Again: secrets are betrayed by memory accesses
• Known plaintext + accesses = key recovery where pi is a plaintext byte, ki a key byte,
C A C H E D E F E N C E S
C A C H E C O L O U R I N G
• Figure out page colors
• These map to shared cache sets
• Do not share same colors across security boundaries
• Kernel arranges this
C A C H E C O L O U R I N G
• Figure out page colors
• These map to shared cache sets
• Do not share same colors across security boundaries
• Kernel arranges this
1 2 3 4 5
11 12 13 14 15
21 22 23 24 25
C A C H E C O L O U R I N G
• Figure out page colors
• These map to shared cache sets
• Do not share same colors across security boundaries
• Kernel arranges this
1 2 3 4 5
11 12 13 14 15
21 22 23 24 25
C A C H E C O L O U R I N G
• Figure out page colors
• These map to shared cache sets
• Do not share same colors across security boundaries
• Kernel arranges this
1 2 3 4 5
11 12 13 14 15
21 22 23 24 25
C A C H E C O L O U R I N G
• Figure out page colors
• These map to shared cache sets
• Do not share same colors across security boundaries
• Kernel arranges this
1 2 3 4 5
11 12 13 14 15
21 22 23 24 25
6 7 8 9 10
16 17 18 19 20
26 27 28 29 30
C A C H E C O L O U R I N G
• Figure out page colors
• These map to shared cache sets
• Do not share same colors across security boundaries
• Kernel arranges this
1 2 3 4 5
11 12 13 14 15
21 22 23 24 25
6 7 8 9 10
16 17 18 19 20
26 27 28 29 30
C A C H E C O L O U R I N G
• Figure out page colors
• These map to shared cache sets
• Do not share same colors across security boundaries
• Kernel arranges this
1 2 3 4 5
11 12 13 14 15
21 22 23 24 25
6 7 8 9 10
16 17 18 19 20
26 27 28 29 30
C A C H E C O L O U R I N G
• Figure out page colors
• These map to shared cache sets
• Do not share same colors across security boundaries
• Kernel arranges this
1 2 3 4 5
11 12 13 14 15
21 22 23 24 25
6 7 8 9 10
16 17 18 19 20
26 27 28 29 30
C A C H E C O L O U R I N G
• Figure out page colors
• These map to shared cache sets
• Do not share same colors across security boundaries
• Kernel arranges this
1 2 3 4 5
11 12 13 14 15
21 22 23 24 25
6 7 8 9 10
16 17 18 19 20
26 27 28 29 30
C A C H E PA R T I T I O N I N G : C AT
• Intel CAT: Cache Allocation Technology
• Intended for predictable performance for VMs
• Partitions caches in ways
• Hardware feature
C A C H E PA R T I T I O N I N G : C AT
• Intel CAT: Cache Allocation Technology
• Intended for predictable performance for VMs
• Partitions caches in ways
• Hardware feature
1 2 3 4 5
11 12 13 14 15
6 7 8 9 10
16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
C A C H E PA R T I T I O N I N G : C AT
• Intel CAT: Cache Allocation Technology
• Intended for predictable performance for VMs
• Partitions caches in ways
• Hardware feature
1 2 3 4 5
11 12 13 14 15
6 7 8 9 10
16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
C A C H E PA R T I T I O N I N G : C AT
• Intel CAT: Cache Allocation Technology
• Intended for predictable performance for VMs
• Partitions caches in ways
• Hardware feature
1 2 3 4 5
11 12 13 14 15
6 7 8 9 10
16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
C A C H E PA R T I T I O N I N G : C AT
• Intel CAT: Cache Allocation Technology
• Intended for predictable performance for VMs
• Partitions caches in ways
• Hardware feature
1 2 3 4 5
11 12 13 14 15
6 7 8 9 10
16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
C A C H E PA R T I T I O N I N G : C AT
• Intel CAT: Cache Allocation Technology
• Intended for predictable performance for VMs
• Partitions caches in ways
• Hardware feature
1 2 3 4 5
11 12 13 14 15
6 7 8 9 10
16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
C A C H E PA R T I T I O N I N G : C AT
• Intel CAT: Cache Allocation Technology
• Intended for predictable performance for VMs
• Partitions caches in ways
• Hardware feature
1 2 3 4 5
11 12 13 14 15
6 7 8 9 10
16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
C A C H E PA R T I T I O N I N G : C AT
• Intel CAT: Cache Allocation Technology
• Intended for predictable performance for VMs
• Partitions caches in ways
• Hardware feature
1 2 3 4 5
11 12 13 14 15
6 7 8 9 10
16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
C A C H E PA R T I T I O N I N G : T S X
C A C H E PA R T I T I O N I N G : T S X• Intel TSX: Transactional Synchronization Extensions
C A C H E PA R T I T I O N I N G : T S X• Intel TSX: Transactional Synchronization Extensions
• Intended for hardware transactional memory
C A C H E PA R T I T I O N I N G : T S X• Intel TSX: Transactional Synchronization Extensions
• Intended for hardware transactional memory
• But relies on unshared cache activity
C A C H E PA R T I T I O N I N G : T S X• Intel TSX: Transactional Synchronization Extensions
• Intended for hardware transactional memory
• But relies on unshared cache activity
• Transactions fit in cache, otherwise auto-abort
C A C H E PA R T I T I O N I N G : T S X• Intel TSX: Transactional Synchronization Extensions
• Intended for hardware transactional memory
• But relies on unshared cache activity
• Transactions fit in cache, otherwise auto-abort
• We can use this as a defence
C A C H E PA R T I T I O N I N G : T S X• Intel TSX: Transactional Synchronization Extensions
• Intended for hardware transactional memory
• But relies on unshared cache activity
• Transactions fit in cache, otherwise auto-abort
• We can use this as a defence
C A C H E PA R T I T I O N I N G : T S X• Intel TSX: Transactional Synchronization Extensions
• Intended for hardware transactional memory
• But relies on unshared cache activity
• Transactions fit in cache, otherwise auto-abort
• We can use this as a defence
H Y P E R T H R E A D I N G
S U P E R S C A L A R C P U U T I L I S AT I O N
S U P E R S C A L A R C P U U T I L I S AT I O N
• Share functional units (FU): increase utilisation
S U P E R S C A L A R C P U U T I L I S AT I O N
• Share functional units (FU): increase utilisation
S U P E R S C A L A R C P U U T I L I S AT I O N
• Share functional units (FU): increase utilisation
S U P E R S C A L A R C P U U T I L I S AT I O N
• Share functional units (FU): increase utilisation
S U P E R S C A L A R C P U U T I L I S AT I O N
• Share functional units (FU): increase utilisation
S U P E R S C A L A R C P U U T I L I S AT I O N
• Share functional units (FU): increase utilisation
H Y P E R T H R E A D I N G
H Y P E R T H R E A D I N G
• Low investment, high utilisation yield
H Y P E R T H R E A D I N G
H Y P E R T H R E A D I N G
• Significant resource sharing: good and bad
T L B L E E D
T L B L E E D : T L B
T L B L E E D : T L B A S S H A R E D S TAT E
T L B L E E D : T L B A S S H A R E D S TAT E
T L B L E E D : T L B A S S H A R E D S TAT E
• Other structures than cache shared between threads?
• What about the TLB?
• Documented: TLB has L1iTLB, L1dTLB, and L2TLB
• They have sets and ways
• Not documented: structure
T L B L E E D : T L B A S S H A R E D S TAT E
T L B L E E D : T L B A S S H A R E D S TAT E• Let’s experiment with performance counters
• Try linear structure first
• All combinations of ways (set size) and sets (stride)
• Smallest number of ways is it
• Smallest corresponding stride is number of sets
T L B L E E D : T L B A S S H A R E D S TAT E• Let’s experiment with performance counters
• Try linear structure first
• All combinations of ways (set size) and sets (stride)
• Smallest number of ways is it
• Smallest corresponding stride is number of sets
T L B L E E D : T L B A S S H A R E D S TAT E• Let’s experiment with performance counters
• Try linear structure first
• All combinations of ways (set size) and sets (stride)
• Smallest number of ways is it
• Smallest corresponding stride is number of sets
T L B L E E D : T L B A S S H A R E D S TAT E
T L B L E E D : T L B A S S H A R E D S TAT E• For L2TLB:
We reverse engineered a more complex hash function
T L B L E E D : T L B A S S H A R E D S TAT E• For L2TLB:
We reverse engineered a more complex hash function
• Skylake XORs 14 bits, Broadwell XORs 16 bits
T L B L E E D : T L B A S S H A R E D S TAT E• For L2TLB:
We reverse engineered a more complex hash function
• Skylake XORs 14 bits, Broadwell XORs 16 bits
• Represented by this matrix, using modulo 2 arithmetic
T L B L E E D : T L B A S S H A R E D S TAT E• For L2TLB:
We reverse engineered a more complex hash function
• Skylake XORs 14 bits, Broadwell XORs 16 bits
• Represented by this matrix, using modulo 2 arithmetic
T L B L E E D : T L B A S S H A R E D S TAT E
T L B L E E D : T L B A S S H A R E D S TAT E• Let’s experiment with performance counters
T L B L E E D : T L B A S S H A R E D S TAT E• Let’s experiment with performance counters
• Now we know the structure.. Are TLB’s shared between hyperthreads?
T L B L E E D : T L B A S S H A R E D S TAT E• Let’s experiment with performance counters
• Now we know the structure.. Are TLB’s shared between hyperthreads?
• Let’s experiment with misses when accessing the same set
T L B L E E D : T L B A S S H A R E D S TAT E• Let’s experiment with performance counters
• Now we know the structure.. Are TLB’s shared between hyperthreads?
• Let’s experiment with misses when accessing the same set
T L B L E E D : T L B A S S H A R E D S TAT E• Let’s experiment with performance counters
• Now we know the structure.. Are TLB’s shared between hyperthreads?
• Let’s experiment with misses when accessing the same set
T L B L E E D : T L B A S S H A R E D S TAT E• Let’s experiment with performance counters
• Now we know the structure.. Are TLB’s shared between hyperthreads?
• Let’s experiment with misses when accessing the same set
T L B L E E D : T L B A S S H A R E D S TAT E
T L B L E E D : T L B A S S H A R E D S TAT E• We find more TLB properties
• Size, structure, sharing, miss penalty, hash function
T L B L E E D : T L B A S S H A R E D S TAT E
T L B L E E D : T L B A S S H A R E D S TAT E
• Can we use only latency?
• Map many virtual addresses to same physical page
T L B L E E D : T L B A S S H A R E D S TAT E
• Can we use only latency?
• Map many virtual addresses to same physical page
T L B L E E D : T L B A S S H A R E D S TAT E
T L B L E E D : T L B A S S H A R E D S TAT E• Let’s observe EdDSA ECC key multiplication
T L B L E E D : T L B A S S H A R E D S TAT E• Let’s observe EdDSA ECC key multiplication
• Scalar is secret and ADD only happens if there’s a 1
T L B L E E D : T L B A S S H A R E D S TAT E• Let’s observe EdDSA ECC key multiplication
• Scalar is secret and ADD only happens if there’s a 1
• Like RSA square-and-multiply
T L B L E E D : T L B A S S H A R E D S TAT E• Let’s observe EdDSA ECC key multiplication
• Scalar is secret and ADD only happens if there’s a 1
• Like RSA square-and-multiply
• But: we can not use code information! Only data..!
T L B L E E D : T L B A S S H A R E D S TAT E
T L B L E E D : T L B A S S H A R E D S TAT E
• Typical cache attack relies on spatial separation
• I.E. different cache lines
T L B L E E D : T L B A S S H A R E D S TAT E
• Typical cache attack relies on spatial separation
• I.E. different cache lines
T L B L E E D : T L B A S S H A R E D S TAT E
T L B L E E D : T L B A S S H A R E D S TAT E• Let’s find the spatial L1 DTLB separation
• There isn’t any
• Too much activity in both blue/green cases
T L B L E E D : T L B A S S H A R E D S TAT E• Let’s find the spatial L1 DTLB separation
• There isn’t any
• Too much activity in both blue/green cases
T L B L E E D : T L B A S S H A R E D S TAT E
T L B L E E D : T L B A S S H A R E D S TAT E• Monitor a single TLB set and use temporal information
T L B L E E D : T L B A S S H A R E D S TAT E• Monitor a single TLB set and use temporal information
• Use machine learning (SVM classifier) to tell the difference
T L B L E E D : T L B A S S H A R E D S TAT E• Monitor a single TLB set and use temporal information
• Use machine learning (SVM classifier) to tell the difference
T L B L E E D : T L B A S S H A R E D S TAT E• Monitor a single TLB set and use temporal information
• Use machine learning (SVM classifier) to tell the difference
T L B L E E D : T L B A S S H A R E D S TAT E• Monitor a single TLB set and use temporal information
• Use machine learning (SVM classifier) to tell the difference
T L B L E E D : T L B A S S H A R E D S TAT E• Monitor a single TLB set and use temporal information
• Use machine learning (SVM classifier) to tell the difference
T L B L E E D : T L B A S S H A R E D S TAT E• Monitor a single TLB set and use temporal information
• Use machine learning (SVM classifier) to tell the difference
E VA L U AT I O N
T L B L E E D R E L I A B I L I T Y
T L B L E E D R E L I A B I L I T Y
T L B L E E D R E L I A B I L I T Y
T L B L E E D R E L I A B I L I T Y
• Work also by Kaveh Razavi, Cristiano Giuffrida, Herbert Bos
• Some diagrams in these slides were taken from other work: FLUSH+RELOAD, Prefetch, DRAMA
• Yuval Yarom, Katrina Falkner, Peter Peßl, Daniel Gruss
C R E D I T
C O N C L U S I O N
• Practical, reliable, high resolution side channels exist outside the cache
C O N C L U S I O N
• Practical, reliable, high resolution side channels exist outside the cache
• They bypass defences
C O N C L U S I O N
• Practical, reliable, high resolution side channels exist outside the cache
• They bypass defences
• @bjg @gober
C O N C L U S I O N
• Practical, reliable, high resolution side channels exist outside the cache
• They bypass defences
• @bjg @gober
• @vu5ec
C O N C L U S I O N
• Practical, reliable, high resolution side channels exist outside the cache
• They bypass defences
• @bjg @gober
• @vu5ec
• www.vusec.net
C O N C L U S I O N
• Practical, reliable, high resolution side channels exist outside the cache
• They bypass defences
• @bjg @gober
• @vu5ec
• www.vusec.net
• Thank you for listening
C O N C L U S I O N
top related