using hardware vulnerability factors to enhance avf analysis vilas sridharan ras architecture and...
TRANSCRIPT
Using Hardware Vulnerability Factors to Enhance AVF Analysis
Vilas SridharanRAS Architecture and Strategy
AMD, Inc.
International Symposium on Computer ArchitectureJune 23, 2010
David R. KaeliECE Department
Northeastern University
2
What is this talk about?
Transient faults Cause data corruption without damage to the underlying device Modeled as a bit flip in the microarchitecture (0 1 or vice versa)
Vulnerability analysis Determines which faults matter and which do not Allows us to make informed decisions about which structures to protect We do this today using the Architectural Vulnerability Factor (AVF)
This talk focuses primarily on the techniques For results, please refer to the paper
3
Architectural Vulnerability Factor (AVF)
The fraction of bits in a hardware structure H that, when corrupted, will result in incorrect program output (an error) These bits are required for Architecturally Correct Execution (ACE bits) Other bits are unACE bits
AVFH ACE bits in H at cycle n
n0
N
BH N
BH: Size in bitsN: Number of cycles
S. S. Mukherjee et al., Int’l Symposium on Microarchitecture, Dec 2003
4
Motivating Example
Constant workload / Variable microarchitecture Variable workload / Constant microarchitecture
AVF depends on hardware and on software
This talk focuses on quantifying hardware vulnerabilityV. Sridharan and D. R. Kaeli, Int’l Symposium on High Performance Computer Architecture, Feb 2009
5
Outline
Introduction Quantifying Hardware Vulnerability Using HVF for Microarchitectural Exploration Estimating AVF at Runtime Conclusions
6
A Typical System
AVF
TVF
7
The System Vulnerability Stack
Timing VF
Program VF
Operating System VF
Virtual Machine VF
Hardware VF
ABI
ISA
ISA
Functional
VF = Vulnerability FactorISA = Instruction Set ArchitectureABI = Application Binary Interface
8
Fault Visibility
Physical Registers
Physical Memory
Hardware-visible state
Process (Virtual) Memory
Program-visible state
Reorder Buffer Issue Queue
Load Buffer Store Buffer
Architected Registers
Hardware-visible fault
Hardware-visible faultProgram-visible fault
9
Issue Queue
Masked faultExposed faultHardware-visible fault
Consequences of a Visible Fault
Physical Registers
Physical Memory
Process (Virtual) Memory
Reorder Buffer
Load Buffer Store Buffer
Architected Registers
Hardware-visible fault
Program-visible fault
Activated fault
10
Hardware Vulnerability Factor
The fraction of activated and exposed hardware-visible faults in hardware structure H These faults that cause a perturbation of the ISA Masked hardware-visible faults do not contribute to HVF
€
HVFH =Activated and exposed faults in H at cycle n
n =0
N
∑BH × N
BH: Size of H in bitsN: Number of cycles
11
Outline
Introduction Quantifying Hardware Vulnerability Using HVF for Microarchitectural Exploration Estimating AVF at Runtime Conclusions
12
Using HVF for Microarchitectural Exploration
Full AVF analysis is possible at hardware design time Software workloads are available at design time
Can HVF help? Provides additional insight to hardware designers Accelerates AVF simulation
13
Additional Insight Generated by HVF
0 1 2 3 4 5 6 7 8 9 10Cycle
Write
Write
Write
Write
Read
P1
P2
P3
P4
(Live)
Read(Dead)
Read(Dead)
Read(Dead)
AVF = 10%
14
Additional Insight Generated by HVF
0 1 2 3 4 5 6 7 8 9 10Cycle
Write
Write
Write
Write
Read
P1
P2
P3
P4
Read
Read
Read
HVF = 40%HVF = 70%
15
Insight from HVF: Real-World Example
equake mgrid
Regions of similar register usage
AVF ≈ 8% AVF ≈ 15%
16
Outline
Introduction Quantifying Hardware Vulnerability Using HVF for Microarchitectural Exploration Estimating AVF at Runtime Conclusions
17
Estimating AVF at Runtime
Allows a system to adapt to changing vulnerability environment Enable redundancy when AVF is high Increase performance when AVF is low
Prior predictors don’t let software designers influence AVF estimate Predictors are entirely encoded in hardware Rely on training benchmarks or invariants (e.g., stored data is vulnerable) Assumptions fall apart in atypical programs (e.g., SW redundancy, games)
We split AVF estimation into HVF and PVF components Allow software designers to measure PVF using a profiling step Estimate HVF in hardware at runtime using an HVF Monitor Unit < 3% error between measured and estimated AVF (see paper for details)
18
Summary
Transient faults are a challenge for all processor manufacturers AVF analysis is a key part of understanding transient fault behavior
HVF quantifies hardware vulnerability to transient faults HVF provides additional insight to hardware designers HVF simulation can accelerate AVF modeling during hardware design
Runtime AVF estimation can be split into HVF and PVF components Software designers can influence runtime AVF estimates
HVF generates meaningful insight intosystem vulnerability to transient faults
Using Hardware Vulnerability Factors to Enhance AVF Analysis
Questions?
20
References
V. Sridharan and D. R. Kaeli, Using Hardware Vulnerability Factors to Enhance AVF Analysis, Int’l Symp. on Computer Architecture (ISCA-37), June 2010.
V. Sridharan and D. R. Kaeli, Eliminating Microarchitectural Dependency from Architectural Vulnerability, Int’l Symp. on High-Performance Computer Architecture (HPCA-15), February 2009.
A. Dixit et al., Trends from Ten Years of Soft Error Experimentation, Workshop on Silicon Errors in Logic – System Effects, March 2009.
V. Sridharan and D. R. Kaeli, The Effect of Input Data on Program Vulnerability, Workshop on Silicon Errors in Logic – System Effects (SELSE-5), March 2009.
V. Sridharan and D. R. Kaeli, Reliability in the Shadow of Long-Stall Instructions, Workshop on Silicon Errors in Logic – System Effects (SELSE-3), April 2007.
R. Baumann, Radiation-Induced Soft Errors in Advanced Semiconductor Technologies, IEEE Trans. On Device and Materials Reliability, September 2005.
S. S. Mukherjee et al., A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor, Int’l Symp. on Microarchitecture (MICRO-36), December 2003.
P. Roche et al., Comparisons of Soft Error Rate for SRAMs in Commercial SOI and Bulk Below the 130-nm Technology Node, IEEE Trans. on Nuclear Science, December 2003.
J. D. Dirk et al., Terrestrial Thermal Neutrons, IEEE Trans. On Nuclear Science, December 2003.