process variation in near-threshold wide simd architectures
DESCRIPTION
Process Variation in Near-threshold Wide SIMD Architectures. Sangwon Seo 1 , Ronald G. Dreslinski 1 , Mark Woh 1 , Yongjun Park 1 , Chaitali Chakrabarti 2 , Scott Mahlke 1 , David Blaauw 1 , Trevor Mudge 1 University of Michigan 1 , Arizona State University 2. Near Threshold Computing. - PowerPoint PPT PresentationTRANSCRIPT
11
1
Process Variation in Near-threshold Wide SIMD Architectures
Sangwon Seo1, Ronald G. Dreslinski1, Mark Woh1, Yongjun Park1,Chaitali Chakrabarti2, Scott Mahlke1, David Blaauw1, Trevor Mudge1
University of Michigan1, Arizona State University2
22
2Near Threshold Computing
Super Threshold high performance
high energy consumption
Near Threshold 10x energy reduction
10x performance degradation
Sub Threshold exponentially decreasing
performance
increasing leakage becomes dominant
2
33
3Near-threshold Computing
Advantage: High energy efficiency
Disadvantage Low performance throughput
Compensated with very wide SIMD architecture
Sensitive to variations in threshold voltage
More critical issues in wide SIMD architectures Increased probability of timing errors
Expensive error recovery mechanisms
3
44
4Near-threshold Computing
Advantage: High energy efficiency
Disadvantage Low performance throughput
Compensated with very wide SIMD architecture
Sensitive to variations in threshold voltage
More critical issues in wide SIMD architectures Increased probability of timing errors
Expensive error recovery mechanisms
How bad is the delay variation in wide SIMD architectures running at near-threshold voltages?
How to mitigate the variation-induced timing errors?
4
55
5Delay Variations in 90nm
5
~2.3x ~1.6x
Uncorrelated variations are averaged out over the chain.
66
6Delay Variations – f(Vdd=0.55V, N)
6
A long chain helps, but the effect diminishes as N increases.
Variations are exacerbated with technology scaling.
77
7Delay Variations – f(Vdd, N=50)
7
LER causes high variations in advanced technology nodes
Strict Design Rules
Metal-Gates w/ high-k material or SOI
Advanced lithography
88
8Delay Distribution – 90nm GP
8
1 critical path delay = delay of a chain of 50 FO4 inverters.
1-wide system delay = max (delays of 100 critical paths )
128-wide system delay = max (delays of 128 1-wide system)
Performance Drop
99
9Variation Effects on 128-wide SIMD Architecture
9
- Structural Duplication- Voltage margining- Frequency margining
1010
10Near-threshold Wide SIMD Architecture: Diet SODA
10
[Seo et al. ISLPED 2010]
1111
11Structural Duplication
11
SIMD Function Unit #7
SIMD Function Unit #6
SIMD Function Unit #5
SIMD Function Unit #4
SIMD Function Unit #3
SIMD Function Unit #2
SIMD Function Unit #1
SIMD Function Unit #0
SIMD Function Unit #9
SIMD Function Unit #8
Crossbar
Datapath#7
Datapath#6
Datapath#5
Datapath#4
Datapath#3
Datapath#2
Datapath#1
Datapath#0
8-wide+2-spare system
Increase number of processing resources
1212
12Structural Duplication
12
SIMD Function Unit #7
SIMD Function Unit #6
SIMD Function Unit #5
SIMD Function Unit #4
SIMD Function Unit #3
SIMD Function Unit #2
SIMD Function Unit #1
SIMD Function Unit #0
SIMD Function Unit #9
SIMD Function Unit #8
Crossbar
Datapath#6
Datapath#6
Datapath#5
Datapath#4
Datapath#3
Datapath#2
Datapath#1
Datapath#0
8-wide+2-spare system
Use the spares if required.
1313
13Structural Duplication – 90nm GP
13
6 spares are required to match the chip delay of baseline architecture.
1414
14Voltage Margining
14
Delay distributions: 45nm PTM model is used
Increase supply voltage
1515
15Frequency Margining
Increase clock period
Applicable for applications with relaxed time constraints
For advanced technology nodes, this is impractical
Caveat
Consider its impact on system
SIMD subsystem clock period (Tclk@NTV)
memory subsystem clock period (Tclk@FV)
15
1616
16Structural Duplication vs. Voltage Margining
16
1717
17Combination of two schemes – 45nm GP
17
128-wide system @ 0.6V
26 spares
17mV boost
5mV + 8 spares
10mV + 2 spares
1818
18Variation-Aware Diet SODA
18
1919
19Conclusions
Near-threshold operation of wide SIMD system can have timing problems due to process variations.
Variation effects on a 128-wide SIMD architecture are marginal for 90nm technology node, but could be non-negligible for current/future technology nodes.
A combination of structural duplication and voltage margining provides a minimal power overhead solution to mitigate variation-induced timing problems in wide SIMD architectures.
19
2020
20Questions?
Thank you!
20
2121
21Backup Slides
21
2222
22Local Spares vs. Global Spares
22
Local Sparing 1 out of 4
(2 spares)
Global Sparing
(2 spares)
+ small overhead
- burst errors
+ burst errors
- Large overhead
2323
23Local Spares vs. Global Spares
23
Global sparing is better than local sparing.
XRAM crossbar supports global sparing.
128 + 8 global spares
128 + 32 local spares(1 out of 4)
2424
24Variation-Aware Diet SODA
24
With little area and power overhead, delay variations can be solved.