trading fault tolerance for performance in an encoding · an encoding–cont’d q make a program...
TRANSCRIPT
![Page 1: Trading Fault Tolerance for Performance in AN encoding · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding:](https://reader035.vdocuments.us/reader035/viewer/2022070802/5f02b0517e708231d40583e6/html5/thumbnails/1.jpg)
Trading Fault Tolerance for Performance in AN encoding
Norman A. Rink and Jeronimo CastrillonTechnische Universität [email protected]
Computing Frontiers15-17 May 2017Siena, Italy
![Page 2: Trading Fault Tolerance for Performance in AN encoding · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding:](https://reader035.vdocuments.us/reader035/viewer/2022070802/5f02b0517e708231d40583e6/html5/thumbnails/2.jpg)
Outline
1. Introduction – hardware faults and software-based fault tolerance
2. AN encoding
3. Configurable compiler-implemented AN encoding
4. Fault experiments and results
5. Summary and outlook
2
![Page 3: Trading Fault Tolerance for Performance in AN encoding · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding:](https://reader035.vdocuments.us/reader035/viewer/2022070802/5f02b0517e708231d40583e6/html5/thumbnails/3.jpg)
Outline
1. Introduction – hardware faults and software-based fault tolerance
2. AN encoding
3. Configurable compiler-implemented AN encoding
4. Fault experiments and results
5. Summary and outlook
3
![Page 4: Trading Fault Tolerance for Performance in AN encoding · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding:](https://reader035.vdocuments.us/reader035/viewer/2022070802/5f02b0517e708231d40583e6/html5/thumbnails/4.jpg)
Hardware faults and soft errors
4
q Faults are a long-standing and recurring issue.q In safety-critical embedded devices.q In servers/data centers and HPC workloads.
q What causes hardware faults?q Cosmic radiation.q “Dim silicon” (near-threshold computing to save energy).q Temperature variations, process variations.
à Typically lead to transient hardware faults, aka soft errors.
q Non-negligible fault rates in emerging computing paradigms: q Silicon-/carbon-based nano-technologies, graphene.q Chemical information processing, bio-inspired/quantum computing, NVM.
taken from S. Borkar, “Designing reliable systems from unreliable components: …,” IEEE Micro, vol. 25, no. 6, 2005.
![Page 5: Trading Fault Tolerance for Performance in AN encoding · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding:](https://reader035.vdocuments.us/reader035/viewer/2022070802/5f02b0517e708231d40583e6/html5/thumbnails/5.jpg)
Hardware faults and soft errors – cont’d
5
cost
(run
time,
ene
rgy)
error probability/output degradation
approximate computing applications,e.g. image processing, machine learning
safety-/security-critical applications,e.g. automotive, operating system (kernels)
non-critical user applications, processes that can be restarted
q The trade-off of computing with faulty hardware:
à Hardware-implemented fault tolerance is not flexible.
future and emerging HW (perhaps)
![Page 6: Trading Fault Tolerance for Performance in AN encoding · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding:](https://reader035.vdocuments.us/reader035/viewer/2022070802/5f02b0517e708231d40583e6/html5/thumbnails/6.jpg)
Error detection through redundant code
q Typical approach to detecting/correcting hardware faults:q Replicate data flow (duplication is sufficient for error detection).q Insert checks.
6
%3 = add i64 %0, %1%4 = mul i64 %3, %2
%3 = add i64 %0, %1%r3 = add i64 %r0, %r1%4 = mul i64 %3, %2%r4 = mul i64 %r3, %r2
%f0 = icmp eq i64 %4, %r4br i1 %f0, label continue,
label recover
transformation
original code
fault-tolerant code
duplicated dataflow
error check
runtime overhead of duplicated dataflow typically < 2x
q State of the art: EDDI ’02, SWIFT ’05.q Many variations and improvements exist.q It is usually assumed that the memory system is already protected, typically by ECC.
![Page 7: Trading Fault Tolerance for Performance in AN encoding · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding:](https://reader035.vdocuments.us/reader035/viewer/2022070802/5f02b0517e708231d40583e6/html5/thumbnails/7.jpg)
Outline
1. Introduction – hardware faults and software-based fault tolerance
2. AN encoding
3. Configurable compiler-implemented AN encoding
4. Fault experiments and results
5. Summary and outlook
7
![Page 8: Trading Fault Tolerance for Performance in AN encoding · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding:](https://reader035.vdocuments.us/reader035/viewer/2022070802/5f02b0517e708231d40583e6/html5/thumbnails/8.jpg)
AN encoding
q Correctness condition: Integer values are multiples of a fixed constant A.à Check for faults like this:
if (n % A != 0) { exit(AN_ERROR_CODE); }
q Advantages over replication:q Data in memory is encoded (automatically protected): no need to replicate loads/stores.q Suitable for multi-threaded and shared memory applications.
q Disadvantages: Large runtime overheads (up to and over several 10x).q Checking is expensive (modulo operation).q Decoding is expensive (division by A).q Decoding often required, e.g., for address operands.
8
Trade frequencies of these operations for runtime.
aside:moreeagervariantsofANencodinghaveevenlargeroverheads
![Page 9: Trading Fault Tolerance for Performance in AN encoding · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding:](https://reader035.vdocuments.us/reader035/viewer/2022070802/5f02b0517e708231d40583e6/html5/thumbnails/9.jpg)
AN encoding – cont’d
q Make a program fault-tolerant by transforming it into an AN-encoded program.
1. Value encoding:
2. Operation encoding, e.g.:
3. Check insertion:q Where non-encoded values enter/leave the scope of AN encoding:
§ memory accesses, calls to external functions, return values
à Often referred to as synchronization points.9
%3 = mul i64 %0, %1 %t3 = mul i64 %0, %1%3 = div i64 %t3, A
multiplication
%1 = load i64* %0 %t0 = ptrtoint i64* %0 to i64%t1 = div i64 %t0, A%t3 = inttoptr i64 %t1 to i64*%1 = load i64* %t3
load
integerconstantc integerconstantc*A
![Page 10: Trading Fault Tolerance for Performance in AN encoding · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding:](https://reader035.vdocuments.us/reader035/viewer/2022070802/5f02b0517e708231d40583e6/html5/thumbnails/10.jpg)
Outline
1. Introduction – hardware faults and software-based fault tolerance
2. AN encoding
3. Configurable compiler-implemented AN encoding
4. Fault experiments and results
5. Summary and outlook
10
![Page 11: Trading Fault Tolerance for Performance in AN encoding · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding:](https://reader035.vdocuments.us/reader035/viewer/2022070802/5f02b0517e708231d40583e6/html5/thumbnails/11.jpg)
Configurable compiler-implemented AN encoding
11
q optional pre-optimizationsq AN encoding increases the complexity of code.q Hence, after AN encoding, the compiler may fail to spot
opportunities for optimization.
q value encoding, operation encoding, check insertionq As previously discussed.q Remember, checks are inserted as so-called synchronization points.
q expansionq Turn encoding, decoding and checking operations into (sequences)
of native machine operations.
![Page 12: Trading Fault Tolerance for Performance in AN encoding · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding:](https://reader035.vdocuments.us/reader035/viewer/2022070802/5f02b0517e708231d40583e6/html5/thumbnails/12.jpg)
Encoding variants
12
q check insertionq Always check before values are stored to memory.q Always check before values are decoded.
codegeneratedforthevariantsduringcompilation
onelocalaccumulatorperfunction
![Page 13: Trading Fault Tolerance for Performance in AN encoding · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding:](https://reader035.vdocuments.us/reader035/viewer/2022070802/5f02b0517e708231d40583e6/html5/thumbnails/13.jpg)
Runtime overheads
13
label test case
A-C bubblesort
D CRC (cyclic redundancy checker)
E DES encryption
F Dijkstra
G-I lex, parse, eval
label test case
J Fibonacci
K integer matrix multiply
L memcopy
M-O quicksort
bestresultswithpre-optimizations
![Page 14: Trading Fault Tolerance for Performance in AN encoding · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding:](https://reader035.vdocuments.us/reader035/viewer/2022070802/5f02b0517e708231d40583e6/html5/thumbnails/14.jpg)
Outline
1. Introduction – hardware faults and software-based fault tolerance
2. AN encoding
3. Configurable compiler-implemented AN encoding
4. Fault experiments and results
5. Summary and outlook
14
![Page 15: Trading Fault Tolerance for Performance in AN encoding · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding:](https://reader035.vdocuments.us/reader035/viewer/2022070802/5f02b0517e708231d40583e6/html5/thumbnails/15.jpg)
Fault injection
15
q Assumptions:q Only a single fault affect program execution.q Only single bit flips occurs.
q Simulate symptoms of faults by …q … flipping a random bit in a random register.q … flipping a rondom bit in a random memory location (that is accessed).
q Evaluate AN encoding on the conditional probability:
Commonly justified by the rarity of faults.(SEU – single event upset)
faultsinmemoryfaultsintheCPU
pSDC = P( ”silent data corruption” | ”hardware fault (visible at the architecture level)” )
![Page 16: Trading Fault Tolerance for Performance in AN encoding · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding:](https://reader035.vdocuments.us/reader035/viewer/2022070802/5f02b0517e708231d40583e6/html5/thumbnails/16.jpg)
Faults in the CPU – full results
16
AA
.1A
.2A
.3A
.po.1
A.p
o.2
A.p
o.3 B
B.1
B.2
B.3
B.p
o.1
B.p
o.2
B.p
o.3 C
C.1
C.2
C.3
C.p
o.1
C.p
o.2
C.p
o.3 D
D.1
D.2
D.3
D.p
o.1
D.p
o.2
D.p
o.3 E
E.1
E.2
E.3
E.p
o.1
E.p
o.2
E.p
o.3 F
F.1
F.2
F.3
F.p
o.1
F.p
o.2
F.p
o.3 G
G.1
G.2
G.3
G.p
o.1
G.p
o.2
G.p
o.3 H
H.1
H.2
H.3
H.p
o.1
H.p
o.2
H.p
o.3
II.
1I.
2I.
3I.
po.1
I.p
o.2
I.p
o.3 J
J.1
J.2
J.3
J.p
o.1
J.p
o.2
J.p
o.3 K
K.1
K.2
K.3
K.p
o.1
K.p
o.2
K.p
o.3 L
L.1
L.2
L.3
L.p
o.1
L.p
o.2
L.p
o.3 M
M.1
M.2
M.3
M.p
o.1
M.p
o.2
M.p
o.3 N
N.1
N.2
N.3
N.p
o.1
N.p
o.2
N.p
o.3 O
O.1
O.2
O.3
O.p
o.1
O.p
o.2
O.p
o.3
rela
tive
fre
qu
en
cie
s
0.0
0.2
0.4
0.6
0.8
1.0
correct detected hang crash sdc
pSDC
![Page 17: Trading Fault Tolerance for Performance in AN encoding · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding:](https://reader035.vdocuments.us/reader035/viewer/2022070802/5f02b0517e708231d40583e6/html5/thumbnails/17.jpg)
Faults in the CPU and in memory – SDC
17
A B C D E F G H I J K L M N O means
p sdc
0.00
0.05
0.10
0.151 2 3 po.1 po.2 po.3
Memory:
pSDC · 103:q Memory accesses are relatively rare.q A single vulnerable has a much stronger effect.q Stack accesses have been found to be
particularly vulnerable.
A B C D E F G H I J K L M N O means
p sdc
0.00
0.02
0.04
0.06
0.081 2 3 po.1 po.2 po.3
CPU:
24x
55x
7x
15x
![Page 18: Trading Fault Tolerance for Performance in AN encoding · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding:](https://reader035.vdocuments.us/reader035/viewer/2022070802/5f02b0517e708231d40583e6/html5/thumbnails/18.jpg)
Outline
1. Introduction – hardware faults and software-based fault tolerance
2. AN encoding
3. Configurable compiler-implemented AN encoding
4. Fault experiments and results
5. Summary and outlook
18
![Page 19: Trading Fault Tolerance for Performance in AN encoding · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding:](https://reader035.vdocuments.us/reader035/viewer/2022070802/5f02b0517e708231d40583e6/html5/thumbnails/19.jpg)
Summary and outlook
19
q On less reliable HW, some applications …q … can live with the occasional error (approximate computing).q … may require additional measures to tolerate faults (e.g. OS kernels).
q Encoding is an interesting alternative to instruction duplication.q Protection of entire computing systems, multi-threaded and shared memory applications.
q AN encoding has large runtime overheads (several 10x to 100x).q Here: reduction from 9.9x to 3.6x – accompanied by an increase in pSDC.q Further reduction to 2.1x if pointers are not encoded.
q Can we design HW that supports error detection by encoding?q Must understand better the interaction of encoding with compiler optimizations.q In assessments, would like to avoid long-running fault injection experiments.
![Page 20: Trading Fault Tolerance for Performance in AN encoding · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding:](https://reader035.vdocuments.us/reader035/viewer/2022070802/5f02b0517e708231d40583e6/html5/thumbnails/20.jpg)
Trading Fault Tolerance for Performance in AN encoding
Norman A. Rink and Jeronimo CastrillonTechnische Universität [email protected]
Work supported by the German Research Foundation (DFG) within the Cluster of Excellence ‘Center for Advancing Electronics Dresden’ (cfaed). Thank you.
![Page 21: Trading Fault Tolerance for Performance in AN encoding · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding:](https://reader035.vdocuments.us/reader035/viewer/2022070802/5f02b0517e708231d40583e6/html5/thumbnails/21.jpg)
Trading Fault Tolerance for Performance in AN encoding
Norman A. Rink and Jeronimo CastrillonTechnische Universität [email protected]
Back upComputing Frontiers15-17 May 2017Siena, Italy
![Page 22: Trading Fault Tolerance for Performance in AN encoding · AN encoding–cont’d q Make a program fault-tolerant by transforming it into an AN-encoded program. 1. Value encoding:](https://reader035.vdocuments.us/reader035/viewer/2022070802/5f02b0517e708231d40583e6/html5/thumbnails/22.jpg)
Worked example of AN encoding
22
%1 = load i64* %0%2 = add i64 %1, 42
%3 = call i64 @foo(i64 %2)store i64 %3, i64* %0
%1 = load i64* %0%2 = add i64 %1, 126
%3 = call i64 @foo(i64 %2)store i64 %3, i64* %0
value encoding
*p = foo( (*p) + 42 );
%t0 = ptrtoint i64* %0 to i64%t1 = div i64 %t0, 3%t2 = inttoptr i64 %t1 to i64*%1 = load i64* %t2
%2 = add i64 %1, 126
%t3 = div i64 %2, 3%t4 = call i64 @foo(i64 %t3)%3 = mul i64 %t4, 3
%t5 = ptrtoint i64* %0 to i64%t6 = div i64 %t5, 3%t7 = inttoptr i64 %t6 to i64*
store i64 %3, i64* %t7
operation encoding
call void @check(i64 %0)%t0 = ptrtoint i64* %0 to i64%t1 = div i64 %t0, 3%t2 = inttoptr i64 %t1 to i64*%1 = load i64* %t2call void @check(i64 %1)
%2 = add i64 %1, 126
call void @check(i64 %2)%t3 = div i64 %2, 3%t4 = call i64 foo(i64 %t3)%3 = mul i64 %t4, 3
call void @check(i64 %0)%t5 = ptrtoint i64* %0 to i64%t6 = div i64 %t5, 3%t7 = inttoptr i64 %t6 to i64*
call void @check(i64 %3)store i64 %3, i64* %t7
check insertion
(A = 3)