risc-v fault tolerant processor implementation · 2019-09-25 · 3 risc-v • risc-v is an...

56
RISC-V fault tolerant processor implementation Alfonso Sánchez-Macián ARIES Research Center

Upload: others

Post on 16-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

RISC-V fault tolerant processor

implementation

Alfonso Sánchez-Macián

ARIES Research Center

Page 2: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

2

Agenda

• Project and ISA.

• Error sources and error protection.

• Characterizing and protecting the ISA.

• Implementations. Fault Tolerance.

• Characterizing implementations.

• Example of TMR protection.

• Example of module protection: TLB.

• Example of protection at RTL level: Register Set.

Page 3: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

3

RISC-V

• RISC-V is an Instruction Set Architecture (ISA).

• Open source and free.

• Originally developed in Berkeley.

• Supported by a Foundation with more than 200 members.

• They define the evolution for the specifications and the HW/SW environment.

Page 4: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

4

RISC-V Specifications

• User-level ISA.

• Modular design.

• Privileged ISA.

• Draft.

• Debug.

Page 5: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

5

User-level ISA

• Base: RV32I/RV64I/RV128I. RV32E.

• Extensions:

• M: Integer multiplication and division.

• A: Atomic instructions.

• F: Single-Precision Floating-Point.

• D: Double-Precision Floating-Point.

• Q: Quad-Precision Floating-Point.

• L: Decimal Floating-Point.

• C: Compressed Instructions.

• B: Bit Manipulation.

• J: Dynamically Translated Languages.

• T: Transactional Memory.

• P: Packed-SIMD Instructions.

• V: Vector Operations.

• N: User-Level Interrupts.

Page 6: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

6

User-level ISA

• Each implementation should state which modules are

supported:

• Example: RV32IMAC

• RV32G = RV32IMAFD

Page 7: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

7

Agenda

• Project and ISA.

• Error sources and error protection.

• Characterizing and protecting the ISA.

• Implementations. Fault Tolerance.

• Characterizing implementations.

• Example of TMR protection.

• Example of module protection: TLB.

• Example of protection at RTL level: Register Set.

Page 8: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

8

Error sources

• Processors may suffer from:

• Hard errors (permanent errors)

• Defects during the manufacturing process.

• Processor wear-out.

• Soft errors (temporary errors)

• In memories, the main cause is radiation.

• In logic, there is also:

• Power supply variations wrong gate behavior, crosstalk.

• Temperature Delay

Page 9: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

9

Fault tolerance

• Reducing the error probability or the consequences. • Ex: Interleaving, Scrubbing.

• Error detection • Example: parity

• Error correction • Example: Error correction codes(ECC)

Page 10: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

10

Fault tolerance II

• Radiation Hardening By Software: • Software techniques. For instance: instruction duplication and verification, invariant

monitoring.

• Application based detection. For instance: task duplication.

• Radiation Hardening By Design:

• ISA oriented.

• Hardware oriented:

• HW architecture (For example: TMR, ECCs)

• HW modules Ad-hoc approaches.

• RTL level.

• Radiation Hardening By Process

Page 11: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

11

Fault tolerance III

• Spatial redundancy. • N-Modular Redundancy (NMR).

• Diverse NMR.

• Reduced Precision Redundancy (RPR).

• Temporal redundancy.

• Information redundancy.

Page 12: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

12

Failure modes: ASICs vs. FPGA

• ASICs:

• Voltage spike at a node of a circuit (Single Event Transient - SET) or bit flips

in stored information (Single Event Upset - SEU).

• Solutions are simulated or a prototype is created using FPGAs before

producing the actual circuit.

• Reconfigurable FPGAs: • Configuration memory cross-section. Changes in the circuit and its behavior.

• It can also suffer from errors in the user memory (SEUs) and other resources

(Single Event Functional Interrupt).

• Other type of errors such as Multiple Cell/Bit Upsets

(MCU/MBU), Adjacent Bit Upsets …

Page 13: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

13

Agenda

• Project and ISA.

• Error sources and error protection.

• Characterizing and protecting the ISA.

• Implementations. Fault Tolerance.

• Characterizing implementations.

• Example of TMR protection.

• Example of module protection: TLB.

• Example of protection at RTL level: Register Set.

Page 14: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

14

Errors and program execution

• Based on their effect on the execution of a program.

• Which is the effect of an SEU on an instruction from a

program running on a microprocessors?

• No error (error is masked).

• Hard fault /Memory Access Exception.

• Program hangs.

• Silent Data Corruption (SDC).

Page 15: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

15

Characterizing the ISA

• What does it happen when there is

a bit flip in the binary representation

of the instruction?

J. A. Martínez, J. A. Maestro and P. Reviriego, "Evaluating the Impact of the Instruction Set on Microprocessor Reliability to Soft Errors," in IEEE Transactions on Device and Materials Reliability, vol. 18, no. 1, pp. 70-79, March 2018.

instruction

Page 16: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

16

Characterizing the ISA

• What does it happen when there is a bit flip in the binary

representation of the instruction?

• The instruction turns into a different one (or to an invalid ISA

opcode)

• If the effect is a program fault (hard fault, hang), it is possible to

detect the error. An SDC is the worst outcome.

• Example: changing the instruction into a Load or Store may

produce a memory access exception.

• Instruction encoding differs among different ISAs. It is usually

optimized for its HW implementation.

Page 17: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

17

Characterizing the ISA

Page 18: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

18

Intrinsic protection

Example with ARM Cortex M0

Page 19: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

19

Characterizing the ISA

• Are there bit positions with less probability of producing an SDC

when a bit flip occurs?

• Analysis of SDC rates as a function of the flipped bit.

• RV32G. Bit 3 is the one that produces less SDCs.

SDC rates as a function of the flipped bit

Page 20: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

20

Protecting the ISA

• Parity can be added to detect the error.

• But it requires adding a bit to all the structures that store instructions.

1 bit 32 bits – RV32G

32 bits – RV32G

XOR (parity)

J. A. Martínez, J. A. Maestro and P. Reviriego, "A Scheme to Improve the Intrinsic Error Detection of the Instruction Set Architecture," in IEEE Computer Architecture Letters, vol. 16, no. 2, pp. 103-106, 1 July-Dec. 2017.

Page 21: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

21

Increasing the intrinsic protection

• Alternative: Encode parity into the bit that produces less SDCs.

32 bits – RV32G b3

32 bits – RV32G

XOR (parity)

• The original instruction is recovered applying the same operation.

• If a bit flip occurs, the error is propagated to the bit where the parity is encoded less probability of SDC.

Page 22: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

22

Increasing the intrinsic protection

ISA

SDC rates when applying the proposed technique

Average

Average

Average

Average

Page 23: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

23

Increasing the intrinsic protection

SDC rates as a function of the flipped bit

Page 24: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

24

Agenda

• Project and ISA.

• Error sources and error protection.

• Characterizing and protecting the ISA.

• Implementations. Fault Tolerance.

• Characterizing implementations.

• Example of TMR protection.

• Example of module protection: TLB.

• Example of protection at RTL level: Register Set.

Page 25: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

25

Implementations

• Open vs. proprietary

• Free vs. paying a license fee (IP).

• IoT, mobile devices, workstations, servers, AI, big data…

• Chisel, Verilog, VHDL …

• Cores, SoC platforms, SoCs.

• ISA variants: RV32I, RV64GC, RV32IMC, RV32EC…

Page 26: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

26

Implementations

• And many other. Which one to choose?

Rocket Chip

LowRISC PULPino

BOOM

ORCA

Ariane

https://github.com/riscv/riscv-cores-list

SCR1

GAP8

Page 27: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

27

Implementations

• Some of them are Fault Tolerant:

• SHAKTI-F: SEC-DED for memories + DMR for ALU.

• Technolution (master thesis - Delft). RV32I. ECC+TMR

• Other academic proposals.

• Create a new one?

Page 28: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

28

Implementations

• Choose an existing one considering:

• ISA extensions implemented by the ISA.

• License.

• Community / Support.

• Complexity/ learning curve (e.g. Chisel).

• Debugging environment and other available software.

Page 29: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

29

Agenda

• Project and ISA.

• Error sources and error protection.

• Characterizing and protecting the ISA.

• Implementations. Fault Tolerance.

• Characterizing implementations.

• Example of TMR protection.

• Example of module protection: TLB.

• Example of protection at RTL level: Register Set.

Page 30: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

30

Architectural Vulnerability Factor

• Architectural Vulnerability Factor (AVF): • Probability that a failure in a specific processor structure affects the final

output.

• ACE bits. Identify the processor state bits that may affect the Architecturally

Correct Execution of the program.

• AVF for a structure: percentage of time where ACE bits are stored in the

structure.

• Depends on the benchmark being executed.

S. S. Mukherjee, C. Weaver, J. Emer, S. K. Reinhardt and T. Austin, "A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor," Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36., San Diego, CA, USA, 2003, pp. 29-40.

Page 31: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

31

Instruction Vulnerability Factor

• Instruction Vulnerability Factor (IVF):

• Probability that an error in an instruction affects the final result.

• It also depends on the Benchmark.

A. Azarpeyvand, M. E. Salehi, F. Firouzi, A. Yazdanbakhsh and S. M. Fakhraie, "Instruction reliability analysis for embedded processors," 13th IEEE Symposium on Design and Diagnostics of Electronic Circuits and Systems, Vienna, 2010, pp. 20-23.

Page 32: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

32

Characterizing implementations

• Which benchmarks? Which input data should be selected for the input?

• Generate the “Golden” copy (without errors). Output, processor state…

• Errors in Hard processors.

• Simulate.

• Prototype using an FPGA and emulate user logic and memory errors.

• Radiate the ASIC.

• Errors in Soft processors.

• Simulate.

• Implement into the FPGA and simulate errors in configuration memory (e.g.

using SEM IP) and user logic and memory.

• Radiate the ASIC.

• Compare results with the “Golden” copy.

Page 33: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

33

Characterization. Example.

• LowRISC – RV64G. Version 0.2.

• FPGA: Xilinx Nexys 4 DDR

• Error injection in configuration memory with SEM IP.

• Failure model: Single event upsets.

• Statistical fault injection campaign (99,8% confidence interval with 1,5% error margin).

• Classification of results: correct, hard fault, hang, application output mismatch, architectural state mismatch (output matches).

• Benchmarks: Quicksort, Hanoi towers, Matrix multiplication, Dijkstra, mergesort and FFT

A. Ramos, J.A. Maestro, P. Reviriego, "Characterizing a RISC-V SRAM-based FPGA Implementation against Single Event Upsets Using Fault Injection", Microelectronics Reliability, Elsevier, Vol. 78, November 2017, pp. 205-211.

Page 34: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

34

Characterization. Example (cont.)

Page 35: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

35

Characterization. Example (cont.)

Page 36: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

36

Agenda

• Project and ISA.

• Error sources and error protection.

• Characterizing and protecting the ISA.

• Implementations. Fault Tolerance.

• Characterizing implementations.

• Example of TMR protection.

• Example of module protection: TLB.

• Example of protection at RTL level: Register Set.

Page 37: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

37

Selective TMR

• For soft processors (using LowRISC, same settings that previous

slides).

• Each program uses different resources with different frequency.

• Reduce the use of resources and the power consumption by using

TMR only in the most used resources.

• Create a set of different configurations and reconfigure the FPGA

depending on the subset of programs to be run.

A. Ramos, R. G. Toral, P. Reviriego and J. A. Maestro, "An ALU protection methodology for soft processors on SRAM-based FPGAs," in IEEE Transactions on Computers, 2019 (in press).

Page 38: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

38

Selective TMR. Example: ALU

Page 39: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

39

Selective TMR. Example: ALU

Page 40: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

40

Selective TMR. Example: ALU

Page 41: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

41

Agenda

• Project and ISA.

• Error sources and error protection.

• Characterizing and protecting the ISA.

• Implementations. Fault Tolerance.

• Characterizing implementations.

• Example of TMR protection.

• Example of module protection: TLB.

• Example of protection at RTL level: Register Set.

Page 42: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

42

Translation Lookaside Buffer

• TLBs based on a CAM (content addressable memory) + RAM

approach.

• Cache for virtual to physical page translation. There might be

several levels of cache.

• Querying and retrieving information from the TLB has to be as

fast as possible.

• Parity is used at some TLB levels. ECC codes have a higher

encoding/decoding.

Page 43: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

43

First solution: Shortened Hamming

• Shortening the Hamming code matrix so:

• One of the parity bits only applies to the VPN bits.

• Correction is only executed when an error is detected.

• The other parity bits protect the VPN and PPN together.

• LowRISC:

A. Sánchez-Macián, P. Reviriego and J. A. Maestro, "Combined Modular Key and Data Error Protection for Content-Addressable Memories," in IEEE Transactions on Computers, vol. 66, no. 6, pp. 1085-1090, 1 June 2017.

Page 44: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

44

First solution: Shortened Hamming

Page 45: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

45

First solution: Shortened Hamming

Page 46: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

46

Second solution: MSB for parity

• Parity is stored into the Most Significant Bit (MSB).

• If an SEU occurs, the error is propagated to the MSB, generating

a remote VPN.

• Intrinsic protection increases (against false positives) due to the

spatial locality of the programs.

• Remote VPNs have less probability of being accessed before the

entry in error is evicted.

• If the TLB has already a parity bit it is also possible to provide

protection for double-adjacent errors.

A. Sánchez-Macián, L. A. Aranda, P. Reviriego, V. Kiani and J. A. Maestro, "Enhancing Instruction TLB Resilience to Soft Errors," in IEEE Transactions on Computers, vol. 68, no. 2, pp. 214-224, 1 Feb. 2019.

Page 47: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

47

Second solution: MSB for parity

Page 48: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

48

Second solution: MSB for parity

Page 49: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

49

Second solution: MSB for parity

Page 50: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

50

Second solution: MSB for parity

Page 51: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

51

Agenda

• Project and ISA.

• Error sources and error protection.

• Characterizing and protecting the ISA.

• Implementations. Fault Tolerance.

• Characterizing implementations.

• Example of TMR protection.

• Example of module protection: TLB.

• Example of protection at RTL level: Register Set.

Page 52: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

52

Register Transfer Level Protection

• Take advantage of the resources already used by the system to

provide protection

• E.g. Lowrisc Integer register set in Xilinx FPGAs

• It is a dual port. Operations requiring two operands need them to be

read in the same cycle.

A. Ramos, A. Ullah, P. Reviriego and J. A. Maestro, "Efficient Protection of the Register File in Soft-Processors Implemented on Xilinx FPGAs," in IEEE Transactions on Computers, vol. 67, no. 2, pp. 299-304, 1 Feb. 2018

Page 53: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

53

• Vivado implements the dual port memories duplicating RAM32M primitives.

• One is used for the first operand and the other one for the second.

Register Transfer Level Protection

Page 54: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

54

• Parity is stored next to Block 11. • It is checked during the read operation. • If an operand parity does not match, reading is done from the

other copy. • If both operands use the same register, they are both read

from the same copy. • Use both clock edges.

Register Transfer Level Protection

Page 55: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

55

Register Transfer Level Protection

Page 56: RISC-V fault tolerant processor implementation · 2019-09-25 · 3 RISC-V • RISC-V is an Instruction Set Architecture (ISA). • Open source and free. • Originally developed in

56

• Thank you!

• Questions?