landoll david mapld09 pres 1
TRANSCRIPT
-
8/2/2019 Landoll David Mapld09 Pres 1
1/32
Avoiding Metastability inFPGA Devices
MAPLD 2009
David Landoll
Applications Architect
Mentor Graphics Corp.
-
8/2/2019 Landoll David Mapld09 Pres 1
2/32
Todays FPGAs
Onboard Computer
System on Chip /
FPGA
MemoryController
Timers IRQCtrl
UARTDMA
Controller
AHB/APBBridge
AMBA AHB
bit Data bus33
Data Memory Parity Memory
CAN
Controller
HDLC
Controller
SpaceWire
Controller
AMBA APB
FPU
HDLC
Controller
CAN
Switch
Leon CPU
EDAC
Controller
BootPROM
SDRAMSDRAM
Address/Control bus
RS33 SpaceWireHDLC3
Up/DownlinkHDLC3
Up/DownlinkCAN3CAN3
Dual CAN
transceiver
. V Linear11
Regulator
+ . V33
+ . V33 + . V33
CLK
Generator
Configuration
PROM
JTAG
I/O port
PPS outPPS in
DMAController
UART
CPUAMBAAHB
Fabrication advances provide more available silicon area More functionality can weigh less and take up less space Integrating/reusing capabilities lowers cost
2
-
8/2/2019 Landoll David Mapld09 Pres 1
3/32
Flight Management
MaintenanceDiagnostics
Image Processing
Integrated AvionicsProcessing
Flight Control &AvoidanceSystems
Boeing 787: Integrations Next StepFrom its central processor to its common data network,surveillance system and navigation system, the theme ofthe Boeing 787 Dreamliner is integration.
James W. Ramsey Communications
Weather Radar
3
Integration Presents New Challenges
Such integration usually involves multiple independent clock domains,which leads to clock-domain crossings and metastability errors!
-
8/2/2019 Landoll David Mapld09 Pres 1
4/32
Clock Domain Crossing (CDC) ErrorsUnpredictable Loss of Data
CDC problems corrupt control and data signals
are subtle, intermittent, unpredictable
are the 2nd major cause of respins
are difficult to reproduce and debug
are temperature, voltage, and process sensitive
willonly occur in hardware; often in the final design
Traditional verification techniques do notwork for CDCsignals
4
A CDC Verification methodology is needed to reducethe risk of CDC related data errors
4
-
8/2/2019 Landoll David Mapld09 Pres 1
5/32
Metastability
What the heck is it, anyway?
What is a clock? Periodic pulsing signal
Digital logic uniformly connected to this signal
Acts as the Symphony Conductor keeps logic in sync
Action happens across the logic at one specific point Typically the rising edge
55
Vee, Vss: GND: 0V
Vcc, Vdd : +5V, +3.3V
-
8/2/2019 Landoll David Mapld09 Pres 1
6/32
Metastability
What the heck is it, anyway?
Whats in a register? (Also known as a latch, flip-flop, etc)
Contain transistors that trap the input value at the
appropriate time E.g. rising edge of the clock
How does this happen?
66
-
8/2/2019 Landoll David Mapld09 Pres 1
7/32
Metastability
The Physics of a Register
Lets take a look at a register CMOS D-type transmission gate flip-flop
7
CLK
D Q
D
CLK
Q0
0
0
-- simple D-type flip-flopprocess(CLK)
begin
if rising_edge(CLK) then
Q
-
8/2/2019 Landoll David Mapld09 Pres 1
8/32
Metastability
The Physics of a Register
Lets take a look at a register CMOS D-type transmission gate flip-flop
8
CLK
D Q
D
CLK
Q1
0
0
-- simple D-type flip-flopprocess(CLK)
begin
if rising_edge(CLK) then
Q
-
8/2/2019 Landoll David Mapld09 Pres 1
9/32
Metastability
The Physics of a Register
Lets take a look at a register CMOS D-type transmission gate flip-flop
9
CLK
D Q
D
CLK
Q1
1
1
-- simple D-type flip-flopprocess(CLK)
begin
if rising_edge(CLK) then
Q
-
8/2/2019 Landoll David Mapld09 Pres 1
10/32
Transistor Model of a D flip-flop
Metastability
The Physics of a Register
Lets take a look at a register CMOS D-type transmission gate flip-flop
10
CLK
D Q
D
CLK
Q0
0
1
Only works if D has a good value at the rising edge of the clock
(no Set-up/hold time violations)
-- simple D-type flip-flopprocess(CLK)
begin
if rising_edge(CLK) then
Q
-
8/2/2019 Landoll David Mapld09 Pres 1
11/32
Metastability
The Physics of a Register
When setup/hold conditions are violated, the output of astorage element becomes unpredictable
This effect is called metastability If not contained, metastability can propagate
11
din tff
MTBFclk
=3
fclk = Clock Frequency
fin = Input Signal Frequency
td = Duration of critical time window
CLK
D
Q
CLK
D Q
Setup/hold window
Metastability is UNAVOIDABLE in designs with multipleasynchronous clocks
11
-
8/2/2019 Landoll David Mapld09 Pres 1
12/32
Clock Domain Crossing signal
Clock Domain Crossings
Guaranteed to Cause Metastability
When 2 or more designs run on disparate clocks: The clocks will continually skew, guaranteeing setup/hold violations
Signals from one design to another are Clock Domain Crossings (CDCs)
12
D
CLK
Q
Sensor System Guidance System
TxTx
Clock B
Clock AClock A
Setup/hold window
Signals that crossasynchronous clock
domains (CDC signals)WILL violate setup and hold
conditions
12
D
CLK
Q
-
8/2/2019 Landoll David Mapld09 Pres 1
13/32
Mitigating Clock Domain Crossing Issues
Problem: Signals crossing a clock domain will violate set-up/hold
Impact: Control/data signals will be dropped/corrupted Loss of Data
Approaches: Avoid having systems that have multiple clocks
Although sensible, its becoming impossible Design around the problem
Designer can add synchronizers to the design Metastability still happens, but nobody else sees it
E.g. 2DFF, FIFO, etc.
Fences in metastability
13 13
-
8/2/2019 Landoll David Mapld09 Pres 1
14/32
Isolate Metastability: Synchronizers
Designers add synchronizers to reduce the probability ofmetastable signals
Synchronizers are sub-circuits that can prevent metastable valuesfrom being sampled across clock domains
Take unpredictable metastable signals and create predictable behavior
14
-
8/2/2019 Landoll David Mapld09 Pres 1
15/32
Mitigating Clock Domain Crossing Issues
Isolate Metastability: Synchronizers
15
QQ
Clock AClock A Clock BClock B
ii i +1i +1 i +2i +2i -1i -1ii i +1i +1 i +2i +2i -1i -1
TxTx
Metastability window
When metastability occurs, the delay through asynchronizer becomes unpredictable
RxRx
ii i +1i +1 i +2i +2i -1i -1 i +3i +3
15
-
8/2/2019 Landoll David Mapld09 Pres 1
16/32
Invalid Command
Synchronizer Delays Can Reconverge
with unexpected resultswith unexpected results
CDC signals cross with an assumed relationship Can be combinational, sequential, or deeply sequential Unpredictable delays on CDC paths lead to reconvergence errors
Designs need logic to correctly handle reconvergence Can occur on single-bit or multiple-bit signals
16
S1
S2
S3 S4
FSM
Input
tx_d0
tx_d2
tx_d1
Sync 2
Sync 1
Sync 0
000
010
010
000
000
010
Valid Command but delayed
Grey
Encode
r
GreyDecod
er
000
111
111
000
101
111
Sync 1
000
000
111
16
-
8/2/2019 Landoll David Mapld09 Pres 1
17/32
And, Synchronizers Fail if Misused
Synchronization between clockdomains requires a transferprotocol
Ensures data ispredictablytransferred between domains
These protocols mustbe verified When protocol is violated
Data is lost
Simulation may notshow a failure
Silicon willeventually show a
functional error
Synchronizer wont function properly if the requiredTransfer Protocol is violated
17
-
8/2/2019 Landoll David Mapld09 Pres 1
18/32
Verification Must Cover
All Three CDC Problems
18
Clock domain crossings need: Structured synchronization
Transfer protocols
Global reconvergence checking
Reconvergenceproblem
Missing syncproblem Possible protocol
problem
18
-
8/2/2019 Landoll David Mapld09 Pres 1
19/32
Mitigating Clock Domain Crossing Issues
Problem: Signals crossing a clock domain will violate set-up/hold
Impact: Control/data signals will be dropped/corrupted
Approaches: Avoid having systems that have multiple clocks Designer can add synchronizers to the design
Designer-added synchronizers + full CDC verification Assures synchronizers are present and used correctly
19 19
-
8/2/2019 Landoll David Mapld09 Pres 1
20/32
Recommendations
During design planning1. Create systems/designs using 1 clk, 1 edge when possible
2. If multiple clocks are required, try to use 1 designer for both clock
domains, and use coding guidelines
1. Use signal naming conventions
2. Many clock domain errors come from design changes, not the initial design
3. Limit clock domain crossings to specific areas or blocks in the design, when possible.
4. NOTE: These techniques can help assure synchronizers are present, but are unlikely
to help identify reconvergence or CDC protocol issues.
3. When multi-clock design is required, plan for proper verification
1. How to we accomplish this?
20 20
My_sig_Tx
My_data_Tx_RegMy_data_Rx_sync1
My_data_Rx_sync2
Comb_sig_Rx
or Example:
Append _A_reg to signals leaving A-clk register, _A for A-clk combo signals
Leverage during code reviews - help identify missing synchronizers
Make sure ONLY _A_reg signals go to synchronizers (no combo logic)
-
8/2/2019 Landoll David Mapld09 Pres 1
21/32
Verifying CDC Synchronization
Problem: Missing synchronizers will create metastability
Correctly placed but misused synchronizers wont work
Reconvergence of synchronized signals can create unexpected
behavior Approaches: Simulation
Digital logic simulators do NOT model transistor behavior Do not model metastability
21 21
-
8/2/2019 Landoll David Mapld09 Pres 1
22/32
For example
22
Q
D
CLK
Simulation captures a 1 while
silicon produces either a 1 or0
Setup Violation
Q
D
CLK
Hold Violation
Simulation Does NOT Reflect Silicon Behavior
Q in silicon Q in silicon
Simulation captures a 0 while
silicon produces either a 1 or0
22
Q in simulation Q in simulation
-
8/2/2019 Landoll David Mapld09 Pres 1
23/32
Verifying CDC Synchronization
Problem: Missing synchronizers will create metastability
Correctly placed but misused synchronizers wont work
Reconvergence of synchronizedControl logic bugs
Approaches: Simulation Wont model CDCs correctly to detect errors Static Timing Analysis
Can be used to identify signals that cross domains Can be used as input for a manual review ButWont detect missing or incorrectly used synchronizers, or
reconvergence
23 23
-
8/2/2019 Landoll David Mapld09 Pres 1
24/32
Verifying CDC Synchronization
Problem: Missing synchronizers will create metastability
Correctly placed but misused synchronizers wont work
Reconvergence of synchronizedControl logic bugs
Approaches: Simulation Wont model CDCs correctly to detect errors Static Timing Analysis
Identifies signals for manual review, but otherwise useless Manual Design Reviews
Error prone (and very time consuming) Typically only identifies synchronizer structures, misses reconvergence
and invalid sync protocol usage
Evidence suggests at least some synchronizers will be missed
24 24
-
8/2/2019 Landoll David Mapld09 Pres 1
25/32
For ExampleTrivial Reconvergence Error
Reconverging synchronized CDC signals - timing is unpredictable. Need to verify the downstream logic can handle variations Manually identifying the reconvergence is very hard
Manually identifying all possible behaviors is harder
Manually assuring logic will behave correctly typically intractable
25 25
-
8/2/2019 Landoll David Mapld09 Pres 1
26/32
Verifying CDC Synchronization
Problem: Missing synchronizers will create metastability
Correctly placed but misused synchronizers wont work
Reconvergence of synchronizedControl logic bugs
Approaches: Simulation - Wont model CDCs correctly to detect errors Timing Analysis -Identifies signals for review, but otherwise useless
Manual Design Reviews - error prone, incomplete
Lab Verification?
Problem is intermittent, debug is impossible Spice simulation? It *does* model transistors, but
Where will you get the Spice deck? (transistor level model) Would be far too slow on a large FPGA
26 26
-
8/2/2019 Landoll David Mapld09 Pres 1
27/32
Verifying CDC Synchronization
Problem: Missing synchronizers will create metastability Correctly placed but misused synchronizers wont work
Reconvergence of synchronizedControl logic bugs
Approaches: So - we need a new method that reliably: Identifies ALL CDC signals, structures, reconvergence
Assures ALL connected, functioning correctly Creates reports for manual reviews The EDA industry has responded
6 commercial tools now availableand counting Butmost wont identify all 3 of our CDC issues
27 27
-
8/2/2019 Landoll David Mapld09 Pres 1
28/32
Mentors CDC Verification Technology
Whos using our technology?Mil-Aero Honeywell, Inc. L-3 Communications Lockheed Martin Co Ministry of Aerospace & Aeronautics
Northrop Grumman Corp Raytheon Rockwell Collins Inc. SAAB Group Thales Commercial Widely used in commercial space
The market leader in CDC verificationThe market leader in CDC verification
28 28
-
8/2/2019 Landoll David Mapld09 Pres 1
29/32
Example Value from One Customer
Design IEEE standard serial communications core Used in 50-60 other COMMERCIAL ASIC products Widely deployed (millions in use daily) Placed core in a sensor guidance system Found issues in the lab Debugged FPGA for weeks Suspected a CDC issue, but not sure
Deployed Mentors CDC solution Results same day Found 199 serious CDC bugs!
45 Missing Synchronizers
83 Incorrect Synchronizers
76 Reconverging Signals
11 other problems Most resulting from more stressful usage In production: Commercial ASIC : Customer issue device is erratic, locks up Avionics: Could result in an Airworthiness Directive
29 29
S
-
8/2/2019 Landoll David Mapld09 Pres 1
30/32
Summary
Recommendations
During design planning
1. Create systems/designs using 1 clk, 1 edge when possible
2. If multiple clocks are required, try to use 1 designer for all clock domains
3. When multi-clock design is required, plan for proper verification
During verification
1. Watch for multiple clocks in designs (Tip Count PLLs)
2. Ask how CDC issues are mitigated (remember there are 3)
Utilize commercial tools designed for detecting these problems
1. Verify all 3 classes of CDC problems
1. Structural Verification
2. Protocols Verification
3. Reconvergance Verification
2. Use reports to aid manual reviews
3. Use CDC tools to support ROBUSTNESS
30 30
-
8/2/2019 Landoll David Mapld09 Pres 1
31/32
In Conclusion
Every multi-clock design is subject to metastability Traditional verification methodologies CANNOT assurerobustness
To properly mitigate the dangers of CDC, we stronglyrecommend a solution that :
Supports Manual Reviews
Automatically reports all sources of CDC problems
Has a proven CDC verification methodology &
customer success
31 31
-
8/2/2019 Landoll David Mapld09 Pres 1
32/32
32