FPGA IP Verification for Use in Severe Environments
2005 MAPLD International Conference
September 2005
Paper #237
Ian Land
Ian Bryant
MAPLD 2005/237Land 2
Trends
Smaller geometries allow more functions
Synthesizable HDL makes design-reuse practical
Gate-level design is difficult with high density Resource-intensive Takes a long time Increases likelihood of error
Thus, block-level design is needed Intellectual property (IP) reduces effort and risk, if done right…
A robust design process is followed, with thorough verification IP is proven in many applications, including space & severe
environments A MIL-STD-1553 example demonstrates
MAPLD 2005/237Land 3
Robust Design Process
Structured design flow should be phase-gated Proposal
Justification for development and creation of the project plan Definition and Planning
Preliminary datasheet creation defining the core Test plan is needed
Development The core is implemented and deliverables are created
Verification and Validation Testing against plan and specification (ie. MIL-STD-1553; PCI)
Release Release of product for volume sales
Configuration Management, Feedback and Revision
Phase Gate
MAPLD 2005/237Land 11
MIL-STD-1553 Example Actel has developed three products
A full-featured BC, RT, MT A ‘simple’ bus controller A ‘simple’ remote terminal
Highlight: the simple remote terminal, Core1553BRT Originally released in 2002 (first production August, 2002)
12 and 16 MHz version Updated for minor changes in 12/2002
Loop back test, version text in code, etc. Updated for Verilog translation issue in April 2004 Updated in 9/2004 and 11/2004 to work with design tool updates Revised to include 20 and 24 MHz versions in January 2005
Manchester encoders/decoders tested as part of full-featured BC, RT, MT ProASIC3/E FPGA Family support added
MAPLD 2005/237Land 12
Mil-STD-1553 RT DevelopmentProposal
Substantial customer demand for MIL-STD-1553 bus interface Review of specification and competitive products suggested we
could improve market offerings with rad-tolerant 1553 FPGA
Definition MIL-STD-1553 Specification Preliminary datasheet highlighting the features in the proposal
Development Developed remote terminal Paid careful attention to Manchester encoder/decoder blocks
that would be re-used across product family Built two testbenches
Verification – runs full set of tests and mimics validation User – runs fewer tests for incorporation into larger system design
MAPLD 2005/237Land 13
RT Development, p.2 Verification and Validation
Stable, tested code with reviewed test results Check corner cases and key parameters
Make sure parity errors injected on every bit 12 and 16 MHz; 12 is the harder case due to clock extraction Tested against existing MIL-STD-1553 COTS tester and
Certified Development Kit at Test Systems, Inc. Completely for 16 MHz and partially for 12 MHz Validated Core1553 Evaluation Board
This is important to use with the verification test bench for future updates
Release gives first-rate integration Core builds complete, board release, release note, user guide, data
sheet, certification papers
Solution improves integration Developed application note, reference design and example designs
since 2002
MAPLD 2005/237Land 14
Updates for Speed and Space
Added 20 and 24 MHz in early 2005 (v2.2) Manchesters validated in full-featured BC, RT, MT core Moved CLKSPD generic to 2-bit input port
Allows single netlist to support four frequencies Modified top-level and backend timers Updated test benches for 20 and 24 MHz and port maps Fixed erroneous SYNCOUT pulses
Occur with some non-Actel transmitters on the bus
Updating for space in late 2005 (v3.0) Protect the core from entering illegal states Hardware test for a babbling transmitter Re-qualify the core at Test Systems, Inc.
MAPLD 2005/237Land 15
Severe Environment Considerations Level 3 verification minimum; level 4 validation
MIL-STD-1553 cores have 3rd-party review at Test Systems, Inc. Requires a validation report review - actions and responses Have a certification envelope - test VHDL & Verilog versions at different speeds
Have exceptional documentation and support Tool flow documented with versions for exact design replication Minimize possibility of integration engineer problems
High coverage standards and well-explained variances Code coverage target of 100% for RTL Consider using error detection and correction for memory
Protect the core from entering illegal states and memory upsets Synplicity default could lock if SEU upset Adds redundancy and reduces risk Use EDAC for memory
Avoid the possibility of a babbling transmitter Can occur if failure of redundant system
Continuously investigate other means to improve quality Over-sampling The need for incorporating DO-254
MAPLD 2005/237Land 17
MIL-STD-1553B Tool Issues
Limit tools and document for validated cores Version 3.0 core will be qualified in hardware with
Synplicity 8.1 used for synthesis Designer 6.2 used for layout ModelSim 6.0c Actel OEM used for simulation
So is what happens if a customer uses Exemplar, or even Synplicity 7.71 The qualification is not repeatable…
The customer still needs to qualify their system
IP vendors should document what tool versions are used for qualified IP cores to be used in severe environments for Repeatability Re-use
MAPLD 2005/237Land 18
Code Coverage
A way to prove that the test benches actually test all the designed in functions Allows to verify that all lines of code covered Today’s tools allow
Statement coverage Branch coverage Condition Coverage Expression Coverage Toggle Coverage
BUT Does not guarantee that the design actually implements the
specification Both the core and testbench may not include a function
MAPLD 2005/237Land 19
Core1553BRT Code Coverage Modular core design allows us to create tests to exercise a
particular portion of code
Verification Testbench reaches >99% Non covered lines are inspected and verified, typically conversion
functions or branches in code that are coded purely for safety
MAPLD 2005/237Land 20
Coverage is Actually 100%Branch coverage does not show 100%, but it is.
The reason is that we have safe coding, that checks conditions before it does stuff, these conditions are always true but the code is better and safer with these statements. Some others are
when INIT => case MUXSEL is
when "000" => DSTATE <= WRITE0; -- RX Mode Code
when "010" => DSTATE <= TXSTAT; -- TX Mode Code
when "001" => DSTATE <= WRITE0; -- RX Data Transfer
when "011" => DSTATE <= TXSTAT; -- TX Data Transfer
when "100" => DSTATE <= WRITE0; -- Bcast RX Mode Code
when "110" => DSTATE <= MSGSTAT; -- Bcast TX Mode Code
LATCHSW <= '1';
when "101" => DSTATE <= WRITE0; -- Bcast RX Data Transfer
when "111" => DSTATE <= MSGSTAT; -- Bcast TX Data Transfer
LATCHSW <= '1';
when others =>
end case;
We never do the others, because we list valid states 0-7 above, but the VHDL language requires us to cover all possible states including "ZZZ" in std_logic, this could be rewritten as -- which would give 100% coverage but whose meaning is not so obvious !
when INIT => case MUXSEL is
when "000" => DSTATE <= WRITE0; -- RX Mode Code
when "010" => DSTATE <= TXSTAT; -- TX Mode Code
when "001" => DSTATE <= WRITE0; -- RX Data Transfer
when "011" => DSTATE <= TXSTAT; -- TX Data Transfer
when "100" => DSTATE <= WRITE0; -- Bcast RX Mode Code
when "110" => DSTATE <= MSGSTAT; -- Bcast TX Mode Code
LATCHSW <= '1';
when "101" => DSTATE <= WRITE0; -- Bcast RX Data Transfer
when others => DSTATE <= MSGSTAT; -- Bcast TX Data Transfer
LATCHSW <= '1';
end case;
There is a trade off here between coverage and readability In the first example its understandable what the 111 condition does,
no so in the second ? They synthesize to the same circuit
MAPLD 2005/237Land 21
CoverageFrom 99% to 100%
Getting the last 1% of coverage is time consuming Especially in designs that include lots of error detection and
recovery logic Often in attempting to do this you will by accident force the
design into an unexpected state that highlights an issue
Core1553BRT In going from 99% to 100% we discovered that when we are
transmitting and verifying the loop backed data - if the last word of a burst (Data or Status) contained all zeros and a Manchester error was introduced by the transceiver then we did not detect the error We did detect just Manchester errors We did detect just data errors
Additional tests now added to test benches to verify this in all future releases.
MAPLD 2005/237Land 22
Safe State Machines Although space FPGA’s incorporate redundancy though triple
flip flops and voting, RTL code also needs to be safe
Commercial FPGA synthesis tools can generate ‘unsafe’ state machines Optimized for small area or speed One - hot state machines by default Some have option of Safe State machines
Make sure all illegal states are covered
BUT HOW DO YOU PROVE IT IS SAFE? For example, beware of hidden illegal conditions in the code like
counters that count to a value and reset What happens if the count toggles to a value > the reset condition?
In reality - design redundancy in and test it Fix the state encoding Synthesis tool independent Make test benches to force illegal states
MAPLD 2005/237Land 23
Safe State MachinesDesign Hard Code states using bit_vectors
Make sure all 2**N values specified In the Case statement
Do not use others clause, list all states. Simulator will warn if you’ve forgotten any states
Using bit_vector means that you need not worry about the ‘X’ and ‘Z’ branches in the case
In Illegal States Clear critical signals
e.g. Transmit enable Send FSM back to IDLE state
Create a FSM_ERROR output One for each state machine
Synthesis Make sure state registers are not
duplicated, if they are you may not detect the illegal state
Make sure any FSM optimization in the Synthesis tool is disabled
-- RT Data word transfers signals
-- Hard encoded for safe state machines
signal DSTATE : bit_vector(3 downto 0);
constant IDLE : bit_vector(3 downto 0) := "0000";
…..
constant ALLDONE : bit_vector(3 downto 0) := "1100";
constant UNUSED0 : bit_vector(3 downto 0) := "1101";
constant UNUSED1 : bit_vector(3 downto 0) := "1110";
constant UNUSED2 : bit_vector(3 downto 0) := "1111";
attribute syn_preserve of DSTATE : signal is true;
attribute syn_encoding of DSTATE : signal is "orginal";
attribute syn_replicate of DSTATE : signal is false;
Case DSTATE is
….
when UNUSED0 | UNUSED1 | UNUSED2 =>
FSMD_ERROR <= '1';
DSTATE <= IDLE; -- clear critical controls
BENDREQ <= '0';
ENC_STB <= '0';
DBUSY <= '0';
CMDDONE <= '0';
end case;
MAPLD 2005/237Land 24
Safe State MachinesTesting How do you prove that the resultant netlist includes the safe state machine ?
Identify the STATE registers in the netlist. Using the simulator force the state register to all states
Reset core after each test to prevent side effects of forcing states Verify that the FSM_ERROR output is asserted
printf("Testing Main State Machine - 16 states, 13-15 Illegal");
for state in 0 to 15 loop
resetcore(RSTNOW,CLK16);
printf(" Testing State %d : Restart by typing : do forcefsm.do 0 %04b",fmt(state)&fmt(state));
assert FALSE report "Ignore ERROR, restart simulation ^^^^^^" severity ERROR ;
-- before restarting state machine is forced to the illegal state
wait for 1 us; -- allow time for tcl script to force error
check_state(state, (state>=13), status, ERR);
end loop;
resetcore(RSTNOW,CLK16);
---------------------------------------------------------------------------------------------------------------------------------------- force -deposit sim:/tbench/u12__0/uut1/DSTATE_3/Q $state_bit3 0 force -deposit sim:/tbench/u12__0/uut1/DSTATE_2/Q $state_bit2 0 force -deposit sim:/tbench/u12__0/uut1/DSTATE_1/Q $state_bit1 0 force -deposit sim:/tbench/u12__0/uut1/DSTATE_0/Q $state_bit0 0
MAPLD 2005/237Land 25
Safe State Machines Results and Memory Protection
Has an effect on gate count and performance compared to normal implementation flows
Causes a 7% increase in gate count Causes a 1% drop in performance
But still fits in device and meets performance requirements
Memory Usage Make sure that EDAC memory is used,
Consider about scrub rates, etc. Avoid memory because it is more easily upset by radiation
MAPLD 2005/237Land 26
What is a ‘Babbling’ Transmitter?
Requirements All RT’s are required to monitor outputs to detect if they are
babbling and if so stop, referred to as a Fail Safe Timer If detected by the bus controller it sends a message to the
terminal using the other bus to stop the babbling transmitter
How can a RT babble? Two errors (failures) have to occur within the terminal:
1. The logic that controlled the enable signal to the transmitter has to fail, and second,
2. The terminal's fail-safe timer (maximum of 800.0 microseconds) has to have failed. Some designs use a digital counter for the fail-safe timer, a single
failure in a clock line could cause a babbling transmitter
MAPLD 2005/237Land 27
Avoid Babbling Transmitter Design
Transmit Timeout MIL-STD-1553 requires that a
separate circuit monitors the transmissions and stops the transmitter if a babbling transmission is detected i.e. greater than 33 words transmitted
Even though the protocol state machines may never theoretically cause this, it is a requirement to include this logic
Separate circuit that monitors the Transmit enables and detects if active for greater than 680us If triggers, then enable to external
transceiver is disabled and error condition generated.
process(CLKSPD)
begin
case CLKSPD is
when "00" => HWTIMVALUE <= "0100001"; -- 12MHz
when "01" => HWTIMVALUE <= "0101011"; -- 16MHz
when "10" => HWTIMVALUE <= "0110110"; -- 20MHz
when others => HWTIMVALUE <= "1000001"; -- 24MHz
end case;
end process;
PTXTTIM: process(CLK,RSTn)
variable TXT_TIMER : std_logic_vector(14 downto 0);
begin
if RSTn='0' then TXT_TIMER := ( others => '0');
TXT_ERROR <= '0';
elsif CLK'event and CLK='1' then
TXT_ERROR <= '0';
if TXT_TXBUSY='1' then
TXT_TIMER := TXT_TIMER + 1;
else
TXT_TIMER := ( others => '0');
end if;
if TXT_TIMER(14 downto 8) = HWTIMVALUE then
TXT_ERROR <= '1';
end if;
end if;
end process;
MAPLD 2005/237Land 28
Babbling Transmitter Testing
How do you test this ? Protocol State machines do not do this in normal operation
Create test mode input - TESTTXTTOUT Modifies the protocol state machine When high, causes >32 data words to be transmitted Test benches set this and verify that the core detects the
babbling transmitter Allows testing, but does this create an additional failure
mechanism ? May be pulled inactive by an external resistor, if this was to fail
then the core would fail
External Input can be disabled Can remove logic from core to prevent this error condition Synthesis will remove the error injection logic.
MAPLD 2005/237Land 29
Another considerationOver Sampling
Some systems can be improved by over-sampling input streams Then filtering or voting
1553B Already has well protected data stream
Manchester coding “00” and “11” patterns are error conditions
Parity on data words
Core1553BRT Samples incoming data at 6X, 8X 10X or 12X the base 2MHZ rate Required for clock extraction and ability to handle 1553B jitter and noise
requirements
Additional over sampling is not implemented at present because As is, Core1553BRT passes all requirements required by the 1553B RT test Would require higher speed clocks
Higher power consumption Larger device
Would require a major redesign Adds additional risk with a major redesign
MAPLD 2005/237Land 30
RTCA/DO-254Design Assurance Guidance for Electronic HW
Advisory Circular 20-152 Ratified 6/30/05, calls for DO-254 compliance for design assurance
levels A, B or C DO-254 standard originally developed in 2000
DO-254 is a hardware standard, IP is hardware There are many misunderstandings about this standard So far, there is no precedence for DO-254 certified IP We are focusing on section 10 by considering to provide Hardware
Design Life Cycle Data for relevant cores
What does it require? A DO-254 development flow in addition to the ISO-certified flow More documentation It forces discipline to follow a test plan and document against that plan
PHAC and HAS are important elements Without this, customers treat our IP as COTS products (section 11)
MAPLD 2005/237Land 32
Lessons Learned High quality = attention to detail
You cannot do too much verification for IP in severe environments We found a bug increasing code coverage from 98% to 100% Have gate reviews backed with data
Document variations from perfect For example, if code coverage is 99%, understand why
Experience matters Design Products Customers
There needs to be a way to add objectivity to verification Against a tester By a third party Have another person review the code or perform verification
You can always improve Core originally tested at multiple speeds, but not multiple languages DO-254 adds additional discipline to the development process
MAPLD 2005/237Land 33
Conclusion
Pre-built and verified IP can reduce risk, if A structured, robust development process is followed
Phase-gate process, even if simplified Additional concerns for severe environments are considered
Safe state machines Redundant check for babbling
Verification and validation is demonstrated Code coverage near 100% Certification of demonstration board design
Deliverables and documentation ease use Helps integration and design re-use
Many customers prove the core in a variety of environments More than one company can do on its own
MAPLD 2005/237Land 34
ConclusionBlock-based Design Enables Development
PCI bus to instrument panel
1553 bus to rest of craft
ASM51 MCU(8051)
SensorModule
RemoteMonitor
SerialChannel
Prog.I/O
Synchronous Serial Channel (SDLC)
Asynchronous Serial Channel (UART)
Memory Data Bus
Shared Memory(on or off-chip)
PCI 1553 RT
Avionics Control PortData Transfer Port
Special Function Register Bus
Spacecraft I/O Board Example