Download - Abstract Introduction Typical PC-based Imple- mentation The Protocol Layers What Was Tested?
Abstract
Introduction
Typical PC-based Imple-mentation
The Protocol Layers
What Was Tested?
Test Methodology for Characterizing the SEE Sensitivity of a Commercial IEEE 1394 Serial Bus (FireWire)
Christina SeidleckRaytheon ITSSLanham, MD
Stephen BuchnerQSS
Landover, MD
Hak KimJackson & Tull
Wahsington, DC
P.W. MarshallConsultant
Brookreal, VA
Kenneth LaBelNASA GSFC
Greenbelt, MD
Results LLC IrradiatedHard Errors
Results LLC IrradiatedSoft Errors
Results PHY IrradiatedHard Errors
Conclusions
References
Example of a Soft Error
Example of a Hard Error
SEFIs Categorized bySteps Required to Start
Communications
Results LLC AsynchronousMode
Results PHY AsynchronousMode
Radiation Test HardwareDiagram
Radiation Test Software
Software Flow forAsynchronous Mode
Software Flow for Isochronous Mode
Two Main Types of ErrorObserved
Modes of Operation
Packet-Based Transactions
Test Part Function and Acc
Radiation Characterization
Radiation Test HardwareSetup
AbstractThe Single Event Effect (SEE) responses of two FireWire serial buses based on the IEEE 1394 standard were tested with heavy ions and protons. A unique approach to testing and categorizing the SEEs is presented.
Introduction and Background
IEEE 1394 is a formal description of the architecture called FireWire originally developed by Apple Computer. FireWire is an advanced serial bus used for connecting numerous high performance devices together.
Why FireWire?
•Less Expensive Alternative to Parallel Buses - a variety of devices can connect directly to a single serial bus, ~4.5 meters allowed between devices (cable implementation)
•Backplane and Cable Implementations Supported -only the cable implementation is presented here
•Plug and Play Support - supports automatic configuration of devices without intervention from the host system
•Scalable Performance - support of transfer rates of 400Mb/s, 200Mb/s, and 100Mb/s
•Attachment Of Up To 63 Nodes On A Single Serial Bus
•Supports Two Transmission Modes: Isochronous and Asynchronous
•Peer to Peer Transfers - data can be transferred between individual nodes without intervention from the host system
DigitalVCR
CD-ROM
PC
LaserPrinter
DigitalCamera
1394 Cable
Typical PC-Based 1394 Implementation
The serial bus allows a variety of high-speed peripheral devices to be attached and supported
The Protocol Layers of the 1394
Bus
Management
Interface
Asynchronous
Transfer
Interface
Isochronous
Transfer
Interface
Software Driver
BusManager
IsochronousResourceManager
Cycle Master
NodeController
Serial BusManagementLayer
TransactionLayer
Physical Layer
Link Layer
Serial Bus
What Was Tested?
Physical Layer PHY
Link Layer LLC
Commercial 1394 Development Board
•FIFOs•PCI Registers •OHCI Registers
•16 Internal Registers
LLC Part Number
TI
NSC
TSB12LV26PZT
CS4210VJG
Vendor PHY Part Number Development Board
TSB41AB3PFP
CS4103VHG
TSBKOHCI403
CS4210A-DK
CA-OAAO45T
Lot Date Code Lot Date Code
VS052ABC4
OCC4RTT
VS052ABC4
Modes of OperationAsynchronous Isochronous
•Data transfers target a particular node based on a unique address (one-to-one transfers)
•Data transfers do not require a constant data rate
•All data transfers of this type are guaranteed 20% (min.) of overall bus bandwidth
•Verifies data delivery with acknowledge, CRC checks and response codes
•Supports data retransmits
•Used when data integrity is required/critical
•Data transfers target nodes based on a channel number of the transfer (like a broadcast, one-to-many)
•Receiving nodes “listen” to channel numbers to receive data packets
•No error detection or retransmits
•Uses constant bandwidth which is requested from the isochronous resource manager
•80% of bus bandwidth used for isochronous data transfers
•Used for time critical, error-tolerant data transfers
Packet-Based Transactions
Asynchronous Packets:•Reads•Writes•Locks
All transactions are transmitted over the bus in a packetized form. Different types of packets are defined for asynchronous and isochronous modes.
Destination Address Source ID Data Label CRCTransaction Type Acknowledge Code Parity
Sample Write Request Packet Sample Acknowledge Packet
Isochronous Packets•Stream Data
Channel Number Transaction Type Data CRC
Sample Stream Packet
Acknowledge
Test Part Function and Accessibility
Physical Layer PHYLink Layer LLC
Functions
•Forms packets for transmission•Provides address decoding for incoming asynchronous packets•Provides channel number decoding for incoming isochronous packets•Performs CRC error checking
Functions
•Electrical and mechanical interface for transmission and reception of packets transferred across the bus•Arbitration - ensures only one node at a time transmits on the bus
Registers Monitored
•FIFOs
•42 out of a possible 102 Open Host Controller Interface (OHCI) registers
•21 out of a possible 22 PCI registers
Registers Monitored
•Due to the volatility of the 16 registers on the PHY, none was monitored
Radiation Characterization
• Protons (TRIUMF) and heavy ions (BNL and TAMU) used to test parts from Texas Instruments and National Semiconductor.
• Irradiate PHY and LINK chips separately on DUT board.
• National Semiconductor part underwent destructive latchup when irradiated with ions having a LET = 27 MeV.cm2/mg. Therefore, did a full characterization on the TI parts only.
Radiation Test Hardware Setup
•Two personal computers (PCs) with PCI slots were used in the test
•Each had an IEEE1394 board
•One of the PCs with the devices-under-test (DUTs) was placed in the beam line while the other was placed in a remote area
•The two PCs were connected by their 1394 interface via a 10 ft 1394 cable for data communication
•A PCI bus isolation card was placed between the DUT board and its host PC
•This card enables current consumption readings from the +5V supply to the DUT board from the host PC via the PCI interface
•A HP34401A Digital Multi-Meter (DMM) was used to read and record this supply current
Radiation Test Hardware Diagram
1394 DUT
PHY LLCBeam
Host PC
1394 Board
Remote PC (CTRL)
1394 Cable
Monitor, Keyboard, Mouse
Monitor, Keyboard, Mouse
Target Area
PCI Bus Isolation Card HP34401ADMM
10ft.
Laptop
Radiation Test Software
•Custom device driver software was developed using C++ and Jungo’s WinDriver targeted for a PC Windows NT 4.0 platform
•Software was an interrupt driven program which established continuous communications between DUT and CTRL at 100 Mbps
•For SEL testing at BNL no registers were monitored
•For proton testing at TRIUMF only asynchronous mode was implemented
•For heavy ion testing at TAMU both asynchronous and isochronous modes were implemented
CTRLR DUT
Software Flow for Asynchronous Mode
Determine test type LLC or PHY Wait for interrupt
Form data request packet and send to DUT
Response Buffer
Request Buffer
Register data response packet
Register data request packet
Setup•Lockdown memory•Set node ID•Set Delay
•Enable receive buffers•Turn on interrupts
•Compare Data•Log Errors•Continue Test Loop
•Determine test type requested•Poll LLC or PHY registers•Form Data Response Packet
Setup•Lockdown memory•Set node ID•Set Delay
•Enable receive buffers•Turn on interrupts
Setup• Lockdown memory• Enable ARRS for bus reset packets• Turn on Isoch receive buffer• Turn on interrupts
Setup• Lockdown memory• Enable ARRS for bus reset packets• Turn on Isoch receive buffer• Turn on interrupts
Form register data solicit packet
• Compare register values
• Log errors • Continue loop
CTRLR
Software Flow for Isochronous Mode
Wait for interrupt
• Poll LLC registers• Build data packet
Register data stream
packet
Register data solicit
stream packet
DUT
Receive Buffer
Receive Buffer
Two Main Type of Errors Observed
Soft Errors
•Bit flips logged by software which occurred in registers, FIFOs or data that did not disrupt communications between the DUT and CTRLR during the test run
Hard Errors or SEFIs
•Errors occurring in registers which halted communications between the DUT and CTRLR during the test run •Errors of this type required a series of software and/or operator steps in order to recover communications
•SEFIs were further classified by the steps taken to re-establish communications
Example of a Soft Error
Asynchronous Request Filter Low Register on the LLC
Enables reception of asynchronous request packets on a per-node basis (handles lower node IDs). When an asynchronous request packet is received, the source node ID is examined. If the bit corresponding to the node ID is not set in this register, then the packet is not acknowledged and the request is not queued. In this example, the register is setup such that only asynchronous request packets from nodes 0 and 1 will be accepted.
Bit 012345678910111213141516171819202122232425262728293031
11000000000000000000000000000000
If bit 26 transitions to a 1, this incorrectly would enable asynchronous request packets from node 26 to be accepted.
Example of a Hard Error (SEFI)
Bit 012345678910111213141516171819202122232425262728293031
00000000000000000111000000000000
ReservedReserved
Host Controller Control Register on the LLC
Provides flags for controlling the TSB12LV26
Bit 17 is the Link Enable bit. This bit is set to 1 when the system is ready to begin operation. If an upset cleared it to 0,the TSB12LV26 would be logically and immediately disconnected from the 1394 bus. No packets would be received or transmitted. Communications would be halted between the CTRLR and DUT.
SEFIs Categorized By Steps Required to Start Communications
5
6
7
8
9
10
Step Action
1
2
3
4
SEU test loop is restarted on the CTRLR, i.e., a packet is sent to DUT requesting register informationSoftware bus reset. Force CTRLR to be root, initiate bus reset in the PHY, reset node on LLC. Restore registers and flush FIFOs. Set bus Ops, IRMC, CMC, ISC, configuration ROM, enable transmit and receive. Implies step 1.Reload software application. This refreshes the lockdown memory region shared by hardware and software. Implies steps 2,1.
Able to verify CTRLR is sending register data solicit packets to DUT. Able to verify that DUT receives the packets and sends data response packet to CTRLR. CTRLR cannot see response packet from DUT. Power cycle the CTRLR. Implies steps 3,2,1.
Disconnect/reconnect the 1394 cable. This causes hard bus reset, tree ID process.
Step 5, followed by steps 3, 2, 1.
Step 6 followed by cold rebooting DUT followed by steps 3, 2, 1.
Cold reboot DUT followed by steps 3, 2, 1.
Step 5 followed by step 8.
Reboot CTRLR followed by steps 3, 2, 1.
11 Reboot both CTRLR and DUT PCs followed by steps 3, 2, 1.
Results - LLC Running Asynchronous ModeERRORS IN LLC RUNNINGASYNCHRONOUS MODE 3 4.2 8.39 11.9 27.7 39.2 51.6 59.6 73
1
2
4
5
6
7
8
9
10
11
12
13
14
15
16
3
“Soft” ErrorsNo errors observed current jumped from 18mA >44mA
Register error, self corrected and no change in current
Register error, self corrected, current jumped 18mA >44mA
“Hard” ErrorsRestart communications from CTRLR
Software bus reset current junped from 18mA to 44mA
Reset CTRLR and/or DUT software
Software bus reset and reset software on DUT and CTRLR
CTRLR sends packet, does not listen cold reboot CTRLR
Disconnect/reconnect cable (hard bus reset)
Disconnect/reconnect cable, reload bus DUT software
Reset cable and cold reboot DUT
Cold reboot DUT after lockup, but no change in current
Cold reboot DUT after lockup, current jump 18mA to 44mA
Reset cable, reboot DUT and software, delta I=0
Reset cable, reboot DUT and software: 18- >44mA
Reboot CTRLR, reload software on bus, DUT and CTRLR
17 Reboot both computers, reset all software
0 0 0 0 0 0 0 0 x
1.3E-4 1.0E-5 4.6E-5 2.5E-5 8.8E-5 3.1E-4 2.4E-4 1.3E-4 x
0 0 0 0 0 0 0 0 x
xxxxxxxxx
xxxxx
0 0 0 8.3E-7 0 0 6.8E-6 0
2.6E-50 0 0 0 0 0 0
4.3E-6 4.2E-7 00 0 1.3E-5 0 0
0 0 4.3E-6 8.3E-7 2.3E-6 0 0 0
2.3E-60 0 0 0 0 0 0
0 0 0 0 0 0 0 06.8E-60 0 0 0 0 0 0
5.7E-52.3E-60 0 0 0 0 0
0 0 0 8.3E-7 4.5E-6 2.6E-5 1.4E-5 0
0 0 2.2E-6 1.7E-6 4.5E-6 0 1.4E-5 0
0
0
00
0
00
0
0
00
4.3E-6
0
0
00
0
0
0
0
0
00
0
0
0
006.8E-6
00
0
Results - PHY Running Asynchronous ModeERRORS IN PHY RUNNINGASYNCHRONOUS MODE 3 4.2 8.39 11.9 27.7 39.2 51.6 59.6 73
1
2
4
5
6
7
8
9
10
11
12
13
14
15
16
3
“Soft” ErrorsNo errors observed current jumped from 18mA >44mA
Register error, self corrected and no change in current
Register error, self corrected, current jumped 18mA >44mA
“Hard” ErrorsRestart communications from CTRLR
Software bus reset current junped from 18mA to 44mA
Reset CTRLR and/or DUT software
Software bus reset and reset software on DUT and CTRLR
CTRLR sends packet, does not listen cold reboot CTRLR
Disconnect/reconnect cable (hard bus reset)
Disconnect/reconnect cable, reload bus DUT software
Reset cable and cold reboot DUT
Cold reboot DUT after lockup, but no change in current
Cold reboot DUT after lockup, current junp 18mA to 44mA
Reset cable, reboot DUT and software, delta I=0
Reset cable, reboot DUT and software: 18- >44mA
Reboot CTRLR, reload software on bus, DUT and CTRLR
17 Reboot both computers, reset all software
0 0 x 0 0 0 0 x x
0 0 x 0 0 0 0 x x
0 0 x 0 0 0 0 x x
xxxxxxxxx
xxxx
x
0 0 x 0 0 1.0E-4 6.4E-5 x00 0 x 0 9.1E-6 0 x
x 0 00 0 0 0 x0 0 x 0 0 0 0 x
00 0 x 0 0 0 x9.1E-8 0 x 8.3E-7 0 0 0 x
00 0 x 3.3E-6 0 0 xx00 0 x 0 0 0
0 0 x 0 0 2.0E-4 0 x0 0 x 0 0 0 0 x0
0
0
0
0
00
0
xxxx
0
0
0
2.5E-6
0
0
0
3.6E-5
0
0
0
2.0E-4
x
xxx2.6E-4
00
0
10-7
10-6
10-5
10-4
0 10 20 30 40 50 60
y=6e-5*(1-exp(-((x-2)/20)1.15))y=2e-5*(1-exp(-((x-2)/40)0.645))IsochronousAsynchronous
Effective LET (MeV.cm2/mg)
SE
FI C
ross
Sect
ion (
cm2 /d
evi
ce)
Results LLC Irradiated Hard Errors
10-7
10-6
10-5
10-4
10-3
0 10 20 30 40 50 60
eqn: y=0.0003*(1-exp(-((x-2.5)/30)1.1))eqn: y=0.0001*(1-exp(-((x-2.5)/30)1.15))IsochronousAsynchronous
Effective LET (MeV.cm2/mg)
SE
U C
ross
Se
ctio
n (
cm2 /d
evi
ce)
Results - LLC Irradiated Soft Errors
10-6
10-5
10-4
10-3
0 10 20 30 40 50 60 70 80
IsochronousAsynchronous
Effective LET (MeV.cm2/mg)
SE
FI C
ross
Sect
ion (
cm2 /d
evi
ce)
Results - PHY Irradiated Hard Errors
Conclusions
• NSC part exhibited destructive latchup at LET=27 MeV.cm2/mg
• TI part exhibited both SEUs (soft errors) and SEFIs (hard errors)
• At low LETs, the errors are mostly soft errors
• The presence of SEFIs resulting in rebooting of the system makes this part problematic for space usage.– power cycling may be required
• An improved test would involve:– automatic reboot– another device
References
Anderson, Don and Mindshare, Inc. FireWire System Architecture. Addison-Wesley:Reading Massachusetts, 1999.
1394 Open Host Controller Interface Specification. Release 1.1, January, 2000.
S. Buchner, et al. Radiation Testing of the 1394 FireWire. Presentation SEU Symposium, Los Angeles. April, 2002.
SponsorsNEPP
NRL/NPOES
Special thanks to Kent Larson and Mike Worcester of Boeing