cs137: electronic design automation
DESCRIPTION
CS137: Electronic Design Automation. Day 8: February 4, 2004 Fault Detection. Today. Faults in Logic Error Detection Schemes Optimization Problem. Problem. Gates, wires, memories: built out of physical media may fail. Device Physics. Represent a 1 or 0 with charge - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/1.jpg)
CALTECH CS137 Winter2004 -- DeHon
CS137:Electronic Design Automation
Day 8: February 4, 2004
Fault Detection
![Page 2: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/2.jpg)
CALTECH CS137 Winter2004 -- DeHon
Today
• Faults in Logic
• Error Detection Schemes
• Optimization Problem
![Page 3: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/3.jpg)
CALTECH CS137 Winter2004 -- DeHon
Problem
• Gates, wires, memories:– built out of physical media– may fail
![Page 4: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/4.jpg)
CALTECH CS137 Winter2004 -- DeHon
Device Physics
• Represent a 1 or 0 with charge– On a gate, in a memory
• Charge may be disrupted -particle– Ground bounce– Noise coupling– Tunneling– Thermal noise– Behavior of individual electrons is statistical
![Page 5: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/5.jpg)
CALTECH CS137 Winter2004 -- DeHon
DRAMs
• Small cells
• Store charge dynamically on capacitor
• Store about 50,000 electrons
• Must be refreshed– Data leaks away through parasitic
resistance
-particle can be 1,000,000 carriers?
![Page 6: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/6.jpg)
CALTECH CS137 Winter2004 -- DeHon
System Reliability
• Device fail with Probability: Pfail
• Have N components in system• All must work for device to work• Psys = (1-Pfail)N
...32
1 32
failfailP
NP
NPNP failsys
![Page 7: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/7.jpg)
CALTECH CS137 Winter2004 -- DeHon
System Reliability
• If NPfail << 1 NPfail dominates higher order terms…
...32
1 32
failfailP
NP
NPNP failsys
failsys PNP 1
![Page 8: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/8.jpg)
CALTECH CS137 Winter2004 -- DeHon
System Reliability
• Psysfail N Pfail
failsys PNP 1
![Page 9: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/9.jpg)
CALTECH CS137 Winter2004 -- DeHon
Modern System
• 100 Million 1 Billion Transistors– Not to mention wiring…
• > GHz = > 1 Billion Transitions / sec.
• N = 1018 per second…
failsys PNP 1
![Page 10: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/10.jpg)
CALTECH CS137 Winter2004 -- DeHon
As we scale?
• N increases• Charge/gate decreases
– Less electrons– Higher probability they wander– Greater variability in behavior
• Voltage levels decrease– Smaller barriers
• Greater variability in device parameters
Pfail increases
failsys PNP 1
![Page 11: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/11.jpg)
CALTECH CS137 Winter2004 -- DeHon
Exacerbated at Nanoscale
• Small numbers of dopants (10s)– High variability
• Small numbers of electrons (10-1000s?)– High variability– Highly susceptible to noise
• Small number of molecules– May break, decay…
![Page 12: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/12.jpg)
CALTECH CS137 Winter2004 -- DeHon
What do we do about it?
• Tolerate faulty components• Detect faults
– Not do anything bad– Try it again
• If statistically unlikely error,–high likelihood won’t recur.
• …Focus on detection…
![Page 13: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/13.jpg)
CALTECH CS137 Winter2004 -- DeHon
Detect Faults
• Key Idea: redundancy
• Include enough redundancy in computation– Can tell that an error occurred
![Page 14: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/14.jpg)
CALTECH CS137 Winter2004 -- DeHon
What kind of redundancy can we use?
• Multiple copies of logic
• Compute something about result– Parity on number of outputs– Count of number of 1’s in output
![Page 15: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/15.jpg)
CALTECH CS137 Winter2004 -- DeHon
Error Detection
![Page 16: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/16.jpg)
CALTECH CS137 Winter2004 -- DeHon
What do we protect against?
• Any n errors– Worst-case selection of errors
![Page 17: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/17.jpg)
CALTECH CS137 Winter2004 -- DeHon
Single Error Detection
• If Pfail small:– No error: (1-Pfail)N 1-NPfail
– One error: NPfail (1-Pfail)N-1 NPfail
– Two errors: [N(N-1)/2] (Pfail )2(1-Pfail)N-1
• Probability of an error going undetected Goes from NPfail
to (NPfail )2
For: NPfail << 1
![Page 18: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/18.jpg)
CALTECH CS137 Winter2004 -- DeHon
Detection Overhead
• Correction and detection circuitry increase circuit size.
• Ndetect > Nlogic
• Ndetect = c Nlogic
• Probability of an error going undetected Goes from NPfail
to (cNPfail )2
Want: c2 << 1/(NPfail )
![Page 19: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/19.jpg)
CALTECH CS137 Winter2004 -- DeHon
Reliability Tuning
• Want NPfail small
– Want: (cNPfail )2 very small
• Idea:– Guard subsystems independently
– Make Nsub suitably small
– Smaller probability there is a double error localized in this small subsystem
![Page 20: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/20.jpg)
CALTECH CS137 Winter2004 -- DeHon
Guarding Subsystems
![Page 21: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/21.jpg)
CALTECH CS137 Winter2004 -- DeHon
Composing Subsystems
• Psysundetect = (Nsys/Ns) Psubundetect
• Psubundetect = (cNsPfail )2
• Psysundetect = (Nsys/Ns) (cNsPfail )2
• Psysundetect = Nsys Ns (cPfail )2
• Extermes:• Ns= Nsys
• Ns=1
![Page 22: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/22.jpg)
CALTECH CS137 Winter2004 -- DeHon
Problem
• Generate logic capable of detecting any single error
![Page 23: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/23.jpg)
CALTECH CS137 Winter2004 -- DeHon
Terminology
• Fault-secure: system never produces incorrect code word– Either produces correct result– Or detects the error
• Self-testing: for every fault, there is some input that produces an incorrect code word– That detects the error
![Page 24: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/24.jpg)
CALTECH CS137 Winter2004 -- DeHon
Terminology
• Totally Self Checking: system is both fault-secure and self-testing.
![Page 25: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/25.jpg)
CALTECH CS137 Winter2004 -- DeHon
Duplication
![Page 26: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/26.jpg)
CALTECH CS137 Winter2004 -- DeHon
Duplication
• N original gates• Duplicate: + N• O outputs
– O xors– O/2 2 2 ors
• O<N• 2<c<5
![Page 27: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/27.jpg)
CALTECH CS137 Winter2004 -- DeHon
Duplication with PLA
Logic
Duplicate
![Page 28: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/28.jpg)
CALTECH CS137 Winter2004 -- DeHon
PLA Duplication
• N product terms in original
• N in duplicate• 2 O product terms
for matching• O<=N• 2<c<4
![Page 29: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/29.jpg)
CALTECH CS137 Winter2004 -- DeHon
Can we do better?
• Seems like overkill to compute twice?
![Page 30: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/30.jpg)
CALTECH CS137 Winter2004 -- DeHon
Idea
• Encode so outputs have some checkable property– E.g. parity
![Page 31: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/31.jpg)
CALTECH CS137 Winter2004 -- DeHon
Will this work?
Original Logic
Extra cubes for parity
parity
![Page 32: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/32.jpg)
CALTECH CS137 Winter2004 -- DeHon
Problem
• Single fault may produce multiple output errors
![Page 33: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/33.jpg)
CALTECH CS137 Winter2004 -- DeHon
How Fix?
• How do we fix?
![Page 34: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/34.jpg)
CALTECH CS137 Winter2004 -- DeHon
No Logic Sharing
• No sharing
• Single fault effects single output
![Page 35: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/35.jpg)
CALTECH CS137 Winter2004 -- DeHon
Parity Checking
• To check parity– Need xor tree on outputs/parity– [(O+1)/2]22 = 2(O+1) xors
• For PLA– xor would blow up– Wrap multiple times– 2 product terms per xor– 4O product terms
![Page 36: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/36.jpg)
CALTECH CS137 Winter2004 -- DeHon
nanoPLA Wrapped xor
Note: two planes here just for buffering/inversion
![Page 37: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/37.jpg)
CALTECH CS137 Winter2004 -- DeHon
Better or Worse than Dual?
• Depends on sharing in logic
• Typical results from Mitra [ITC2002]
![Page 38: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/38.jpg)
CALTECH CS137 Winter2004 -- DeHon
Can we allow sharing?
• When?
![Page 39: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/39.jpg)
CALTECH CS137 Winter2004 -- DeHon
Multiple Parity Groups
• Can share with different parity groups
• Common error flagged in both groups
![Page 40: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/40.jpg)
CALTECH CS137 Winter2004 -- DeHon
Better or Worse than Dual?
• Typical results from Mitra [ITC2002]
(parity here includes sharing)
![Page 41: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/41.jpg)
CALTECH CS137 Winter2004 -- DeHon
Project Assignment
• Assignments #3 & #4– Out on Monday
• Provide an algorithm for identifying parity groups– Keep single error detection property– Minimize pterms
![Page 42: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/42.jpg)
CALTECH CS137 Winter2004 -- DeHon
Admin
• Assignment #2 due Friday
![Page 43: CS137: Electronic Design Automation](https://reader036.vdocuments.us/reader036/viewer/2022062314/568148b5550346895db5cbbd/html5/thumbnails/43.jpg)
CALTECH CS137 Winter2004 -- DeHon
Big Ideas
• Low-level physics imperfect– Statistical, noisy
• Larger devices greater likelihood of faults
• Redundancy
• Self-checking circuits