e0-286 “vlsi test”
TRANSCRIPT
1
E0-286 “VLSI Test”
1. Delay Test: LOC,LOS,PD,SDD. Delay test with multiple-clock domains. Scan architectures to improve delay test coverage. Testing memory interface logic
2
At-speed test
q Stuck-at tests ensures that the manufactured design meets the functionality, based on structural fault coverage
q At-speed tests ensures that the manufactured design meets the timing constraints, based on structural fault coverage¦ Timing constraints: setup and hold
q How many (at-speed) capture clock pulses are needed to test a design for setup timing compliance…¦ for a full-scan design?¦ for a partial-scan design?
q How many clock pulses are needed to test a design for hold timing compliance…¦ for a full-scan design?¦ for a partial-scan design?
3
Fault models for testing setup delay defects
q Path delay fault model: Delay defect in a circuit may result in cumulative delay of a combinational path to exceed the specified duration (or the clock period).¦ Number of potential fault sites = number of paths.
? Path start (end) point: output (capture input) of a scan flip-flop? Number of paths may be exponential with number of gates!
¦ Fault: Path and its transition direction
q Transition delay fault model: Delay defect at gate terminals is assumed to be large enough to prevent any passing transition through the gate from reaching any output within the specified duration (or the clock period).¦ Number of potential fault sites = number of gates¦ Fault: slow-to-rise (str) and slow-to-fall (stf) at gate terminals
4
q Two pattern tests enough to target setup in a testable full-scan circuit: V1 à V2
q V1 is initialisation pattern => initializes the state.
q V2 is launch pattern => launches a transition¦ Limitation is in generation of launch pattern from initialization
pattern.
q Capture cycle then captures the result of launch pattern
q At-speed requirement is between launch pattern (V2) and capture pattern.¦ Initialization and launch may or may not be at-speed.
Scan-based setup delay fault tests
5
5-valued algebra
q S0 (S1): Steady 0 (1); V1 = V2 = 0 (1)
q U0 (U1): Final value is 0 (1); V1 = X, V2 = 0 (1)¦ V1 may be either 0 or 1¦ U0 = S0 OR (Fall)
q XX: V1 = V2 = X (don’t-care)
q Algebra:¦ S1.U0 = U0; U0.U1 = U0; …¦ S0+U1= U1; U0+U1 = U1; …¦ ~U0 = U1; …
q ATPG may use such algebra to generate robust / non-robust tests as required.¦ Robust test requires only Rise or Fall along the path.
6
Robust path-delay test
q Guarantees detection irrespective of delays in other parts of circuit.
f3
f5
f4
f8
f7
f6
f1
f0
f2
f9
a1
a0y0
y1
y2
y3
1
1
1
1
1
One fault is targetedper-pattern.
Disjoint path tests may be merged in samepattern
FF
F
F
7
Non-robust path-delay test
q Guarantees detection only when no other path-delay fault is present.
f3
f5
f4
f8
f7
f6
f1
f0
f2
f9
a1
a0y0
y1
y2
y3
Fault-free Faulty
a1
a0
y3
Strobe-points
U1
U1
U1
U1
U0
U0
U0F
U1
8
Transition delay fault test
q Transition at fault site and final value (V2) should be observed at output. Example: slow-to-fall @ f6
Any path may be usedfor justification and propagation.
Initial value is not observed.
f3
f5
f4
f8
f7
f6
f1
f0
f2
f9
a1
a0y0
y1
y2
y3
U1
U1
U1
F
U1
U1U1
U1
F
9
Stuck-at vs. Transition fault tests
q How many test patterns are required for the following circuit?
f1
f0 f9
f8
f7
f6
f3
f2
f5
f4
f1
f0 f9
f8
f7
f6
f3
f2
f5
f4
11
11
1
1
1
R
11
1
1
1
R
Stuck-at Test Example Transition Fault Test Example
Can this be generalized?
10
Small Delay Defects
0
5
10
4 8 12 16 20 24 28 32 36 40 44
Slack
Nod
es
Earlier metric:
• # transition faults detected.
New metric:
• # transition faults detected weighted as:
• (Fault * Minimal slack) / (Slack along detected path)
• Area under the slack curve.
q Take all patterns for a given slack – accept coverage obtained.
q Take all patterns for a given coverage – accept slack used.
% of detection path slack w.r.t. minimum path slack
Pattern count Coverage
10% 20% 30% 40% 50% 60% 10% 20% 30% 40% 50% 60%% of detection path slack w.r.t.
minimum path slack
11
q Ideally, V2 should be independently generated from V1 to enable applying any possible pattern to the DUT. ¦ This may require special hardware (i.e. hold-scan flip-flops)
q Otherwise, three possible ways to generate launch (V2), after initialization (V1)¦ LOS: V2 is shifted value of V1 (combinational). Different V2 for all V1 guaranteed.¦ LOC: V2 is functional state of V1 (sequential). Different V2 for all V1 not guaranteed.¦ LOC: V2 is functional state of V1 after many cycles. Higher number of launch states
(V2) for a given V1.
Scan-based transition fault tests
shift_in (LOS) shift_outlaunch
capture
Launch off shift
Launch off capturescan_enable
scan_enable
V1 V2
shift_in (LOC)
12
LOS vs. LOC
q LOS¦ Trivial to generate V2 from V1.
? Faster ATPG¦ Large number of parallel transitions possible => many faults
detectable per pattern ? Lesser pattern count.
¦ V2->capture is at-speed => SE switches at-speed? SE drives all flops like clock. At-speed SE => similar physical design
considerations like clock.¦ Scan chain ordering impacts coverage as different scan order
results in different V2 for a given V1.q LOC
¦ Sequential simulation required to generate V2 from V1? Time-consuming ATPG
¦ V2 is functionally reachable state from V1 => Multiple arbitrarytransitions may not be justifiable? Higher pattern count.
¦ No at-speed requirement on SE and no dependence on scan order
13
Improving limitations of LOS and LOC
q LOS requires at-speed scan enable¦ Pipelined scan enable addresses this to generate local at-speed SE.¦ For full-scan designs with no-sequential feedback, bi-partitioning
provides full LOS coverage with 2 slow-speed scan enables.
q Many techniques have been proposed to combine the best worlds of LOC and LOS to improve coverage¦ LOS by definition has N shifts and 1 capture.
? Last flop in each scan chain may not see a valid transition. ? LOES (launch on extra shift) addresses this.
¦ Scan enable partitioning techniques with multiple scan enables.¦ Combining multiple scan enables with multiple capture depths.
GSELSE GSE
LSE
14
Bi-partitioned scan
Phase-I- Launch fromblue.
- Capture in red.
Phase-II- Launch fromred.
- Capture in blue.
SE_R
SE_B
SE_B
SE_R
16
Proof of coverage improvement
q Coverage improvement is directly related to the number of launch states (V2) possible for a given initialization state (V1)¦ V2(LOS) = S(V1), for all V1
¦ V2(LOC) = C(V1), for all V1
¦ V2(LOS+LOC) = S(V1)+C(V1), for all V1
¦ V2(2_SE) = SS(V1)+CC(V1)+SC(V1)+CS(V1), for all V1
Where S denote shift, C denotes capture
q Can this be generalized?
17
q Launch condition enabled through scan initialization (non-functional).
q Next state after scan shift is not necessarily functional. Depends on synthesis of next state functional. Very likely only reachable for incompletely state machines.
q Reachable ? Functional. Former depends upon synthesis. Latter depends upon specification. For a completely specified machine, the two are same.
q Launch-off capture and launch-off shift patterns can both have non-functional launch states.
Functional vs. scan states in transition faults
18
q Transition fault pattern does not necessarily cause a transition in the capture flip-flop.¦ 0->1 transition on S is a valid transition fault test for A = 1 and B =1->0.¦ Fault-free output does not change: 1->1. Faulty o/p: 1 -> 0.
q Path delay pattern: A transition in capture flip-flop is guaranteed. However, not necessarily through a valid path.
q Path delay tests can be robust (single launch transition), non-robust (other enabling launch transitions), functional sensitisable (other enabling paths). Considerations for multi-cycle paths and false paths.
q Multiple cycle launches may be required to achieve a functional launch state.
Desirable Transitions
AS Z = A.S + B.S’
BS’
19
Clocking considerations
q Capture with slow-speed tester clock¦ Use of two time-sets¦ Use of on-chip clock generation
q Handling multiple clock domains for capture¦ Intra-domain test is simple
? One clock domain of interest at a time works fine.
¦ Inter-domain test of synchronous clocks? Either launch or capture in each domain§ ATPG with 2 cycles is simple
? Both launch and capture in each domain § ATPG with 2 cycles is complex due to edge alignment and LOS/LOC.§ Multi-time-frame sequential ATPG needed
¦ Inter-domain interaction of asynchronous clocks may be false, with the use of synchronizers / FIFOs
20
Generating at-speed clocks
q At-speed clocks for capture interleaved with slow-speed shift clocks can be generated from¦ Tester – high-cost tester that generates such a
precision¦ On-chip PLL – this is more likely
q Leaking two at-speed clocks from on-chip PLL
PLLCLK
LEAKER
Shift clock
Ref clock
Slow shift + Fast capture
SE
21
Test time optimization with multiple clock domains
q Scan chain considerations with mixing multiple clocks: similar to mixing posedge and negedge flops
q Test time limitations with standard scan chains¦ Capture on one domain => wasted shift cycles on all
other domain? Shift-per-clock-domain for testing intra-domain faults
¦ Capture across multiple domains ? Sequential multi-cycle (>2) ATPG => run-time intensive? Simpler two-cycle ATPG§ Masking inter-domain logic for correctness
? Handling test power issues with minimal test time impact§ Staggered capture across domains
22Process Spread
Nor
mal
ized
Fm
ax
Testing logic-memory interface
q Control or observe flops do not exercise the true speed paths¦ Poor ATPG effectiveness though the coverage may be
good
23
Testing logic-memory interface
q Scan-enable partitioning can be extended here as well¦ RAM-sequential ATPG for logic->memory paths
? Q fault RAM-sequential pattern: <load, W, W, R, R, C, unload>? Address fault RAM-sequential pattern: <load, W, W, R, C, unload>
¦ Generating transition becomes simpler with shift-only scan chains – similar to LOS
RAM
C
C
C
Q
ADR
D
ME
ADR
TADR
D
TD
ME
TME
Q
ATPG_MODE = 0BISTE = 0
F
F
F
F
B
F
F
F
B
F
Shift-onlyScan chain
CaptureScan chain
24
Testing memory-memory interface
q Need to exercise true memory paths during ATPG¦ Key idea: Maximal use of simpler non-functional (BIST or
shift) paths along non-targeted memory paths
@ M1: Wr.D1A1 Wr.
D2A2 Rd.A1
Rd.A2
Wr.CA Wr.
DB Rd.B@ M2:
Rd.A2
Capture1 2 3 4 5 6 7
M2
C
C
C
Q
ADR
D
ME
ADR
TADR
D
TD
ME
TME
Q
ATPG_MODE = 0BISTE = 0
F
F
F
F
B
F
F
F
B
M1
C
C
C
Q
ADR
D
ME
ADR
TADR
D
TD
ME
TME
Q
ATPG_MODE = 0BISTE = 1
B
B
B
B
B
B
B
B
B
B
B
B
Capture depth of 7 requiredfor M1 àM2 path
Can you generalizecapture-depth needed
for ‘M’ back-to-backmemories?
25
Hold delay ATPG
q A two-pattern ATPG <V1, V2> with one capture cycle¦ Slow capture clock is sufficient
q Path-delay fault model is simpler to translate setup path-delay patterns to hold path-delay patterns¦ Ensures that the path of interest (observing flip-flop)
has transition from one state to another
q Transition fault hold-delay patterns is not possible to be translated from setup-delay patterns¦ Transitions are not ensured at the observing flip-flop¦ Need to customized ATPG algorithm
? Similar as conventional setup delay, but one less clock pulse and launch state is observed.
26
Other references
q V. R. Devanathan: Novel Bi-partitioned Scan Architecture to Improve Transition Fault Coverage. Asian Test Symposium 2005.
q Hiroyuki Nakamura, et. al.: Low Cost Delay Testing of Nanometer SoCs Using On-Chip Clocking and Test Compression. Asian Test Symposium 2005.
q N. Ahmed, C.P. Ravikumar, M. Tehranipoor and J. Plusquellic: At-Speed Transition Fault Testing With Low Speed Scan Enable. VLSI Test Symposium 2005.
q V. R. Devanathan, C. P. Ravikumar, V. Kamakoti: Reducing SoC Test Time and Test Power in Hierarchical Scan Test : Scan Architecture and Algorithms. VLSI Design 2007.
q V. R. Devanathan, Srinivas Kumar Vooka: Techniques to improve memory interface test quality for complex SoCs. International Test Conference 2011.
27
Assignments
q Suggest a technique to achieve multiple independent at-speed scan enables for designs with only one slow-speed SE pin. (Hint: LOS)
q Express the total number of paths in the following circuit as a function of k?
q Would a test that detects str @ A also detect stuck-0 @ A? Similarly would a test for stuck-0 @ A also detect str @ A?
q For the circuit in slide 7, if the scan order is SI->f0->..fi->fi+1..->SO generate a test for str fault at output of f4 with LOC and LOS.
q For the below circuit what would be the worst-case capture depth to detect memory interface logic with the standard scheme? What if we use BIST paths?
M1 M2Q ADR QADR
M3ADR Q
M4ADR Q
M5ADR Q
M6ADR Q
TADR TD
BIST flop
Func.flop
Func.flop
BIST flop