integrated test data compression and core wrapper design for low-cost system-on-a-chip testing
DESCRIPTION
Integrated Test Data Compression and Core Wrapper Design for Low-Cost System-on-a-Chip Testing. Nicola Nicolici Electrical and Computer Engineering. Paul Theo Gonciari Bashir Al-Hashimi Electronic Systems Design Group. University of Southampton, UK. McMaster University, Canada. - PowerPoint PPT PresentationTRANSCRIPT
Integrated Test Data Compression and Core Wrapper Design for Low-Cost System-on-a-Chip Testing
Paul Theo Gonciari
Bashir Al-Hashimi
Electronic Systems Design
Group
University of
Southampton, UK
Nicola Nicolici
Electrical and Computer
Engineering
McMaster University,
Canada
Overview
• Low-cost system-on-a-chip test
• Single vs. multiple scan chains compression
• Proposed add-on architecture– TAM add-on architecture
• Core wrapper design• Reduce control and area overhead
– Design flow integration
• Experimental results
• Conclusion
Low-cost SOC test
• Problems– High volume of test data– Increased chip/ATE frequency ratio– Increased chip/ATE pin number ratio– Increased scan-power dissipation
High ATE costs and yield loss
Low-cost SOC test
• Solutions– Test data reduction– Reuse existing ATE technology– Exploit chip/ATE frequency ratio– Reduce pin count testing (RPCT)– Scan chain partitioning
TAM add-on architecture
Core Core
SOC
Low-cost solution for core based SOC test
TAM add-on
Overview
• Low-cost system-on-a-chip test
Single vs. multiple scan chains compression
• Proposed add-on architecture– TAM add-on architecture
• Core wrapper design• Reduce control and area overhead
– Design flow integration
• Experimental results
• Conclusion
Single scan chain TDC
si
so
Core
sync
AT
E H
ead
decoder
5 FF
SISR
counter
SOC
Single scan chain TDC (cont)
• Exploit test set regularities (e.g., runs of 0s)
• Based on coding schemes
• Exploit frequency ratio
• Synchronization overhead – temporal deserialization [Gonciari, ETW02]
– External clock synchronization– FIFO like structures
• High scan power due to the long scan chain
Multiple scan chain TDC
SIS
R
scan chain
scan chainscan chain
Core
WSC
XORNetwork
Core
scan chain
data in
ctrl
Multiple scan chain TDC (cont)• Exploit care bits sparseness
• Uses XOR based spreading networks
• Temporal pattern lockout– Extra control line– Doubles the volume of test data– Influences test application time
• Structural Pattern lockout – can influence fault coverage
• High scan power due to driving of all scan chains
Extend single scan chain TDC to multiple scan chains
Extend single scan chain TDC …
Use one decoder and shift register [Chandra, DATE02]
de
co
de
r
sh
ift reg
iste
r
scan chain
scan chain
scan chainCore
Use one decoder and shift register
• Loosened the ATE timing constraint – Exploitation of frequency ratio
• Reduce peek scan-power – Shift register buffering
• Synchronization overhead
• Decrease in compression ratio
– Unbalanced scan chains
– Test set rotation
Extend single scan chain TDC … (cont)Use one decoder per scan chain
[Chandra, TCAD01] [Gonciari, ETW02]
ctrl
ctrl
ctrl
dis
tr
dec1
dec2
dec3
scan chain
scan chain
scan chain
Core
Use one decoder per scan chain• Loosened the ATE timing constraint
– Exploitation of frequency ratio
• Reduced scan-power – Scan chain partitioning
• Good compression ratio– No test set rotation
• Reduced synchronization overhead
Increased area and control overhead
Large number of scan chains
Unbalanced scan chains
Low-cost SOC test• Solutions
– Test data reduction– Reuse existing ATE technology– Exploit chip/ATE frequency ratio– Reduce pin count testing (RPCT)– Scan chain partitioning
Use one decoder per scan chainIncreased area and control overhead
Large number of scan chains
Unbalanced scan chains
Overview
• Low-cost system-on-a-chip test
• Single vs. multiple scan chains compression
Proposed add-on architecture– TAM add-on architecture
• Core wrapper design• Reduce control and area overhead
– Design flow integration
• Experimental results
• Conclusion
TAM add-on architecture
Core Core
SOC
Low-cost solution for core based SOC test
TAM add-on
Core wrapper design
WSC2
WSC3
WSC1
WSC4Core
tb2tb3
tb4
tb1
Why core wrapper design ?
• WSC partitioning [Gonciari, VTS02]– Useless memory reduction– Easy control
Reducing control and area overhead
ctrl
ctrl
ctrl
ctrl
dis
tr
dec1
dec2
dec3
WSC
WSC
WSC
Coredec4 WSC
Instead of
Reducing control and area overhead …
WSC
WSC
WSC
WSC
Core
• WSC partitioning– 2 partitions– 1 control unit per partition– 1 decoder per partition
Exploit WSC partitioning for area and control reduction
Reducing control and area overhead …
WSCWSC
WSCWSC
• Control– Length of max scan chain– No of scan chains– Diff of partitions length
Easy control per partition
diff
length
no
WS
Cs
WSC
WSCdec1
Extended decoder (xDec) – input
decscan clk
data
length no WSCs
diff
Extended decoder (xDec) – output
WSC
WSC
dec
no WSCs
mu
x SISR
Extended distribution architecture
distr
xDec1m
ux SISR
Core
WSC
WSC
WSC
WSCxDec2
mu
x SISR
mu
x
xDistr
Extended distribution architecture …
Core
WSC
WSC
WSC
WSC
Core
WSC
WSC
WSC
WSC
Unequal partition size for some cores !!
Extended distribution architecture
xDec1 mu
x
xDec2
mu
x
add-on-xDistrm
ux
Core
WSC
WSC
WSC
WSC
WSC
WSC
WSC
WSC
Core
Multiple TAM SOC test
Core Core
2xSIS
R2xS
ISR
add
-on
add
-on Core Core
SOC
Design flow integration
Constr ?
TAM DesignTAM width
VTD constraints
Simulate
Extend CW
Change CW
NO
YES
1:k add-on-xDistr
Overview
• Low-cost system-on-a-chip test– Test data reduction– Synchronization overhead
• Single vs. multiple scan chains compression
• Proposed add-on architecture– TAM add-on architecture
• Core wrapper design• Reduce control and area overhead
– Design flow integration
Experimental results
• Conclusion
Minimum VTD vs. equal partitions
0
20000
40000
60000
80000
100000
120000
s13207 s15850 s35932 s38584
Vo
lum
e o
f te
st d
ata
Minimum VTD Equal partitions
Test bus = 16 Frequency ratio 2
0
20000
40000
60000
80000
100000
s13207 s35932 s38417 s38584
Vo
lum
e o
f te
st d
ata
Minimum VTD Equal partitions
Minimum VTD vs. equal partitions
Test bus = 16 Frequency ratio 4
add-on-xDistr vs. SSC
0
5000
10000
15000
20000
25000
4 8 16 24 32
Vo
lum
e o
f te
st d
ata
add-on-xDistr Single scan chain
Core s35932 Frequency ratio 2
add-on-xDistr vs. SSC
Core s35932 Frequency ratio 4
13000
14000
15000
16000
17000
4 8 16 24 32
Vo
lum
e o
f te
st d
ata
add-on-xDistr Single scan chain
add-on-xDistr vs. SSCSystem 1 Frequency ratio 2
Test bus 24 Reduction 19.29%
0
40000
80000
120000
160000
Vo
lum
e o
f te
st d
ata
Single scan chain add-on-xDistr
add-on-xDistr vs. SSCSystem 2 Frequency ratio 2
Test bus 24 Reduction 26.88%
0
40000
80000
120000
160000
Vo
lum
e o
f te
st d
ata
Single scan chain add-on-xDistr
Conclusion
• Low-cost solution for core based SOC test
• TAM add-on architecture
• Design flow integration
• Exploited core wrapper design features– Reduced control overhead– Reduced area overhead
• Reduced scan power through partitioning
• Small area overhead (3-4%) for System1,2
Test data reduction
dec
DIB
SO
SOCATE
CUTHead
• Aims– Volume of test data– Area overhead– Test application time
Generic on-chip decoder
CI
PG
ATE
scan clk
data in
ate clkData out
sync
• Serial decoder– PG and CI can not work independently– Implicit communication between PG and CI
• Parallel decoder– PG and CI can work independently– Explicit communication between PG and CI
Synchronization overhead
• Extensions to the DIB– Multiple ATE channels– Deserialization units– Latency FIFOs– Clock synchronization
Synchronization overhead (cont)
dec
DIB
SOCATE
CUTS
O
• New ATEs
Source synchronous buses
Require programming
Synchronization overhead (cont)
dec
DIB
SOCATE
CUTSO
Synchronization overhead (cont)
• Low-cost test through ATE reuse– Small area overhead increase
– Solution for entire chip test– Test application time reduction
dec
DIB
SOCATE
CUTSO
Synchronization overhead
• Old ATEs– Latency FIFO – Clock synchronization
0 2 3 4 5 6 71
PG
CI CISTOP CI
ATE clk
Chip clk
PG PG
On-chip SO solution 0 2 3 4 5 6 71
PG
CI CISTOP CI
ATE clk
Chip clk
PG PG
On-chip SO solution (cont)
Increased VTD and TAT
Exploit DUMMY bits and reduce VTD and TAT
0 2 3 4 5 6 71
PG
CI CIDUMMY CI
ATE clk
Chip clk
PG PG
On-chip SO solution (cont)
• Distribution unit
– Any number of cores
– Self synchronous architecture
PG2
0 2 3 4 5 6 71
PG1
CI1 CI1CI2 CI1
ATE clk
Chip clk
PG1 PG1
dis
tr
dec1
dec2
XOR-network %tpl
0
10
20
30
40
50
60
70
80
90
0 5 10 15 20 25
Care bit density
% t
em
po
ral p
atte
rn lo
cko
uts
W=8 W=16 W=24 W=32
S38417: VTD / TAT for w = 32
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 5 10 15 20 25Care bit density
VT
D/T
AT
add-on-xDistr(32,6) XOR-network(32) VTD XOR-network(32) TAT
S35932: VTD / TAT for w = 32
0
20000
40000
60000
80000
100000
120000
0 20 40 60 80Care bit density
VT
D/T
AT
add-on-xDistr(32,6) XOR-network(32) VTD XOR-network(32) TAT