[ieee 2010 ieee international test conference (itc) - austin, tx, usa (2010.11.2-2010.11.4)] 2010...

10
Towards Effective and Compression-friendly Test of Memory Interface Logic V.R. Devanathan 1 , Alan Hales 2 , Sumant Kale 2 , Dharmesh Sonkar 1 1 Texas Instruments (India) Pvt. Ltd., Bangalore 560093, India 2 Texas Instruments Inc., Dallas, TX 75266, USA {vrd, alanh, sumant kale, dharmesh}@ti.com Abstract— Cost and time-to-market considerations are strongly driving the need to improve the effectiveness of structural patterns for speed/voltage binning. In this paper we focus on improving the quality of testing memory interface paths for speed/voltage-binning. We propose DFT schemes that propagate faults through the memory that are effective with test com- pression. We also propose memory architectural enhancements to improve the effectiveness of ATPG patterns for Fmax iden- tification. Both synchronous and asynchronous memories are targeted. Experimental results on an industrial ASIC core show the effectiveness of the proposed schemes with test compression. Initial silicon results from a 40-nm testchip is also presented and it proves that Fmax using the proposed scheme is very close to that of functional patterns, while Fmax using conventional schemes are more than 2X higher than that of functional patterns. I. I NTRODUCTION Increasing process variation and design marginality have resulted in power-performance spread of manufactured dies in deep sub-micron technologies. It was seen in [1] that even for a 180-nm technology node, process variation can result in about 30% variation in performance and as much as 20X variation in leakage. Such variation in performance has resulted in wide use of speed-binning for microprocessors to increase the yield. On the other hand, meeting the functional performance specification is the only criterion for ASICs to be classified as “good” dies. Static Adaptive Voltage Scaling (AVS) or similar “voltage binning” schemes [2] may be used in ASICs to shrink the power-performance spread by controlling the supply voltage to increase the yield and/or reduce the power of “good” dies. For ICs using speed / voltage binning, it is essential that delay tests are effective in detecting gross and subtle delay defects, and are accurate in determining the speed of the die to identify the correct bin. Traditionally, functional tests have been used for speed binning. Figure 1 shows an Fmax (maximum passing test frequency) plot for various kinds of test patterns for a 90-nm 15 million-gate TI System-on-Chip (SoC) with 6 DSP cores. This data was collected across a large sample of dies. It may be noted from the figure that structural (path-delay and transition- fault ATPG) patterns have significantly higher Fmax compared to functional patterns. It may also be noted that Fmax of path- delay patterns is higher than that of transition fault patterns. Fmax ascertained from path-delay patterns is a direct rep- resentation of choice of paths provided to the ATPG tool (i.e. closeness of the selected paths to the actual timing- critical/speed-limiting paths found in silicon across dies), and Path Delay Patterns Transition Fault Patterns Functional Patterns 1 Functional Patterns 2 Normalized Fmax 1.0 0.5 0.75 Path Delay Patterns Transition Fault Patterns Functional Patterns 1 Functional Patterns 2 Normalized Fmax 1.0 0.5 0.75 Fig. 1. Fmax plot of various patterns for a 90-nm SoC the effectiveness of the ATPG tool in testing the provided paths. Fmax ascertained from transition fault patterns is a rep- resentative of the path exercised by the ATPG tool compared to the worst-case functional critical/speed-limiting path in the design. Second order effects such as switching activity/IR- drop induced delay may also impact Fmax. Lack of correlation of paths detected by path-delay patterns to that of the actual timing-critical paths along with second-order effects seem to be the reason for higher Fmax with path-delay patterns. For this SoC, most of the timing-critical paths reported by STA (Static Timing Analysis) happened to be along the logic interfacing with memories. Similar observation on timing- criticality of memory interface paths have also been noted in literature [3], [4]. Due to differences in the paths and their timings exercised by functional and structural patterns for the memory-interface logic (explained later in Section II), Fmax of structural patterns was found to be significantly higher than that of functional patterns. At the same time, it was also noted in [5] that for designs with timing-critical paths along digital logic (and not on memory interface) structural patterns were indeed found effective for speed binning. Unfortunately, generating robust functional patterns with good coverage is prohibitively effort-intensive. It is not un- common for a large SoC to take many person-years of effort to generate and validate functional patterns, and also ensure that they work on tester/silicon. There is hence a strong need to improve structural patterns for speed binning. In this paper, we focus on DFT and memory design tech- niques to improve the effectiveness of structural patterns for Paper 4.2 978-1-4244-7207-9/10/$26.00 c 2010 IEEE INTERNATIONAL TEST CONFERENCE 1

Upload: dharmesh

Post on 30-Mar-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2010 IEEE International Test Conference (ITC) - Austin, TX, USA (2010.11.2-2010.11.4)] 2010 IEEE International Test Conference - Towards effective and compression-friendly test

Towards Effective and Compression-friendlyTest of Memory Interface LogicV.R. Devanathan1, Alan Hales2, Sumant Kale2, Dharmesh Sonkar1

1 Texas Instruments (India) Pvt. Ltd., Bangalore 560093, India2 Texas Instruments Inc., Dallas, TX 75266, USA

{vrd, alanh, sumant kale, dharmesh}@ti.com

Abstract— Cost and time-to-market considerations are stronglydriving the need to improve the effectiveness of structuralpatterns for speed/voltage binning. In this paper we focus onimproving the quality of testing memory interface paths forspeed/voltage-binning. We propose DFT schemes that propagatefaults through the memory that are effective with test com-pression. We also propose memory architectural enhancementsto improve the effectiveness of ATPG patterns for Fmax iden-tification. Both synchronous and asynchronous memories aretargeted. Experimental results on an industrial ASIC core showthe effectiveness of the proposed schemes with test compression.Initial silicon results from a 40-nm testchip is also presented andit proves that Fmax using the proposed scheme is very closeto that of functional patterns, while Fmax using conventionalschemes are more than 2X higher than that of functional patterns.

I. INTRODUCTION

Increasing process variation and design marginality haveresulted in power-performance spread of manufactured dies indeep sub-micron technologies. It was seen in [1] that even for a180-nm technology node, process variation can result in about30% variation in performance and as much as 20X variationin leakage. Such variation in performance has resulted inwide use of speed-binning for microprocessors to increase theyield. On the other hand, meeting the functional performancespecification is the only criterion for ASICs to be classifiedas “good” dies. Static Adaptive Voltage Scaling (AVS) orsimilar “voltage binning” schemes [2] may be used in ASICsto shrink the power-performance spread by controlling thesupply voltage to increase the yield and/or reduce the powerof “good” dies. For ICs using speed / voltage binning, it isessential that delay tests are effective in detecting gross andsubtle delay defects, and are accurate in determining the speedof the die to identify the correct bin.

Traditionally, functional tests have been used for speedbinning. Figure 1 shows an Fmax (maximum passing testfrequency) plot for various kinds of test patterns for a 90-nm15 million-gate TI System-on-Chip (SoC) with 6 DSP cores.This data was collected across a large sample of dies. It may benoted from the figure that structural (path-delay and transition-fault ATPG) patterns have significantly higher Fmax comparedto functional patterns. It may also be noted that Fmax of path-delay patterns is higher than that of transition fault patterns.

Fmax ascertained from path-delay patterns is a direct rep-resentation of choice of paths provided to the ATPG tool(i.e. closeness of the selected paths to the actual timing-critical/speed-limiting paths found in silicon across dies), and

Path Delay Patterns

Transition Fault PatternsFunctional Patterns 1Functional Patterns 2

Nor

mal

ized

Fm

ax

1.0

0.5

0.75

Path Delay Patterns

Transition Fault PatternsFunctional Patterns 1Functional Patterns 2

Nor

mal

ized

Fm

ax

1.0

0.5

0.75

Fig. 1. Fmax plot of various patterns for a 90-nm SoC

the effectiveness of the ATPG tool in testing the providedpaths. Fmax ascertained from transition fault patterns is a rep-resentative of the path exercised by the ATPG tool comparedto the worst-case functional critical/speed-limiting path in thedesign. Second order effects such as switching activity/IR-drop induced delay may also impact Fmax. Lack of correlationof paths detected by path-delay patterns to that of the actualtiming-critical paths along with second-order effects seem tobe the reason for higher Fmax with path-delay patterns.

For this SoC, most of the timing-critical paths reported bySTA (Static Timing Analysis) happened to be along the logicinterfacing with memories. Similar observation on timing-criticality of memory interface paths have also been noted inliterature [3], [4]. Due to differences in the paths and theirtimings exercised by functional and structural patterns for thememory-interface logic (explained later in Section II), Fmaxof structural patterns was found to be significantly higher thanthat of functional patterns. At the same time, it was also notedin [5] that for designs with timing-critical paths along digitallogic (and not on memory interface) structural patterns wereindeed found effective for speed binning.

Unfortunately, generating robust functional patterns withgood coverage is prohibitively effort-intensive. It is not un-common for a large SoC to take many person-years of effortto generate and validate functional patterns, and also ensurethat they work on tester/silicon. There is hence a strong needto improve structural patterns for speed binning.

In this paper, we focus on DFT and memory design tech-niques to improve the effectiveness of structural patterns for

Paper 4.2978-1-4244-7207-9/10/$26.00 c⃝2010 IEEE

INTERNATIONAL TEST CONFERENCE 1

Page 2: [IEEE 2010 IEEE International Test Conference (ITC) - Austin, TX, USA (2010.11.2-2010.11.4)] 2010 IEEE International Test Conference - Towards effective and compression-friendly test

delay-defect testing of memory interface logic that works withtest compression. The paper is organized as follows. SectionII provides the background of the problem and discusses theprior work and issues posed by test compression. Section IIIproposes compression-friendly test schemes to improve thememory-interface test quality. Section IV discusses challengeswith asynchronous memories and proposes novel memoryarchitecture and test schemes for the same. The effectivenessof the proposed techniques is shown with experimental resultson industrial SoCs with commercial ATPG tools in Section V.Section VI concludes the paper.

II. LIMITATIONS OF STRUCTURAL PATTERNS

A. Background

Figure 2 shows a block diagram of a typical synchronousmemory architecture that supports memory BIST and scan test.The memory array (shown in the middle) consists of bit-cellsand the related controls to read from and write into them.ADR denotes the address; D denotes the data; ME, WE andCLK denote memory-enable, write-enable, and the clock to thememory array, respectively. Q mem denotes the output fromthe memory array. Memory BIST and scan test are supportedusing a collar/wrapper over the memory array. In this paper,we use the term “memory array” to refer to the array of bit-cells along with its relevant read/write controls, while we usethe term “memory” to refer to the memory array along withthe BIST and scan collar. Dedicated functional and test inputports are available at the memory for functional operation andmemory BIST, respectively. These are multiplexed (muxed)based on a test mode signal (called BIST MODE in Figure2). Detailed controls and other ports are not shown in thefigure for ease of illustration. To enable full controllability andobservability of the memory interface logic during ATPG, thememory inputs have dedicated scan collar flip-flops to observethe inputs, and the memory output is controlled by a scanflip-flop. Typically, a dedicated port (called ATPG MODE inFigure 2) is used to control the output of the memory duringATPG, functional and BIST modes. Memory output is drivenby the memory array for both functional and memory BIST.In both functional and memory BIST modes, ATPG MODEport is de-asserted. During ATPG mode (both scan shift andcapture cycles), ATPG MODE is asserted to enable scan flip-flop to control the memory output. When ATPG MODE isasserted, the output from the memory array is bypassed. Thememory array may also be disabled for power reduction duringATPG, as seen by the gating on memory enable (ME) withATPG MODE in Figure 2.

Figure 3 shows a typical memory use scenario with var-ious paths exercised during functional and test modes. Thefunctional mode input (output) path is from (to) functionalflip-flops, marked ‘F’, through the functional interface portto (from) the memory array. This is illustrated with an ar-row marked as FuncPath. Similarly, the memory BIST input(output) path is from (to) memory BIST controller flip-flops,marked ‘B’, through the test interface port to (from) thememory array. This is indicated with an arrow marked asBISTPath in Figure 3. On the other hand, the ATPG inputpath starts from either functional or BIST controller flip-flopand terminates at memory scan collar flip-flop. Similarly, the

MemoryArray

Q_mem

D

CLK

ME

FF

WE

ADR

FF

Q

CLK

D

TD

WE

TWE

ME

TME

ATPG_MODEADR TADRBIST_MODE

FF

FF

1

0

1

0

1

0

10

0

1

MemoryArray

Q_mem

D

CLK

ME

FFFF

WE

ADR

FFFF

Q

CLK

D

TD

WE

TWE

ME

TME

CLK

D

TD

WE

TWE

ME

TME

ATPG_MODEADR TADRBIST_MODE

FFFF

FFFF

1

0

1

0

1

0

1

0

1

0

1

0

10

0

1

Fig. 2. Block diagram of a memory with scan and BIST collar

Functional Path (FuncPath)

Memory BIST Path (BistPath)

ATPG Path (AtpgPath)

LEGEND

Mem bist

Mem bist

D

TD

Q

D

Q_mem

ADR

TADR

ADR

CLK

F

B

F

B

B

F

functional

functional

functional

Mem bist

MemoryArray

Functional Path (FuncPath)

Memory BIST Path (BistPath)

ATPG Path (AtpgPath)

LEGEND

Mem bist

Mem bist

D

TD

Q

D

Q_mem

ADR

TADR

ADR

CLK

FF

BB

FF

BB

BB

FF

functional

functional

functional

Mem bist

MemoryArray

MemoryArray

MemoryArray

Fig. 3. Paths exercised during memory use/test modes

ATPG output path starts from the memory scan collar flip-flopand terminates at either functional or BIST controller flip-flop.ATPG path is indicated with a dashed arrow in Figure 3.

It may be seen from the figure that neither the path exercisedin memory BIST nor ATPG matches the functional path. Asmemory BIST is primarily targeted to identify defects withinthe memory, ATPG patterns must be additionally used to targetthe memory interface logic. To ensure that ATPG has similarFmax as that of functional mode, it is necessary to ensure thatATPG exercises the functional path or an identically timedpath. Further, the read access time also varies for differentwords within the memory array. Hence, it is also essential thatATPG targets the path with worst-case functional timing. Priorwork and challenges are discussed further in this section.

B. Related Work

Various challenges in testing memories were discussed in[6], [7]. In [6], clocking and other controllability issues forlogic within memories using scan patterns were discussed.

Paper 4.2 INTERNATIONAL TEST CONFERENCE 2

Page 3: [IEEE 2010 IEEE International Test Conference (ITC) - Austin, TX, USA (2010.11.2-2010.11.4)] 2010 IEEE International Test Conference - Towards effective and compression-friendly test

TABLE I

TEST COVERAGE ON MEMORY OUTPUTS FOR AN INDUSTRIAL DESIGN

ATPG Pattern Test Test # Test ATPGTool Type Mode Coverage (%) Patterns run-time (s)

Tool ATransition Fault

Compression bypass 100 103 39With compression 100 256 67

RAM-sequentialCompression bypass 99.7 199 4095With compression 70.13 489 8257

Tool BTransition Fault

Compression bypass 100 50 51With compression 100 51 60

RAM-sequentialCompression bypass 99.8 87 2274With compression 37.7 200 > 22 Hrs

Rules were proposed for memory design and scan test. It wasobserved that even with numerous such rules, generating scanpatterns through memory remained a challenging task. In [7], itwas noted that even with structural patterns and custom BISTpatterns for memories, functional patterns were still neededfor speed binning. Detailed correlation between functional andstructural patterns were analyzed for speed binning a processorin [4]. Based on comparison of Fmax data between functionalpatterns, flip-flop-to-flop transition fault patterns, transitionfault patterns through memories, path-delay patterns and otherstructural and memory BIST patterns, it was concluded thattransition fault patterns through the memory array had bestcorrelation with functional patterns. Sequential ATPG throughmemories was discussed in [8]. It considers memories withoutscan/BIST collar and discusses various design care-aboutsneeded for enabling ATPG through memories.

Commercial tools, such as FastScan𝑇𝑀 [9] andTetraMaX𝑇𝑀 [10], support RAM-sequential ATPG whereinATPG propagates faults through the memories to detectfaults in the memory-interface logic. These tools recommenddesign guidelines for controllability of memory clocks,enables and other control signals for pattern generation.RAM-sequential pattern works as follows. During scan shiftoperation, the scan flip-flops that control the memory areinitialized for subsequent memory operations. During scancapture, write and read operations are performed at-speed.Faults are detected only by observing the result of the finalread operation from memory at the functional scan flip-flops.RAM-sequential ATPG has following characteristics: (a)Inherent unknowns (Xs) at the memory output during scancapture, (b) Increased sequential depth and its associatedconstraints for the ATPG tool, and (c) Multiple scan loads(scan shifts and captures) within a pattern to control memory.Use of test compression has direct impact on (a) and (b).Decompressor effectiveness is limited by the satisfiabilityof sequential constraints from (b), while the compactoreffectiveness is limited by the amount of Xs from (a).

Table I provides test coverage, pattern count and ATPGrun-times for a 40-nm, 2.5 million gate industrial ASIC coreusing two different commercial ATPG tools with X-toleranttest compression. Details of this testcase is available in SectionV. To understand the impact of test compression, resultswith test compression and with bypassing the compressionare compared. Rows with “Transition fault” pattern type de-note the result from conventional two-cycle launch-off-capture[11] transition fault ATPG using memory collar scan flip-flops to control and observe, bypassing the memory array(ATPG MODE=1). It may be noted that both compression

and bypass modes ensure complete transition fault coverageof the memory output faults, with both the ATPG tools. Rowswith “RAM-sequential” pattern type denote the result froma 5-cycle sequential ATPG through the memory array withATPG MODE=0. An example of such a pattern is illustrated inFigure 4 (Section III-A). Compression data is with enabling X-masking logic. To simplify this experiment, only faults at thememory output interface were targeted. Following points areworth noting: (a) RAM-sequential ATPG result in increasedpattern count and run-time compared to conventional transitionfault ATPG. This is due to increased sequential depth and faultpropagation complexities through the memory array, (b) Com-mercial ATPG tools are effective with uncompressed/bypasspatterns for RAM-sequential ATPG, (c) Commercial ATPGtools have poor coverage and increased pattern count withtest compression. While RAM-sequential ATPG bypassingtest compression provides good output fault coverage, it isineffective for testing input faults, for reasons described laterin Section III-B.

Further, for RAM-sequential ATPG to be effective forspeed-binning, it is essential that the slowest word is also ac-cessed. It is seen that words within an array do not have sameaccess time. The access time of a memory cell depends uponthe clock to wordline access time and the wordline to outputaccess time [12]. Both of these parameters depend upon thephysical location of the word with respect to the clock and theoutput. It is also not uncommon to see differences in the orderof hundreds of pico-seconds between the slowest and fastestwords, for large instances. To summarize, the main care-aboutsfor effective RAM-sequential ATPG on synchronous memoriesare: (a) eliminating Xs at the memory output during scancapture, and (b) accessing the slowest word.

On the other hand, asynchronous memories/register filesallow the data addressed by the read address to be asyn-chronously available at the output. We have also observedon many design instances that such asynchronous read paththrough memory happens to be timing-critical. For such mem-ories, the main issues are: (a) eliminating Xs at the memoryoutput during scan capture, (b) accessing the slowest word,and (c) testing the true asynchronous read path.

III. IMPROVING INTERFACE TEST FOR SYNCHRONOUS

MEMORIES

In this section, we propose schemes to improve variousaspects of RAM-sequential test for synchronous memories.Firstly, we propose schemes to eliminate Xs at the memoryoutput during scan capture. Next, we propose schemes thatare effective in detecting memory input faults and also target

Paper 4.2 INTERNATIONAL TEST CONFERENCE 3

Page 4: [IEEE 2010 IEEE International Test Conference (ITC) - Austin, TX, USA (2010.11.2-2010.11.4)] 2010 IEEE International Test Conference - Towards effective and compression-friendly test

CLK

ATPG_MODE

SE

D

ME

WE

Q xxxxxxxxx

D1 D2

D1 D2

Write (D1) @ A1

ADR A1 A2

Write (D2) @ A2

A1 A2

Read @ A1

Read @ A2

V1 V2 V3 V4 V5

V1 :

V2 :

V3 :

V4 :

Capture D2V5 :

LEGEND

1

0

0

1

Scan shift Scan capture Scan shift

CLK

ATPG_MODE

SE

D

ME

WE

Q xxxxxxxxx

D1 D2

D1 D2

Write (D1) @ A1

ADR A1 A2

Write (D2) @ A2

A1 A2

Read @ A1

Read @ A2

V1 V2 V3 V4 V5

V1 :

V2 :

V3 :

V4 :

Capture D2V5 :

LEGEND

1

0

0

1

CLK

ATPG_MODE

SE

D

ME

WE

Q xxxxxxxxxxxxxxxxxx

D1 D2

D1 D2

Write (D1) @ A1

ADR A1 A2

Write (D2) @ A2

A1 A2

Read @ A1

Read @ A2

V1 V2 V3 V4 V5

V1 :

V2 :

V3 :

V4 :

Capture D2V5 :

LEGEND

1

0

1

0

0

1

Scan shift Scan capture Scan shift

Fig. 4. Clocking sequence for conventional ram sequential pattern

MemoryArray

Q_mem

D

CLK

ME

FF

WE

ADR

FF

Q

CLK

D

TD

WE

TWE

ME

TME

ATPG_MODEADR TADRBIST_MODE

FF

FF

1

0

1

0

1

0

10

LA

0

1

f

CLK

f =1, If ( (SE = 1) or Memory-read-operation )

0, Otherwise

MemoryArray

Q_mem

D

CLK

ME

FFFF

WE

ADR

FFFF

Q

CLK

D

TD

WE

TWE

ME

TME

ATPG_MODEADR TADRBIST_MODE ATPG_MODEADR TADRBIST_MODE

FFFF

FFFF

1

0

1

0

1

0

1

0

1

0

1

0

10

LALA

0

1

0

1

f

CLK

f =1, If ( (SE = 1) or Memory-read-operation )

0, Otherwise

Fig. 5. Modified block diagram of a synchronous memory

the slowest word for test. Lastly, we propose scan architectureimprovements to reduce test time.

A. Memory output initialization

Figure 4 shows a sample RAM-sequential ATPG clockingsequence with 5 capture cycles: Write (D1) @ A1, Write(D2) @ A2, Read (D1) @ A1, Read(D2) @ A2, Capture.The first and second write cycles (𝑉1 and 𝑉2) initialize thememory words A1 and A2. The third read cycle (𝑉3) initializesthe memory output with D1. The fourth read cycle (𝑉4)reads the inverse and induces a (D1 → D2) transition at thememory output. The last cycle (𝑉5) captures the transition ata functional scan flip-flop. It may be noted that ATPG MODEsignal is de-asserted throughout the pattern generation for thememory array to drive the memory output port. It may also benoted that the output of memory is unknown (X) during 𝑉1 and𝑉2 as the memory output is initialized only at 𝑉3 upon a validread operation. The Xs generated during 𝑉1 and 𝑉2 propagateto the subsequent functional logic during 𝑉3, 𝑉4 and 𝑉5. Atthe end of 𝑉5, Xs continue to remain in the scan flip-flops.

ATPG_MODE

Initialize D1 at Q (via last shift) Read @ A1V0 : V2 :

Capture D2V3 :

LEGEND

Write (D2) @ A1V1 :

CLK

SE

D

ME

WE

Q

D2

ADR A1

V1 V2 V3

1

0

0

1

1

D1 D2

V0

Scan shift Scan capture Scan shift

ATPG_MODE

Initialize D1 at Q (via last shift) Read @ A1V0 : V2 :

Capture D2V3 :

LEGEND

Write (D2) @ A1V1 :

CLK

SE

D

ME

WE

Q

D2

ADR A1

V1 V2 V3

1

0

0

1

1

D1 D2

V0

ATPG_MODE

Initialize D1 at Q (via last shift) Read @ A1V0 : V2 :

Capture D2V3 :

LEGEND

Write (D2) @ A1V1 :

CLK

SE

D

ME

WE

Q

D2

ADR A1

V1 V2 V3

1

0

0

1

1

D1 D2

V0

CLK

SE

D

ME

WE

Q

D2

ADR A1

V1 V2 V3

1

0

1

0

0

1

1

D1 D2

V0

Scan shift Scan capture Scan shift

Fig. 6. Modified clocking sequence for ram sequential pattern

To prevent Xs during scan capture, it is essential to initializethe memory output at the end of scan shift operation. Figure5 shows a modified synchronous memory architecture with alatch driving the output (compared to Figure 2). It may benoted that the latch is after the ATPG mux and is designed tobe transparent during read and scan shift operations (i.e. clock𝑓 to the Q latch is 1 during scan shift and memory read).At the end of scan shift, the latch is initialized with the lastvalue shifted into the scan flip-flops (that observe data - D/TD)and holds it until the next memory read operation or scanshift. Apart from preventing Xs, initializing memory outputat the end of scan shift has another benefit in reducing thesequential depth. Figure 6 shows the modified ATPG clockingsequence with this memory architecture, when ATPG MODEis 1 during scan shift and 0 during scan capture. For detectingfaults at memory output (Q), it is enough to have patternswith 3 capture cycles (or sequential depth 3): {Write D2 @A1 (Q is initialized to D1 at end of scan shift); Read D2 @A1; Capture D1→D2}. Similarly, a sequential depth of 4 issufficient for detecting faults on memory input ports.

Though ATPG MODE switches at the end of shift, tran-sition is induced at the Q latch only upon a valid readoperation. It may be noted that no back-to-back read operationis performed at the memory array for output fault detection.The intent of this RAM-sequential pattern is only to test theinterface logic for delay faults and not the memory itself.Hence, it is sufficient to initialize the Q latch from scan shiftand use the memory read to induce transition at the output.However, if memory has separate write and read addresses,back-to-back reads at different addresses may be required.Even is such cases, initializing Q latch from scan shift isrequired to prevent Xs.

This is a minor modification to the memory design andaddition of the output latch and its associated clock controlresults in a small area overhead in the memory IP. AsATPG MODE port now switches during ATPG, care should betaken during memory design to ensure that ATPG MODE hastiming constraints similar to that of scan enable (SE). The SoCdesigners should ensure that ATPG MODE is timing closed for

Paper 4.2 INTERNATIONAL TEST CONFERENCE 4

Page 5: [IEEE 2010 IEEE International Test Conference (ITC) - Austin, TX, USA (2010.11.2-2010.11.4)] 2010 IEEE International Test Conference - Towards effective and compression-friendly test

switching (at slow speed) during ATPG. The proposed latch atthe output is transparent during shift, but not during capture.A limitation with such a latch is that not all commercial ATPGtools are effective in understanding non-transparent latcheswith test compression when testing through memories.

B. Two-pass ATPG approach for effective input fault coverage

Conventionally, faults at memory input paths are observedat the memory collar scan flip-flop. To identify a limitationwith this approach, let us consider the address input of asynchronous memory that operates at rising edge of the clock.Address pre-decoding is widely used to enhance performancein the functional mode of operation [12], [13]. Pre-decodingof address happens when the clock is low so that rising edgeof clock latches the pre-decoded addresses to generate thewordline as early as possible. So, it is essential for the addressto stabilize much before the pre-decode operation starts. Thisis reflected in the setup timing constraints for the memoryaddress port. Unfortunately, ATPG captures the address onlyat the rising edge of the clock. This allows addresses to bedelayed for a much longer duration during ATPG even withsuccessful capture at the collar scan flip-flop. This implies thatany delay defect in the functional logic for address generation,within the size of the pre-decode logic, cannot be detected bycapturing at the observe flip-flop. So, such pseudo coverage ofthe memory input paths obtained by capturing at the observescan flip-flops should be prevented during RAM-sequentialATPG. Ideally, the memory collar scan flip-flops should bedesigned to prevent scan capture when ATPG MODE = 0and SE = 0. Unfortunately, few vendor memories used byus did not have this feature. So, we explored various ATPGworkarounds to address this limitation.

The simplest mechanism to prevent observe at memory col-lar scan flip-flop is to exclude the memory collar scan flip-flopsfrom the scan chains during RAM-sequential ATPG. But, theproposed scheme in Section III-A requires the memory collarscan flip-flop to initialize the memory output latch at the endof scan shift and hence, cannot be excluded from scan chainfor RAM-sequential ATPG. Another alternative is to constrainthe ATPG tool to mask (ignore the captured value at) thesememory collar scan flip-flops to prevent pseudo coverage. But,the drawback of this scheme is that masking flip-flops with testcompression would result Xs for the compactor, impacting testcoverage and/or pattern count.

We must ensure that the ATPG tool does not detect faultsby observing at scan flip-flops, while still including them inscan chain and not masking them in ATPG. To achieve this,we propose a two-phase ATPG approach illustrated in Figure7. In the first phase, we generate RAM-sequential patterns(using the clocking sequence of Figure 6) with a modifiedATPG model. The ATPG model is modified (by Script #1 ofthe Figure 7) in such a way that the clock to the memorycollar scan flip-flops is (incorrectly) turned off during capturecycle to prevent observability. As a result, ATPG cannotobserve at the scan flip-flop and instead, uses paths throughthe memory array for detecting faults at the memory inputinterface. Further, this ATPG run does not generate Xs asthe output is initialized by scan shift and the clock is turnedoff during capture forcing the scan flip-flops to retain the

ATPG Models

ATPG – 1st pass(For true coverage through memories)

Intermediate Test Patterns

Script #1(Turn-off memory scan flop clocks during capture)

Capture-gatedATPG Models

ATPG – 2nd pass(Good-simulation of

input patterns)

Final Test Patterns(for tester)

Design Netlist(s)

ATPG Models

ATPG – 1st pass(For true coverage through memories)

Intermediate Test Patterns

Script #1(Turn-off memory scan flop clocks during capture)

Capture-gatedATPG Models

ATPG – 2nd pass(Good-simulation of

input patterns)

Final Test Patterns(for tester)

Design Netlist(s)

Fig. 7. Two-phase ATPG for input fault detection

shifted-in value. But, these patterns fail simulation as the scanflip-flops, in reality, get updated during capture cycle withthe new data. The second phase of the pattern generationis used to get the correct final test pattern. In the secondphase, these incorrect patterns are read back into the ATPGtool with the original/correct ATPG model. These patterns arenow subjected to good simulation within the ATPG tool wherethe correct output response is re-calculated and the correctedpattern is generated. It may be noted that the coverage from thefirst phase with incorrect model is still valid with the patternfrom the second phase. This is because, the coverage wasachieved by detecting the input faults through the memoryarray and observing memory outputs at functional scan flip-flops. The second phase was only used to calculate the correctoutput response to strobe for. A point worth noting is thatATPG model used by us in the first phase turned off scanclocks to prevent capture. Constraining these scan flip-flops tobe shift-only (by tying the scan enable to 1) may also be usedto prevent capture.

The limitation from this scheme is increased ATPG run-timedue to two ATPG runs and additional ATPG models requiredfor each memory configuration.

C. Enhanced two-pass ATPG for worst-case read access time

ATPG tools detect faults at the memory interface by sensi-tizing and propagating transition through each of data, address,control and output ports. But ATPG tool does not considerthe memory access time variance for detecting faults on theoutput. As the read access time for a word significantly variesdepending on its location, we constrain the ATPG tool togenerate additional patterns exercising the slowest word forworst-case delay.

Figure 8 illustrates the modifications to the memory modelto force the ATPG tool to use the slowest word. We control thememory-enable (ME) to enable the memory only if the mostsignificant bits of the address satisfies a pattern (say 𝑃 , for theslowest word). We rely on the following observation to arrive

Paper 4.2 INTERNATIONAL TEST CONFERENCE 5

Page 6: [IEEE 2010 IEEE International Test Conference (ITC) - Austin, TX, USA (2010.11.2-2010.11.4)] 2010 IEEE International Test Conference - Towards effective and compression-friendly test

MemoryArray

ME

ME

TME 1

0

ADR

ADR

TADR

FF

1

0

ATPG_MODE

(ADR == P )

CLK

CG

FF

CLKSE

Clock to memory collar scan flops

MemoryArray

ME

ME

TME 1

0

1

0

ADR

ADR

TADR

FFFFFF

1

0

1

0

ATPG_MODE

(ADR == P )

CLK

CG

FFFFFF

CLKSE

Clock to memory collar scan flops

Fig. 8. Memory model updates for fault propagation along slowest word

at 𝑃 . It is common for memory words to be laid out in rows ofC words each, where C denotes the column-mux-factor. It isalso common that bits within a word are interleaved with thecorresponding bits of other words in the same row. In suchcases, it would be sufficient to ensure that any word in theslowest row is chosen for the worst-case read.

Following is an example to illustrate the proposed scheme.Let us assume the slowest word to correspond to the maximumaddressed row. Let us also assume that the column-mux-decode bits (for words within a row) to be the least significantaddress bits. Now, for a compiler with 640 words (maximumaddress is 1001111111, in binary) and a column-mux-factor of4, the slowest row corresponds to the row address 10011111,in binary. For the proposed scheme, the memory enable isgated with the 8 most significant bits (MSB) of address insuch a way that memory is disabled if the address MSB is not10011111. With such a constraint, access to the memory bythe ATPG tool would always correspond to the slowest row.

Figure 8 also integrates the modification of Section III-Bto gate the clocks to the memory collar scan flip-flops duringscan capture, using a clock-gate (CG) that is enabled onlywhen SE=1. The model is also modified to ensure that thememory array output is valid during read operation only whenthe address satisfies the pattern 𝑃 . The two-pass ATPG ofSection III-B is enhanced to first generate patterns with themodel constraining the address to be the slowest word andthen re-simulating these patterns within the ATPG tool withthe correct model to deliver patterns for silicon/tester.

It is clear that, in this mode of test, only one row ofmemory is accessed by the ATPG tool for targeting the slowestword. Additional patterns with unconstrained addresses arestill required to fully target faults on the functional logic coneof memory address port.

While this scheme has all the limitations of the two-pass ATPG scheme of Section III-B, it may be noted thatthese schemes do not impact the functional design with anyadditional gating.They only rely on modified ATPG modelto constrain pattern generation to happen through the slowestword and not capture at observe scan collar flip-flops.

D. Scan architectural optimizations for test time and coverageimprovement

RAM-sequential patterns are additional patterns over theconventional transition fault and path-delay structural patterns.These patterns require multiple capture cycles and ATPG tooloften ends up with multiple scan loads/shifts for a pattern toensure required memory controls. As a result, they impacttest time and pattern volume. In this section, we proposescan architecture schemes that aim to minimize the test timeincrease with RAM-sequential patterns.

Figures 4 and 6 showed that RAM-sequential ATPG in-volves multiple capture cycles in conventional launch-off-capture mode. For ram-sequential pattern with 3 capturecycles, faults need to propagate from upto two levels ofsequential logic to translate to the required controls at thememory. To simplify fault sensitization for the ATPG tool,we propose a scan architecture in which only one level ofsequential logic is required to control and observe the memory.The scan chains are proposed to be constructed based onthe fanin/fanout cone analysis of memories. We partitionthe design scan flip-flops into three sets: 𝐼𝑛𝑝𝑢𝑡, 𝑂𝑢𝑡𝑝𝑢𝑡,𝐷𝑜𝑛𝑡𝑐𝑎𝑟𝑒.

The set 𝐼𝑛𝑝𝑢𝑡 contains all flip-flops that control and donot observe memory instances and a separate scan enable𝐼𝑛𝑆𝐸 is assigned. These scan flip-flops are made shift-only(𝐼𝑛𝑆𝐸 = 1) as they do not observe during RAM-sequentialATPG. Further, as they are made shift-only, the control logiccone gets restricted to only one sequential level, irrespective ofmultiple capture cycles. The set 𝑂𝑢𝑡𝑝𝑢𝑡 contains all flip-flopsthat directly observe memory instances and they are assigneda separate scan enable 𝑂𝑢𝑡𝑆𝐸. As these scan flip-flops arerequired to observe the memory output during scan capture,its scan enable is set to the design-level scan enable signal(𝑂𝑢𝑡𝑆𝐸 = 𝑆𝐸𝑡𝑜𝑝) that is asserted during scan shift and de-asserted during scan capture. The flip-flops that both controland observe memory instances are also classified under the set𝑂𝑢𝑡𝑝𝑢𝑡. The set 𝐷𝑜𝑛𝑡𝑐𝑎𝑟𝑒 contains all flip-flops that neithercontrol nor observe memory instances and does not impactthe RAM-sequential coverage. They may be excluded fromthe scan group during RAM-sequential ATPG.

The overhead for this scheme are: (a) additional run-time toidentify input and output flip-flops, (b) stitching these flip-flopsinto separate scan chains, (c) additional scan muxes to selectonly the input and output flip-flops during RAM-sequentialATPG.

IV. IMPROVING INTERFACE TEST COVERAGE OF

ASYNCHRONOUS MULTI-PORT MEMORIES

In this section, we first propose memory architecturalschemes for effective testing of asynchronous two-port mem-ory and later extend it for four-port memories.

A. Enhanced two-port memory architecture

A simple two-port asynchronous memory may be viewedas an extension to Figure 2, wherein instead of one addressport for both read and write operations, it has separate writeaddress (AW) and read address (AR) ports. Writes to thememory array are synchronous with clock, while reads areasynchronous and the output (Q) depends only upon AR. For

Paper 4.2 INTERNATIONAL TEST CONFERENCE 6

Page 7: [IEEE 2010 IEEE International Test Conference (ITC) - Austin, TX, USA (2010.11.2-2010.11.4)] 2010 IEEE International Test Conference - Towards effective and compression-friendly test

Memory Array

D [0]

D [1]

D [N-1]

D [N]

.

.

.

Q [0]

Q [1]

Q [N-1]

Q [N]

.

.

.

1

0

1

0

1

0

1

0D [0]

D [1]

D [N-1]

D [N]

Q [0]

Q [1]

Q [N-1]

Q [N]

.

.

.

SO

SI

SE

WriteAddress

ReadAddress

WriteEnable1

0

1

0

1

0

WE

AW

AR

1

ATPG_MODE

W

W

Memory Array

D [0]

D [1]

D [N-1]

D [N]

.

.

.

Q [0]

Q [1]

Q [N-1]

Q [N]

.

.

.

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0D [0]

D [1]

D [N-1]

D [N]

Q [0]

Q [1]

Q [N-1]

Q [N]

.

.

.

SO

SI

SE

WriteAddress

ReadAddress

WriteEnable1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

WE

AW

AR

1

ATPG_MODE

W

W

Fig. 9. Proposed two-port asynchronous memory architecture

such memories, when ATPG MODE is asserted, Q follows thescan flip-flop. But, as soon as ATPG MODE is de-asserted, Qimmediately reflects the word addressed by AR. It may benoted that controlling ATPG MODE similar to scan enabledoes not initialize Q at the end of shift. For the output to becontrolled with the required value at the end of the scan shift(before the first capture clock), it is essential to ensure: (a)a memory word is initialized with the required value duringscan shift, and (b) the read address at the end of shift pointsto the above location.

To address the above concerns, the approach we follow isto eliminate a separate RAM-sequential mode to test asyn-chronous memory. Instead, we propose a memory architecturewhere the scan shift happens through the memory worditself and the timings during the ATPG mode matches withthe worst-case functional use mode. Further, as compared tosynchronous memories, asynchronous memory is not disabledwhen ATPG MODE is asserted, as the memory word is alsoused during scan shift and capture.

For asynchronous read memories, we note that the outputtimings are worst during the write-through operation. Write-through here refers to the condition when both write operationand read operation are exercised at the same location at thesame time simultaneously. During write-through operation, thememory word is written and the result of the write is thenreflected at the memory output. In such cases, the read timeis the sum of time taken to write to the word followed by thetime taken to read from it. Hence, for the ATPG to mimicthe worst-case functional timing, we ensure a write-throughoperation on the slowest word.

Figure 9 illustrates a high-level block-diagram of suchan architecture. For simplicity, memory BIST muxes andassociated test ports are not shown here, and WE is usedto denote both memory enable and write enable port of thecompiler. AR denotes the read address and AW denotes thewrite address ports of the compiler. D and Q are data input

and output ports. 𝑊 denotes the slowest word of the memory.Following points are worth noting in this architecture:

1) During scan shift (ATPG MODE=1 and SE=1), thememory is enabled and write operation is asserted.Unlike conventional memory architecture of Figure 2,memory enable is not de-asserted by ATPG MODE forthe proposed scheme.

2) During scan capture (ATPG MODE=1 and SE=0), writeoperation is enabled based on the value at WE port.

3) During scan shift, the write addresses to memory array isforced to a particular word 𝑊 , irrespective of the valueat AW port.

4) During scan capture, the write address is forced to 𝑊only during write operation and scan observe flip-flopson AW are disabled. During read operation, the valuedriven by the functional logic at AW is captured at thescan collar flip-flops.

5) During scan shift and scan capture (ATPG MODE=1),the read addresses to memory array is forced to aparticular word 𝑊 , irrespective of the value at AR port.

6) During scan shift, the bits of the memory word are daisychained (i.e. D[i] is driven by Q[i-1]) to form the datascan chain.

7) During scan shift, input for the first data bit (D[0])comes from the data scan input (SI); the last output bit(Q[N-1]) drives the data scan output (SO).

8) Address and other control signals have observe flip-flopsstitched to a separate scan chain, not shown in the figure.

In the proposed scheme, ATPG MODE signal of the mem-ory is always asserted during scan shift and scan capture,similar to conventional transition fault ATPG that bypassesthe memory array. However, unlike conventional transitionfault ATPG, each cycle of scan shift and capture performsa valid memory write or read operation. As a result, theabove implementation causes the data and the output timingsin the ATPG mode to match exactly with the worst-casefunctional use mode. For example, during scan capture, dataport is observed by a valid write operation, while the output iscontrolled by a valid read operation. To observe the functionallogic interfacing these input signals, we rely on conventionalmemory collar scan flip-flops and special care was taken toensure that ATPG mode and functional mode timings wereidentical.

Thus, by the above scheme, we have ensured that Xs areprevented at the output and the slowest word is used for worst-case functional timing.

B. Testing asynchronous read path with ATPG

The enhancements described in the above section coversall ports except the read address (AR) port. As AR is anasynchronous port, observe onto scan flip-flop is not sufficientfor testing the Fmax along true functional path. But, we stillretain the observe flip-flop on AR for stuck-at coverage. Totest the fully true asynchronous read through the memory, thememory architecture is further enhanced by providing a specialport control called AR 0 ONLY.

AR 0 ONLY controls the read address to the memory arrayduring scan capture. For testing transition faults on outputand other input ports, AR 0 ONLY is asserted to ensure that

Paper 4.2 INTERNATIONAL TEST CONFERENCE 7

Page 8: [IEEE 2010 IEEE International Test Conference (ITC) - Austin, TX, USA (2010.11.2-2010.11.4)] 2010 IEEE International Test Conference - Towards effective and compression-friendly test

ReadAddress1

0AR

ATPG_MODE

W

AR_0_ONLY

SEMemory

Array

ReadAddress1

0

1

0

1

0AR

ATPG_MODE

W

AR_0_ONLY

SEMemory

Array

Fig. 10. Enhancements to test asynchronous read path

read address is forced to 𝑊 (slowest word) during scancapture. Further, the observe flip-flops at AR capture the ARport only when AR 0 ONLY and ATPG MODE are asserted.So, in conventional transition fault testing, AR port is stillobserved at the memory collar scan flip-flop. But, this does notprovide true coverage of the asynchronous read path. WhenAR 0 ONLY is de-asserted, read address of the memory arrayis driven by AR port. Scan collar flip-flops of AR do notcapture in this mode to force the ATPG tool to propagatefaults through the memory array for coverage credit. Figure 10illustrates this scheme for read address control. Write addresson the other hand remains unchanged to write only to theslowest address word during ATPG.

Thus, in this mode, Q depends upon the value of AR andonly one word 𝑊 is written during scan shift. At the end ofshift, the output remains X as the memory word addressed byAR is not initialized by scan shift. To prevent Xs, the memoryis initialized prior to this mode of ATPG. Memory BIST orcustom logic may be used to initialize all locations of thememory with a deterministic pattern before ATPG.

C. Extension to Four-Port Asynchronous Memory

The four-port memory under discussion has 2 synchronouswrite ports and 2 asynchronous read ports. We denotethe first pair of one write and one read-port as portA; and the second write+read-port pair as port B. Bothports are independently controlled as two asynchronous two-port memories, using the scheme explained above. Sep-arate clocks (CLKA/ CLKB), scan enables (SEA/ SEB),ATPG modes (ATPG MODEA/ ATPG MODEB), AR controls(ARA 0 ONLY/ ARB 0 ONLY) exist for each of the port andthey are identically controlled. In our implementation, IOs forport A and port B lie on opposite sides. As a result, the fastestword for port A becomes the slowest word for port B and vice-versa. This causes the slowest word for port A and port B tobe non-overlapping and hence, part of scan chains for port Aand port B, respectively. Daisy chaining of data scan chainsfor each side (read-write pair) of a four-port memory is similarto that of two-port memory illustrated in figure 9.

Apart from using the worst-case functional path and beingcompression-friendly, a key advantage with the proposed ar-chitecture is that ATPG requires only 2 capture cycles to detectboth input and output interface faults. This is because Q isalways initialized at the end of shift, and any write operationimmediately reflects at the Q (as read is asynchronous). Hence,write followed by a scan capture suffices to detect all faultsexcept on AR. For testing asynchronous read path, a read and

a capture cycle is sufficient. For example, read address is setto AR1 at the end of shift. First capture induces a AR1 →AR2 transition at read address and the corresponding changeat Q is captured during the second capture cycle.

D. Overhead and Limitation

The main limitation of the proposed asynchronous memoryarchitecture is the impact to performance and area of the mem-ory, at the expense of improved testability. Primarily, writeenables (WEA/WEB) and addresses (AWA/AWB/ARA/ARB)are impacted due to additional gating. Write-enable is im-pacted by two factors: (a) additional mux at WE that forceswrite-enable to 1 during scan shift, and (b) additional gatingto force write address to select the slowest word. Originally, a2x1 mux was used for separate memory BIST/test ports. Withthe new architecture, a custom 3x1 mux was designed selectthe ATPG inputs with negligible impact to timing. On the otherhand, the impact of forcing write address to the slowest wordresulted in approximately 100ps degradation for WEA/B. But,this was recovered by using a separate latch for ATPG sothat the functional write-enable operation was decoupled fromwrite-address generation for ATPG. Further, the WE latch forATPG was timed in such a manner that it does not lie in thecritical path for write-address generation. This had negligibleimpact on functional write-enable timing. For addresses, thedegradation in timing due to additional gating was less than40ps. To puts things in perspective, the compiler is spec-edaround 1GHz with all these overhead. The area impact due tothe additional logic is approximately 5% to 10% of the overallarea of the memory, depending on the compiler configuration.

Another limitation is that the proposed scheme requiresmemory initialization prior to ATPG for testing asynchronousread path. This imposes restriction on the chip-level test modeentry/exit scheme. As switching from memory BIST mode (orother initialization mode) to ATPG mode should not disturbthe memory content, care should be taken to gate-off memoryclocks during the test mode setup to enter into ATPG mode.

V. EXPERIMENTAL RESULTS

The proposed schemes were implemented in a 40-nmindustrial ASIC core with 2.5 million gates, 235 memoryinstances and about 100K scan flip-flops in 3 clock domains.A commercial ATPG tool (Tool A of Table I) was used toimplement a 25X X-tolerant test compression with 8 scanchains interfacing the tester. The same testcase was alsoused in experiments described in Table I. Longest scan chainlength with compression was 497, while that of compressionbypass was 12425. This design has two asynchronous four-portmemory instances, apart from synchronous memories. TheATPG MODE pin of all synchronous memories was controlledby one chip-level pin, while that of four-port memories wasdriven by a separate chip-level pin. For pin-limited designs,these may be controlled using separate JTAG/P1500 modecontrol data bits.

Testchips (using 40-nm technology) were taped-out imple-menting the proposed asynchronous four-port architecture andram-sequential tests for synchronous memories.

Paper 4.2 INTERNATIONAL TEST CONFERENCE 8

Page 9: [IEEE 2010 IEEE International Test Conference (ITC) - Austin, TX, USA (2010.11.2-2010.11.4)] 2010 IEEE International Test Conference - Towards effective and compression-friendly test

TABLE II

TESTING SYNCHRONOUS MEMORY INPUT+OUTPUT INTERFACE FAULTS

Test Test # Test Tester ATPGMode Coverage(%) Patterns Cycles run-time(s)

Compression bypass 96.63 1621 20M 6576With compression 96.69 2002 995K 8101

0

100

200

300

400

Any wordNo compression

Slowest wordNo compression

Any wordCompression

Slowest wordCompression

0

400

800

1200

1600

Ru

n-t

ime

(s)

Pattern Count

Run-time

Fig. 11. Memory output interface fault test results for slowest word

A. Results for synchronous memories

Table II describes coverage, pattern count and ATPG run-time with the proposed schemes for synchronous memories. Italso compares compression and bypass to show that proposedschemes are effective even with test compression. Columns2 to 5 correspond to pattern generation statistics when bothinput and output faults of the memory interface are targeted,using schemes described in sections III-A and III-B. Testcoverage numbers are with transition fault model and faultsare propagated through the memory. Total tester cycles iscomputed as 𝑁𝑢𝑚𝑃𝑎𝑡𝑡𝑒𝑟𝑛𝑠× 𝐿𝑜𝑛𝑔𝑒𝑠𝑡𝑆𝑐𝑎𝑛𝐶ℎ𝑎𝑖𝑛𝐿𝑒𝑛𝑔𝑡ℎ.𝐿𝑜𝑛𝑔𝑒𝑠𝑡𝑆𝑐𝑎𝑛𝐶ℎ𝑎𝑖𝑛𝐿𝑒𝑛𝑔𝑡ℎ denotes the length of longestscan chain, while 𝑁𝑢𝑚𝑃𝑎𝑡𝑡𝑒𝑟𝑛𝑠 corresponds to the patterncount of column 3. Results indicate that the proposed schemesare effective even with test compression. The coverage fromtest compression for testing memory input+output interfacefaults is almost same as that of compression bypass mode withabout 23% increase in both patterns and run-time. Tester cyclesindicate effective compression achieved for RAM-sequentialpatterns as 20X for a 25X compression implementation.

Figure 11 describes the impact of two-phase constrainedATPG of Section III-C (shown as 𝑆𝑙𝑜𝑤𝑒𝑠𝑡 𝑤𝑜𝑟𝑑) whencompared to the one-pass unconstrained ATPG (shown as𝐴𝑛𝑦 𝑤𝑜𝑟𝑑) for detecting the output faults. Both these schemesgave 100% coverage on output faults and generated samenumber of patterns, with small impact to run-time. The run-times reported for the 𝑆𝑙𝑜𝑤𝑒𝑠𝑡 𝑤𝑜𝑟𝑑 scheme is the sum ofpattern generation ATPG run-time for Pass1 and the totalwalk-clock time for pattern re-simulation of Pass-2, whilethe run-times reported for 𝐴𝑛𝑦 𝑤𝑜𝑟𝑑 scheme is only thepattern generation run-time. Results show that two-pass ATPGscheme for targeting the slowest word provides completecoverage with test compression. The pattern count has noimpact and run-time increases by about 18% with respectto unconstrained atpg. However, test compression results in80% higher pattern count for both the schemes comparedto that of compression bypass. This is possibly due to thefact the additional constraints on address ports restrict thedecompressor’s ability to generate patterns.

Table III shows the effectiveness of scan architectural op-timizations of Section III-D. For the proposed scheme, only

TABLE III

RESULTS WITH SCAN ARCHITECTURAL OPTIMIZATIONS

Test Test # Test Tester ATPGMode Coverage(%) Patterns Cycles run-time(s)

Compression bypass 96.78 1438 8M 4494With compression 96.83 2051 525K 7823

0.96

0.97

0.98

0.99

1

W1 W2 W3 W4 S1 S2 S3 S4 N1 N2 N3 N4

Parts across split lots

No

rmal

ized

Vm

in

Any word

Slowest word

Fig. 12. Testchip results for slowest word test

43K flip-flops were required and used for scan shift, comparedto the total 100K scan flip-flops in the design. Memory collarflip-flops were also included in the 43K, as they are requiredto be in scan chain to initialize the memory output at the endof scan shift. Unfortunately, the longest scan chain with testcompression was limited by the longest memory collar scanchain thereby impacting the effective compression obtained.Comparing the compression results from Tables II and III,we infer the following: (1) Pattern count increased marginally(by around 2%). This was due to the increase in the care-bit density in the scan flip-flops, as adjacent scan flip-flopsneed to be in a specific manner for transitions to be generatedfrom scan shift; (2) Marginal increase in coverage (by around0.1%) due to improved controllability; (3) Despite increasedpattern count, tester cycles reduced by around 47% (from995K to 525K) due to shorter scan chains from reducedscan flip-flops. When test compression is bypassed, care-bitdensity does not impact pattern count. Instead, the simplercontrollability via scan shift results in reduced pattern countand ATPG run-time. Reduced pattern count along with shorterscan chains from reduced scan flip-flops together result insignificant reduction in tester cycles (by around 60%) whencompression is bypassed.

Figure 12 compares the Vmin from the testchip for uncon-strained patterns that target any word and those that target theslowest word. Results are shown for a sample memory instancefrom 12 dies across process splits (parts S1 to S4 are fromstrong process; W1 to W4 are from weak process; and, N1 toN4 are from nominal process). It may be noted from the graphthat vmin is higher for patterns exercising slowest word withan average of about 1%. The access time difference betweenthe fastest and the slowest word increases as the number ofrows in the compiler increases. As the instance used in thisstudy is smaller (i.e. the instance configuration is close to thelower range of the compiler offering), the vmin impact is alsolower.

Paper 4.2 INTERNATIONAL TEST CONFERENCE 9

Page 10: [IEEE 2010 IEEE International Test Conference (ITC) - Austin, TX, USA (2010.11.2-2010.11.4)] 2010 IEEE International Test Conference - Towards effective and compression-friendly test

TABLE IV

TESTING ASYNCHRONOUS FOUR-PORT MEMORY INTERFACE FAULTS

Asynchronous read not tested (AR 0 ONLY=1) Asynchronous read tested (AR 0 ONLY=0)Test Test # Test ATPG Test # Test ATPG

Mode Coverage(%) Patterns run-time(s) Coverage(%) Patterns run-time(s)

Compression bypass 99.04 56 115 99.31 44 43With compression 99.31 66 135 99.31 43 53

B. Results for asynchronous memories

Table IV shows the coverage, pattern count and ATPGrun-times for the proposed asynchronous four-port memoryarchitecture. Columns 2-4 are results for testing all input andoutput ports (except asynchronous read) with ARA 0 ONLY= ARB 0 ONLY = 1. In this mode, coverage is credited bycapturing onto the scan flip-flops (including read address).Slowest word 𝑊 , part of data scan chain, captures the dataand controls the output. On the other hand, columns 5-7 areresults for input and output ports (including true asynchronousread paths) by setting ARA 0 ONLY = ARB 0 ONLY = 0,assuming that the memory is initialized prior to ATPG. In ourexperiments, memory BIST was used to initialize all wordsto 0. Results indicate that coverage with test compression isvery similar to that of compression bypass, with about 20%increase in pattern count and run-time. Further, testing trueasynchronous path results in same coverage with reduction inpattern count and run-time.

Figure 13 shows normalized Fmax result from the testchipfor a four-port instance with 512 rows. Fmax data is presentedfor two different functional accesses, ATPG with the pro-posed architecture and ATPG with conventional architectureof Figure 2. In conventional architecture, memory array isbypassed during ATPG. As only the proposed architecture wasimplemented in silicon, simulation results for ATPG timingswith conventional architecture for a similar process-voltage-temperature corner is shown (marked by #). Memory BISTcontrolled both test and functional ports in the testchip. So,functional access 1 and 2 were generated using BIST withdifferent address sequences stressing different componentswithin the memory array for worst-case Fmax. Followingpoints are worth noting from Figure 13: (1) ATPG Fmaxwith conventional architecture is highly inaccurate (more than2X) compared to functional access, with reasons explained inSection II; (2) ATPG with the proposed architecture has almostthe same Fmax as functional access 1; (3) Small difference(about 20%) exists between the worst-case functional sequence(access 2) and proposed ATPG. This difference is also foundamongst the functional patterns (access 1 and 2) themselves.

VI. CONCLUSION AND FUTURE WORK

In this paper, we focused on improving the quality of ATPGpatterns in determining Fmax for memory-interface logic. Forsynchronous memories, we proposed minor modifications tomemory along with DFT and pattern generation schemesthat worked with test compression. These pattern generationschemes are enhancements to conventional RAM-sequentialATPG that propagate faults through the memory array. Forasynchronous memories, we proposed memory architecturalenhancements by making a memory word scanable to re-flect true functional timing during ATPG operation. Proposedschemes also ensured that the slowest word was targeted for

0

0.25

0.5

0.75

1

Functional Access1 Functional Access2 ATPG with Proposed

architecture

ATPG with Conventional

architecture (#)

Normalized Fmax

Fig. 13. Testchip results for asynchronous four-port enhancement

both synchronous and asynchronous memories. Experimentalresults on an industrial ASIC core indicate good coverage withtest compression similar to compression bypass, with smallpattern count and run-time overhead. Preliminary silicon re-sults on a 40-nm testchip show that proposed scheme achievescloser to functional Fmax for asynchronous memories.

The proposed schemes use conventional transition faultmodel for generating patterns with worst-case functional tim-ing through memories. Future work includes integrating thisscheme with a small delay defect ATPG that also propagatesfaults along longer logic paths to/from memories [14].

REFERENCES

[1] S. Borkar, et al., “Parameter Variations and Impact on Circuits andMicroarchitecture,” in Proc. ACM/IEEE Design Automation Conference,2003, pp. 338–342.

[2] V. Zolotov, C. Visweswariah, and J. Xiong, “Voltage Binning Under Pro-cess Variation,” in Proc. IEEE International Conference on Computer-Aided Design, 2009, pp. 425–432.

[3] L.-C. Chen, et al., “Transition Test on UltraSPARC T2 Microprocessor,”in Proc. IEEE International Test Conference, 2008.

[4] J. Zeng, et al., “On Correlating Structural Test with Functional Testsfor Speed Binning of High Performance Design,” in Proc. IEEE Inter-national Test Conference, 2004, pp. 31–37.

[5] B. Cory, R. Kapur, and B. Underwood, “Speed Binning with Path DelayTest in 150-nm Technology,” Design Test of Computers, IEEE, pp. 41–45, Sep-Oct. 2003.

[6] E. K. Vida-Torku and G. Joos, “Designing for scan test of highperformance embedded memories,” in Proc. IEEE International TestConference, 1998.

[7] M. Abadir and R. Raina, “Design-For-Test Methodology for MotorolaPowerPC Microprocessors,” in Proc. IEEE International Test Confer-ence, 1999, pp. 810–819.

[8] A. Jindal, “Testing Around Memories - an inside look,” TechOnline,𝑤𝑤𝑤.𝑡𝑒𝑐ℎ𝑜𝑛𝑙𝑖𝑛𝑒.𝑐𝑜𝑚/𝑒𝑙𝑒𝑐𝑡𝑟𝑜𝑛𝑖𝑐𝑠 𝑑𝑖𝑟𝑒𝑐𝑡𝑜𝑟𝑦/𝑡𝑒𝑐ℎ𝑝𝑎𝑝𝑒𝑟/216401057.

[9] “Testing RAM and ROM,” Scan and ATPG Process Guide, MentorGraphics Inc., 8.2009 4, Nov 2009.

[10] “Test Pattern Data,” TetraMAX ATPG User Guide, Synopsys Inc.,2009.06, Nov 2009.

[11] J. Saxena, et al., “Scan-Based Transition Fault Testing - Implementationand Low-Cost Challenges,” in Proc. IEEE International Test Conference,2002, pp. 1120–1129.

[12] S. K. Jain, K. Srivastva, and S. Kainth, “A Novel Circuit to OptimizeAccess Time and Decoding Schemes in Memories,” in Proc. IEEEInternational Conference on VLSI Design, 2010, pp. 117–121.

[13] S. Dhong, et al., “A 4.8GHz fully pipelined embedded SRAM in astreaming processor of a CELL processor,” in Proc. IEEE InternationalSolid-State Circuits Conference, 2005, pp. 486–612.

[14] X. Lin, et al., “Timing-Aware ATPG for High Quality At-speed Testingof Small Delay Defects,” in Proc. IEEE Asian Test Symposium, 2006.

Paper 4.2 INTERNATIONAL TEST CONFERENCE 10