efficient test for systems using counters*people.ee.duke.edu/~krish/chandra_vlsi_design.pdfvectors...

12
VLSI DESIGN 2001, Vol. 12, No. 4, pp. 475-486 Reprints available directly from the publisher Photocopying permitted by license only (C) 2001 OPA (Overseas Publishers Association) N.V. Published by license under the Gordon and Breach Science Publishers imprint, member of the Taylor & Francis Group. Efficient Test Application for Core-Based Systems Using Twisted-Ring Counters* ANSHUMAN CHANDRAa’t, KRISHNENDU CHAKRABARTY and MARK C. HANSEN b aDepartment of Electrical and Computer Engineering, Duke University, 130 Hudson Hall, Box 90291, Durham, NC 27708; bDelphi Delco Electronics Systems, IC Design Center, 2705 Goyer Road, P.O. Box 9005, Mail Station D18, Kokomo, IN 46904-9005 (Received 15 August 1999; In finalform 11 September 2000) We present novel test set encoding and pattern decompression methods for core-based systems. These are based on the use of twisted-ring counters and offer a number of important advantages-significant test compression (over 10X in many cases), less tester memory and reduced testing time, the ability to use a slow tester without compromising test quality or testing time, and no performance degradation for the core under test. Surprisingly, the encoded test sets obtained from partially-specified test sets (test cubes) are often smaller than the compacted test sets generated by automatic test pattern generation programs. Moreover, a large number of patterns are applied test-per-clock to cores, thereby increasing the likelihood of detecting non-modeled faults. Experimental results for the ISCAS benchmark circuits demonstrate that the proposed test architecture offers an attractive solution to the problem of achieving high test quality and low testing time with relatively slower, less expensive testers. Keywords: Embedded core testing; Slower-speed testers; Test set encoding; Test rate; Test vector decompression; Test-per-clock 1. INTRODUCTION System-on-a-chip designs containing embedded cores pose a number of difficult test challenges [1]. One of these involves the reduction of test application time. Testing time is especially im- portant for a core-based system since long test times can significantly increase cost and require design changes, thereby affecting time-to-market. Unfortunately, the testing time for a core-based system can be very high due to the fact that the test patterns for the embedded cores are applied serially via scan chains. In such test-per-scan methods, one test pattern is applied to an n-input core under test every n cycles. For embedded cores that use the same internal scan chain for applying * This research was supported in part by the National Science Foundation under grant no. 9875324, by a contract from Delphi Delco Electronics Systems, and by an equipment grant from Sun Microsystems. Corresponding author. Tel.: (919) 660-5230, Fax: (919) 660-5293, e-mail: [email protected] 475

Upload: others

Post on 24-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Efficient Test for Systems Using Counters*people.ee.duke.edu/~krish/Chandra_VLSI_Design.pdfvectors provided by the core vendor are stored in compressed form in the tester memory, and

VLSI DESIGN2001, Vol. 12, No. 4, pp. 475-486Reprints available directly from the publisherPhotocopying permitted by license only

(C) 2001 OPA (Overseas Publishers Association) N.V.Published by license under

the Gordon and Breach Science Publishers imprint,member of the Taylor & Francis Group.

Efficient Test Application for Core-Based SystemsUsing Twisted-Ring Counters*

ANSHUMAN CHANDRAa’t, KRISHNENDU CHAKRABARTY and MARK C. HANSENb

aDepartment of Electrical and Computer Engineering, Duke University, 130 Hudson Hall, Box 90291,Durham, NC 27708; bDelphi Delco Electronics Systems, IC Design Center, 2705 Goyer Road, P.O. Box 9005,

Mail Station D18, Kokomo, IN 46904-9005

(Received 15 August 1999; In finalform 11 September 2000)

We present novel test set encoding and pattern decompression methods for core-basedsystems. These are based on the use of twisted-ring counters and offer a number ofimportant advantages-significant test compression (over 10X in many cases), less testermemory and reduced testing time, the ability to use a slow tester without compromisingtest quality or testing time, and no performance degradation for the core under test.Surprisingly, the encoded test sets obtained from partially-specified test sets (test cubes)are often smaller than the compacted test sets generated by automatic test patterngeneration programs. Moreover, a large number of patterns are applied test-per-clock tocores, thereby increasing the likelihood of detecting non-modeled faults. Experimentalresults for the ISCAS benchmark circuits demonstrate that the proposed testarchitecture offers an attractive solution to the problem of achieving high test qualityand low testing time with relatively slower, less expensive testers.

Keywords: Embedded core testing; Slower-speed testers; Test set encoding; Test rate; Test vectordecompression; Test-per-clock

1. INTRODUCTION

System-on-a-chip designs containing embeddedcores pose a number of difficult test challenges[1]. One of these involves the reduction of testapplication time. Testing time is especially im-portant for a core-based system since long testtimes can significantly increase cost and require

design changes, thereby affecting time-to-market.Unfortunately, the testing time for a core-basedsystem can be very high due to the fact that thetest patterns for the embedded cores are appliedserially via scan chains. In such test-per-scanmethods, one test pattern is applied to an n-inputcore under test every n cycles. For embedded coresthat use the same internal scan chain for applying

* This research was supported in part by the National Science Foundation under grant no. 9875324, by a contract from DelphiDelco Electronics Systems, and by an equipment grant from Sun Microsystems.

Corresponding author. Tel.: (919) 660-5230, Fax: (919) 660-5293, e-mail: [email protected]

475

Page 2: Efficient Test for Systems Using Counters*people.ee.duke.edu/~krish/Chandra_VLSI_Design.pdfvectors provided by the core vendor are stored in compressed form in the tester memory, and

476 A. CHANDRA et al.

test patterns and capturing test responses, suchserialization of the test patterns is unavoidable.However, a test-per-scan method increases thetesting time unnecessarily for a core that employs aboundary scan chain for test application, and usesa separate circuit for capturing test responses.While parallel access to the core in such cases ispossible by multiplexing the core I/Os to chip I/Os[2], the routing and area overhead associated withsuch a test access mechanism can be prohibitive.The testing time for a core-based system can be

reduced by employing built-in self-test (BIST).However, BIST for logic cores is feasible onlyif the core vendor provides "BIST-able cores".Another problem in using BIST for core testinglies in the fact that practical (low-cost) patterngeneration methods that provide high fault cover-age require structural information about the IPcore, either for test-point placement [3], or forcarrying out fault simulation and ATPG [4]. Thecore vendor usually provides only a precomputedtest set for the core.

In order to reduce the testing time for "non-BIST-able cores", a recently proposed test vectorcompression/decompression method that uses anon-chip decoder and a cyclical scan register toapply a precomputed test set T to the core [5]. Testvectors provided by the core vendor are storedin compressed form in the tester memory, andtransferred serially to the core and decompressedduring test application. However, a problem withthis method is that it is based on a test-per-scanarchitecture and is therefore inefficient for corescontaining boundary scan registers that do notcapture test responses.

In this paper, we introduce a new test compres-sion/decompression method based on the use oftwisted-ring (Johnson) counters (TRCs) [6]. Unlikethe BIST method presented in [6], the testapplication method described here does notrequire reseeding of the twisted-ring counter. Forcores containing boundary scan, a test-per-clockarchitecture allows the application of a test pat-tern to the core every clock cycle. It imposes noperformance penalty beyond what is introduced

by scan design. Since a larger number of patternsare applied to the CUT every cycle, the proposedtest set-up can also be used with a slower-speedtester without affecting test quality. It is becomingincreasingly difficult for testers to keep up withthe high frequencies needed to sufficiently test forperformance-related defects in today’s IC’s. High-speed testers also pose greater interfacing prob-lems during test. Moreover, the SIA NationalTechnology Roadmap predicts that the cost ofhigh-speed testers will exceed $20 million by 2010[7]. Hence, test methods that can be used withslower-speed, less expensive testers are becomingespecially important [8].

In contrast to a conventional test-per-scanmethod, the proposed method applies patternsto the core at every clock cycle, hence the testresponses must be observed every clock cycle.We also show that the proposed compression meth-od is flexible in that the TRC-based test can bedesigned using both fully-specified and partially-specified test sets. Finally, we compare the sizeof the encoded test sets (number of bits to bestored) for partially-specified patterns to the testsets obtained from ATPG programs that attemptto generate compact test sets [9]. The motivationfor generating small test sets is to reduce testingtime. However, the high degree of compressionobtained for partially-specified test sets demon-strates that test set compaction is not alwaysnecessary-the testing time is reduced more effec-tively by encoding partially-specified test sets. Theproposed encoding method provides significanttest set compression (over 10X) for many full-scanand non-scan circuits.The paper is organized as follows. In Section 2,

we review twisted-ring counters and present ourtest set encoding method. As mentioned above,our encoding method is tailored towards the useof a TRC for test application. We describe ourpattern application technique using TRCs andillustrate the method for a boundary scan archi-tecture in which test responses are captured in aseparate scan chain. In Section 3, we presentexperimental results for the ISCAS 85 [11] and

Page 3: Efficient Test for Systems Using Counters*people.ee.duke.edu/~krish/Chandra_VLSI_Design.pdfvectors provided by the core vendor are stored in compressed form in the tester memory, and

EMBEDDED CORE TESTING 477

ISCAS 89 [12] benchmark circuits. Section 4 pre-sents the conclusions and outlines directions forfurther research.

2. TEST SET ENCODING AND PATTERNDECOMPRESSION

In this section, we first review a recently-proposedBIST approach based on twisted-ring counters(TRCs). We then present a TRC-based test vectorcompression/decompression approach for em-bedded core testing. We describe the proposedtest-per-clock pattern application technique usingTRCs and illustrate the method for a boundaryscan architecture in which test responses arecaptured in a separate scan chain.An n-bit ring counter is a group of n flip-flops

F1, F2,..., Fn connected as a shift register, withthe output of Fn fed back to the input of F1. Itbehaves as a counter with up to n distinct statesdepending on the initial value (seed). The TRC is aring counter with an inverter added between theoutput of Fn and the input of F1. An n-bit TRCbehaves as a counter with up to 2n distinct statesdepending on its seed. TRCs have recently beenproposed for the design of BIST pattern genera-tion circuits [6].

Test application in [6] is carried out by re-configuring the input scan register of the circuitunder test as a ring counter and a TRC. A smallnumber of seed patterns are stored in a ROM, andfor each seed, the pattern generator is clocked for3n clock cycles. For the first n cycles, the patterngenerator operates as a ring counter (shift mode)while for the remaining 2n cycles, it operates asa TRC (twist mode). A single 4-to-1 multiplexeris used to control the two-mode operation ofthe pattern generator. It was shown that a smallnumber of seeds are generally sufficient to embedan arbitrary precomputed test set. The set of seedsmay be viewed as an encoded test set, which is usedto generate a superset of the precomputed test set

during test application. A key advantage of thisapproach is that test-per-clock BIST pattern ap-plication is achieved without requiring any map-ping logic between the scan register flip-flopsand the circuit under test.We now present another application of

TRCs-test vector compression/decompression forembedded core testing that provides all theadvantages of the method described in [6]. Inaddition, the proposed technique allows moreefficient use of the TRC by removing a restrictionthat is inherent in the BIST architecture of [6],namely, the pattern generator can change modes(from shift to twist) only once for every seed. Thisrestriction limits the number of patterns that canbe generated from any starting state (seed) to 3n.We show in this paper that the entire precom-

puted test set can be generated from only onestarting state if the pattern generator is allowedto switch modes freely, i.e., at any clock cycle. Infact, any test pattern can be generated from anarbitrarily chosen initial state in at most n cycles.This observation forms the basis for the testarchitecture proposed in the paper. Figure showsthe main idea behind the pattern generator. Thetest set is encoded as a single-bit stream; duringtest application, each bit of the encoded test setdetermines the operation (shift or twist) to beperformed during the corresponding clock cycle.The encoded test set is stored in tester memoryand transferred to the IC serially during testapplication.For a precomputed test set Tz with m patterns,

the size of the encoded test set TE is at most mnbits, which equals the storage requirement whenno encoding is employed. However, we show laterthat the size of TE is generally much smaller thanmn bits. Another advantage of using the set-upshown in Figure is that the patterns are appliedin a test-per-clock fashion without adding delayson the functional logic paths. When no encodingis employed, i.e., TD is stored in the tester as intraditional testing, a total ofm patterns are applied

The proposed method is test-per-clock with respect to the scan clock, as opposed to the at-speed functional clock.

Page 4: Efficient Test for Systems Using Counters*people.ee.duke.edu/~krish/Chandra_VLSI_Design.pdfvectors provided by the core vendor are stored in compressed form in the tester memory, and

478 A. CHANDRA et al.

Multiplexer

Select(encoded test set)

Input scan register

Core under test

FIGURE Generic pattern application using a twisted-ring counter.

to the core under test in mn cycles (test-per-scan).However, with encoding, a total of ITel patternsare applied to the core under test in exactly ITelcycles. Therefore, even if no compression isachieved via encoding in the worst case, a totalof mn patterns are applied in mn cycles.For example, consider the example To in

Figure 2. We assume that TRC starts in the all-0state. This can easily be achieved using a globalreset. The encoded test set Te contains only 26bits, a saving of 47%, both in tester memory andtest application time. Furthermore, using thisencoded test set, a total of 26 patterns are applied

to the core under test in 26 cycles, instead of theonly 7 patterns that are applied in 49 cycles if To isstored in tester memory. The patterns from Tzthat are applied to the core under test arehighlighted in the figure. In general, it is notnecessary to assume an all-0 initial state. Anyinitial state can be scanned into the TRC usingthe serial scan-in mechanism shown in the testarchitecture of Figure 3. In this paper, however,we assume that the TRC always starts in the all-0state.

It is straightforward to show that any testpattern can be generated from any given state S

010101111010101001101000111011100010110011110110

An example TD

Initial state 6 cycles 3 cycles0000000 1110110 0001110

TTSTTT STT TSS

4 cycles 5 cycles /r3 cycles

010t011 0110011 1110001

S,s STT$ TSTTT2 cycle

3 cycles1101010 1001101

SST

Encoded test set (symbolic): TTSTTTSTTTSSTSTTTSTTSSSSST

Encoded test set (binary): t0111011100101 11011000

Size of TD 49 bits, size of TE 26 bits.

t0111000ooooooo/OOl lOO000000[ O01110I00000[11001110110000]01100111011000/i0110011101100/01011001110110/I01011001110110101010011101I010101ooo o

1010101000111 011010111000111110001 0011010

1001101

Test patterns applied to core under test

FIGURE 2 An example illustrating test set encoding and pattern application (T and S refer to twist and shift, respectively).

Page 5: Efficient Test for Systems Using Counters*people.ee.duke.edu/~krish/Chandra_VLSI_Design.pdfvectors provided by the core vendor are stored in compressed form in the tester memory, and

EMBEDDED CORE TESTING

Scan in [1Scan enebte

Serial test data fromtester (encoded)

Multiplexer

Core under test

FIGURE 3 Proposed pattern application scheme using a TRC.

479

of an n-bit pattern generator (Fig. 1) in at most ncycles. Let t-tit2.. "tn and S=SlS2"’’Sn. Then wecan generate from s in a minimum number ofcycles by determining the smallest integer r suchthat sl=tr+l, s2=tr+2,...,Sn-r=tn The testpattern can then be generated from s in exactly r

cycles. Intuitively, this also implies that the TRCcan always be made to generate any desired testpattern by "flushing out" s and carefully reloadingthe bits corresponding to t.

A block diagram of the interface between theATE and the SOC under test is shown in Figure 4.

The tester supplies patterns with frequency fext, Aysis the functional clock (system) frequency, andfcanis the scan clock frequency. Since a pattern isapplied to the core every cycle, fsoan=fext. Theon-chip clock generator is used to generate asynchronization clock to control the flow of databetween ATE and the SOC, and to synchronizefextwith fsan. We introduce the term test rate forexternal testing to measure the number of patternsthat are applied to the core under test per second.The test rate is simply fscan for the TRC-basedarchitecture. We also use the term relative test rate

ATEfscafext

ExternalSync.

ATEI/O

channel

Sync. Clock

SOC

fscafext Wrapper

Cre,

Wrapper

Core

On-chip clock generator

fsys’fscan

FIGURE 4 A conceptual architecture for testing a system-on-a-chip.

Page 6: Efficient Test for Systems Using Counters*people.ee.duke.edu/~krish/Chandra_VLSI_Design.pdfvectors provided by the core vendor are stored in compressed form in the tester memory, and

48O A. CHANDRA et al.

to measure the number of patterns applied to thecore under test per tester cycle. For the patternapplication scheme in Figure 3, the relative testrate is the maximum value possible, i.e., 1. Notethat for a test-per-scan scheme, the test rate is onlyfscan/n and the relative test rate is only 1/n. Letamn be the size of the encoded test set Te. In otherwords, a ]Tel/[ TD[ is directly proportional to theamount of compression achieved using the pro-posed encoding method.

Test-per-clock pattern application is thereforeachieved using a TRC without adding any addi-tional logic on the critical paths between the flip-flops and the inputs of the core under test. Hence,as in test-per-scan methods [14, 15], the proposedmethod imposes no additional performance de-gradation beyond scan. This is in contrast to mosttest-per-clock methods, which require mappinglogic between the flip-flops and the circuit undertest. Furthermore, the test-per-clock approachallows us to use slower, low-cost testers. Forexample, in order to apply all the test patterns intime T, a test-per-scan method requires the testerto run at frequency such that fscan--mn/T. Withour test-per-clock approach, the testing can becarried out in the same amount of time using atester that runs at frequency such thatfca amn/T. Moreover, an additional re(an- 1) patterns areapplied to the circuit in the same amount of time.This increases the likelihood of detecting non-modeled faults that are not explicitly targeted byTz.As in any test-per-clock approach, response

monitoring in the proposed method must be doneevery tester clock cycle. This can be carried outusing a combination of space compaction [16, 17]and multiple-input signature register. The responsemonitoring logic can be designed from the re-sponses of the core under test to the patterns inTo and to the additional patterns that are appliedby the TRC.We now address the issue of determining the

encoded test set Te from the given precomputedtest set To. We first note that for full-scan cores,since the patterns can be reordered, the encoding

problem can be formulated as the well-knowntraveling salesman problem (TSP) on a weighteddirected graph. Every pattern in To corresponds toa node in a directed graph G. An additional nodes in this graph corresponds to the all-0 initial stateof the pattern generator. The weight of an edge(x, y) in G equals the minimum number of cyclesrequired to generate Y=YlYz"’Yn from x---xlx2." "xn. This is determined from the smallestr _< n such that xl yr+ 1, x2 yr+ 2, Xn- y,,.It can now be easily seen that the minimum-sizeencoding of To corresponds to a minimum-costHamiltonian path (TSP) in G that starts at s.

Since the traveling salesman problem is NP-complete [10], we use a simple greedy algorithmto carry out the encoding. Starting from the all-0initial state S, we generate the pattern in To that isat the least "distance" from S. (The distance of yfrom x is given by the weight of the edge from xto y in G.) We then continue this process until allthe patterns in To have been generated. Figure 2illustrates the greedy algorithm for an exampletest set. The complexity of this algorithm is O(m2),and the complexity of generating the graph isO(man).The encoding algorithm can also be applied

to test sets containing partially-specified patterns(test cubes). In fact, significantly more compres-sion can be expected if To is partially-specified. Inorder to illustrate the encoding for test cubes, weintroduce the notion of compatibility between thebits xi and yj of test cubes x and y. We define xiand yj to be compatible if either (i) both xi and yare specified and equal, or (ii) at least one of thesetwo bits is a don’t care. Once again, we start fromthe all-0 s initial state and follow a greedy strategyof choosing a pattern at the least distance, i.e., anode in G connected by an outgoing edge from swith the least weight. The weight w(x,y) of theedge from x to y is determined as follows:w(x,y)= r, where r is the smallest integer suchthat xi and y+i are compatible, 1 <_ < n- r. Notethat the don’t-cares of every test cube generatedfrom an initial state are always uniquely mappedto ls and 0s.

Page 7: Efficient Test for Systems Using Counters*people.ee.duke.edu/~krish/Chandra_VLSI_Design.pdfvectors provided by the core vendor are stored in compressed form in the tester memory, and

EMBEDDED CORE TESTING

0lOXOX10XXlO11xx01lXlXOX

An example 7"0

Compressed 7"0

Initial state 2 cycles cycle000000 010XOX 1XlXOX (101000)

TS (o lOOOO) T|7"

cycle6 cycles 2 cycles

lOXXlX 11XXOl 1101ox (110100)(101111) TTSSTS(111101) 77"

Encoded test set (symbolic):

Encoded test set (binary): 101111110010

Size of TD (compressed) 24 bits, size of TE 12 bits.

’000000100000010000101000110100111010111101011110101110

Patterns applied tocore under test

FIGURE 5 An example illustrating compression and pattern generation for a partially-specified test set.

481

As an example, let us refer to Figure 5. Startingfrom the all-0 initial states, the first pattern fromTo that is applied to the core under test is

tl-010XOX. However, since the starting state iscompletely-specified, the two don’t-care bits oftl are mapped to 0s. The state of the patterngenerator is now 010000 (shown in the figure),and the next pattern from To that is applied tothe core under test is therefore t. 1X1XOX. Thethree don’t cares of this test cube are uniquelymapped to 0s. In general, this mapping is notunique; therefore long runs of ls and 0s can beobtained by suitably assigning values to the don’t-care bits. We have implemented the encodingalgorithm in C+ / and the experimental resultsfor the ISCAS benchmark circuits are reported inSection 4.

It is possible that a few of the patterns generatedby the TRC may be repetitions of patterns alreadyapplied to the core. However, this seems to occurrarely. For example, no patterns are repeated inthe examples of Figure 2 and Figure 4. A generalcharacterization of the sequences generated by thetest architecture of Figure 4 remains an interestingopen problem.The example in Figure 4 illustrates another key

advantage of the proposed compression/decom-pression approach. A test set containing test cubesis usually compressed to generate a compact,completely-specified test set that is stored in testermemory. This is typically carried out during auto-matic test pattern generation (ATPG) [9]. In the

example of the test set of Figure 4, the numberof bits required for storage can be reduced in thisway from 30 to 24. This also reduces the testingtime by 6 cycles for a test-per-scan testing scheme.On the other hand, if encoding is performed on thepartially-specified test set, only 12 bits are requiredfor storage. This represents a saving of 50% intester memory and test application time. Thepatterns applied to the core under test are alsoshown in Figure 4 (the patterns from Tz arehighlighted).

3. EXPERIMENTAL RESULTS

In this section, we evaluate the proposed testset compression/decompression method for theISCAS benchmark circuits. We use the followingmeasures in our evaluation: tester memory re-quirement, test application time, number ofpatterns applied to the core under test, and thetest rate. Test data compression is not the onlybenefit of using the proposed pattern applicationscheme. As our experimental results demonstrate,we achieve significantly higher test rates for test-per-clock architectures.

Table I presents the experimental results forfully-specified test sets generated using the MintestATPG program [9]. These experiments werecarried out using a Sun Ultra 10 workstation witha 333 MHz processor and 128 MB of DRAM. Wecompare the proposed test-per-clock scheme with

Page 8: Efficient Test for Systems Using Counters*people.ee.duke.edu/~krish/Chandra_VLSI_Design.pdfvectors provided by the core vendor are stored in compressed form in the tester memory, and

482 A. CHANDRA et al.

TABLE Experimental results for the full-scan ISCAS circuits with completely-specified testpatterns generated using Mintest [9]

Proposed method Test-per-scan

Circuit n rn a

Testing No. of Testing No. oftime patterns time patterns

(cycles) applied (cycles) applied

c432 36 27 0.8457c499 41 52 0.8908c880 60 16 0.9750c1355 41 84 0.9760c1908 33 106 0.8385c2670 233 44 0.9814c3540 50 84 0.8917c5315 178 37 0.9700c6288 32 12 0.8099c7552 207 73 0.9766s208 19 27 0.7837s298 17 23 0.7648s344 24 13 0.8334s349 24 23 0.8494s382 24 25 0.7934s386 13 63 0.5825s400 24 24 0.8125s420 35 43 0.8971s444 24 24 0.8334s510 25 54 0.7860s526 24 49 0.7241s641 54 21 0.9224s713 54 21 0.9400s820 23 93 0.7420s832 23 94 0.7392s838 67 75 0.9521s953 45 76 0.8886s1!96 32 113 0.8067s1238 32 121 0.7950s1423 91 20 0.9566s1488 14 101 0.5821s1494 14 100 0.5486s5378 214 97 0.9704s9234 247 105 0.9747s13207 700 234 0.9899s15850 611 95 0.9901

822 822 972 271899 1899 2132 52936 936 960 16

3361 3361 3444 842933 2933 3498 10610061 10061 10252 443745 3745 4200 846388 6388 6586 37311 311 384 12

14757 14757 15111 73402 402 513 27299 299 391 23260 260 312 13265 265 312 23476 476 600 25477 477 819 63468 468 576 241350 1350 1505 43480 480 576 241061 1061 1350 54838 838 1176 491046 1046 1134 211066 1066 1134 211587 1587 2139 931598 1598 2162 944784 4784 5025 753039 3039 3420 762917 2917 3616 1133078 3078 3872 1211741 1741 1820 20823 823 1414 101768 768 1400 100

20413 20413 20758 9725278 25278 25935 105162152 162152 163800 23457472 57472 58045 95

a conventional test-per-scan scheme for these testsets. The fractional reduction in tester memoryand testing time is given by 1-a. Furthercompression can be achieved by applying run-length coding (or other coding techniques) to theencoded test set, with the corresponding overheadof on-chip decoders. While the encoding procedurereduces the tester memory (and thereby testing

time) in all cases, the comparison becomes morestriking when we examine the number of testpatterns applied to the core under test during thisperiod. The number of patterns applied to the core(and hence the test rate) is increased several ordersof magnitude. The test quality is therefore signi-ficantly improved due to the increased likelihoodof detecting non-modeled faults.

Page 9: Efficient Test for Systems Using Counters*people.ee.duke.edu/~krish/Chandra_VLSI_Design.pdfvectors provided by the core vendor are stored in compressed form in the tester memory, and

EMBEDDED CORE TESTING 483

We next compare the proposed method with thecompression/decompression method of [5]. Thecompression reported in [5] is higher than thatobtained by us using compact test sets generatedby Mintest. However, the results reported in [5]were obtained using test sets with a considerablylarger number of patterns. Since we do not haveaccess to the test sets used in [5], we carry out acomparison using test sets generated by Atalanta[13]. The sizes of the test sets generated byAtalanta are similar to those used in [5], and thecompression achieved using by our method im-proves significantly for the larger test sets, therebyallowing a reasonable comparison.

Further compression can be expected by apply-ing run-length coding to TE. The results in Table IIshow that if the proposed encoding method is usedwith Atalanta test sets, the tester memory isreduced over [5] in a number of cases. Moreimportantly, the test-per-clock architecture allowsa much larger of patterns to be applied to the coreunder test.We next evaluate the proposed test-per-clock

method for partially-specified patterns (test cubes).We first use test patterns obtained by instrument-ing the Atalanta program, as well as test cubesfrom the Mintest program developed at University

of Illinois. Table III shows that considerable testcompression (over 10X in many cases) is obtainedfor all these test sets. The number of bits stored,i.e., the size of the encoded test, is equal to thetesting time (in cycles). It is also equal to thenumber of patterns applied to the core under test.

Surprisingly, these results also demonstrate thattest set compaction may not always be necessaryduring automatic test pattern generation. Wecompare the amount of test data obtained usingATPG compaction to the amount of test dataobtained by encoding the partially-specified testsets. (The amount of ATPG-compacted test datafor Mintest and Atalanta are obtained from TablesII and III, respectively.) The circuits for whichencoding of test cubes leads to less test data areshaded in Table III. We did not generate thecompacted test sets using Atalanta for the full-scanversions of the ISCAS 89 circuits hence a com-parison was not possible in this case.

Finally, we note that a >_ l/n, since the size ofthe encoded test set is at least m bits. Quiteremarkably, this lower bound is nearly achievedfor several circuits in Table III. For example, fors344 and s349, the lower bound on a is 0.0417.Using the Mintest test sets, we obtain a-0.0437and 0.0494, respectively.

TABLE II Comparison of TRC-based pattern generation for completely-specified Atalanta test sets with test-per-scan method of [5]

Proposed method

Circuit rn a IZl

Testingtime

(cycles)

Test per scan [5]

No. of Testing No. ofpatterns time patternsapplied m ITel (cycles) applied

c432 49 0.84 1489c499 53 0.87 !898c880 53 0.93 2956c1355 84 0.85 2930c1908 144 0.79 3791c2670 103 0.97 23279c3540 179 0.86 7678c5315 122 0.95 20775c6288 40 0.86 1100c7552 215 0.96 42952

14891898295629303791

232797678

207751100

42952

1489 59 1608 1608 591898 58 1449 1449 562956 80 3930 3930 802930 103 2910 2910 1033791 130 3159 3159 130

23279 133 24405 24405 1337678 198 7752 7752 198

20775 214 30501 30501 2141100 36 912 912 36

42952 271 42528 42528 271

Page 10: Efficient Test for Systems Using Counters*people.ee.duke.edu/~krish/Chandra_VLSI_Design.pdfvectors provided by the core vendor are stored in compressed form in the tester memory, and

484

Circuit m

A. CHANDRA et al.

TABLE III Experimental results (test-per-clock) for partially-specified test sets

Atalanta test sets Mintest test sets

No. of bits Amount of No. of bitsstored for test data stored forproposed obtained proposedencoding using ATPG encodingmethod compaction m a method

Amount oftest dataobtained

using ATPGcompaction

c432 128c499 93c880 355c1355 143c1908 434c2670 639c3540c6288 201c5315 1213c7552 845s208 64s298 93s344 85s349 85s382 102s386 102s400 96s420 129s444 107s510 103s526 160s641 200s713 192s820 195s832 199s838 257s953 227s1196 291s1238 164s1423 89s1488 252s1494 250s5378s9234s13207s15850

0.3273 1508 1764 211 0.23870.3913 1492 2173 114 0.65690.2605 5548 3180 406 0.16760.8136 4770 3444 198 0.63270.6919 9909 4752 262 0.57490.2652 39483 23999 775 0.1874

891 0.13060.5176 3329 1280 202 0.56800.3094 66789 21716 1552 0.20180.5030 87929 44505 1133 0.29290.2459 299 81 0.13910.0981 155 127 0.06490.1746 356 129 0.04370.1697 346 126 0.04940.1336 327 118 0.10350.3824 507 120 0.22310.1225 289 120 0.07960.2012 908 161 0.13210.1036 266 128 0.08890.2804 722 130 0.06870.1857 713 207 0.10450.0987 1065 231 0.13950.1396 1447 220 0.13980.2973 1333 253 0.12190.2740 1254 255 0.12520.1456 2506 319 0.12650.1513 1545 255 0.13090.1773 1651 346 0.31960.1916 1005 360 0.34310.1334 1080 524 0.12850.4159 1467 317 0.13700.2666 933 311 0.1282

1459 0.08271929 0.15843237 0.06343290 0.0779

1813 9723070 21324082 6405136 34444970 349833839 102525818 42033671 384

55748 658668694 15111

214 513140 391135 312149 552293 600348 819229 576744 1505273 576223 1350519 11761740 11341661 1134709 2139734 2162

2703 50251502 34203539 36163953 38726127 1820608 1414558 1400

25821 2075875471 25935143658 163800156593 58045

4. CONCLUSIONS

We have presented a novel test vector compressionmethod and decompression architecture for em-bedded core testing. These are based on the use ofa twisted-ring counter, which can operate in boththe shift and twist modes. The serial bit stream

that controls the operation of the countercomprises the encoded test set. The proposedmethod offers a number of important advan-tages- significant test compression (over 10X inmany cases), less tester memory and reducedtesting time, the ability to use a slow tester withoutcompromising test quality or testing time, and

Page 11: Efficient Test for Systems Using Counters*people.ee.duke.edu/~krish/Chandra_VLSI_Design.pdfvectors provided by the core vendor are stored in compressed form in the tester memory, and

EMBEDDED CORE TESTING 485

no performance degradation for the core undertest.The decompressed patterns are applied test-per-

clock, thereby applying a large number of patternsto the core under test and increasing the likelihoodof detecting non-modeled faults. Experimental re-suits for the ISCAS benchmark circuits demon-strate that the proposed test architecture offersan attractive solution to the problem of achievinghigh test quality and low testing time with rela-tively slower, low-cost testers. The experimen-tal results also show that the encoded test setsobtained from partially-specified test sets (testcubes) are often smaller than the compacted testsets generated by automatic test pattern generationprograms. The proposed scheme has a limitationthat it can be only applied to register based designssuch as the designs which employ built-in logic-block observation (BILBO) registers.

[11] Brglez, F. and Fujiwara, H. (1985). A neutral netlist of 10combinational benchmark circuits and a target simulatorin Fortran, Proc. International Symposium on Circuits andSystems, pp. 695-698.

[12] Brglez, F., Bryan, D. and Kozminski, K. (1989)."Combinational profiles of sequential benchmark cir-cuits", Proc. 1989 lnt. Symposium on Circuits and Systems,pp. 1929-1934.

[13] Lee, H. K. and Ha, D. S., "On the Generation of TestPatterns for Combinational Circuits", Tech. Rep. No.12_93, Dept. of Electrical Eng., Virginia Poly. Instituteand State Univ.

[14] Rajski, J., Tyszer, J. and Zacharia, N. (1998). "Testdata decompression for multiple scan designs withboundary scan", IEEE Transactions on Computers, 47(11),1188-1200.

[15] Zacharia, N., Rajski, J., Tyszer, J. and Waicukauski, J. A.(1996). "Two-dimensional test data decompressor formultiple scan designs", Proc. International Conference,pp. 186-194.

[16] Chakrabarty, K., Murray, B. T. and Hayes, J. P. (1998)."Optimal zero-aliasing space compaction of test re-sponses", IEEE Transactions on Computers, 47(11),1171-1187.

[17] Savir, J., "Shrinking wide compressors", IEEE Transac-tions on Computer-Aided Design, 14, 1379-1387, Nov.,1995.

References

[1] Zorian, Y., Marinissen, E. J. and Dey, S. (1998). "Testingembedded-core based system chips", Proc. InternationalTest Conference, pp. 130 143.

[2] Immaneni, V. and Raman, S. (1990). "Direct access testscheme-design of block and core cells for embeddedASICs", Proc. 1990 International Test Conference,pp. 488-492.

[3] Cheng, K.-T. and Lin, C.-J. (1995). "Timing-driven testpoint insertion for full-scan and partial-scan BIST", Proc.1995 International Test Conference, pp. 506-514.

[4] Fagot, C., Girard, P. and Landrault, C. (1997). "On usingmachine learning for logic BIST", Proc. International TestConference, pp. 338-346.

[5] Jas, A. and Touba, N. A. (1998). "Test vector decom-pression via cyclical scan chains and its application totesting core-based designs", Proc. International TestConference, pp. 458-464.

[6] Chakrabarty, K., Murray, B. T. and Iyengar, V. (1999)."Built-in test pattern generation for high performancecircuits using twisted-ring counters", Proc. VLSI TestSymposium, pp. 22-27.

[7] http://notes.sematech.org/97pelec.htm, The NationalTechnology Roadmap for Semiconductors (NTRS), Sili-con Industry Association (SIA), 1997.

[8] Agrawal, V. D. and Chakraborty, T. J. (1995). "High-performance circuit testing using slow-speed testers",Proc. International Test Conference, pp. 302-310.

[9] Hamzaoglu, I. and Patel, J. H. (1998). "Test set com-paction algorithms for combinational circuits", Proc.International Conference on CAD, pp. 283-289.

[10] Garey, M. S. and Johnson, D. S. (1979). Computers andIntractability: A Guide to the Theory of NP-Completeness,Freeman, W. H. and Company, New York.

Authors’ Biographies

Anshuman Chandra received the B.E. degree inElectrical engineering from the University ofRoorkee, Roorkee, India, in 1998, M.S. degree inElectrical and Computer Engineering from DukeUniversity, Durham, USA, in 2000, and iscurrently enrolled in the Ph.D. program at DukeUniversity. His research interests are in the fieldsof VLSI Design, Digital Testing and ComputerArchitecture. He is a recipient of TTTC JamesBeausang Student Paper award for DFT for apaper published in Proc. 2000 IEEE FLSI TestSymposium. He is currently working in the areasof test set compression/decompression, embeddedcore testing and built-in self test (BIST). He is astudent member of IEEE.Krishnendu Chakrabarty received the B. Tech.degree from the Indian Institute of Technology,Kharagpur, in 1990, and the M.S.E. and Ph.D.degrees from the University of Michigan, AnnArbor, in 1992 and 1995, respectively, all inComputer Science and Engineering. He is nowAssistant Professor of Electrical and Computer

Page 12: Efficient Test for Systems Using Counters*people.ee.duke.edu/~krish/Chandra_VLSI_Design.pdfvectors provided by the core vendor are stored in compressed form in the tester memory, and

486 A. CHANDRA et al.

Engineering at Duke University. Dr. Chakrabartyis a recipient of the National Science Foundation’sEarly Faculty (CAREER) award, the office ofNaval Research Young Investigator Award andthe Mercator Professor award from the DeutscheForschungsgemeinschaft, Germany for carryingout research at the University of Potsdam during2000-2001. His current research projects (sup-ported by NSF, ONR, DARPA and industrialsponsors) are in system-on-a-chip test, embeddedreal-time operating systems, distributed sensornetworks, and architectural optimization of micro-electrofluidic systems. He has published over 65papers in archival journal and refereed conferenceproceedings, and he holds a US patent in built-inself-test. He is a senior member of IEEE, member

of Sigma Xi, serves as Vice Chair of TechnicalActivities in IEEE’s Test Technology TechnicalCouncil, and is a member of the program com-mittees of several IEEE/ACM conferences andworkshops.Mark C. Hansen supervises the IC Design Auto-mation Operations groups at Delphi Delco Elec-tronics Systems in Kokomo, Indiana. His researchinterests include functional-level fault modeling,design for test, ATPG, BIST, and analog test. Heholds a dozen patents. Hansen received a BSEEfrom Michigan State University, and MSEE fromCarnegie Mellon University, and a Ph.D. in Com-puter Science and Engineering from Universityof Michigan, Ann Arbor. He is a member of theIEEE and the IEEE Computer Society.