logic diagnosis with improved resolution - uni-stuttgart.de filealgorithms for design-automation -...

Algorithms for Design-Automation - Mastering Nanoelectronic Systems

Logic Diagnosis With Improved Resolution

ALEJANDRO COOKINFOTECH

SUMMER SEMESTER 2007

SUPERVISOR:

STEFAN HOLST

July 17, 2007

2

Table of contents

1. Introduction 3

2 Diagnosis in General 4

2.1 Logic Diagnosis 4

2.1.1 Fault Tuples 4

2.2 Volume and Precision Diagnosis 5

2. 3 Cause-Effect and Effect-Cause Diagnosis 6

2.4 Fault Diagnosis Objectives 7

2.4.1 Fault Localization 7

2.4.2 Fault Identification 7

3 DIAGNOSIX: A diagnosis methodology 8

3.1 Overview 8

3.2 Stage 1: Defect Localization 12

3.2.1 Path-tracing 12

3.2.2 Per-test diagnosis 13

3.2.3 Passing parameter validation 16

3.3 Stage 2: Behavior Identification 17

3.3.2 Cover Forest analysis 17

3.3.2 Neighborhood function extraction 18

3.4 Stage 3: Behavior validation 19

4 Results 20

4.1 Applicability 22

5 Conclusions 23

References 24

3

1 Introduction

The periodic advances in semiconductor technology have placed strong challenges in the manufacturing

process of VLSI circuits. The higher level of integration in each process generation has led to an

increasing level of complexity in today's designs, and thus more resources and effort have been recently

devoted to test and diagnosis.

The goal of testing is to determine the presence of defects in a single chip, diagnosis, on the other hand,

focuses on their location and identification. The diagnosis information can later be used to improve the

outcome of the manufacturing process; this is specially important during the first spins of a new design

in a new process technology, when the yield is usually very low, and more detailed defect information

could provide valuable insight into the cause of the problems.

Yield learning is defined as the collection and application of process knowledge to improve yield by

identifying and locating systematic and random manufacturing events [8]. The complexity of today´s

designs is hard to model and, therefore, it is very difficult to obtain accurate defect information by

simulation; this makes yield learning extremely important during the first silicon of new designs

because it characterizes defects at the physical level. With this information in mind, the designers are

able to modify the circuit in order to decrease the number of systematic defects, and hopefully obtain a

steep yield ramp in the next process cycle. Rapid yield learning is key to the success of the electronic

industry, where time-to-volume makes a difference in market share and lost revenues.

Diagnosis can be defined as the process of identifying and locating faults in a chip. It uses different

levels of abstraction to discover potential physical defects. It is a software-based approach, and, under

certain assumptions, is essentially independent of of the physical complexity of the device under

diagnosis (DUD), and can, therefore, be used to analyze large amounts of data [4]. Fault diagnosis is

often the first step towards identifying and locating defects, and its goal is to enable physical failure

analysis (PFA), that is to say, a closer, more detailed look at the physical cause of the failure.

As the size and the complexity of the chip increase, the diagnosis approaches using simple fault models

are becoming more limited. In this paper, DIAGNOSIX, a new methodology for fault diagnosis, is

introduced. DIAGNOSIX is not heavily constraint by the assumptions of traditional fault models;

4

instead of checking the occurrence of a static fault model to explain the defect behavior, DIAGNOSIX

extracts a consistent fault model from the given set of test patterns and the physical neighborhood

surrounding suspect lines. Although DIAGNOSIX does rely on some key assumptions regarding

defects, they are considered to be weaker that those used in traditional approaches.

In the next pages, fault diagnosis is presented in general to provide a context for the following

discussion of DIAGNOSIX. Then, a few definitions relevant to the new methodology are explained and

the underlying defect assumptions are explicitly formulated. In the following pages, the complete

methodology, its evaluation and applicability are covered in detail.

2 Diagnosis in general

2.1 Logic Diagnosis

It is possible to distinguish between several strategies in the context of VLSI diagnosis. Logic

diagnosis refers to the diagnosis of random logic and it makes use of the knowledge and

technology available in the logic testing domain. As when dealing with testing, if the structure of

the target circuit is regular and known to fail in specific ways, specialized techniques may yield

better results, for example, the diagnosis of on-chip memories calls for different algorithms and

fault models from those used in logic diagnosis. Scan chain diagnosis also has particular needs and

is addressed with a different set of tools.

2.1.1 Fault tuples

The single stuck-at line (SSL) fault model has been so far the most common fault model for

testing random logic in VLSI circuits, however, according to recent studies [2], this abstraction

does not accurately match the real physical defect behavior, and thus makes diagnosis tasks

difficult. Several more accurate fault models have been explored to overcome this limitation,

and their number is likely to keep increasing in coming years due to more complex designs and

the increasing parametric variations in the manufacturing process of future circuit geometries.

In this context, a new fault modeling mechanism has been proposed to represent arbitrary

misbehaviors by making use of fault tuples. Fault tuples can combined in order to describe

arbitrarily complex faults and provide a generalization to express known fault models in a

single notation. This way, it is possible to perform simultaneous analysis of a many defect

5

mechanisms using the same methodology.

A fault tuple is a 3-tuple represented as ⟨l , v , t ⟩ , l denotes a given signal line, v is a value,

and t represents a clock condition. The possible set of values for these parameters is shown

next:

l∈{lines}v∈{0,1 , D ,D }

t∈{i ,iN , i, i}

A fault tuple is said to be satisfied, if the line l is controlled to the value v within a time frame

represented by t. Additionally, the misbehavior (if any) should be propagated to an observable

output.

The parameter t specifies a clock cycle condition for controlling a line and its meaning

depends on the value to be assigned; for example, if v ∈ {0,1 } ∧ t=i , l must have been

controlled by the ith clock cycle. On the other hand, if v ∈ {D , D } ∧ t=i , the discrepancy

{D , D } must manifest itself by the ith clock cycle. Likewise, the condition t=iNmeans a constraint within N clock cycles after the reference ith clock cycle. The rest of the

conditions can be explained in a similar manner.

A product of tuples in the form ⟨ l1 , v1 ,T K ⟩⋅⟨ l2 , v2 , T L⟩ is satisfied if and only if all tuples in

the expression are satisfied. Several products can be joined together in a macrofault in order to

model arbitrary misbehaviors.

2.2 Volume and Precision Diagnosis

Fault diagnosis may pursue two different purposes in the yield learning process. In volume

diagnosis, a large number failing chips are analyzed to discover systematic defects. If a given

number of chips shows the same faulty behavior, it is likely that the same physical defect is present

in all all of them, and the manufacturing process can be improved to reduce its occurrence.

Volume diagnosis is constraint by the large amounts of data it must handle and the time it takes to

analyze the behavior of every chip.

Precision diagnosis, on the other hand, is performed on a small quantity of failing chips, like the

first silicon or a sample chip from a group of chips with the same systematic fault. Its objective is

to locate faults with sufficiently high resolution, and provide detailed information on the nature of

6

physical defect. In precision diagnosis, a great deal of effort is paid to obtain enough accuracy as to

guide PFA.

2. 3 Cause-Effect and Effect-Cause Diagnosis

There exist two classic approaches to fault diagnosis: cause-effect diagnosis relies on a fault model

to predict the output value of a faulty chip. In this approach, a fault dictionary is constructed by

pre-calculating the set of possible responses to all input patterns for all modeled faults; and in the

next step, when output responses from the DUD are compared and matched against the fault

dictionary, potential faults can be identified.

The main disadvantage of this technique is that it depends on fault models, and these might not

accurately describe the real defects mechanism in complex CMOS technologies. For this reason,

finding a suitable fault model becomes one of the main challenges in cause-effect analysis.

A second disadvantage of cause-effect algorithms is that it assumes a single fault in the DUD

because the huge number of fault sites in a multiple fault model makes the generation of the fault

dictionary infeasible. Furthermore, even when all individual fault can be identified, multiple faults

are not guaranteed be detected in the presence of fault masking [10]. The single fault assumption is

of course broken when two or more faults are present in the chip, or when a defect manifests itself

as multiple faults in the chosen fault model.

In Effect-cause diagnosis, the chip outputs are observed and deductive reasoning is performed,

starting from primary outputs to primary inputs, in order to identify potential faults, consistent with

the output responses. This technique does not make use of a fault dictionary since the fault

information is processed directly when the defect symptom is encountered during testing,

moreover, this approach is better suited to analyze the occurrence of multiple faults and their

masking relations because these problems can be handled within the logic reasoning to detect and

locate faults.

The main drawback of the algorithms in this approach stems from the computational effort required

to infer potential faults that explain the faulty behavior.

7

2.4 Fault Diagnosis Objectives

2.4.1 Fault localization

Fault localization techniques usually employ simple fault models and matching algorithms to

identify a faulty signal line. The location of this signal is used as a first approximation to PFA.

Diagnosis methods for fault localization, like that introduced in [10], use a cause-effect

approach and heavily rely on fault simulation to build a fault dictionary.

Fault simulation with diagnosis in mind requires a different approach from that used in testing

to calculate fault coverage. Traditionally, in order to reduce simulation time, a fault is

immediately removed from the simulation set after it has been detected by one input pattern,

however, this measure limits the resolution of the fault dictionary. The reason for this is that

many different faults can produce the same response to the same input pattern, and the

responses to other patterns need to be known to narrow down the number of potential faults.

A diagnosis simulator needs to simulate a fault until it can uniquely identify it based on the

output responses this fault produces, that is to say, a fault can be dropped only when the

simulator finds a difference between the test failure output, and the expected failure output that

can be caused by this fault.

The size of the fault dictionary becomes a problem as the responses for all faults to many input

patterns need to be stored, furthermore, most of the failure information is not used at all since

only a small part of all simulated faults will actually be present in the DUD. A way to overcome

this situation is suggested in [10], where they perform fault simulation after testing and they

guide the diagnosis towards the relevant faults in accordance to the observed behavior. The key

point in this optimization is to select the appropriate faults to include in the simulator to achieve

a good fault resolution without too much computational overhead as to made the process too

time-consuming.

The methods outlined in the previous paragraphs still suffer fro the disadvantages of cause-

effects paradigm, and their inability to describe defect mechanisms precisely has become a great

obstacle for its performance in recent years.

2.4.2 Fault Identification

Techniques for fault identification try to assess the presence of a presumed fault in the DUD.

They attempt to circumvent the limitations of a simple fault model by, either exploring more

8

complex models [1], or using a fault model independent approach [5], [6].

The advantage of fault identification over fault localization is that it captures the faulty behavior

in a way that can be mapped back to real physical defects. When using complex fault models,

however, the success of this approach relies again on how accurately they can represent the

defects in the DUD. In addition, complex fault models also need a higher computational effort

to simulate and, therefore, their use might become very time-consuming or even infeasible for

large circuits. The use of multiple fault models is not guaranteed to yield optimum results, since

the chip may fail in unexpected ways, and it is unlikely that all defect mechanisms will be

exactly matched.

Other fault identification methods weaken the requirements on the fault models by directly

extracting the defect behavior from the DUD responses [4]. These techniques use an effect-

cause analysis to obtain a first approximation of the faulty sites and, in a second stage of the

algorithm, try to somehow improve the resolution. One possible way to improve resolution is to

use the faulty sites and DUD responses to guide ATPG test pattern generation [5]. A second

alternative for better resolution is to reason over the behavior of faulty lines by studying the

passing patterns. This approach is used in DIAGNOSIX and will be further explained later in

this paper when the algorithm is detailed.

3 DIAGNOSIX: A diagnosis methodology

3.1 Overview

The goal of DIAGNOSIX is to identify logical faults that accurately capture the defect mechanism

in the DUD. These logical faults are extracted from the logic behavior of the circuit and its layout

information and hence need not be an instance of a traditional fault model. Logical faults consist of

both faulty signal lines and the set of logical conditions that produce the faulty behavior; they can,

thus, model defects that produce repeatable logic-level misbehavior.

The inputs to DIAGNOSIX, like in most diagnosis methods, include the complete logic-level

description of the DUD, the set of passing test patterns and the failing patterns with their

corresponding failing outputs. Additionally, DIAGNOSIX requires some limited layout

information so that the neighborhood of suspect faulty lines can be determined.

DIAGNOSIX was designed for combinational circuits, so it is assumed that the DUD is purely

combinational or implements full scan.

9

The methodology of DIAGNOSIX relies on a few simplifying assumptions in order to make the

diagnosis problem tractable. The first assumption states that defects need to cause one or more

failing outputs for at least one input test pattern, additionally , the faulty components in the circuit

(transistors, gates, interconnects) must be located within a given intra-layer physical distance from

the faulty line. In the case of multiple faults, the same methodology is applicable if all faults are in

this sense localized to a specific region of the DUD. For instance, if a fault is detected in line s3 in

Figure 1, the responsible defects are restricted to those within a radius r of s3. In this case, the lines

s2, s4 and s5 are also considered in the following steps of the algorithm and the lines s1 and s6 are

discarded from further analysis.

The second simplifying assumption requires the behavior caused by the defects not change over

time, that is to say, defects must be repeatable and their behavior can always be reproduced with

the same set of input patterns. Finally, it is assumed that the scan circuitry is fault-free and cannot,

by itself, produce failing outputs.

In the context of the DIAGNOSIX methodology, the neighborhood of line li is defined as the set of

lines that: are physical neighbors of li, drive li or one of its physical neighbors, or drive a gate which

is also driven by li (side-inputs).

Figure 1. The defects are localized to a region of radius r from the faulty line s3 [4].

Figure 2(a) shows a bridge fault between lines S6 and S9 and Figure 2(b) shows some layout

information for the same circuit. It can be seen that S9 has two physical neighbors, namely S7 and

10

S10. The neighborhood of S9 is then comprised of S7 and S10, the drivers of S9: S3, S4 and S8, the

drivers of S7: S1 and S2, and the drivers of S10: S4, S5. Even when S10 were not a physical neighbor

of S9, it would still be a part of the neighborhood of S9 because it drives a S14, which, is also driven

by S9.

Likewise, the neighborhood state of a line li, for a pattern ti, is the set of fault-free logical values

present in the neighborhood of li, when ti is input to the DUD. The localization assumption implies

that the neighborhood state of a line determines when it becomes faulty [4]; in turn, this

generalization usually captures the nature of the behavior many real defects like bridges, opens,

transistor stuck-open, gate-oxide shorts, etc.

Figure 2. Neighborhood of a line: (a) gate-level circuit.

DIAGNOSIX is an effect-cause methodology for precision diagnosis and it is capable of

performing both fault localization and identification. Figure 3 shows the procedural flow of the

diagnosis activities; in the first stage, the failure information from the tester is analyzed and the

suspect faulty lines are identified, then, in the second stage, the logical conditions that make the

target lines faulty are determined by studying the test pattern responses, the logic-level description

of the DUD as well as its layout information. The observed fault, the suspect lines and their

neighborhood state in this stage make up an extracted fault model to explain the the real behavior

of the chip.

11

Figure 2. Neighborhood of a line: (b) layout information

This model is again verified in the behavior validation stage, when the derived faults are simulated

and compared against the DUD responses. In order to better represent and simulate faults, fault

tuples are used in this methodology since they can describe arbitrary trigger conditions.

In the last stage of DIAGNOSIX, Focused ATPG, new test patterns are generated to improve

accuracy, confidence and resolution.

The final output of the algorithm is a set of candidate faults. The accuracy of the candidate is the

ratio between the total number of patterns applied to the DUD, and the number of patterns that can

be explained by this fault. A fault explains a given pattern if the result of the fault simulation

matches that of the real DUD. Resolution is the inverse of the number of candidates, and, finally,

resolution is ratio between the number of neighborhood states produced by all test patterns, and the

total number of possible states in the circuit.

12

Figure 4. Overview of DIAGNOSIX [4]

3.2 Stage 1: Defect Localization

The main purpose of this stage is to find potential faulty lines, that is, to identify signal lines that

are likely to be influenced by a physical defect. To achieve this localization, three main steps are

performed: path-tracing [9], per-test diagnosis [6] and passing pattern validation (PPV) [3].

3.2.1 Path-tracing

Path-tracing, in its simplest form, statically traces the output responses back to the inputs

through the combinational gates so that an input cone can be constructed for each observable

output. All the signal lines in the input cone are identified as potential faulty lines that could

cause the observed faulty behavior.

It is possible to reduce the number of candidate lines if the results from fault-free simulation

are taken into account when selecting lines from the input cone. This is called dynamic path-

tracing. With this approach, it can be proven that some signal lines cannot be responsible for

the faulty outputs, and therefore, they can be safely removed from further consideration.

According to one of DIAGNOSIX's key assumptions, path-tracing in this step must only be

concerned with combinational logic. Fortunately, this problem is well understood, and the

existing algorithms are guaranteed to find all possible candidate signals for a given observed

failure [6].

Since the ultimate purpose of this methodology is to identify defect behavior, the output of the

path-tracing procedure is augmented with additional information, and the polarity of the error

on a line is also annotated in the process output. This information is obtained anyway in

dynamic path-tracing and can reduce simulation time in subsequent steps the the localization

13

strategy.

Formally, the output of the path-tracing step is a set Sp, holding entries of the form liv, where li

is a suspect faulty line and v is its error polarity where v∈⟨0,1⟩ .

3.2.2 Per-test diagnosis

Although the number of lines in the output set of path-tracing step is much smaller than the

total number of lines in the DUD, it can be still large enough to be an obstacle for precision

fault diagnosis. This is why, per-test diagnosis is used to further reduce the number of potential

faulty lines.

Per-test diagnosis is a fault localization technique that attempts to model realistic defect

behavior by making use of simple fault models that can be efficiently simulated in software

tools. This method is also knows as single location at a time (SLAT) because it uses only those

patterns for which the defect behavior can be explained by a single fault location.

SLAT makes use of the stuck-at fault model to gather defect information, however, it is not

assumed that this model characterizes defect behavior. When a test pattern is applied to the

DUD, the defect is modeled as a set of stuck-at faults, however, this faults need not be present

consistently in all test patterns, for instance, it may be possible that an input pattern reveals a

stuck-at 0 for a given line, but for a different pattern the same line might behave as a stuck-at

1, or even as a fault-free line.

In the context of combinational logic, one fundamental assumption is needed so that the SLAT

diagnosis becomes feasible: all observed fails for an input pattern can be explained exactly by

at least one stuck-at fault that affects a single pin. This assumption is known as the SLAT

property, and it means that, independently of the the fault activation condition, the failure

behavior of each pattern can be modeled with stuck-at faults on a single line. One of the

advantages of the SLAT property is that it allows the use of the already available set of tools

and knowledge base for stuck-at faults to study other kind of defects more realistically.

The input patterns can be classified into SLAT patterns, that is, failing patterns with the SLAT

property, non-SLAT patterns and passing patterns. In per-test diagnosis only SLAT patterns

14

are studied, whereas the passing patterns are considered in later stages of the DIAGNOSIX

methodology.

There are two main risks in this methodology. Firstly, it may happen that a defect, in response

to a given input pattern, causes errors on more than one line, but the errors propagate to

observable outputs in such a way that the fails can be explained by a single stuck-at fault. This

may lead to a simplified scenario where a complex defect mechanism is modeled as a simpler

one. Secondly, a very serious risk rises when non of patterns has the SLAT property and the

diagnosis fails. This first risk is unavoidable in any diagnosis strategy, while the second is

considered low since there are usually enough patterns with the SLAT property [4].

The SLAT diagnosis within the context of the DIAGNOSIX methodology proceeds as follows:

the set of faults Sp, obtained from the path-tracing step, is used as the initial set of candidate

faults and the input patterns are simulated to identify all the SLAT patterns. Only the faults

that independently explain the output responses for one or more patterns, and justify the SLAT

property, are stored in a table along with its SLAT pattern. As Figure 5 shows, all faults are

simulated for all patterns in a double loop.

The information in the table is represented as a set of temporary stuck lines (TSL), and a TSL

is defined as li /v T K , where line li is stuck at value v for a given set of test patterns TK

which belong to the total set of patterns T ( T K∈T ).

Each TSL li /v T K is ranked by counting the number of patterns it explains: a given TSL

li /v iT K is considered to be better that other TSL l j /v j

T L if T L⊂T K⊂T . This

classification comes from the intuitive idea that the fault that explains the most patterns is

more likely to be closer to the actual defect location. With this consideration in mind, a cover

forest is constructed to represent the fault and their relationships. Formally, there is an edge

between vertices li /v T K and l j /v j

T L if T L⊂T K⊂T , the higher ranked faults are

always placed closer to the roots, and the faults that explain patterns for which there are no

other explaining faults, are placed at the roots of the forest. Figure 6 shows an example of a

forest cover, in this case T 1≠T 2 , and T 5⊂T 2⊂T 1⊆T .

15

Figure 5. SLAT diagnosis flow [6]

Figure 6. Example cover forest

16

3.2.3 Passing pattern validation

The cover forest described in the previous section lists the identified potential fault locations,

however, due to fault dominance and equivalence, some of these faults may lead to incorrect

results. In this step of the methodology, the assumptions described previously in this

discussion are exercised to discard misleading TSL faults.

The problem at hand comes from the presence of equivalent faults in the forest cover, that is,

faults that behave identically for the same input pattern. The simplest example of this situation

is a defect on the input of a NOT gate that manifests itself as a stuck at zero li /0T K . The

cover forest for this example will include li /0T K , the correct fault location, but will also

contain the incorrect fault lo/1T K on the output of the NOT gate.

Passing parameter validation (PPV) relies on the observation that the logical conditions that

activate a faulty line in a failing pattern cannot be present in a good pattern that also sensitizes

the line. According to the underlying assumptions, the neighborhood state of a TSL li /vT K ,

when the set TK is applied to the DUD, represents the activation condition for the fault, and

consequently, there can be no passing pattern that produces the same neighborhood state in the

line, and, at the same time, is propagates the fault to an observable output. All TSL faults for

which this check is not fulfilled are removed from the cover forest.

For example, Table 1 shows the simulation results for the DUD shown in Figure 2(a). The TSL

fault s 9/1T K explains both failing patterns, that is T K=⟨ t1 , t 2⟩ , and is included in the

forest cover. The TSL fault s12 /0T K will also be included since it is equivalent to

s 9/1T K . If the neighborhood of s12 is assumed to be ⟨s5 , s7 , s8 , s10⟩ , it can be seen that,

for the TSL fault s12 /0T K , the neighborhood state of the failing pattern t1

⟨s5 s7 s8 s10⟩=⟨0110 ⟩ is the same as that of the passing pattern t3 , so this fault is removed

from the cover tree. For s 9/1T K , none of the failing patterns produces the same

neighborhood stage as the passing pattern, which means that this fault is retained.

The passing parameter validation step relies on two very important assumptions: the complete

defect excitation is captured by the neighborhood state, so the defect locations must be

17

confined to a radius r from the faulty line, and finally, there can be no faults in the

neighborhood of the faulty lines. This condition is of course necessary to compare the values

of the neighborhood states of the failing and passing patterns.

3.3 Stage 2: Behavior Identification

Once the potential defect locations are known, the behavior identification step infers the logical

conditions that the produce the faulty lines. The goal in this process is to extract a fault model,

from the information gathered so far, that captures the logic behavior of the physical defect.

The input of this methodology stage are the reduced forest cover and the neighborhood

information for each potential faulty line. Figure 7 shows the flow of activities for behavior

identification and validation.

3.3.1 Cover Forest analysis

As Figure 7 shows, cover forest analysis is the first activity in the behavior identification

procedure. Each TSL fault in the reduced cover forest represents a faulty line that explains a

subset of the SLAT parameters used in the localization phase. The objective at this point is to

find a group of TSL faults that together explain the complete set of SLAT patterns. Once one

such cover is identified, all its failure information is used to create a macrofault. If a consistent

fault model for this macrofault is identified, the Focused ATPG stage is entered, and new

patterns will be generated, based on the candidate model, to further improve diagnosis

accuracy and resolution.

If the number of TSL faults is small, all their combinations can be considered for macrofault

Test Pattern

FailStatus

Neighborhood states1 s2 s3 s5 s7 s8 s10

t1 failing 0 0 0 0 1 1 0t2 failing 0 0 1 0 1 0 0t3 passing 0 1 0 0 1 1 0

Table 1. Fault-free simulation values on the neighborhood lines for the circuit in Figure 2(a)

18

formation, however, if this is not the case, the analysis of every single cover becomes

infeasible, for this reason, some heuristics need to be taken into account to guide the cover

selection. The structure of the cover forest can provide a good selection heuristic by noticing

that the TSL faults that explain more patterns are closer to the roots the the forest, so they

should be given preference over the leaf nodes.

The DIAGNOSIX implementation builds covers comprised only of root nodes.

Figure 7. Flowchart for behavior identification and validation

3.3.2 Neighborhood function extraction

This phase of the methodology identifies, for each TSL fault li /v T K present in the selected

cover, the logical conditions that cause the DUD to behave like the fault li /v for the test set

T K , and a consistent fault model is constructed in order to represent the defect activation

19

mechanisms.

Neighborhood function extraction (NFE) finds a logical function that includes all the causes of

defect excitation on a single line. To construct such a function, the neighborhood states

resulting from the failing patterns can be considered as minterms of a truth table. Likewise, the

neighborhood states from the passing patterns represent maxterms, and the rest of the states

are included as “don't cares”. After the table is complete, boolean minimization techniques can

be applied to find a sum-of-products expression known as neighborhood function. This

procedure is repeated for all TSL faults in the selected cover.

It is worth to note make clear that including a set of “don't cares” when deriving the

neighborhood function is a heuristic; although the procedure addresses both passing and

failing patterns, it does not capture all the possible neighborhood states, that is, the states

produced by patterns not included in the test set, therefore, in order to improve the diagnosis

confidence, additional patterns must be generated.

3.4 Stage 3: Behavior validation

Until now, the information on each of the TSL faults partially models the DUD defect mechanism;

in the Behavior validation step, the TSL faults in the selected cover with their corresponding

neighborhood functions are joined together to produce a single macrofault that, by itself, models

the complete defect behavior. To ensure consistency in the final model, the resulting macrofault is

simulated using all available patterns (passing, SLAT and non-SLAT) and the obtained results are

compared with the real responses from the DUD. Based on this comparison, some macrofaults are

rejected on the grounds of poor accuracy, while others will be considered model candidates.

Within the DIAGNOSIX methodology, only macrofaults with 100% accuracy will be deemed

successful.

In order to merge the information about all TSL faults into a single simulatable macrofault, each

the fault and its neighborhood function are represented as fault tuples and their product is

constructed. Each product of tuples contains all necessary information to explain some part of the

defect behavior. Finally, the macrofault is constructed by joining all the product of tuples together.

20

4 Results

In [4], the authors of the DIAGNOSIX methodology analyze the diagnosis outcome for two sets of

experiments. In the first experiment, a number of typical faults are injected in the description of a

circuit and, after simulation, the expected signal responses are fed to DIAGNOSIX; additionally, 5 real

failing chips with PFA results are also analyzed. This part of the experiments serves as a way to

validate the diagnosis output, because, in the first case, the defect behavior and its location are

completely known, while in the latter, the already available physical analysis provides the real location

of the chip defects.

Later, in the second experiment, 830 failing chips, provided by an industrial partner, are diagnosed and

analyzed. From the resulting outcome, and based on the possible resolution and accuracy obtained in

the previous controlled experiment, it is possible to assess the value of this methodology in real

industrial production.

For the first experiment set, several traditional and non-traditional fault models were used to evaluate

the performance of the methodology. In particular, the following fault models: two-wire biased-voting

bridge, wired-bridge, interconnect open, input pattern, three-line bridge, multiple stuck line MSL and

net faults, were injected five times each in a separate instance of the same design, giving a total of 35

circuit responses. Out of these 35 devices, 4 of them, injected with MSL faults, have non-SLAT

patterns, which account, at most, for 42 % of the total number of patterns.

The localization stage of the DIAGNOSIX methodology is able to identify a set of faulty lines that

includes the real defect lines for all but two of the 35 devices. The first localization error occurs in a

two-wire biased-voting bridge, in this case, one of the lines in the bridge is never included in the cover

forest. The reason for this is that a TSL fault in this line is never exercised by the initial the test set, and

therefore the methodology fails to consider this line faulty.

The second localization error is made in one of the circuits injected with MSL faults. In this case, the

real faulty lines are included in the cover forest in the per-test diagnosis step but they are later removed

during passing parameter validation. If all the initial assumptions are maintained, no signal should be

incorrectly removed from the cover forest, however, for this circuit, the neighborhood of this signal also

contains faults, which violates the given assumptions for PPV. It is worth to note that this localization

result, even when the MSL error is present, can be regarded as a total success in the traditional sense,

21

since it identifies at least one of the faulty signal lines.

The analysis of the intermediate outputs for the 35 DUDs suggests that the methodology can greatly

reduce the number of candidate fault lines in each step of the localization procedure. The fault

localization of a typical example of those circuits starts with 3303 faulty lines after path-tracing, and

this number is reduced to 211 and 94 in per-test diagnosis and PPV, respectively. On the average case,

per-test diagnosis reduced the number of suspect lines by 93% while PPV further reduced this number

by 54%.

The analysis of the five real circuits with PFA results yielded slightly worse numbers for the reduction

of suspect lines, where the number of lines was reduced, on average, by 90.3% after PPV, however, for

these 5 cases, PFA showed the all the real defect locations are included in the cover forest of every

DUD. Table 2 shows the detailed number of identified faulty lines after each localization step.

In the behavior validation step, DIAGNOSIX found suitable candidate macrofaults for all but 4 of the

35 target DUDs. As expected, one of this circuits is that for which a faulty line was dropped from the

cover forest, and therefore, no candidate macrofault can be identified. The other 3 DUDs have also

been injected with MSL faults, but this time the heuristic of choosing macrofaults from the roots of the

cover forest does not achieve 100% accuracy in the validation step. It is hence necessary either to

search the cover forest for another macrofault, or to relax the validation criteria and accept the

available fault candidates.

It is important to notice that for 24 circuits, the number of macrofaults is not reduced after the behavior

validation step and the large number of possible candidates results in poor diagnosis resolution. The

Size of SpNo. of TSL faults

After per-test After PPVChip #1 1131 51 31Chip #2 4141 176 89Chip #3 408 114 67Chip #4 2217 106 68Chip #5 397 136 76

Table 2. Localization results for 5 chips[]

22

problem lies in the limited diagnosis capabilities of the initial test set, and could be improved by

generating new patterns specifically produced to identify differences between candidate macrofaults.

For each of the 5 real circuits with PFA results, DIAGNOSIX identifies at least one candidate

macrofault. Furthermore, the number of lines in the neighborhood of each TSL fault can also be taken

into account in order to reason about the nature of the defect in the chip. More specifically, for three of

the DUDs, there is one candidate consisting of a single TSL fault without neighbors, this means that the

fault does not depend on any other signal, and, therefore, there is either a front-end-of-the-line (FEOL)

defect, that is, a defect that causes a faulty gate, or a short to one of the power lines. For the remaining

two DUDs, the candidate macrofaults are composed of multiple TSL faults with as many as 25

neighborhood lines. On the same grounds as in the previous analysis, the FEOL defects can be

discarded and some back-end-of-the-line (BEOL) defects, that is to say, defects that cause interconnect

errors like opens and bridges, are more likely to be present. The results from the physical analysis of

these circuits confirm these observations about the nature of the defects in the chips.

4.1 Applicability

According to the authors of the methodology, DIAGNOSIX can be used to guide PFA by

reducing the number of potential faulty sites to be observed. Even when the defects causing the

faulty lines in a candidate macrofault could span several metal layers, or a large region in the

same layer, the neighbors of the faulty line limit the physical region that has to be inspected to

find physical imperfections.

Although the application of the methodology relies on a few weak assumptions, the accuracy and

confidence of the diagnosis output depends more heavily on the characteristics of the observed

fault; that is to say, the quality of the results in the localization and validation steps is sensitive to

defect behavior. For instance, the methodology may yield sub-optimal results in the presence of

MSL faults because they are not likely to be fully characterized by a single stuck-at fault in per-

test diagnosis, or because another defect is present in the neighborhood of a faulty line and, thus,

the validation step incorrectly removes the line from the cover forest.

The described experimental results show that MSL faults are the most severe limitation of the

DIAGNOSIX methodology, however, for the real silicon chips, this shortcoming did not hinder

the accuracy and confidence of the diagnosis results. Nonetheless, this situation could become a

23

serious issue in other process technologies or design scenarios.

The results of the analysis of 830 failing chips suggest that DIAGNOSIX is also able to model the

characteristics of the failures in a manufacturing process; in particular, DIAGNOSIX identified

five or less candidate macrofaults with 100% accuracy for 71% of the total chips.

DIAGNOSIX found 530 circuits (61%) for which the lines in the macrofaults have no physical

neighbors. As stated earlier, the number of physical neighbors can be considered a mean to

quantify the extent of the physical region that has to be inspected in PFA and, consequently,

DIAGNOSIX not only improves diagnostic localization by 70% with respect to previous

approaches [10], but also hints that most of the errors are FEOL defects or shorts to the power

rails.

Statistical analysis on the macrofaults of a large number of failing chips could be used to describe

and model systematic defects in a manufacturing process. Such volume analysis is part of the

future research efforts in the DIAGNOSIX methodology.

5 Conclusions

DIAGNOSIX is a new diagnosis methodology capable of both locating and identifying physical defects

in a chip. This methodology extracts defect information from both failing and passing patterns and is,

therefore, independent of the nature of the defects in the circuit. The resulting error behavior is further

refined by simulating the complete available test set.

In the initial step, the set of possible faulty lines is obtained by path-tracing. These faulty lines are

reduced by per-test diagnosis and further refined by considering the results of the passing patterns and

the information on the physical neighborhood of a faulty line.

After this step, it is possible to find a set of faulty lines that together explain the complete failure

behavior of the chip; this behavior and its activation condition can then be extracted in the form of

boolean equations.

The extracted behavior is again validated against all available patterns to discard any potential

inconsistent models.

The experimental results of the analysis of a large number of chips suggest that DIAGNOSIX can be

24

used as a first approximation of defect location in order to guide PFA, in some cases achieving a large

localization improvements in comparison to previous approaches. For a reduced number of chips with

available PFA results, DIAGNISIX correctly identified FEOL and BEOL defects.

25

References

[1] R. C. Aitken, “Finding Defects with Fault Models,” in Proc. of the Int.Test Conf., pp. 498–505, 1995.

[2] R. D. Blanton; K. N. Dwarakanath and R. Desineni. “Defect Modeling Using Fault Tuples,” in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Nov. 2006, Volume: 25, Issue: 11. Page(s): 2450-2464

[3] R. Desineni and R. D. Blanton, “Diagnosis of Arbitrary Defects Using Neighborhood Function Extraction,” in Proc. of the VLSI Test Symposium, pp. 366–373, May 2005.

[4] R. Desineni; O. Poku; R. D. Blanton. “A Logic Diagnosis Methodology for Improved Localization and Extraction of Accurate Defect Behavior,” in International Test Conference, 2006. IEEE International, Vol., Iss., Oct. 2006. Pages:1-10

[5] S. Holst, H-J. Wunderlich, “Adaptive Debug and Diagnosis without Fault Dictionaries,” ets, pp. 7-12, 12th IEEE European Test Symposium, 2007

[6] L. M. Huisman, “Diagnosing Arbitrary Defects in Logic Designs Using Single Location at a Time (SLAT),” in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 23, no. 1, pp. 91–101, Jan. 2004.

[7] S. D. Millman, E. J. McCluskey and J. M. Acken, “Diagnosing CMOSBridging Faults with Stuck-At Fault Dictionaries,” in Proc. of the Int.Test Conf., pp. 860–870, Oct. 1990.

[8] Semiconductor Industry Association, “The Int. Technology Roadmapfor Semiconductors,” 2005 edition.

[9] S. Venkataraman and S. B. Drummonds, “POIROT: A Logic Fault Diagnosis Tool and Its Applications,” in Proc. of the Int. Test Conf., pp. 253–262, Oct. 2000.

[10] J. A. Waicukauski and E. Lindbloom, “Logic Diagnosis of StructuredVLSI,” in IEEE Design and Test of Computers, pp. 49–60, Aug. 1989.

http://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=36103&isYear=2006



logic diagnosis with improved resolution - uni-stuttgart.de filealgorithms for design-automation -...

Documents