design for testability and fault tolerance overviewcdc.ioc.ee/final-wksh/jervan-slides.pdf · 1...

5
1 Tallinn University of Technology Department of Computer Engineering Estonia Department of computer Engineering ati.ttu.ee Design for testability and fault tolerance Gert Jervan Design for testability and fault tolerance © Gert Jervan, TTÜ/ATI Overview 9 Design for testability Built-in self-test (BIST) 9 Design and technology trends 9 Fault tolerance System-level fault tolerance Design for testability and fault tolerance © Gert Jervan, TTÜ/ATI Design Flow System-Level Synthesis Behavioral Description Behavioral Synthesis RTL Description Gate-Level Description RTL Synthesis IDEA Logic Synthesis Mask Data Specification and Synthesis Implementation Manufacturing Technology Mapping Manufacturing Technology Dependent Network Layout Product Product Product Design for testability and fault tolerance © Gert Jervan, TTÜ/ATI Design Flow System-Level Synthesis Behavioral Description Behavioral Synthesis RTL Description Gate-Level Description RTL Synthesis IDEA Logic Synthesis Mask Data Specification and Synthesis Implementation Manufacturing Technology Mapping Manufacturing Technology Dependent Network Layout Product Testing Good Product Good Good Product Product Design for testability and fault tolerance © Gert Jervan, TTÜ/ATI Testing 9 Post-manufacturing Manufacturing defects: stuck-at, bridges, cross-talk, ... Assembly defects: Misaligned components, wrong components, connections, wiring, ... 9 Operation & maintenance Aging (wearout), radiation (particles), ... ESD Damage Design for testability and fault tolerance © Gert Jervan, TTÜ/ATI Test Application Support from Automatic Test Equipment © Agilent Technologies Stimuli Response Yes No/diagnosis

Upload: others

Post on 23-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Design for testability and fault tolerance Overviewcdc.ioc.ee/final-wksh/jervan-slides.pdf · 1 Tallinn University of Technology Department of Computer Engineering Estonia Department

1

Tallinn University of TechnologyDepartment of Computer Engineering

Estonia

Department of computer Engineeringati.ttu.ee

Design for testability and fault tolerance

Gert Jervan

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Overview

Design for testabilityBuilt-in self-test (BIST)

Design and technology trends

Fault toleranceSystem-level fault tolerance

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Design FlowSystem-Level

Synthesis

BehavioralDescription

BehavioralSynthesis

RTLDescription

Gate-LevelDescription

RTLSynthesis

IDEA

LogicSynthesis

Mask Data

Specification andSynthesis Implementation Manufacturing

Technology Mapping Manufacturing

Technology DependentNetwork

Layout

ProductProductProduct

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Design FlowSystem-Level

Synthesis

BehavioralDescription

BehavioralSynthesis

RTLDescription

Gate-LevelDescription

RTLSynthesis

IDEA

LogicSynthesis

Mask Data

Specification andSynthesis Implementation Manufacturing

Technology Mapping Manufacturing

Technology DependentNetwork

Layout

Product

Testing

GoodProduct

GoodGoodProductProduct

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Testing

Post-manufacturingManufacturing defects:• stuck-at, bridges, cross-talk, ...

Assembly defects:• Misaligned components, wrong components, connections,

wiring, ...

Operation & maintenanceAging (wearout), radiation (particles), ...

ESD Damage

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Test Application

Support from Automatic Test Equipment

© Agilent Technologies

Stimuli

Response

YesNo/diagnosisYesNo/diagnosis

Page 2: Design for testability and fault tolerance Overviewcdc.ioc.ee/final-wksh/jervan-slides.pdf · 1 Tallinn University of Technology Department of Computer Engineering Estonia Department

2

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

External TestingDrawbacks

ATE are expensive (typically several million US$)

ATE will become more and more inefficient

Slow test throughput with long scan chains, especially for core-based designs

External Test Data Volume can be extremely high (function of chip complexity)

ExternalTest

Super Tester

Pattern GenerationPrecision Timing

DiagnosticsPower Management

Test Control

Very high pin countDeep memory

Slow throughput

Logic

.

Mixed-Signal

Memory

I/Os & Interconnects

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Built-in Self-Test (BIST)Solution: Dedicated Built-In H/W for embedded test functions

Repartition tester into embedded test and external test functions

Include low H/W cost and high data volume embedded test

External Test

Standard Digital Tester

Limited Speed/

Accuracy

Low Cost-per-Pin

EmbeddedTest

(Built-in)

Pattern GenerationResult Compression

Precision TimingDiagnostics

Power ManagementTest Control

Support forBoard-level Test

System-Level Test

Logic

.

Mixed-Signal

Memory

I/Os & Interconnects

Chip, Board or System

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Problems with BISTTypical test generator: Linear Feedback Shift Register (LFSR)

LFSR generated vectors are pseudorandom by their nature

Pseudorandom vectors:

Very long test application time

Not guaranteed high fault coverage

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

BIST ImprovementPossible solution:

Combination of pseudorandom test and deterministic test to form a Hybrid BISTsolution in order to:

increase the fault coveragereduce the test cost

Faul

t Co

vera

ge

Time

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Hybrid BIST – System Level Test Architecture

SoC

C3540

C1908 C880 C1355

Embedded Tester C2670

Test accessmechanismBIST BIST

BISTBISTBIST

Test Controller

TesterMemory

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Hybrid BIST – The Concept

PRG

ATPG

Max. faultcoverage

Time Time Time

(1) (2) (3)DifferentApproaches

Amount of Stored PatternsFC

The best

Page 3: Design for testability and fault tolerance Overviewcdc.ioc.ee/final-wksh/jervan-slides.pdf · 1 Tallinn University of Technology Department of Computer Engineering Estonia Department

3

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Cost Calculation for Hybrid BIST

Total Cost CTOTAL

Cost of pseudorandom test

patterns CGEN Number of remaining faults after applying k

pseudorandom test patterns rNOT(k) Cost of stored

test CMEM

Number of pseudorandom test patterns applied, k

CTOTAL = CGEN + CMEM = αL + βS

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Our ContributionsDeveloped fast estimation method for various optimization calculations

Proposed algorithms for finding the shortest test length under given memory constraints

Single core and multicore systems

Serial and parallel (broadcasting) test architectures

Proposed algorithms for test cost minimization

Proposed algorithms for test energy minimization

Proposed algorithms for test time minimization and test power control in the abort-on-first-fail environment

Tallinn University of TechnologyDepartment of Computer Engineering

Estonia

Department of computer Engineeringati.ttu.ee

Today and future

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Technology Trends

Drawn to the same scale

1.0 μmMid 1980s

Speed: 10 MHz

0.1 μmEarly 2000’s

Speed: 3 GHz

Today: 65 nm, going down to 22 nm by 2016

90nm MOS Transistor

50nm50nm

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Benefits of Technology Scaling

Benefits of scaling the dimensions by 30%:Reduce gate delay by 30% (increase operating frequency by 43%)

Double transistor density

Reduce energy per transition by 65% (50% power savings @43% increase in frequency)

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Algorithm on a Chip Hardwired Computation

Hardwired Communication

System on a ChipProgrammable Computation

Hardwired Communication

Network on a ChipProgrammable Computation

Programmable Communication

The Evolution of Systems

Page 4: Design for testability and fault tolerance Overviewcdc.ioc.ee/final-wksh/jervan-slides.pdf · 1 Tallinn University of Technology Department of Computer Engineering Estonia Department

4

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Algorithm on a Chip

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Algorithm on a Chip Hardwired Computation

Hardwired Communication

System on a ChipProgrammable Computation

Hardwired Communication

Network on a ChipProgrammable Computation

Programmable Communication

The Evolution of Systems

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Design Trends

System on a Chip (SoC)

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Algorithm on a Chip Hardwired Computation

Hardwired Communication

System on a ChipProgrammable Computation

Hardwired Communication

Network on a ChipProgrammable Computation

Programmable Communication

The Evolution of Systems

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Switch

Wires

Resources

ComputationalRF / AnalogProcessor coresHardware blocksFPGAs

Storage

Network on a Chip

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

NoC Example

Intel 80 core teraflops research chip

Page 5: Design for testability and fault tolerance Overviewcdc.ioc.ee/final-wksh/jervan-slides.pdf · 1 Tallinn University of Technology Department of Computer Engineering Estonia Department

5

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Implications to DesignDesign fabric will be Regular

Will look like sea-of-transistors interconnected with regular interconnect fabric

Shift in the design efficiency metricFrom Transistor Density to Balanced Design

Manufacturing of these sub-nanometer chips defect-free is almost impossible (yield is below acceptable levels)

Increasing importance of transient and intermittent faults (due to the environment)

BUT

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Transient faults

Radiation

Electromagnetic interference (EMI)

Lightning storms

Happen for a short time

Corruptions of data, miscalculation in logic

Do not cause a permanent damage of circuits

Causes are outside system boundaries

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Intermittent faults

Internal EMI Crosstalk

Power supply fluctuations

Init (Data)

Software errors (Heisenbugs)

Manifest similar as transient faults

Happen repeatedly

Causes are inside system boundaries

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Fault Tolerance

Fault tolerance and reliability are system issues, requiring work with hardware, software, time and information

It is incrasingly hard to work with hardware issues

More work will be done at the system levelBuilt-in self-repair• Dynamic reconfiguration

• Re-execution, replication, checkpointing, ...

• Task remapping/rescheduling

Design for testability and fault tolerance© Gert Jervan, TTÜ/ATI

Fault Tolerance at System Level

System Specification

Architecture Selection

Mapping & Hardware / Software Partitioning

Scheduling

Back-end Synthesis

Feedback loops

Fault ToleranceTechniques

Tallinn University of TechnologyDepartment of Computer Engineering

Estonia

Department of computer Engineeringati.ttu.ee

The problem to be solved:

How to design reliable system out of non-reliable hardware?