ee 587 soc design & test partha pande school of eecs washington state university...

38
EE 587 SoC Design & Test Partha Pande School of EECS Washington State University [email protected]

Upload: kristopher-truan

Post on 29-Mar-2015

232 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

EE 587SoC Design & Test

Partha PandeSchool of EECS

Washington State [email protected]

Page 2: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

SoC Physical Design Issues

Interconnect Architectures and Signal Integrity

Page 3: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Design Challenges

1. Non-scalable global wire delay

2. Moving signals across a large die within one clock cycle is not possible.

3. Current interconnection architecture- Buses are inherently non-scalable.

4. Transmission of digital signals along wires is not reliable.

Page 4: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Bus – non scalability

Clock cycle depends on the parasitic and bus length

Multiple bus segments

•More than one design iteration

•Converges to network

Page 5: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Bus Architectures

Page 6: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Split Bus Architecture

),(),()(

),(

),([

5.02

1 221

2 ,22

1 ,11

2

ijBUSi BUSj

jiBUSBUS

BUSi jiBUSjjiBUS

BUSi jiBUSjjiBUS

MMxferMMxferCC

MMxferC

MMxferC

swVE

Page 7: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Achievable Clock Cycle in a Bus segment

Page 8: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Minimize Power Consumption

Modification of interconnect architectures

Incorporate parallelism (ITRS 2003 & ISSCC 2004) Decoupling of communication and processing Modular architecture

Minimize use of global wires Locality in communication

Page 9: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

SoC Micro architecture Trend

50-100K gates block – No global wire delay problem. Block-based hierarchical design style that uses block sizes of

50-100K gates. Single synchronous clock regions will span only a small fraction

of the chip area. Different self-synchronous IPs communicate via network-

oriented protocols. Structured network wiring leads to deterministic electrical

parameters - reduces latency and increases bandwidth. Failures due to inherent unreliable physical medium can be

addressed by introducing error correction mechanisms.

Page 10: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

New design paradigm

New designs – very large number of functional blocks Moving bits around efficiently

• Develop on-chip infrastructure to solve future inter-block communication bottlenecks

Development of infrastructure IPs

• SoC = (SFIP + SI2P)

Page 11: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Silicon Back plane

Page 12: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

MIPS SoC-it

Page 13: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

The network-on-chip paradigm

Driven by

Increased levels of integration Complexity of large SoCs

– New designs counting 100s of IP blocks

Need for platform-based design methodologies DSM constraints (power, delay, time-to-market, etc…)

Page 14: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Decoupling of functionality from communication Dedicated infrastructure for data transport

High-bandwidthmemory interface

High-performanceARM processor

High-bandwidthARM processor

DMA Busmaster

BRI

DGE

UART

PIOKeypad

TimerAHB APB

NoC infrastructure

switch link

NoC Features

Page 15: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Some Common Architectures

(a) Mesh, (b) Folded-Torus (FT) and (c) Butterfly Fat Tree (BFT)

- F unc tio nal IP - S w itc h

(a) (b )

(c )

Page 16: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Data Transmission

Packet-based communication Low memory requirement

Packet switching Wormhole routing

Packets are broken down into flow control units or flits which are then routed in a pipelined fashion

Page 17: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Connecting Different IP Blocks Using Tree Architecture

Page 18: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Communication Pipelining

• Need to constrain the delay of each stage within 15 FO4

Page 19: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Signal Integrity

According to ITRS signal integrity will become a major issue in future technologies

Causes for such inherent unreliability Shrinking geometries, layout dimensions

Reduction in the charge used for storing bits

Increased probability of transient events like:

Crosstalk

Ground Bounce

Alpha particle hits

Page 20: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Micro network Protocol Stack

Page 21: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

On Chip Signal Transmission

Future global wires will function as lossy transmission lines Reduced-swing signaling Noise due to crosstalk, electromagnetic interference, and other

factors will have increased impact. it will not be possible to abstract the physical layer of on-chip

networks as a fully reliable, fixed-delay channel At the micro network stack layers atop the physical layer, noise

is a source of local transient malfunctions.

Page 22: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Coding Schemes

Low-Power Coding

Reducing self-transition activity Crosstalk Avoidance Coding

Reducing Coupling with adjacent lines Error Control Coding

SEC, SECDED

Page 23: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Low Power Coding

Reduction of self-transition activity Bus-Invert Code Data is inverted and an invert bit is sent to the decoder if the

current data word differs from the previous data word in more than half the number of bits

Effectiveness decreases with increase in bus width

Page 24: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Error Control Coding

Linear block codes (n, k) linear block code, a data block, k bits long, is mapped

onto an n bit code word, Forward Error Correction or Automatic Repeat Request Redundant wires Possibility of voltage reduction Energy efficiency is an important criterion Codec overhead

Page 25: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Worst Case Crosstalk

Transition from 101 to 010 pattern or vice versa

Due to Miller Capacitance worst case capacitance between adjacent wires become

Victim Rise Time

Aggressor Rise Time

Victim Wire

Aggressor Wire 2

0

1

1

0

Aggressor Wire 1

0

1

LC41

Page 26: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Joint Crosstalk Avoidance and Single Error Correction Codes

Reduce crosstalk as well correct errors due to other transient events

Duplicate Add Parity (DAP) Dual Rail Code (DR) Boundary Shift Code (BSC) Modified Dual Rail Code (MDR)

Worst case crosstalk capacitance is reduced to (1+2λ)CL

Page 27: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Duplicate-Add-Parity Code

Each bit is duplicated A parity bit from one

copy is computed Same as Dual Rail

Code

Page 28: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Crosstalk Avoidance Double Error Correction Code (CADEC)

The 32-bit flit is Hamming coded and then an overall parity is calculated

All bits apart from the overall parity are duplicated

The 32 bit original flit becomes 77 bits

Minimum Hamming distance is 7

Worst case crosstalk capacitance is reduced to (1+2λ)CL

(38,32)Hamming encoding

32 38

38 parity, bit76

bit 0

bit 1

bit 2

bit 3

bit 4

bit 5

bit 6

bit 7

bit 74

bit 75

32 bit i/p

77 bit o/p

Hamming encoding

DAP duplication

Page 29: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Energy Savings with Joint Codes

Due to increased error resilience lower noise margins can be tolerated and hence operating voltage can be reduced

Coding adds overhead in terms of extra wires and codec

Page 30: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Voltage Swing Reduction for CADEC

10-20

10-10

0.4

0.5

0.6

0.7

0.8

0.9

1EDDAPCADEC

V

Word error rate

The probability of word error for DAP 2

2

)1(3

kkPDAP

32 )4()( nnPCADEC

Page 31: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Energy Savings with CADEC

2010

Page 32: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Communication Pipelining

Inter- and Intra-switch stages

Pipelined Data Transfer

inte

r-sw

itch

li

nk

inte

r-sw

itch

li

nk

inte

r-sw

itch

li

nk

dec

od

er

enco

der

dec

od

er

enco

der

intra-switch pipelined stages

intra-switch pipelined stages

Page 33: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Latency Characteristics

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Injection Load

Ave

rag

e M

essa

ge

Lat

ency

(C

ycle

s)

UncodedCoded

•The codes should be optimized

It can be merged with existing stages No Latency penalty

Page 34: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Adaptive Supply Voltage Links

Dynamic Voltage Scaling (DVS) DVS schemes dynamically adjust the processor clock frequency

and supply voltage to just meet instantaneous performance requirement, making the system energy aware.

communication architectures display a wide variance in their utilization depending on the communication patterns of applications

adapts the link’s frequency and supply voltage in accordance with the instantaneous traffic bandwidth.

Page 35: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Repeater Insertion & Coding

Repeater insertion reduces interconnect wire delay Increases power dissipation due large drivers CACs reduce coupling capacitance Joint repeater insertion and CAC is a promising solution to

reduce power in global wires

Page 36: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Repeater Insertion & Coding

Reference: A low-Power Bus

Design Using Joint Repeater Insertion and

Coding

130 nm

Page 37: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Repeater Insertion & Coding

45 nm

Page 38: EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

Reliability

Crosstalk, electromigration,material ageing…. Transient failures

Error control coding Crosstalk avoidance coding Power, area trade-off

Permanent failures

Spare switches and links Overall routing complexity Effect on system performance