communication architecture synthesis alessandro pinto, university of california at berkeley...

Communication Architecture Synthesis

Alessandro Pinto,

University of California at Berkeley

[email protected]

Introduction: what

Communication synthesisWhat is the “best way” of transferring information between entities

Problem FormulationWho wants to communicate with whomHow he aspects the communication mechanism to behaveWhat is possible to built todayHow to build a communication architecture that is cheap and satisfies the entities constraints while being feasible

Introduction: why

On-Chip applicationLots of components connected together

Distance is becoming important

There isn’t a formal approach to the problem

We build tools which are the glue in the platform-based design approach

More or less everything has been done in the design of “functions”

Outline

OverviewInternet vs. On-chipnet

Standard communication structuresVSIA + Standard comm. Platform

Constraint-Driven Communication Synthesis

A theoretic approach (A.Pinto,L.P.Carloni.ASV)

Current researchConclusion

Wire Scaling

“As designs scale to newer technologies, they get smaller and their wiresget shorter, and the relative change in the speed of wires to thespeed of gates is modest.” M.A.Horowitz et.al “The future of wires”

“The real wire problem arises with increasing chip complexity and global communication costs.” M.A.Horowitz et.al “The future of wires”

Scale

Local Wire

Global Wire

What is going to happen

From few closed connected components to a lot of distributed IP’s

A global net is most likely to be traveled in more that one clock cycle

High bandwidth and low power requirements

Fault tolerance requirement

There is need for a correct by construction methodology

Internet Network

Highly distributed

Fault Tolerant

Every node must reachable

Networks vs. On-Chip Networks

Internet major concern was to guarantee connectivity in case of attack

Cost is mostly static

Buffers memory is not a problem (512MB is still cheap)

Number and complexity of protocol layer is important only for performance issues

Full connectivity is not the major concern

Now cost is mostly dynamic (power)

Registers are expensive

The protocol must be very light

In the future error correction might become a reality

The beginning of internet

Back in 1969: the ARPAnet

SRI

UCLA

UCSB

UTAH

The beginning of On-Chipnet

A lot of effort on on-chip communicationOne of GSRC main theme: Communication-based design

Special section in conferences

Different approachesNot always really networks

Buses are still widely used

Effect of standardization

0

200

400

600

800

1000

1200

'69 '71 '74 '76 '78 '80 '82

Hosts

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

'84 '85 '86 '87 '89 '90 '91

Host

1983: TCP/IP was introduced

1987: Commercialization of internetTCP/IP technology is transferred formInventors to vendors

Standardization

VSIA: Virtual Socket Interface AllianceSpecifies Virtual Component Interface

Peripheral VCI

Basic VCI

Advanced VCI

VCI Initiator VCI Target

VCI Target VCI Initiator

Bus Master Bus SlaveAny Bus

A B

http://www.vsi.org

Standard “Bus” Architecture

ARM (ARM Ltd.)*

CoreFrame (Palmchip)

CoreConnect (IBM)

WichBone (Silicore)

SiliconBackPlane (Sonics)*

AMBA Bus Hierarchy

AHB

APB

HP ARMCore On Chip RAM

DMA

Off-Chip RAM

Bridge

Keypad

UART

AHB: Advanced High-Performance Bus

AB: Advanced Peripheral Bus

SiliconBackplane

It’s possible to specify constraints for each point to point communication

It’s possible to specify different OPC

Synthesis of the network that guarantees performances

A Today’s Platform Example

Processor

USB, MMC, IrDA, Bluetooth

PCMCIA, SDRAM

CPU

PCMCIA

SDRAM

Bluetooth

MMC

USB

IrDA

Brid

ge

Based on Intel XScale

Where are we headed?

Communication is specified in terms of point to point Constraints

A target implementation technology is abstracted into a Library

A synthesis procedure generates a communication architecture

That satisfies the constraints

That uses only the library elements

A Platform Integrator EnvironmentArchPltBuilderArchPltBuilder

RTOS

μC1

HW

MEM

μC2

DSP

μC1

μC2

DSP

MEM

RTOS

HWHW

MEM

To Silicon

Communication Topology Design

Platform-based-design in communication

Application Space

Implementation Space

API

Topology Design

Protocol Design

Routing

Network Layer

MAC Layer

Physical Level

The Problem

M 1

M 2

M 3

M 4

M 5

Previous Work

S.Dey et al., “System-Level Performance Analysis for Designing On-Chip Communication Architectures”, TCAD, Vol. 20, NO. 6, June 2001

S.Dey et al., “Efficient Exploration of the SOC Communication Architecture Design Space”,Int. Conf. On CAD, August 1999

J.M. Daveau et al., “Protocol Selection and Interface Generation for HW-SW Codesign”, TVLSI, Vol. 5, NO.1, March 1997

J.M. Daveau et al., “Synthesis of system level communication by an allocation based approach”, Int. Symposium on System Synthesis, 1995

Our Approach

System modules communicate by means of point-to-point unidirectional channels

Each channel is characterized by a set of communication constraints (distance, minimum bandwidth)

The specific functionality of each module is “abstracted away”

Abstract Model

The Goal – Overview of the Method

Application Space

Implementation Space

API

Constraint Propagation

Performance Abstraction

Abstraction (Intuition)

cost

Length

b1

b2

clk

Relay Station Switches

Cr(b)

Cs(b)

Cs(b)

This is the basic library of Communication Components

Communication-Constraint Graph

G=(V,A)

v V, p(v) is the vertex position in our coordinate system

a A:The arc distance d(a)

The arc bandwidth b(a)

M 1 M 5

v1 v2

a

Library

= L N, communication links and communication nodes

l L:d(l) is the link length

b(l) is the link bandwidth

c(l) link cost

n N c(n) is the cost of the communication node

l1

l2

l3

n

r

Communication Implementation

(1,3)

(b(a),d(a))=(4,5)

(1,2)(1,2)

(2,2)

n r

G=(V,A)


Arc Matching

(1,3)

(4,5)

(1,2)

(2,2)

n r


Arc Segmentation

(4,5)

(1,2)

(2,2)

n r


Arc Duplication + Arc Segmentation

(1,2)

(2,2)

n r


(1,3)

Arc Merging

(1,2)

(2,2)

n r

Implementation Graph

G’(G, ) = (V’ N’, A’) v’ V’, v’ V

n’N’, n’ is an instance of an element of N

a’A’, a’ is an instance of an element of L

a(u,v) A, P(a) set of paths s.t.P(a) connects u’ with v’ without passing through any other computational vertex

P(a) satisfies the bandwidth constraint of a

The cost of and implementation graph is

N'n' A'a'

)c(a' )c(n' )C(G'

TheOptimization Problem

Gtosbj

GCGG

)'(min),('

Assumptions

Assumption on the Libraryd(a1) > d(a2) b(a1) > b(a2) c(a1) > c(a2)

K-Way Merging structure

q*

Candidate Implementations

Generate only k-way mergings that could be part of the final implementation

Derive mergeability conditions based only on positions and bandwidth of the constraint graph and the library

2-Way Mergeablility

u1

u2

v1

v2

u1

u2

v1

v2

u1

u2

v1

v2

a1

a2

d(a1) + d(a2)

|| p(u1) – p(u2)||

|| p(v1) – p(v2)||

|| p(u1) – p(u2)|| + || p(v1) – p(v2)||

{a1, a2} not mergeable

2-Way Mergeablility

{a1… ak} not mergeable

u1

uk

v1

vk

a1

ak

1k

1i

(k-1)d(ak) + d(ai)

|| p(u1) – p(uk)||

|| p(v1) – p(vk)||

1k

1i

|| p(ui) – p(uk)|| + || p(vi) – p(vk)||

2-Way Mergeablility

b(ai) b(al) + b(aj) {a1… ak} not mergeable

k

1i Llmax

k][1,jmin

u1

uk

v1

vk

n-Way Mergeablility

Expansion Rule

If a A is not mergeable with any set of k-1 arcs in A,

then a is not mergeable with any set of k arcs in A

A

a

Akmax

Summary of Pruning Conditions

K-way mergeablility condition over distance

K-way mergeablility condition over bandwidth

Kk+1 mergeability condition

1

2

3

All the pruning conditions are based on Positions of ports andProperties of arcs in the Constraint Graphand maximum bandwidth in the Library

Solving the Optimization Problem

Constrains

Library

CandidateMerging

Generation

Non LinearOptimizer

UnateCovering

Solution

K-way Mergings

Point to point

Costs

Constrained Distance Sum Matrix

Dij= d(ai) + d(aj) aj

ak

am

Dkj + Dmj = d(ak) + d(aj)+ d(am) + d(aj) = 2d(aj) + d(ak) + d(am)

= (k-1)d(aj) + d(ai)

1k

1i

Merging Distance Sum Matrix

ij= ||p(ui) – p(uj)|| + ||p(vi) - p(vj)||

aj

ak

am

kj + mj = ||p(uk) – p(uj)|| + ||p(vk) - p(vj)|| + ||p(um) – p(uj)|| + ||p(vm) - p(vj)||=

= || p(ui) – p(uk)|| + || p(vi) – p(vk)||

1k

1i

Algorithm K 1;foundcanditatemapping TRUE;while (foundcanditatemapping) { for all column jcol() kWayMergingNotFound TRUE; for all subset of row R={i1…ik} row() if sum(D,R) < sum(,R) if l L s.t. b(l) + bmin(R,j) < sum(B,R) + B[j] {

S S kmergingimplementation(R,j);

kWayMergingNotFound FALSE } if (kWayMergingNotFound)

col() col() \ j;

if col() = foundcanditatemapping FALSE;

else k k+1;}

12

3

A Simple Example

a1 a3a2

(0,0) (0.4,0) (1.4,0)

(0,1) (0.4,1) (1.4,1)

2 2

2

a1

a1

a2

a2

a3

a3

D

0.8 2.8

2

a1

a1

a2

a2

a3

a3

1 1 1

a1 a2 a3

B

b(l)=2 d(l)=1 c(l)=1L

A Simple Example

a1 a3a2

(0,0) (0.4,0) (1.4,0)

(0,1) (0.4,1) (1.4,1)

2 2

2

a1

a1

a2

a2

a3

a3

0.8 2.8

2

a1

a1

a2

a2

a3

a3

1 1 1

a1 a2 a3

B

b(l)=2 d(l)=1 c(l)=1L

Examples

Lib1

3 3-way mergings

1 9-waymerging

Cost per unit length

0

2

4

6

8

10

Bw

C/l Lib1

Lib2

Lib2

All p-to-p

1 9-waymerging

Cost per unit length

0

2

4

6

8

10

Bw

C/l Lib1

Lib2

Examples

Limitations

u1 u2 u3 v1 v2 v3

Alternation

u1u2 u3 v1 v2 v3

Bi-directionality

Library Characterization

Main concerns (costs)Area

Power

Costs are affected byGeometric and physic properties of wires (our basic communication primitives)

Communication Speed

Wires Geometry and Physics

Pitch

Thickness

DielectricSpace

Mn

Mn+1

lcrit

buffer buffer

sopt

Depends only on the technology, not on the metal layer

An Example

Based on Intel 0.13um

0

0.2

0.4

0.6

0.8

1

1.2

1.4

M1 M2 M3 M4 M5 M6

Energy(pJ)

0

50

100

150

200

250

300

350

400

450

M1 M2 M3 M4 M5 M6

Sopt(*minsize)

Energy consumptionfor a series of lcrit and buffer sopt

Optimum buffer size in terms of minimum size

pscrit 17

Wires trade off

crit is constant

for the same length, different metal layers carry different bandwidth

A wire in M(n) is going to cost more in terms of area and power then a wire in M(n-1)

Buffers are bigger

The library is convex

critcritl

lT

Conlcusion

A chip will look like a network of componentsHighly distributedMulticlock cycle nets

CAD tools for On-Chipnet design mustAllow specification of communication constraintsFind the minimum cost network. Costs are:

PowerArea (statefull/stateless repeaters)Blocking time (queues)

Ensure Network reliability and correctness

Constraint-Driven Communication Synthesis is our approach to the problem

Future Work (Available Projects)

Ad hoc flow control protocol for low power, minimum buffering On-Chipnet

Implementation and performance/cost abstraction of fast low power On-Chip routers

Interface with Metropolis

On-Chipnet VHDL netlist generator

communication architecture synthesis alessandro pinto, university of california at berkeley...

Documents

chip communication

reachable slide

reality slide

feasible slide

construction methodology

communication mechanism

global communication

chip application