communication architecture synthesis alessandro pinto, university of california at berkeley...
Post on 21-Dec-2015
212 views
TRANSCRIPT
Communication Architecture Synthesis
Alessandro Pinto,
University of California at Berkeley
Introduction: what
Communication synthesisWhat is the “best way” of transferring information between entities
Problem FormulationWho wants to communicate with whomHow he aspects the communication mechanism to behaveWhat is possible to built todayHow to build a communication architecture that is cheap and satisfies the entities constraints while being feasible
Introduction: why
On-Chip applicationLots of components connected together
Distance is becoming important
There isn’t a formal approach to the problem
We build tools which are the glue in the platform-based design approach
More or less everything has been done in the design of “functions”
Outline
OverviewInternet vs. On-chipnet
Standard communication structuresVSIA + Standard comm. Platform
Constraint-Driven Communication Synthesis
A theoretic approach (A.Pinto,L.P.Carloni.ASV)
Current researchConclusion
Wire Scaling
“As designs scale to newer technologies, they get smaller and their wiresget shorter, and the relative change in the speed of wires to thespeed of gates is modest.” M.A.Horowitz et.al “The future of wires”
“The real wire problem arises with increasing chip complexity and global communication costs.” M.A.Horowitz et.al “The future of wires”
Scale
Local Wire
Global Wire
What is going to happen
From few closed connected components to a lot of distributed IP’s
A global net is most likely to be traveled in more that one clock cycle
High bandwidth and low power requirements
Fault tolerance requirement
There is need for a correct by construction methodology
Internet Network
Highly distributed
Fault Tolerant
Every node must reachable
Networks vs. On-Chip Networks
Internet major concern was to guarantee connectivity in case of attack
Cost is mostly static
Buffers memory is not a problem (512MB is still cheap)
Number and complexity of protocol layer is important only for performance issues
Full connectivity is not the major concern
Now cost is mostly dynamic (power)
Registers are expensive
The protocol must be very light
In the future error correction might become a reality
The beginning of internet
Back in 1969: the ARPAnet
SRI
UCLA
UCSB
UTAH
The beginning of On-Chipnet
A lot of effort on on-chip communicationOne of GSRC main theme: Communication-based design
Special section in conferences
Different approachesNot always really networks
Buses are still widely used
Effect of standardization
0
200
400
600
800
1000
1200
'69 '71 '74 '76 '78 '80 '82
Hosts
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
'84 '85 '86 '87 '89 '90 '91
Host
1983: TCP/IP was introduced
1987: Commercialization of internetTCP/IP technology is transferred formInventors to vendors
Standardization
VSIA: Virtual Socket Interface AllianceSpecifies Virtual Component Interface
Peripheral VCI
Basic VCI
Advanced VCI
VCI Initiator VCI Target
VCI Target VCI Initiator
Bus Master Bus SlaveAny Bus
A B
http://www.vsi.org
Standard “Bus” Architecture
ARM (ARM Ltd.)*
CoreFrame (Palmchip)
CoreConnect (IBM)
WichBone (Silicore)
SiliconBackPlane (Sonics)*
AMBA Bus Hierarchy
AHB
APB
HP ARMCore On Chip RAM
DMA
Off-Chip RAM
Bridge
Keypad
UART
AHB: Advanced High-Performance Bus
AB: Advanced Peripheral Bus
SiliconBackplane
It’s possible to specify constraints for each point to point communication
It’s possible to specify different OPC
Synthesis of the network that guarantees performances
A Today’s Platform Example
Processor
USB, MMC, IrDA, Bluetooth
PCMCIA, SDRAM
CPU
PCMCIA
SDRAM
Bluetooth
MMC
USB
IrDA
Brid
ge
Based on Intel XScale
Where are we headed?
Communication is specified in terms of point to point Constraints
A target implementation technology is abstracted into a Library
A synthesis procedure generates a communication architecture
That satisfies the constraints
That uses only the library elements
A Platform Integrator EnvironmentArchPltBuilderArchPltBuilder
RTOS
μC1
HW
MEM
μC2
DSP
μC1
μC2
DSP
MEM
RTOS
HWHW
MEM
To Silicon
Communication Topology Design
Platform-based-design in communication
Application Space
Implementation Space
API
Topology Design
Protocol Design
Routing
Network Layer
MAC Layer
Physical Level
The Problem
M 1
M 2
M 3
M 4
M 5
The Problem
M 1
M 2
M 3
M 4
M 5
Previous Work
S.Dey et al., “System-Level Performance Analysis for Designing On-Chip Communication Architectures”, TCAD, Vol. 20, NO. 6, June 2001
S.Dey et al., “Efficient Exploration of the SOC Communication Architecture Design Space”,Int. Conf. On CAD, August 1999
J.M. Daveau et al., “Protocol Selection and Interface Generation for HW-SW Codesign”, TVLSI, Vol. 5, NO.1, March 1997
J.M. Daveau et al., “Synthesis of system level communication by an allocation based approach”, Int. Symposium on System Synthesis, 1995
Our Approach
System modules communicate by means of point-to-point unidirectional channels
Each channel is characterized by a set of communication constraints (distance, minimum bandwidth)
The specific functionality of each module is “abstracted away”
Abstract Model
The Goal – Overview of the Method
Application Space
Implementation Space
API
Constraint Propagation
Performance Abstraction
Abstraction (Intuition)
cost
Length
b1
b2
clk
Relay Station Switches
Cr(b)
Cs(b)
Cs(b)
This is the basic library of Communication Components
Communication-Constraint Graph
G=(V,A)
v V, p(v) is the vertex position in our coordinate system
a A:The arc distance d(a)
The arc bandwidth b(a)
M 1 M 5
v1 v2
a
Library
= L N, communication links and communication nodes
l L:d(l) is the link length
b(l) is the link bandwidth
c(l) link cost
n N c(n) is the cost of the communication node
l1
l2
l3
n
r
Communication Implementation
(1,3)
(b(a),d(a))=(4,5)
(1,2)(1,2)
(2,2)
n r
G=(V,A)
Communication Implementation
Arc Matching
(1,3)
(4,5)
(1,2)
(2,2)
n r
Communication Implementation
Arc Segmentation
(4,5)
(1,2)
(2,2)
n r
Communication Implementation
Arc Duplication + Arc Segmentation
(1,2)
(2,2)
n r
Communication Implementation
(1,3)
Arc Merging
(1,2)
(2,2)
n r
Implementation Graph
G’(G, ) = (V’ N’, A’) v’ V’, v’ V
n’N’, n’ is an instance of an element of N
a’A’, a’ is an instance of an element of L
a(u,v) A, P(a) set of paths s.t.P(a) connects u’ with v’ without passing through any other computational vertex
P(a) satisfies the bandwidth constraint of a
The cost of and implementation graph is
N'n' A'a'
)c(a' )c(n' )C(G'
TheOptimization Problem
Gtosbj
GCGG
)'(min),('
Assumptions
Assumption on the Libraryd(a1) > d(a2) b(a1) > b(a2) c(a1) > c(a2)
K-Way Merging structure
q*
Candidate Implementations
Generate only k-way mergings that could be part of the final implementation
Derive mergeability conditions based only on positions and bandwidth of the constraint graph and the library
2-Way Mergeablility
u1
u2
v1
v2
u1
u2
v1
v2
2-Way Mergeablility
u1
u2
v1
v2
u1
u2
v1
v2
2-Way Mergeablility
u1
u2
v1
v2
u1
u2
v1
v2
u1
u2
v1
v2
a1
a2
d(a1) + d(a2)
|| p(u1) – p(u2)||
|| p(v1) – p(v2)||
|| p(u1) – p(u2)|| + || p(v1) – p(v2)||
{a1, a2} not mergeable
2-Way Mergeablility
{a1… ak} not mergeable
u1
uk
v1
vk
a1
ak
1k
1i
(k-1)d(ak) + d(ai)
|| p(u1) – p(uk)||
|| p(v1) – p(vk)||
1k
1i
|| p(ui) – p(uk)|| + || p(vi) – p(vk)||
2-Way Mergeablility
b(ai) b(al) + b(aj) {a1… ak} not mergeable
k
1i Llmax
k][1,jmin
u1
uk
v1
vk
n-Way Mergeablility
Expansion Rule
If a A is not mergeable with any set of k-1 arcs in A,
then a is not mergeable with any set of k arcs in A
A
a
Akmax
Summary of Pruning Conditions
K-way mergeablility condition over distance
K-way mergeablility condition over bandwidth
Kk+1 mergeability condition
1
2
3
All the pruning conditions are based on Positions of ports andProperties of arcs in the Constraint Graphand maximum bandwidth in the Library
Solving the Optimization Problem
Constrains
Library
CandidateMerging
Generation
Non LinearOptimizer
UnateCovering
Solution
K-way Mergings
Point to point
Costs
Constrained Distance Sum Matrix
Dij= d(ai) + d(aj) aj
ak
am
Dkj + Dmj = d(ak) + d(aj)+ d(am) + d(aj) = 2d(aj) + d(ak) + d(am)
= (k-1)d(aj) + d(ai)
1k
1i
Merging Distance Sum Matrix
ij= ||p(ui) – p(uj)|| + ||p(vi) - p(vj)||
aj
ak
am
kj + mj = ||p(uk) – p(uj)|| + ||p(vk) - p(vj)|| + ||p(um) – p(uj)|| + ||p(vm) - p(vj)||=
= || p(ui) – p(uk)|| + || p(vi) – p(vk)||
1k
1i
Algorithm K 1;foundcanditatemapping TRUE;while (foundcanditatemapping) { for all column jcol() kWayMergingNotFound TRUE; for all subset of row R={i1…ik} row() if sum(D,R) < sum(,R) if l L s.t. b(l) + bmin(R,j) < sum(B,R) + B[j] {
S S kmergingimplementation(R,j);
kWayMergingNotFound FALSE } if (kWayMergingNotFound)
col() col() \ j;
if col() = foundcanditatemapping FALSE;
else k k+1;}
12
3
A Simple Example
a1 a3a2
(0,0) (0.4,0) (1.4,0)
(0,1) (0.4,1) (1.4,1)
2 2
2
a1
a1
a2
a2
a3
a3
D
0.8 2.8
2
a1
a1
a2
a2
a3
a3
1 1 1
a1 a2 a3
B
b(l)=2 d(l)=1 c(l)=1L
A Simple Example
a1 a3a2
(0,0) (0.4,0) (1.4,0)
(0,1) (0.4,1) (1.4,1)
2 2
2
a1
a1
a2
a2
a3
a3
D
0.8 2.8
2
a1
a1
a2
a2
a3
a3
1 1 1
a1 a2 a3
B
b(l)=2 d(l)=1 c(l)=1L
A Simple Example
a1 a3a2
(0,0) (0.4,0) (1.4,0)
(0,1) (0.4,1) (1.4,1)
2 2
2
a1
a1
a2
a2
a3
a3
0.8 2.8
2
a1
a1
a2
a2
a3
a3
1 1 1
a1 a2 a3
B
b(l)=2 d(l)=1 c(l)=1L
Examples
Lib1
3 3-way mergings
1 9-waymerging
Cost per unit length
0
2
4
6
8
10
Bw
C/l Lib1
Lib2
Lib2
All p-to-p
1 9-waymerging
Cost per unit length
0
2
4
6
8
10
Bw
C/l Lib1
Lib2
Examples
Limitations
u1 u2 u3 v1 v2 v3
Alternation
u1u2 u3 v1 v2 v3
Bi-directionality
Library Characterization
Main concerns (costs)Area
Power
Costs are affected byGeometric and physic properties of wires (our basic communication primitives)
Communication Speed
Wires Geometry and Physics
Pitch
Thickness
DielectricSpace
Mn
Mn+1
lcrit
buffer buffer
sopt
Depends only on the technology, not on the metal layer
An Example
Based on Intel 0.13um
0
0.2
0.4
0.6
0.8
1
1.2
1.4
M1 M2 M3 M4 M5 M6
Energy(pJ)
0
50
100
150
200
250
300
350
400
450
M1 M2 M3 M4 M5 M6
Sopt(*minsize)
Energy consumptionfor a series of lcrit and buffer sopt
Optimum buffer size in terms of minimum size
pscrit 17
Wires trade off
crit is constant
for the same length, different metal layers carry different bandwidth
A wire in M(n) is going to cost more in terms of area and power then a wire in M(n-1)
Buffers are bigger
The library is convex
critcritl
lT
Conlcusion
A chip will look like a network of componentsHighly distributedMulticlock cycle nets
CAD tools for On-Chipnet design mustAllow specification of communication constraintsFind the minimum cost network. Costs are:
PowerArea (statefull/stateless repeaters)Blocking time (queues)
Ensure Network reliability and correctness
Constraint-Driven Communication Synthesis is our approach to the problem
Future Work (Available Projects)
Ad hoc flow control protocol for low power, minimum buffering On-Chipnet
Implementation and performance/cost abstraction of fast low power On-Chip routers
Interface with Metropolis
On-Chipnet VHDL netlist generator