vlsi programming systolic design - tu/ewsinmak/education/2imn35/2imn35-2016... · 2016-05-18 ·...
TRANSCRIPT
![Page 1: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/1.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 1
VLSI programming
Systolic DesignBook Parhi, Chp. 7
Rudolf Mak
![Page 2: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/2.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 2
Agenda
• Systolic arrays (what, where)
• Regular Iterative Algorithms (RIAs)
• Dependence graphs (regular, reduced)
• Systolic design techniques
– Binding (computations to PEs)
– Scheduling (computations to time slots)
• Examples
– Fir filters, matrix multipliers
![Page 3: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/3.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 3
FSM reminder
Moore machine Mealy machine
CL
state
CL
state
Chaining Mealy machines may lead too long critical paths!
![Page 4: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/4.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 4
Systolic system (Leiserson)
A systolic system is a set of interconnected Moore
machines that operate synchronously and satisfy
certain smallness (boundedness) conditions:
1. # states is bounded
2. # input ports is bounded
3. # output ports is bounded
4. # neighbor machines is bounded
“#” stands for “number of”
![Page 5: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/5.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 5
Systolic = Uniform Pipelined SDF
• Uniform:– Each PE (Moore machine) computes the same
set of combinatorial functions.
• Regular:– All PEs are connected to a small finite number of
neighboring PEs via one or more D-elements according to a regular topology. All connections are point-to-point connections.
• Synchronous operation: – All PEs operate in lock step (fire concurrently) ;
data is pumped through the system, much like the hart pumps blood through the body (hence the name systolic).
![Page 6: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/6.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 6
Relaxations
• To obtain better systems small relaxations to the systolic model are allowed:
1. Not all PEs are identical, small deviations are allowed especially for PEs at the border of the system.
2. (A limited form) of broadcasting is allowed. This means that PEs have become Mealy machines. 1. These systems are called semi-systolic by Leiserson.2. Parhi does not make the distinction. Instead he uses the
notion fully pipelined for the Moore machine variant.
3. Connections need not be to nearest neighbors, but locality needs to be maintained.
![Page 7: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/7.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 7
Systolic system
PE
Host
PE PEPEPE
Systolic array: Moore machines
Turing-equivalent machine
Such as a
Power PC
on a FPGA
Such as a dedicated computing engine on a FPGA
![Page 8: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/8.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 8
Application areas
• Computationally intensive, regular
– Basic linear algebra operations
– Signal processing
– Image processing
– Order statistics, sorting
– Dynamic programming
– High performance computing
• e.g., many particle simulations (in chemistry,
physics or astronomy)
![Page 9: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/9.jpg)
FIR filter (N-tap)
� � � � � � � � � , 0 � � ����
� �, � � � � � � � � � � , 0 � � � �
�����
� �, 0 � ����� �, � � � � � � �� � � � � 1 � � � 1 �
�������� � � � � � ��� 1, � � 1�� �, � � 0� �, � � � � � ��� 1, ��� �, � � � � � ���, � 1�
18-May-16 Rudolf Mak TU/e Computer Science Systolic 9
or � ���, � � 1�
Spec
RIA
� � 1 � � 1does not work!!!
![Page 10: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/10.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 10
Regular Iterative Algorithm
A RIA is a triple consisting of1. An index space
2. A finite set of variables
3. A set of direct dependencies among indexed variables (given as equalities)
• with associated index displacement vectors
• also called fundamental edges by Parhi
Canonical forms:
1. Standard input
2. Standard output
{��, ��|0 � �, 0 � � � � !�, �, �
���, �� is input
![Page 11: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/11.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 11
Standard output canonical form:
���, �� � ���, �����, �� � ��� 1, � � 1�, ���, �� � 0���, �� � ��� 1, ��, ��1, �� � �������, �� � ���, � 1� ���, 1� � ����
Index displacement vectors:
� → �� → �� → �� → �� → ��#�$ → %�$�
�0, 1��1, 0��0, 0��0, 0��1, 1�
FIR-filter: RIA description
� ���, � � 1����, �� �
�0,1�
� 1, 1= �, �
LHS = RHS + IDV
![Page 12: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/12.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 12
Computational node
&
'
(
)
*
( � +1�&, '�) � +2�&, '�* � +3�&, '�
. � +1+2+3
node g
![Page 13: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/13.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 13
Computational node from RIA
��� 1, ��
���, � 1�
��� 1, � � 1�
���, ��
���, ��
���, ��
���, �� � ���, �����, �� � ��� 1, � � 1�
I(g)
I(g) is the index
vector, i.e., the
sequence of coordinates of g
in index-space
![Page 14: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/14.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 14
Dependence graphs
1. The nodes of a dependence graph represent (small) computations. There is a separate node for each com-putation.
2. The edges of a dependence graph represent causal dependencies between computations, i.e., an edge from node � to node � indicates that the result of the computation performed by � is used in the computation performed by �.
3. There is no notion of time in a dependence graph. It is an (index-)space representation.
![Page 15: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/15.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 15
FIR: Dependence graph
x(0) x(1) x(2) x(4)x(3)
y(0) y(1) y(2) y(4)y(3)
h(1)
h(2)
h(0)
�
�
���� � ��0����� � ��1���� 1� � ��2���� 2�
![Page 16: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/16.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 16
FIR: Dependence graph
x(0) x(1) x(2) x(4)x(3)
y(0) y(1) y(2) y(4)y(3)
h(1)
h(2)
h(0)
�
�
���� � ��0����� � ��1���� 1� � ��2���� 2�
0
0
0
0 0 0 0
![Page 17: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/17.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 17
Regular dependence graphs
A dependence graph / is regular when:
1. There is a injective mapping 0 from the
nodes of / to a grid of points in the �-
dimensional index space.
2. There exists a finite set 1 of vectors, called
fundamental edges, such that every pair ��, ��of neighboring nodes is mapped to a pair of
grid locations that differ by a fundamental
edge 2 ∈ 1, i.e., 0 � 0 � � 2.
![Page 18: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/18.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 18
FIR: DG in space representation
x(0) x(1) x(2) x(4)x(3)
y(0) y(1) y(2) y(4)y(3)
h(1)
h(2)
h(0)
(1,-1)
(1,0)
(0,1)
1 � 24 25|26� � 1 0 10 1 1fundamental edges
![Page 19: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/19.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 19
Systolic array design
The design of a systolic array for a computation
given in the form of a regular dependence graph
involves:
1. Choosing a processor space, i.e., a set of dimensions
and a number of PEs per dimension (the array).
2. Mapping each computational node of the graph to a
PE of the array.
3. For each PE scheduling the computations of the
nodes mapped onto it, i.e., assigning each individual
computation to a distinct time slot.
Similar to folding
![Page 20: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/20.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 20
Design parameters
An �� 1)-dimensional systolic design for an
�-dimensional regular dependence graph is
characterized by:
1. A � 7 �� 1� processor space matrix 8:
9:0��� is the processor that executes node �2. A �-dimensional scheduling vector ; :
<:0���is the time slot at which node x is executed
3. A projection (iteration) vector = :
0���– 0��� � ?= implies 9:0��� � 9@0���
![Page 21: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/21.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 21
Design constraints
• Computations whose grid locations differ by a multiple of the projection vector execute on the same PE
– 0���– 0��� � ?= implies 9:0��� � 9@0���– hence 9:= � 0
• Computations that execute on the same PE must be scheduled in different time slots – <:0��� is the time slot at which node � is
executed
– hence ;:= A 0
![Page 22: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/22.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 22
Processor allocation:
x(0) x(1) x(2) x(4)x(3)
y(0) y(1) y(2) y(4)y(3)
h(1)
h(2)
h(0)
B: �� � �
pro
ce
sso
rs
): � �1, 0�B: � �0, 1�
![Page 23: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/23.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 23
Scheduling:
x(0) x(1) x(2) x(4)x(3)
y(0) y(1) y(2) y(4)y(3)
h(1)
h(2)
h(0)
0
0
0
1
1
1
2
2
2
3
3
3
4
4
4
time
): � �1, 0�C: � �1, 0�
C: �� � �
![Page 24: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/24.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 24
Hardware Utilization Efficiency (HUE)
Let � and y computations with index vectors 0���, 0���that are executed on the same PE.
• Then 0��� � 0��� � ?=.
Let $D be the time at which � is scheduled and $Ebe the time at which � is scheduled.
• Then $D $E � ;@�0 � 0 � � � ?;:= F |;:=|.Hence, any PE executes at most 1 computation
per ;:= time slots. So
HUE = 1/|;:=|Question: How do we call ;:= ?
![Page 25: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/25.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 25
From DG to systolic array
Map a DG onto a systolic array as follows:
• Nodes:
– map x to processing element 9:0���• Edges
– map � → �to connection 9:0 � → 9:0���– insert ;:2 D-elements in this edge, where
2 � 0���– 0���, is a fundamental edge
Note that there are only finitely many fundamental grid
edges (independent of the size DG), and recall that
each edge is a translation of a fundamental edge.
![Page 26: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/26.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 26
B1: H-stay, X-broadcast, Y-move
2: H:* ;:*� �1, 0� 0 1� �0, 1� 1 0� �1, 1� 1 1
��
� � �
PE PE PE
=: � �1, 0�H: � �0, 1�;: � �1, 0�
![Page 27: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/27.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 27
B1: H-stay, X-broadcast, Y-move
HUE = 1 / | sTd | = 1
h0 h1 h2
0
x(i)
y(i) v(i) u(i)
y(i) = h0·x(i) + v(i-1), v(i) = h1·x(i) + u(i-1), u(i) = h2·x(i) + 0
y(i) = h0·x(i) + h1·x(i-1) + h2·x(i-2)
![Page 28: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/28.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 28
Determining 8, ;, and =• Trial-and-error approach
– Pick a combination and check whether the design
constraints are fulfilled.
• Constructive approach 1. Determine a schedule ;.
2. Determine a projection vector = such that ;:= A 03. Let I � =:=0– ==:
. Then I is a matrix of rank
� 1 such that I:= � 0. By sweeping, a zero
column can be created in Q. Drop this column to
obtain a � 7 � 1 -matrix 8.
![Page 29: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/29.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 29
FIR-designs (Parhi)
sT
dT
pT
pT(eh|ex|ey) s
T(eh|ex|ey)
B1 (1, 0) (1, 0) (0, 1) (0, 1,-1) (1, 0, 1)
F (1, 1) (1, 0) (0, 1) (0, 1,-1) (1, 1, 0)
W1 (2, 1) (1, 0) (0, 1) (0, 1,-1) (2, 1, 1)
W2 (1, 2) (1, 0) (0, 1) (0, 1,-1) (1, 2,-1)
DW2 (1,-1) (1, 0) (0, 1) (0, 1,-1) (1,-1,2)
B2 (1, 0) (1,-1) (1, 1) (1, 1, 0) (1, 0, 1)
R1 (1,-1) (1,-1) (1, 1) (1, 1, 0) (1,-1, 2)
R2 (2, 1) (1,-1) (1, 1) (1, 1, 0) (2, 1, 1)
DR2 (1, 2) (1,-1) (1, 1) (1, 1, 0) (1, 2, -1)
reverse
direction
funda-
mental
edge
ey = -ey
ex = -ex
ex = -ex
ey = -ey
![Page 30: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/30.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 30
Design R1: dependence graph
X(0) X(1) X(2) X(4)X(3)
y(0) y(1) y(2) y(4)y(3)
h(1)
h(2)
h(0)
(1,-1)
(1,0)
(0,-1)
E = ( eh | -ex | ey) =1 0 1
0 -1 -1fundamental edges
![Page 31: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/31.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 31
Space-time diagram R1
0 2 4 6 8 10 12
10
8
6
4
2
): � �1, 1�,B: � �1, 1�,C: � �1,1�
J
K
L � B:0 M � � � �N � C:0 M � � �
![Page 32: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/32.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 32
Processor allocation R1:
X(0) X(1) X(2) X(4)X(3)
y(0) y(1) y(2)
h(1)
h(2)
h(0)
dT = (1, -1)
L � B: �, � :LO)3
![Page 33: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/33.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 33
Scheduling R1:
X(0) X(1) X(2) X(4)X(3)
h(1)
h(2)
h(0)
dT = (1, -1)
0
1
2
4
2
3
5
6
4
6
7
8
10
8
9
sT = (1, -1)
y(0) y(1) y(2) y(4)y(3)
N � C:��, ��: � 3��� � �� P 3�!�2
![Page 34: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/34.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 34
R1: H-move, X-move, Y-stay
2: H:2 ;:2� �1, 0� 1 1� �0, 1� 1 1� �1,1� 0 2
=: � �1, 1�H: � �1, 1�;: � �1, 1�
�
�
���
PE PE PE
![Page 35: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/35.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 35
R1: H-move, X-move, Y-stay
HUE = 1 / | ;Q= | = 1 / 2
(2-slow)
h1
0
0 0
00
h2
h0
x00x1
01
2
4
5
At time: 0
0
1
3
4
0
2
3
5
0 0
06Y20505Y10
504Y005
0 0 0
![Page 36: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/36.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 36
R1: H-move, X-move, Y-stay
0
0 0
00
h2
h0
x00x1
01
2
4
5
0 006Y20
5Y5
Y
XWV
H
![Page 37: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/37.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 37
N � R S � $T �0 �U 0 0 � \\ 01 0 �U ∗ � 0 0 // 02 �� 0 �U ∗ � x� // 0
3 0 �U ∗ � ��� ∗ �� 0 0 \\ 04 � 0 �U ∗ � ��� ∗ �� �U // 05 0 �U 0 0 // 06 �U 0 �U �\ \\ 07 0 �U ∗ �\ 0 0 // �U8 �� 0 �U ∗ �\ �_ // 09 0 �U ∗ �\ ��� ∗ �_ 0 0 \\ 010 � 0 �U ∗ �\ ��� ∗ �_ �a // 011 0 �a 0 0 // 012 �U 0 �a �b \\ 013 0 �U ∗ �a 0 0 // �a
![Page 38: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/38.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 48
Matrix multiplication �c 7 c�: RIA
(��, �� � �∑�: 0 � � � c: &��, ��'��, ���f��, �, g� � �∑�: 0 � � � g: &��, ��'��, ���h��, �, g� � &��, g 1�i��, �, g� � '�g 1, ��
f��, �, 0� � 0f��, �, g� � f��, �, g 1� � h��, �, g�i��, �, g�h��, �, g� � h��, � 1, g�i��, �, g� � i�� 1, �, g�+Oj0 � �, �, g � c
![Page 39: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/39.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 49
i
jk
C
B
A
Dependence graph
for c � 3(Finite!)
10
0
2
0
1
2
1
2
![Page 40: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/40.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 51
Kung-Leiserson design
• Scheduling vector– ;: � �1, 1, 1�
• Projection vector– =: � �1, 1, 1�
• Projection space matrix
– 9: � 1 0 10 1 1
• HUE = 1/3
2 8Q2 ;Q2
h010
01 1
i100
10 1
f001
11 1
![Page 41: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/41.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 52
Kung-
Leiserson
(3x3)-matrix
multiplication
systolic array
delay-elements
not drawn: one
on each edge!y = j-k
x = i-k
![Page 42: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/42.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 53
KL-array
processor
allocation
( binding )
unbalanced
workload
![Page 43: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/43.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 54
i
jk
C
B
A
Dependence graph
for c � 3
0
1
2
0
0
1
2
1
2d
![Page 44: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/44.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 55
KL-array
3-slow
schedule
HUE = 1/3
![Page 45: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/45.jpg)
KL-array details
In addition to the previous slides the
following issues must be addressed
• For both A and B there are 5 input streams
• How are the matrix values distributed over them?
• For C there are 5 output stream
• How are the resulting values distributed over them?
• How are results that become available at an internal PE propagated to the border
• How to operate this array for multiple multiplications?
• Flushing old values, can be combined with getting internal results out.
18-May-16 Rudolf Mak TU/e Computer Science Systolic 56
![Page 46: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/46.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 57
Summary
1. Systolic architectures are attractive for implementation media like VLSI circuits and FPGAs.
2. Starting point for systolic design is a RIA (or a dependence graph).
3. RIAs can be mapped to systolic arrays in a systematic fashion.
4. Mapping uses simple linear algebra techniques.
5. A large variety of designs for a single problem can be obtained.
![Page 47: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/47.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 58
Exercise (systolic design)1. An OCL system is a system that counts (#), for each window of
size � on its input stream, the number of times the last received value occurs in that window, i.e., for � 1 � �
� � � #�: 0 � � � �: � � � � � � ,where �is the input stream and � the output stream.
a) Derive a RIA (in standard output form) for this system that satisfies the equations
� �, � � � � , 0 � � � �� �, � � #�: � � � � � �: � � � � � � , 0 � � � �l �, � � � � � � � , 0 � � � � Note that l��, �� � � �, � ‼!
b) Draw the dependence graph of this RIA for � � 4. (you need to draw only the part with 0 � � � 6).
![Page 48: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/48.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 59
2. Consider the scheduling, projection and processor vector
a) Construct the systolic array that corresponds to these vectors. You
may assume the existence of a comparator operator that takes
two input streams and produces an output stream of one’s and
zero’s, for equal and unequal input pairs respectively.
b) Determine the slowness of your design.
3. Assume that the time to perform comparison and addition are given by Tcmp = 1ns and Tadd = 3ns, respectively. Give the
maximum throughput and the latency of your design (taking slowness into account). Give the latency both in number of delays and in real time (�C)
Exercise (systolic design)
=
C � 21 , ) � 1
2 , p � 01
![Page 49: VLSI programming Systolic Design - TU/ewsinmak/Education/2IMN35/2IMN35-2016... · 2016-05-18 · 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1 VLSI programming Systolic Design](https://reader030.vdocuments.us/reader030/viewer/2022040812/5e54e01c89fa3e3db157d4c8/html5/thumbnails/49.jpg)
18-May-16 Rudolf Mak TU/e Computer Science Systolic 60
4. Next, replace the scheduling vector by sT = (1, 0). Compare the
throughput and latency of the resulting systolic array with that of the one with sT = (2, 1).
5. Consider the design of 4.
a) Eliminate redundant operators, and optimize the throughput by
pipelining. Give the resulting throughput and latency.
b) Next retime the result of a), keeping throughput and latency fixed, to
obtain the minimum number of delays.
Exercise (systolic design)