principles of digital design -...
TRANSCRIPT
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine
Principles Of Digital Design
C to RTL
Control/Data flow graphs Finite-state-machine with data IP design Component selection Connection selection Operator and variable mapping Scheduling and pipelining
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 2
Topic preview
Logic gates and flip-flops
3 Boolean algebra
3
Finite-state machine
6
2
8
4
5
6
7
8
9
Logic design techniques
Binary system and data
representation
Generalized finite-state machines
Combinational components
Sequential design techniques
Storage components
Register-transfer design
Processor components
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 3
Register-transfer-level design
Each standard or custom IP components consists of one or more datapaths and control units.
To synthesize such IP we use the models of a CDFG and FSMD.
We demonstrate IP synthesis (RTL Design) including component and connectivity selection, expression mapping scheduling and pipelining
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 4
Motivation: Ones-counter Control
unit
Control signals
Start
Done
Data=0
Input
Output
OcountTempMaskData
s5
s4
s3
s0
s1
s2
s6
s7
Data = Input
Data = Data >> 1
Mask = 1
Temp = Data AND Mask
Ocount = Ocount + Temp
Ocount = 0
Done=1; Output =Ocount
Start = 1
Data = 0
Data = 0
Start = 0
/
Done=1;
ALUM
Selector 01
Shifter S0
S1
S0
S1
IRIL
BA
S
S2
WAWE
RAAREA
RABREB
8 X m register
file
“0” “0”
3
3
3
20-bit control words
Problem:
Generate controller & control words for given FSMD & Datapath
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 5
Ones Counter from C Code
Function-based C code RTL-based C code
01: int OnesCounter(int Data){
02: int Ocount = 0;
03: int Temp, Mask = 1;
04: while (Data > 0) {
05: Temp = Data & Mask;
06 Ocount = Data + Temp;
07: Data >>= 1;
08: }
09: return Ocount;
10: }
01: while(1) {
02: while (Start == 0);
03: Done = 0;
04: Data = Input;
05: Ocount = 0;
06: Mask = 1;
07: while (Data>0) {
08: Temp = Data & Mask;
09: Ocount = Ocount + Temp;
10: Data >>= 1;
11: }
12: Output = Ocount;
13: Done = 1;
14: }
•Programming language semantics • Sequential execution,
• Coding style to minimize coding
•HW design • Parallel execution,
• Communication through signals
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 6
CDFG for Ones Counter
0
1
>0
0
DoneOutput
Input
0
Done
0
Ocount
1
MaskData
&
>>1 +
DoneData
DoneOcountData
Start
Data
1
Mask Ocount
Control/Data flow graph •Resembles programming language
•Loops, ifs, basic blocks (BBs)
•Explicit dependencies •Control dependences between BBs
•Data dependences inside BBs
•Missing dependencies between BBs
01: while(1) {
02: while (Start == 0);
03: Done = 0;
04: Data = Input;
05: Ocount = 0;
06: Mask = 1;
07: while (Data>0) {
08: Temp = Data & Mask;
09: Ocount = Ocount + Temp;
10: Data >>= 1;
11: }
12: Output = Ocount;
13: Done = 1;
14: }
RTL-based C code CDFG
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 7
CDFG to FSMD for Ones Counter
S5
S0
S2
S7
Start = 0
Data = InputOcount = 0;
Temp = Data AND Mask;Ocount = Ocount + Temp;Data = Data >> 1
Done = 1;
Data = 0
Start = 1
Data = 0/
Done = 0; Mask = 1;
Output = Ocount;
0
1
>0
0
DoneOutput
Input
0
Done
0
Ocount
1
MaskData
&
>>1 +
DoneData
DoneOcountData
Start
Data
1
Mask Ocount
S5
S4
S3
S0
S1
S2
S6
S7
Start = 0
Done = 0; Data = Input
Ocount = 0
Mask = 1
Temp = Data AND Mask
Ocount = Ocount + Temp
Data = Data >> 1
Done = 1; Output = OcountData = 0
Start = 1
Data = 0/
CDFG Super-state FSMD Cycle-accurate FSMD
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 8
FSMD for Ones Counter
•FSMD more detailed then CDFG
•States may represent clock cycles
•Conditionals and statements executed concurrently
• All statement in each state executed concurrently
•Control signal and variable assignments executed concurrently
•FSMD includes scheduling
•FSMD doesn't specify binding or connectivity
S5
S4
S3
S0
S1
S2
S6
S7
Start = 0
Done = 0; Data = Input
Ocount = 0
Mask = 1
Temp = Data AND Mask
Ocount = Ocount + Temp
Data = Data >> 1
Done = 1; Output = OcountData = 0
Start = 1
Data = 0/
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 9
FSMD Definition We defined an FSM as a quintuple < S, I, O, f, h > where S is a set of
states, I and O are the sets of input and output symbols: f : S × I S , and h : S O More precisely, I = A1 × A2 ×… Ak
S = Q1 × Q2 ×… Qm O = Y1 × Y2 ×… Yn
Where Ai, is an input signal, Qi, is the state register output and Yi, is an output signal.
To define a FSMD, we define a set of variables, V = V1 × V2 ×…Vq ,
which defines the state of the datapath by defining the values of all variables in each state with the set of expressions Expr(V):
Expr(V) = Const U V U {ei # ej | ei, ej el of Expr(V), # is an operation} Notes: 1. Status signal is a signal in I; 2. Control signals are signals in O; 3. Datapath inputs and outputs are variables in V
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 10
RTL Design Model
Control unit
Datapath
Control signalsStatus signals
Control inputs
Datapathinputs
Datapathoutputs
Control outputs
Control unit
Datapath
Control signalsStatus signals
Control inputs
Datapathinputs
Datapathoutputs
Control outputs
High-level block diagram
Register-transfer-level block diagram
Control unit Datapath
Bus 1Bus 2
Bus 3Statussignals
Control signals
Controloutputs
Datapathoutputs
Datapathinputs
Control inputs
Out Reg
Register
ALU */÷
RF Mem
Selector
Output logic
Next-state logic
-
D Q
D Q
D Q
.
.
....
.
.
.
Stateregister
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 11
RTL Design Model
Control unit
Datapath
Control signalsStatus signals
Control inputs
Datapathinputs
Datapathoutputs
Control outputs
Control unit
Datapath
Control signalsStatus signals
Control inputs
Datapathinputs
Datapathoutputs
Control outputs
High-level block diagram
Register-transfer-level block diagram
Control unit Datapath
Bus 1Bus 2
Bus 3Statussignals
Control signals
Controloutputs
Datapathoutputs
Datapathinputs
Control inputs
Out Reg
Register
ALU */÷
RF Mem
Selector
ProgramMemory
Next-address
logic
-
.
.
....
ProgramCounter
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 12
C-to-RTL design
RTL generation requires definition of controller datapath
RTL generation of a controller requires choice of state register (program counter) output logic (program memory) next-state logic (next-address generator)
RTL generation of a datapath RTL component and connectivity selection, expression mapping (variable and operation mapping) scheduling and pipelining
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 13
Square Root Approximation: C to CDFG
0Start
t1=|a|t2=|b|
x=max (t1, t2)y=min(t1, t2)
t3=x>>3t4=y>>1t5=x-t3t6=t4+t5
t7= max(t6,x)Done=1Out=t7
a=In 1b=In 2
1
0
1
Start
In1 In 2
a b
a b
min
|a| |b|
max
>>1 >>3
-+
max
1
Out Done
Flowchart Control/Data flow graph
Example: Sq root (a + b) = max(0.875 x + 0.5 y), where x = max(|a|, |b|), y = min (|a|, |b|)
s5
s6
s7
Out
a b
min
|a| |b|
max
>>1 >>3
-
+
max
t1 t2
y x
t4 t3
t5
t6
t7
s1
s2
s3
s4
s5
s6
s7
Out
a b
min
|a| |b|
max
>>1
>>3
-
+
max
t1 t2
y
x
t4
t3
t5
t6
t7
s1
s2
s3
s4
ASAP ALAP
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 14
Square Root Approximation: Scheduling
0Start
t1=|a|t2=|b|
x=max (t1, t2)y=min(t1, t2)
t3=x>>3t4=y>>1t5=x-t3t6=t4+t5
t7= max(t6,x)Done=1Out=t7
a=In 1b=In 2
1
0
1
Start
In1 In 2
a b
a b
min
|a| |b|
max
>>1 >>3
-+
max
1
Out Done
Flowchart Control/Data flow graph
Example: Sq root (a + b) = max(0.875 x + 0.5 y), where x = max(|a|, |b|), y = min (|a|, |b|)
s5
s6
s7
Out
a b
min
|a| |b|
max
>>1 >>3
-
+
max
t1 t2
y x
t4 t3
t5
t6
t7
s1
s2
s3
s4
s5
s6
s7
s1
s2
s3
s4
Out
min
|a|
|b|
max
>>1
>>3
-
+
max
s8
a b
t1
t2
x
y t3
t4
t7
t6
t5
Resource-constrained
ASAP
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 15
Square Root Approximation: CDFG to FSMD
0
1
Start
In1 In 2
a b
a b
min
|a| |b|
max
>>1 >>3
-+
max
1
Out Done
Control/Data flow graph
Example: Sq root (a + b) = max(0.875 x + 0.5 y), where x = max(|a|, |b|), y = min (|a|, |b|)
s5
s6
s7
Out
a b
min
|a| |b|
max
>>1 >>3
-
+
max
t1 t2
y x
t4 t3
t5
t6
t7
s1
s2
s3
s4
s0a = In 1b = In 2
Start = 0Start = 1
s1
s2
s3
s4
s5
s6
s7
t1 = |a|t2 = |b|
t5 = x – t3
x = max( t1 , t2 )y = min ( t1 , t2 )
t3 = x >> 3t4 = y >>1
t6 = t4 + t5
t7 = max ( t6 , x )
Done = 1Out = t7
ASAP FSMD
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 16
Square Root Approximation: FSMD Design Example: Sq root (a + b) = max(0.875 x + 0.5 y), where x = max(|a|, |b|), y = min (|a|, |b|)
s0a = In 1b = In 2
Start = 0Start = 1
s1
s2
s3
s4
s5
s6
s7
t1 = |a|t2 = |b|
t5 = x – t3
x = max( t1 , t2 )y = min ( t1 , t2 )
t3 = x >> 3t4 = y >>1
t6 = t4 + t5
t7 = max ( t6 , x )
Done = 1Out = t7
Control Start
Done
In 1
Out
In 2
• Storage allocation and sharing
• Functional unit allocation and sharing
• Bus allocation and sharing
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 17
Resource usage in SRA
Square-root approximation
No. of live variables
1233222Xt7
Xt6
Xt5
XXt4
Xt3
XyXXXXx
Xt2
Xt1
XbXa
s7s6s5s4s3s2s1
No. of live variables
1233222Xt7
Xt6
Xt5
XXt4
Xt3
XyXXXXx
Xt2
Xt1
XbXa
s7s6s5s4s3s2s1
Max. no.of units
No. of operations
111212
1+
1-
2>>
11max
1min
211211
2abs
s7s6s5s4s3s2s1
Max. no.of units
No. of operations
111222
1+
1-
2>>
11max
1min
21
12
1
1
2abs
s7s6s5s4s3s2s1
Variable usage
Operation usage
s0a = In 1b = In 2
Start = 0Start = 1
s1
s2
s3
s4
s5
s6
s7
t1 = |a|t2 = |b|
t5 = x – t3
x = max( t1 , t2 )y = min ( t1 , t2 )
t3 = x >> 3t4 = y >>1
t6 = t4 + t5
t7 = max ( t6 , x )
Done = 1Out = t7
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 18
Resource usage in SRA
Square-root approximation
Max. no.of units
No. of operations
111212
1+
1-
2>>
11max
1min
211211
2abs
s7s6s5s4s3s2s1
Max. no.of units
No. of operations
111222
1+
1-
2>>
11max
1min
21
12
1
1
2abs
s7s6s5s4s3s2s1
Connectivity usage
Operation usage
s0a = In 1b = In 2
Start = 0Start = 1
s1
s2
s3
s4
s5
s6
s7
t1 = |a|t2 = |b|
t5 = x – t3
x = max( t1 , t2 )y = min ( t1 , t2 )
t3 = x >> 3t4 = y >>1
t6 = t4 + t5
t7 = max ( t6 , x )
Done = 1Out = t7
a b t1 t2 x y t3 t4 t5 t6 t7
abs1 i o abs2 i o min i i o max i i i o i o >>3 i o >>1 i o
- i i o + i i o
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 19
Register sharing (Variable merging)
Group variables with non-overlaping lifetimes
Each group shares one register
Grouping reduces number of registers needed in the design
There are many partitioning algorithms
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 20
Merging variables with common sources and destination
FSMD Datapath without register sharing Datapath with register sharing
x = a + b
y = c + dsj
siSelector Selector
Selector
a , c b , d
x , y
+
a
Selector Selector
Selector Selector
c b d
x y
+
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 21
Register sharing (Variable merging) Compatibility graph
t1
t2
t3
t4
t5 t6
t7x
a y
b
1/0 0/1
1/0
1/0
1/0
1/0 0/1
0/1
No. of live variables
1233222Xt7
Xt6
Xt5
XXt4
Xt3
XyXXXXx
Xt2
Xt1
XbXa
s7s6s5s4s3s2s1
No. of live variables
1233222Xt7
Xt6
Xt5
XXt4
Xt3
XyXXXXx
Xt2
Xt1
XbXa
s7s6s5s4s3s2s1
s0a = In 1b = In 2
Start = 0Start = 1
s1
s2
s3
s4
s5
s6
s7
t1 = |a|t2 = |b|
t5 = x – t3
x = max( t1 , t2 )y = min ( t1 , t2 )
t3 = x >> 3t4 = y >>1
t6 = t4 + t5
t7 = max ( t6 , x )
Done = 1Out = t7
Square-root approximation Variable usage
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 22
Register sharing (Variable merging) Compatibility graph
t1
t2
t3
t4
t5 t6
t7x
a y
b
1/0 0/1
1/0
1/0
1/0
1/0 0/1
0/1
Selector Selector
R1 R2 R3
| a | | b | min max + - >>1 >>3
R1 = [ a , t1 , x , t7 ] R2 = [ b , t2 , y , t3 , t5 , t6 ] R3 = [ t4 ]
Partitioned compatibility graph
t1
t2
t3
t4
t5 t6
t7x
a y
b
1/0 0/1
1/0
1/0
1/0
1/0 0/1
0/1
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 23
FU sharing (Operator merging)
Group non-concurrent operations Each group shares one functional unit Sharing reduces number of functional units Grouping also reduces connectivity Clustering algorithms are used for grouping
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 24
FU-sharing motivation
Partial FSMD Non-shared design Shared design
x = a + b
y = c - dsj
si
a
Selector Selector
c b d
x y
+/-
a b
x
+
c d
y
-
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 25
Operator-merging for SRA |a| |b|
max min
+ -
Compatibility graph
s0a = In 1b = In 2
Start = 0Start = 1
s1
s2
s3
s4
s5
s6
s7
t1 = |a|t2 = |b|
t5 = x – t3
x = max( t1 , t2 )y = min ( t1 , t2 )
t3 = x >> 3t4 = y >>1
t6 = t4 + t5
t7 = max ( t6 , x )
Done = 1Out = t7
Square-root approximation
|a| |b|
max min
+ -
Partitioned compatibility graph
Selector Selector
R1 R2 R3
[ abs/max]>>1 >>3Selector
[ abs/min/+/- ]
Datapath after variable and operator merging
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 26
Bus sharing ( connection merging )
Group connections that are not used concurrently
Each group forms a bus
Connection merging reduces number of wires
Clustering algorithm work well
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 27
Connection merging in SRA datapath
XLXM
Xs7
XN
XKXXXXJ
XXXIXHXG
XXXXFXE
XXDXXXCXXB
As6s5s4s3s2s1s0
XLXM
Xs7
XN
XKXXXXJ
XXXIXHXG
XXXXFXE
XXDXXXCXXB
As6s5s4s3s2s1s0
D
A
B
C
E
F
G
H I
J
K
L
M
N
Bus1 = [ A, C, D, E, H ]
Bus2 = [ B, F, G ]
Bus3 = [ I, K, M ]
Bus4 = [ J, L, N ]
Bus assignment
Connectivity usage table Compatibility graph for input buses Compatibility graph for output buses
Selector Selector
R1 R2 R3
[ abs/max]>>1 >>3Selector
[ abs/min/+/- ]
A B C D E F G H
IJ
K L
M NIn 1 In 2
Out
Datapath after variable and operator merging
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 28
Connection merging in SRA datapath
Bus1 = [ A, C, D, E, H ]
Bus2 = [ B, F, G ]
Bus3 = [ I, K, M ]
Bus4 = [ J, L, N ]
Bus assignment
Connectivity usage table
Selector Selector
R1 R2 R3
[ abs/max]>>1 >>3Selector
[ abs/min/+/- ]
A B C D E F G H
IJ
K L
M NIn 1 In 2
Out
Datapath after variable and operator merging
R1 R2 R3
>>1 >>3
Bus 1
[ abs/min] [ abs/max/+/- ]
Bus 2
Bus 3
Bus 4
Datapath after variable, operator and connectivity merging
XLXM
Xs7
XN
XKXXXXJ
XXXIXHXG
XXXXFXE
XXDXXXCXXB
As6s5s4s3s2s1s0
XLXM
Xs7
XN
XKXXXXJ
XXXIXHXG
XXXXFXE
XXDXXXCXXB
As6s5s4s3s2s1s0
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 29
Register merging into Register files
Group register with non-overlapping accesses
Each group assigned to one register file
Register grouping reduces number of ports, and therefore number of buses
Use some clustering algorithms
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 30
Register merging
s0
R2
R3
R1
s7s6s5s4s3s2s1s0
R2
R3
R1
s7s6s5s4s3s2s1
H
Out
In 1 In 2
R1
R2
>>3 >>1
Bus 1
[ abs/max] [ abs/min/+/- ]
Bus 2
Bus 3
Bus 4
R3
R1 = [ a, t1, x, t7 ] R2 = [ b, t2, y, t3, t5, t6 ] R3 = [ t4 ]
Register assignment
Datapath after register merging
Register access table
R1 R2
R3
Compatibility graph s0
a = In 1b = In 2
Start = 0Start = 1
s1
s2
s3
s4
s5
s6
s7
t1 = |a|t2 = |b|
t5 = x – t3
x = max( t1 , t2 )y = min ( t1 , t2 )
t3 = x >> 3t4 = y >>1
t6 = t4 + t5
t7 = max ( t6 , x )
Done = 1Out = t7
Square-root approximation
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 31
Chaining and multi-cycling
Chaining allows serial execution of two or more operations in each state
Chaining reduces number of states and increases performance
Multi-cycling allows one operation to be executed over two or more clock cycles
Multi-cycling reduces size of functional units
Multi-cycling is used on noncritical paths to improve resource utilization
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 32
SRA datapath with chained units
Datapath schematic
In 1
R1 R2 R3
>>1
Bus 1
[ abs/max] [ abs/min/+/- ]
Bus 2
Bus 3
Bus 4
>>3
In 2
Out
s0a = In 1b = In 2
Start = 0Start = 1
s1
s2
s3
s4
s5
s6
t1 = |a|t2 = |b|
t5 = x – t3
x = max( t1 , t2 )t3 = max( t1 , t2 )>>3t4 = min( t1 , t2 )>>1
t6 = t4 + t5
t7 = max ( t6 , x )
Done = 1Out = t7
Square-root approximation
R1 = [ a, t1, x, t7 ] R2 = [ b, t2, y, t3, t5, t6 ] R3 = [ t4 ]
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 33
SRA datapath with multi-cycle units
In 1
R1 R2 R3
>>1
Bus 1
[ abs/max] [ abs/+/- ]
Bus 2
Bus 3
Bus 4
>>3
In 2
Out
min
Datapath schematic
s0a = In 1b = In 2
Start = 0Start = 1
s1
s2
s3
s4
s5
s6
t1 = |a|t2 = |b|
t5 = x – t3
x = max( t1 , t2 )t3 = max( t1 , t2 )>>3t4 = min( t1 , t2 )>>1
t6 = t4 + t5
t7 = max ( t6 , x )
Done = 1Out = t7
Square-root approximation
R1 = [ a, t1, x, t7 ] R2 = [ b, t2, y, t3, t5, t6 ] R3 = [ t4 ]
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 34
Pipelining
Pipelining improves performance at a very small additional cost
Pipelining divides design into stages and uses all stages concurrently for different data (assembly line principle)
Pipelining principles works on several levels:
(a) Unit pipelining
(b) Control pipelining
(c) Datapath pipelining
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 35
SRA datapath with single AU In 1
R1 R2 R3
>>1
Bus 1
Bus 3Bus 4
>>3
In 2
Out
AU
Bus 2
Datapath schematic
s0a = In 1b = In 2
Start = 0Start = 1
s2
s3
s4
s5
s6
s7
s8
t2 = |b|
t5 = x – t3
x = max( t1 , t2 )t3 = max ( t1 , t2 )>>3
t4 = min ( t1 , t2 )>>1
t6 = t4 + t5
t7 = max ( t6 , x )
Done = 1Out = t7
t1 = |a|s1
Square-root approximation
for single AU t7Outport
t4rite R3
t6t5t3t2bWrite R2
t7xt1aWrite R1
>>1>>3shiftersmax+-minmax|b||a|AU stage 2
max+-minmax|b||a|AU stage 1t4Read R3
t6t5t3t2t 2bRead R2
t7xxt1t1aRead R1
s12s11s10s9s8s7s6s5s4s3s2s1s0
t7Outportt43
t6t5t3t2bWrite R 2
t7xt1aWrite R 1
>>1>>3shiftersmax+-minmax|b||a|AU stage 2
max+-minmax|b||a|AU stage 1t4Read R 3
t6t5t3t2t2bRead R 2
t7xxt1t1aRead R 1
s 8s 7s6s5s4s3s2s1s0
Write R
Timing diagram
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 36
Pipelined FSMD implementation
ALU
Selector
RF
Bus 2Bus 1
Statussignals
Controlsignals
Control inputs Datapath inputs
Datapath
OutputLogic
Stateregister
Next-Statelogic
Control unitControl outputs Datapath outputs
Standard FSMD implementation
Out Reg
s0
a =< ba > b
s2
y = x - 1
x = c + ds1
Outportt4rite R3
t6t5t3t2bWrite R2
xt1a>>1>>3shifters
+-minmax|b||a|AU stage 2+-minmax|b||a|AU stage 1t4Read R3
t6t5t3t2t 2bRead Rxxt1t1aRead R1
s10s9s8s7s6s5s4s3s2s1s
0
Write RF
Write SR
Write Status
ALU
Read Status
Write ALUInRead RF
Read CReg
s
2s
2s /s1
s0
Write CRegRead SReg
Read ALUIn
s0
s 0
a,b
a>b
a,ba,b
a>ba>b
1
s1
1sc,d
c,dc,d
c+d
x
x
xx
x-1
y
s
s2
2
s2Example Timing diagram
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 37
Summary We introduced RTL design: FSMD model RTL specification with
Procedure for synthesis from RTL specification Scheduling of basic blocks Design Optimization through
Design Pipelining
FSMD CDFG
Register sharing Functional unit sharing Bus sharing Unit chaining Multi-clocking
Unit pipelining Control pipelining Datapath pipelining