principles of digital design -...

37
Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine Principles Of Digital Design C to RTL Control/Data flow graphs Finite-state-machine with data IP design Component selection Connection selection Operator and variable mapping Scheduling and pipelining

Upload: others

Post on 18-Mar-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine

Principles Of Digital Design

C to RTL

Control/Data flow graphs Finite-state-machine with data IP design Component selection Connection selection Operator and variable mapping Scheduling and pipelining

Page 2: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 2

Topic preview

Logic gates and flip-flops

3 Boolean algebra

3

Finite-state machine

6

2

8

4

5

6

7

8

9

Logic design techniques

Binary system and data

representation

Generalized finite-state machines

Combinational components

Sequential design techniques

Storage components

Register-transfer design

Processor components

Page 3: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 3

Register-transfer-level design

Each standard or custom IP components consists of one or more datapaths and control units.

To synthesize such IP we use the models of a CDFG and FSMD.

We demonstrate IP synthesis (RTL Design) including component and connectivity selection, expression mapping scheduling and pipelining

Page 4: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 4

Motivation: Ones-counter Control

unit

Control signals

Start

Done

Data=0

Input

Output

OcountTempMaskData

s5

s4

s3

s0

s1

s2

s6

s7

Data = Input

Data = Data >> 1

Mask = 1

Temp = Data AND Mask

Ocount = Ocount + Temp

Ocount = 0

Done=1; Output =Ocount

Start = 1

Data = 0

Data = 0

Start = 0

/

Done=1;

ALUM

Selector 01

Shifter S0

S1

S0

S1

IRIL

BA

S

S2

WAWE

RAAREA

RABREB

8 X m register

file

“0” “0”

3

3

3

20-bit control words

Problem:

Generate controller & control words for given FSMD & Datapath

Page 5: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 5

Ones Counter from C Code

Function-based C code RTL-based C code

01: int OnesCounter(int Data){

02: int Ocount = 0;

03: int Temp, Mask = 1;

04: while (Data > 0) {

05: Temp = Data & Mask;

06 Ocount = Data + Temp;

07: Data >>= 1;

08: }

09: return Ocount;

10: }

01: while(1) {

02: while (Start == 0);

03: Done = 0;

04: Data = Input;

05: Ocount = 0;

06: Mask = 1;

07: while (Data>0) {

08: Temp = Data & Mask;

09: Ocount = Ocount + Temp;

10: Data >>= 1;

11: }

12: Output = Ocount;

13: Done = 1;

14: }

•Programming language semantics • Sequential execution,

• Coding style to minimize coding

•HW design • Parallel execution,

• Communication through signals

Page 6: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 6

CDFG for Ones Counter

0

1

>0

0

DoneOutput

Input

0

Done

0

Ocount

1

MaskData

&

>>1 +

DoneData

DoneOcountData

Start

Data

1

Mask Ocount

Control/Data flow graph •Resembles programming language

•Loops, ifs, basic blocks (BBs)

•Explicit dependencies •Control dependences between BBs

•Data dependences inside BBs

•Missing dependencies between BBs

01: while(1) {

02: while (Start == 0);

03: Done = 0;

04: Data = Input;

05: Ocount = 0;

06: Mask = 1;

07: while (Data>0) {

08: Temp = Data & Mask;

09: Ocount = Ocount + Temp;

10: Data >>= 1;

11: }

12: Output = Ocount;

13: Done = 1;

14: }

RTL-based C code CDFG

Page 7: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 7

CDFG to FSMD for Ones Counter

S5

S0

S2

S7

Start = 0

Data = InputOcount = 0;

Temp = Data AND Mask;Ocount = Ocount + Temp;Data = Data >> 1

Done = 1;

Data = 0

Start = 1

Data = 0/

Done = 0; Mask = 1;

Output = Ocount;

0

1

>0

0

DoneOutput

Input

0

Done

0

Ocount

1

MaskData

&

>>1 +

DoneData

DoneOcountData

Start

Data

1

Mask Ocount

S5

S4

S3

S0

S1

S2

S6

S7

Start = 0

Done = 0; Data = Input

Ocount = 0

Mask = 1

Temp = Data AND Mask

Ocount = Ocount + Temp

Data = Data >> 1

Done = 1; Output = OcountData = 0

Start = 1

Data = 0/

CDFG Super-state FSMD Cycle-accurate FSMD

Page 8: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 8

FSMD for Ones Counter

•FSMD more detailed then CDFG

•States may represent clock cycles

•Conditionals and statements executed concurrently

• All statement in each state executed concurrently

•Control signal and variable assignments executed concurrently

•FSMD includes scheduling

•FSMD doesn't specify binding or connectivity

S5

S4

S3

S0

S1

S2

S6

S7

Start = 0

Done = 0; Data = Input

Ocount = 0

Mask = 1

Temp = Data AND Mask

Ocount = Ocount + Temp

Data = Data >> 1

Done = 1; Output = OcountData = 0

Start = 1

Data = 0/

Page 9: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 9

FSMD Definition We defined an FSM as a quintuple < S, I, O, f, h > where S is a set of

states, I and O are the sets of input and output symbols: f : S × I S , and h : S O More precisely, I = A1 × A2 ×… Ak

S = Q1 × Q2 ×… Qm O = Y1 × Y2 ×… Yn

Where Ai, is an input signal, Qi, is the state register output and Yi, is an output signal.

To define a FSMD, we define a set of variables, V = V1 × V2 ×…Vq ,

which defines the state of the datapath by defining the values of all variables in each state with the set of expressions Expr(V):

Expr(V) = Const U V U {ei # ej | ei, ej el of Expr(V), # is an operation} Notes: 1. Status signal is a signal in I; 2. Control signals are signals in O; 3. Datapath inputs and outputs are variables in V

Page 10: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 10

RTL Design Model

Control unit

Datapath

Control signalsStatus signals

Control inputs

Datapathinputs

Datapathoutputs

Control outputs

Control unit

Datapath

Control signalsStatus signals

Control inputs

Datapathinputs

Datapathoutputs

Control outputs

High-level block diagram

Register-transfer-level block diagram

Control unit Datapath

Bus 1Bus 2

Bus 3Statussignals

Control signals

Controloutputs

Datapathoutputs

Datapathinputs

Control inputs

Out Reg

Register

ALU */÷

RF Mem

Selector

Output logic

Next-state logic

-

D Q

D Q

D Q

.

.

....

.

.

.

Stateregister

Page 11: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 11

RTL Design Model

Control unit

Datapath

Control signalsStatus signals

Control inputs

Datapathinputs

Datapathoutputs

Control outputs

Control unit

Datapath

Control signalsStatus signals

Control inputs

Datapathinputs

Datapathoutputs

Control outputs

High-level block diagram

Register-transfer-level block diagram

Control unit Datapath

Bus 1Bus 2

Bus 3Statussignals

Control signals

Controloutputs

Datapathoutputs

Datapathinputs

Control inputs

Out Reg

Register

ALU */÷

RF Mem

Selector

ProgramMemory

Next-address

logic

-

.

.

....

ProgramCounter

Page 12: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 12

C-to-RTL design

RTL generation requires definition of controller datapath

RTL generation of a controller requires choice of state register (program counter) output logic (program memory) next-state logic (next-address generator)

RTL generation of a datapath RTL component and connectivity selection, expression mapping (variable and operation mapping) scheduling and pipelining

Page 13: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 13

Square Root Approximation: C to CDFG

0Start

t1=|a|t2=|b|

x=max (t1, t2)y=min(t1, t2)

t3=x>>3t4=y>>1t5=x-t3t6=t4+t5

t7= max(t6,x)Done=1Out=t7

a=In 1b=In 2

1

0

1

Start

In1 In 2

a b

a b

min

|a| |b|

max

>>1 >>3

-+

max

1

Out Done

Flowchart Control/Data flow graph

Example: Sq root (a + b) = max(0.875 x + 0.5 y), where x = max(|a|, |b|), y = min (|a|, |b|)

s5

s6

s7

Out

a b

min

|a| |b|

max

>>1 >>3

-

+

max

t1 t2

y x

t4 t3

t5

t6

t7

s1

s2

s3

s4

s5

s6

s7

Out

a b

min

|a| |b|

max

>>1

>>3

-

+

max

t1 t2

y

x

t4

t3

t5

t6

t7

s1

s2

s3

s4

ASAP ALAP

Page 14: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 14

Square Root Approximation: Scheduling

0Start

t1=|a|t2=|b|

x=max (t1, t2)y=min(t1, t2)

t3=x>>3t4=y>>1t5=x-t3t6=t4+t5

t7= max(t6,x)Done=1Out=t7

a=In 1b=In 2

1

0

1

Start

In1 In 2

a b

a b

min

|a| |b|

max

>>1 >>3

-+

max

1

Out Done

Flowchart Control/Data flow graph

Example: Sq root (a + b) = max(0.875 x + 0.5 y), where x = max(|a|, |b|), y = min (|a|, |b|)

s5

s6

s7

Out

a b

min

|a| |b|

max

>>1 >>3

-

+

max

t1 t2

y x

t4 t3

t5

t6

t7

s1

s2

s3

s4

s5

s6

s7

s1

s2

s3

s4

Out

min

|a|

|b|

max

>>1

>>3

-

+

max

s8

a b

t1

t2

x

y t3

t4

t7

t6

t5

Resource-constrained

ASAP

Page 15: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 15

Square Root Approximation: CDFG to FSMD

0

1

Start

In1 In 2

a b

a b

min

|a| |b|

max

>>1 >>3

-+

max

1

Out Done

Control/Data flow graph

Example: Sq root (a + b) = max(0.875 x + 0.5 y), where x = max(|a|, |b|), y = min (|a|, |b|)

s5

s6

s7

Out

a b

min

|a| |b|

max

>>1 >>3

-

+

max

t1 t2

y x

t4 t3

t5

t6

t7

s1

s2

s3

s4

s0a = In 1b = In 2

Start = 0Start = 1

s1

s2

s3

s4

s5

s6

s7

t1 = |a|t2 = |b|

t5 = x – t3

x = max( t1 , t2 )y = min ( t1 , t2 )

t3 = x >> 3t4 = y >>1

t6 = t4 + t5

t7 = max ( t6 , x )

Done = 1Out = t7

ASAP FSMD

Page 16: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 16

Square Root Approximation: FSMD Design Example: Sq root (a + b) = max(0.875 x + 0.5 y), where x = max(|a|, |b|), y = min (|a|, |b|)

s0a = In 1b = In 2

Start = 0Start = 1

s1

s2

s3

s4

s5

s6

s7

t1 = |a|t2 = |b|

t5 = x – t3

x = max( t1 , t2 )y = min ( t1 , t2 )

t3 = x >> 3t4 = y >>1

t6 = t4 + t5

t7 = max ( t6 , x )

Done = 1Out = t7

Control Start

Done

In 1

Out

In 2

• Storage allocation and sharing

• Functional unit allocation and sharing

• Bus allocation and sharing

Page 17: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 17

Resource usage in SRA

Square-root approximation

No. of live variables

1233222Xt7

Xt6

Xt5

XXt4

Xt3

XyXXXXx

Xt2

Xt1

XbXa

s7s6s5s4s3s2s1

No. of live variables

1233222Xt7

Xt6

Xt5

XXt4

Xt3

XyXXXXx

Xt2

Xt1

XbXa

s7s6s5s4s3s2s1

Max. no.of units

No. of operations

111212

1+

1-

2>>

11max

1min

211211

2abs

s7s6s5s4s3s2s1

Max. no.of units

No. of operations

111222

1+

1-

2>>

11max

1min

21

12

1

1

2abs

s7s6s5s4s3s2s1

Variable usage

Operation usage

s0a = In 1b = In 2

Start = 0Start = 1

s1

s2

s3

s4

s5

s6

s7

t1 = |a|t2 = |b|

t5 = x – t3

x = max( t1 , t2 )y = min ( t1 , t2 )

t3 = x >> 3t4 = y >>1

t6 = t4 + t5

t7 = max ( t6 , x )

Done = 1Out = t7

Page 18: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 18

Resource usage in SRA

Square-root approximation

Max. no.of units

No. of operations

111212

1+

1-

2>>

11max

1min

211211

2abs

s7s6s5s4s3s2s1

Max. no.of units

No. of operations

111222

1+

1-

2>>

11max

1min

21

12

1

1

2abs

s7s6s5s4s3s2s1

Connectivity usage

Operation usage

s0a = In 1b = In 2

Start = 0Start = 1

s1

s2

s3

s4

s5

s6

s7

t1 = |a|t2 = |b|

t5 = x – t3

x = max( t1 , t2 )y = min ( t1 , t2 )

t3 = x >> 3t4 = y >>1

t6 = t4 + t5

t7 = max ( t6 , x )

Done = 1Out = t7

a b t1 t2 x y t3 t4 t5 t6 t7

abs1 i o abs2 i o min i i o max i i i o i o >>3 i o >>1 i o

- i i o + i i o

Page 19: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 19

Register sharing (Variable merging)

Group variables with non-overlaping lifetimes

Each group shares one register

Grouping reduces number of registers needed in the design

There are many partitioning algorithms

Page 20: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 20

Merging variables with common sources and destination

FSMD Datapath without register sharing Datapath with register sharing

x = a + b

y = c + dsj

siSelector Selector

Selector

a , c b , d

x , y

+

a

Selector Selector

Selector Selector

c b d

x y

+

Page 21: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 21

Register sharing (Variable merging) Compatibility graph

t1

t2

t3

t4

t5 t6

t7x

a y

b

1/0 0/1

1/0

1/0

1/0

1/0 0/1

0/1

No. of live variables

1233222Xt7

Xt6

Xt5

XXt4

Xt3

XyXXXXx

Xt2

Xt1

XbXa

s7s6s5s4s3s2s1

No. of live variables

1233222Xt7

Xt6

Xt5

XXt4

Xt3

XyXXXXx

Xt2

Xt1

XbXa

s7s6s5s4s3s2s1

s0a = In 1b = In 2

Start = 0Start = 1

s1

s2

s3

s4

s5

s6

s7

t1 = |a|t2 = |b|

t5 = x – t3

x = max( t1 , t2 )y = min ( t1 , t2 )

t3 = x >> 3t4 = y >>1

t6 = t4 + t5

t7 = max ( t6 , x )

Done = 1Out = t7

Square-root approximation Variable usage

Page 22: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 22

Register sharing (Variable merging) Compatibility graph

t1

t2

t3

t4

t5 t6

t7x

a y

b

1/0 0/1

1/0

1/0

1/0

1/0 0/1

0/1

Selector Selector

R1 R2 R3

| a | | b | min max + - >>1 >>3

R1 = [ a , t1 , x , t7 ] R2 = [ b , t2 , y , t3 , t5 , t6 ] R3 = [ t4 ]

Partitioned compatibility graph

t1

t2

t3

t4

t5 t6

t7x

a y

b

1/0 0/1

1/0

1/0

1/0

1/0 0/1

0/1

Page 23: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 23

FU sharing (Operator merging)

Group non-concurrent operations Each group shares one functional unit Sharing reduces number of functional units Grouping also reduces connectivity Clustering algorithms are used for grouping

Page 24: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 24

FU-sharing motivation

Partial FSMD Non-shared design Shared design

x = a + b

y = c - dsj

si

a

Selector Selector

c b d

x y

+/-

a b

x

+

c d

y

-

Page 25: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 25

Operator-merging for SRA |a| |b|

max min

+ -

Compatibility graph

s0a = In 1b = In 2

Start = 0Start = 1

s1

s2

s3

s4

s5

s6

s7

t1 = |a|t2 = |b|

t5 = x – t3

x = max( t1 , t2 )y = min ( t1 , t2 )

t3 = x >> 3t4 = y >>1

t6 = t4 + t5

t7 = max ( t6 , x )

Done = 1Out = t7

Square-root approximation

|a| |b|

max min

+ -

Partitioned compatibility graph

Selector Selector

R1 R2 R3

[ abs/max]>>1 >>3Selector

[ abs/min/+/- ]

Datapath after variable and operator merging

Page 26: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 26

Bus sharing ( connection merging )

Group connections that are not used concurrently

Each group forms a bus

Connection merging reduces number of wires

Clustering algorithm work well

Page 27: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 27

Connection merging in SRA datapath

XLXM

Xs7

XN

XKXXXXJ

XXXIXHXG

XXXXFXE

XXDXXXCXXB

As6s5s4s3s2s1s0

XLXM

Xs7

XN

XKXXXXJ

XXXIXHXG

XXXXFXE

XXDXXXCXXB

As6s5s4s3s2s1s0

D

A

B

C

E

F

G

H I

J

K

L

M

N

Bus1 = [ A, C, D, E, H ]

Bus2 = [ B, F, G ]

Bus3 = [ I, K, M ]

Bus4 = [ J, L, N ]

Bus assignment

Connectivity usage table Compatibility graph for input buses Compatibility graph for output buses

Selector Selector

R1 R2 R3

[ abs/max]>>1 >>3Selector

[ abs/min/+/- ]

A B C D E F G H

IJ

K L

M NIn 1 In 2

Out

Datapath after variable and operator merging

Page 28: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 28

Connection merging in SRA datapath

Bus1 = [ A, C, D, E, H ]

Bus2 = [ B, F, G ]

Bus3 = [ I, K, M ]

Bus4 = [ J, L, N ]

Bus assignment

Connectivity usage table

Selector Selector

R1 R2 R3

[ abs/max]>>1 >>3Selector

[ abs/min/+/- ]

A B C D E F G H

IJ

K L

M NIn 1 In 2

Out

Datapath after variable and operator merging

R1 R2 R3

>>1 >>3

Bus 1

[ abs/min] [ abs/max/+/- ]

Bus 2

Bus 3

Bus 4

Datapath after variable, operator and connectivity merging

XLXM

Xs7

XN

XKXXXXJ

XXXIXHXG

XXXXFXE

XXDXXXCXXB

As6s5s4s3s2s1s0

XLXM

Xs7

XN

XKXXXXJ

XXXIXHXG

XXXXFXE

XXDXXXCXXB

As6s5s4s3s2s1s0

Page 29: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 29

Register merging into Register files

Group register with non-overlapping accesses

Each group assigned to one register file

Register grouping reduces number of ports, and therefore number of buses

Use some clustering algorithms

Page 30: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 30

Register merging

s0

R2

R3

R1

s7s6s5s4s3s2s1s0

R2

R3

R1

s7s6s5s4s3s2s1

H

Out

In 1 In 2

R1

R2

>>3 >>1

Bus 1

[ abs/max] [ abs/min/+/- ]

Bus 2

Bus 3

Bus 4

R3

R1 = [ a, t1, x, t7 ] R2 = [ b, t2, y, t3, t5, t6 ] R3 = [ t4 ]

Register assignment

Datapath after register merging

Register access table

R1 R2

R3

Compatibility graph s0

a = In 1b = In 2

Start = 0Start = 1

s1

s2

s3

s4

s5

s6

s7

t1 = |a|t2 = |b|

t5 = x – t3

x = max( t1 , t2 )y = min ( t1 , t2 )

t3 = x >> 3t4 = y >>1

t6 = t4 + t5

t7 = max ( t6 , x )

Done = 1Out = t7

Square-root approximation

Page 31: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 31

Chaining and multi-cycling

Chaining allows serial execution of two or more operations in each state

Chaining reduces number of states and increases performance

Multi-cycling allows one operation to be executed over two or more clock cycles

Multi-cycling reduces size of functional units

Multi-cycling is used on noncritical paths to improve resource utilization

Page 32: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 32

SRA datapath with chained units

Datapath schematic

In 1

R1 R2 R3

>>1

Bus 1

[ abs/max] [ abs/min/+/- ]

Bus 2

Bus 3

Bus 4

>>3

In 2

Out

s0a = In 1b = In 2

Start = 0Start = 1

s1

s2

s3

s4

s5

s6

t1 = |a|t2 = |b|

t5 = x – t3

x = max( t1 , t2 )t3 = max( t1 , t2 )>>3t4 = min( t1 , t2 )>>1

t6 = t4 + t5

t7 = max ( t6 , x )

Done = 1Out = t7

Square-root approximation

R1 = [ a, t1, x, t7 ] R2 = [ b, t2, y, t3, t5, t6 ] R3 = [ t4 ]

Page 33: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 33

SRA datapath with multi-cycle units

In 1

R1 R2 R3

>>1

Bus 1

[ abs/max] [ abs/+/- ]

Bus 2

Bus 3

Bus 4

>>3

In 2

Out

min

Datapath schematic

s0a = In 1b = In 2

Start = 0Start = 1

s1

s2

s3

s4

s5

s6

t1 = |a|t2 = |b|

t5 = x – t3

x = max( t1 , t2 )t3 = max( t1 , t2 )>>3t4 = min( t1 , t2 )>>1

t6 = t4 + t5

t7 = max ( t6 , x )

Done = 1Out = t7

Square-root approximation

R1 = [ a, t1, x, t7 ] R2 = [ b, t2, y, t3, t5, t6 ] R3 = [ t4 ]

Page 34: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 34

Pipelining

Pipelining improves performance at a very small additional cost

Pipelining divides design into stages and uses all stages concurrently for different data (assembly line principle)

Pipelining principles works on several levels:

(a) Unit pipelining

(b) Control pipelining

(c) Datapath pipelining

Page 35: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 35

SRA datapath with single AU In 1

R1 R2 R3

>>1

Bus 1

Bus 3Bus 4

>>3

In 2

Out

AU

Bus 2

Datapath schematic

s0a = In 1b = In 2

Start = 0Start = 1

s2

s3

s4

s5

s6

s7

s8

t2 = |b|

t5 = x – t3

x = max( t1 , t2 )t3 = max ( t1 , t2 )>>3

t4 = min ( t1 , t2 )>>1

t6 = t4 + t5

t7 = max ( t6 , x )

Done = 1Out = t7

t1 = |a|s1

Square-root approximation

for single AU t7Outport

t4rite R3

t6t5t3t2bWrite R2

t7xt1aWrite R1

>>1>>3shiftersmax+-minmax|b||a|AU stage 2

max+-minmax|b||a|AU stage 1t4Read R3

t6t5t3t2t 2bRead R2

t7xxt1t1aRead R1

s12s11s10s9s8s7s6s5s4s3s2s1s0

t7Outportt43

t6t5t3t2bWrite R 2

t7xt1aWrite R 1

>>1>>3shiftersmax+-minmax|b||a|AU stage 2

max+-minmax|b||a|AU stage 1t4Read R 3

t6t5t3t2t2bRead R 2

t7xxt1t1aRead R 1

s 8s 7s6s5s4s3s2s1s0

Write R

Timing diagram

Page 36: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 36

Pipelined FSMD implementation

ALU

Selector

RF

Bus 2Bus 1

Statussignals

Controlsignals

Control inputs Datapath inputs

Datapath

OutputLogic

Stateregister

Next-Statelogic

Control unitControl outputs Datapath outputs

Standard FSMD implementation

Out Reg

s0

a =< ba > b

s2

y = x - 1

x = c + ds1

Outportt4rite R3

t6t5t3t2bWrite R2

xt1a>>1>>3shifters

+-minmax|b||a|AU stage 2+-minmax|b||a|AU stage 1t4Read R3

t6t5t3t2t 2bRead Rxxt1t1aRead R1

s10s9s8s7s6s5s4s3s2s1s

0

Write RF

Write SR

Write Status

ALU

Read Status

Write ALUInRead RF

Read CReg

s

2s

2s /s1

s0

Write CRegRead SReg

Read ALUIn

s0

s 0

a,b

a>b

a,ba,b

a>ba>b

1

s1

1sc,d

c,dc,d

c+d

x

x

xx

x-1

y

s

s2

2

s2Example Timing diagram

Page 37: Principles Of Digital Design - CECSgajski/eecs31/slides/Digital_Design_-_C_to_RTL_dR9pHa2u.pdfDigital Design C to RTL Control/Data flow graphs . Finite-state-machine with data . IP

Copyright © 2010-20011 by Daniel D. Gajski EECS31/CSE31/, University of California, Irvine 37

Summary We introduced RTL design: FSMD model RTL specification with

Procedure for synthesis from RTL specification Scheduling of basic blocks Design Optimization through

Design Pipelining

FSMD CDFG

Register sharing Functional unit sharing Bus sharing Unit chaining Multi-clocking

Unit pipelining Control pipelining Datapath pipelining