1 synthesizing datapath circuits for fpgas with emphasis on area minimization andy ye, david lewis,...

1

Synthesizing Datapath Circuits for FPGAs With Emphasis on

Area Minimization

Andy Ye, David Lewis, Jonathan Rose

Department of Electrical and Computer Engineering, University of Toronto

{yeandy, lewis, jayar}@eecg.utoronto.ca

2

Motivation: Datapath Regularity

• Larger FPGAs– Larger applications on FPGAs

– More datapath logic in larger applications

– Datapath logic is highly regular

• Utilize regularity to improve logic density

3

Utilizing Datapath Regularity

• A new datapath-oriented FPGA

• New CAD tools supporting the new FPGA– Synthesis

– Packing

– Placement

– Routing

• This talk focuses on synthesis

4

Background: Datapath-oriented FPGA

• Architected to utilize datapath regularity

• Architectural features– Capture regularity using special logic blocks

– Increase logic density by coarse grain routing

5

Background: FPGA Overview

L L

L L

S

L Logic cluster

Coarse grain routing tracksFine grain routing tracks

S Switch box

RoutingChannels

6

Background: Logic ClusterBLEBLEBLEBLE

BLEBLEBLEBLE

BLEBLEBLEBLE

BLEBLEBLEBLE

Subcluster 1Subcluster 2Subcluster 3Subcluster 4

LocalRoutingNetwork

BLEBLEBLEBLE

A Subcluster

MU

X

LUTDF

F

MA Basic Logic Element (BLE)

7

Background: FPGA Overview

L L

L L

S

L Logic cluster

Coarse grain routing tracksFine grain routing tracks

S Switch box

RoutingChannels

8

Background: Coarse Grain Routing Tracks

Logic Cluster

Sub-cluster

Sub-cluster

Sub-Cluster

Sub-cluster

M

Sw

itch

Bo

x

M

M

Coarse Grain Routing

M M M M

Fine Grain Routing

9

Datapath Synthesis

• Synthesis– The first step in a fully automated CAD flow

– Transforms high level descriptions into logic

• Conventional synthesis (flat synthesis)– Minimizes area and delay metrics

– Destroys datapath regularity

• Datapath synthesis– Preserves datapath regularity

– Supports downstream CAD tools

10

Datapath Representation

• Datapath circuits are represent by netlists of datapath components (VHDL or Verilog)

• Datapath component library– Multiplexers

– Adders/subtracters

– Shifters

– Comparators

– Registers

• Each component consists of identical bit-slices

11

Hard Boundary Hierarchical Synthesis

• Optimize within the boundaries of bit-slices

• Keep identical bit-slices identical

• Optimized 15 datapath circuits from Pico-java processor using Synopsys [sun]– Good regularity

– Bad area - 38% area inflation

• FPGA architecture – increase logic density– Need a better synthesis tool

12

Causes of Area Inflation

• Examined circuits to determine the causes

• Constraint of preserving bit-slice boundaries– Common sub-expressions exist across bit-slices

– Harder to discover in datapath synthesis

• Constraint of preserving datapath regularity– Identical bit-slices have different external connections

– Some bit-slices have more optimization opportunities

– Missing optimization opportunities if one has to keeping all bit-slices identical

13

Enhanced Module CompactionNetlist of Datapath

Components

Word-level Optimization

Module Compaction

Bit-slice Netlist I/OOptimization

Flat Synthesis & OptimizationWithin Bit-slice Boundaries

Manual Operation

Netlist of SynthesizedBit-slices

14

Word-level Optimization

• Done manually and will be automated

• Optimizes across bit-slice boundaries

• Uses the functionality of each datapath component to create optimization opportunities

• Two are performed– Multiplexer tree collapsing

– Operation reordering

• More in the future

15

Multiplexer Tree Collapsing

• Datapath circuits contain multiplexers in a tree topology

• Collapses several multiplexers in a multiplexer tree into a single multiplexer

• Collapsing operation creates common sub-expressions

• Extracts common expressions out of multiple bit-slices to save area

16

An Example

FF

S1

S2

R

A

FF

A

rl

S1

S2

rl – random logic

mux1

mux2

17

Operation Reordering

• Transforms result selection into operand selection

• Accepts the transformation if resulting in smaller area

18

An Example

mux

+ +a b c d

se

mux

+

a c b dmux

e

s

sum carry sum carry

a0b0cin0a c0

d0cin0b

cout0a

cout0bs0

e0

sum carry

e0cout0

cin0

a0 c0 b0 d0

s0

19

Module Compaction

• Merges bit-slices into larger bit-slices

• Based on connectivity between datapath components

• Larger bit-slices have more optimization opportunities for flat synthesis

• Avoids merging based on carry chains

• Similar to the algorithm proposed by Koch

20

An Example

mux0 mux1 mux2 mux3

FA0 FA1 FA2 FA3 FA4

21

Bit-slice I/O Optimization

• Granularity of bit-slice I/O optimization, m

• Breaks datapath components into m-bit wide chunks

• m bit-slices are kept identical to each other

• Allows some bit-slices in a datapath component to be optimized more than others

22

Bit-slice I/O Optimization

• Converts bit-slice I/O signals into internal signals if all m bit-slices meet an optimization criteria

• More optimization opportunities for flat synthesis

• Four types of I/O optimizations– Constant absorption

– Feedback absorption

– Duplicated input absorption

– Unused output absorption

23

Experimental Results

• Fifteen benchmark circuits– From the Pico-java processor

– Synthesized into 4-LUTs and DFFs

• Experiments– Area

– Regularity

– Area against m (the granularity of bit-slice I/O optimization)

24

Area

• m (granularity of bit-slice I/O optimization) = 4

• Compare datapath synthesis with flat synthesis

25

Post-synthesis Area (LUT Count)

Flat Synthesis

Area

Datapath Synthesis

Area Inflation

icu_dpath 3120 3235 3.7%ex_dpath 2530 2553 0.91%multmod_dp 1558 1634 4.9%ucode_dat 1243 1304 4.9%imdr_dpath 1182 1219 3.1%dcu_dpath 960 966 0.63%mantissa_dp 846 878 3.8%incmod_dp 779 865 11%smu_dpath 490 493 0.61%exponent_dp 477 501 5.0%pipe_dpath 443 471 6.3%prils_dp 377 388 2.9%rsadd_dp 346 305 -12%code_seq_dp 218 223 2.3%ucode_reg 78 82 5.1%Total Area 14647 15117 3.2%

26

Regularity

• m (granularity of bit-slice I/O optimization) = 4

• Two terminal connections captured by– 4-bit wide buses

– 4-bit wide control groups

27

Regularity

A 4-bit wide bus

S1S2S3S4

S1S2S3S4

S1S2S3S4

A 4-bit wide control group

28

Regularity ResultsTwo Terminal Connections

4-bit Wide Buses 4-bit Wide Control groups

dcu_dpath 2232 49% 43%ex_dpath 6547 52% 39%icu_dpath 8047 47% 36%imdr_dpath 3100 50% 36%pipe_dpath 1049 48% 42%smu_dpath 1167 48% 25%ucode_data 3143 52% 41%ucode_reg 194 72% 21%code_seq_dp 799 58% 18%exponent_dp 1362 32% 23%incmod_dp 2013 42% 33%mantissa_dp 2533 47% 36%multmod_dp 3380 39% 25%prils_dp 864 41% 32%rsadd_dp 722 52% 27%Total 37152 48% 35%

• 94% of LUTs remain in regular datapath components

29

Granularity (m) Vs. Area

• Higher m (the granularity of bit-slice I/O optimization)– Keeps more bit-slices identical

– Preserves more regularity

– Higher area cost

30

Granularity Vs. Area Inflation

0

1

2

3

4

5

6

7

8

%

1 4 8 12 16 20 24 28 32

31

Conclusion

• Presented a datapath-oriented FPGA architecture

• Presented an enhanced module compaction algorithm

• Empirically demonstrated the area efficiency of the algorithm– 3%-8% area inflation

• Good regularity– 48% two terminal connections are in 4-bit wide buses– 35% two terminal connections are in 4-bit wide control

groups

1 synthesizing datapath circuits for fpgas with emphasis on area minimization andy ye, david lewis,...

Documents

datapath regularityidentical

datapath synthesisconstraint

fpgasmore datapath logic

synthesizing datapath

logic densityneed

logic clustersubcluster

slices identicaloptimized

tree topologycollapses