high-level synthesis: creating custom circuits from high-level code greg stitt ece department...

52
High-Level Synthesis: Creating Custom Circuits from High- Level Code Greg Stitt ECE Department University of Florida

Upload: daniel-fox

Post on 13-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

High-Level Synthesis: Creating Custom Circuits from High-Level Code

Greg Stitt

ECE Department

University of Florida

Page 2: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Existing FPGA Tool Flow Register-transfer (RT) synthesis

Specify RT structure (muxes, registers, etc) + Allows precise specification - But, time consuming, difficult, error prone

HDL

Netlist

Bitfile

Processor FPGA

RT Synthesis

Physical Design

Technology Mapping

Placement

Routing

Page 3: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Future FPGA Tool Flow?

HDL

Netlist

Bitfile

Processor FPGA

RT Synthesis

Physical Design

Technology Mapping

Placement

Routing

High-level Synthesis

C/C++, Java, etc.

Page 4: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

High-level Synthesis Wouldn’t it be nice to write high-level code?

Ratio of C to VHDL developers (10000:1 ?) + Easier to specify + Separates function from architecture

+ More portable - Hardware potentially slower

Similar to assembly code era Programmers could always beat compiler But, no longer the case

Hopefully, high-level synthesis will catch up to manual effort

Page 5: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

High-level Synthesis More challenging than compilation

Compilation maps behavior into assembly instructions

Architecture is known to compiler High-level synthesis creates a custom architecture

to execute behavior Huge hardware exploration space Best solution may include microprocessors Should handle any high-level code

Not all code appropriate for hardware

Page 6: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

High-level Synthesis

First, consider how to manually convert high-level code into circuit

Steps 1) Build FSM for controller 2) Build datapath based on FSM

acc = 0;for (i=0; i < 128; i++) acc += a[i];

Page 7: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Manual Example

Build a FSM (controller) Decompose code into states

acc = 0;for (i=0; i < 128; i++) acc += a[i];

if (i < 128)

acc=0, i = 0

load a[i]

acc += a[i]

i++

Done

Page 8: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Manual Example

Build a datapath Allocate resources for each state

acci

if (i < 128)

acc=0, i = 0

load a[i]

acc += a[i]

i++

Done

<

addra[i]

++ +

1 128 1

acc = 0;for (i=0; i < 128; i++) acc += a[i];

Page 9: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Manual Example

Build a datapath Determine register inputs

acci

if (i < 128)

acc=0, i = 0

load a[i]

acc += a[i]

i++

Done

<

addra[i]

++ +

1 128

2x1

0

2x1

0

1

2x1

&a

In from memory

acc = 0;for (i=0; i < 128; i++) acc += a[i];

Page 10: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Manual Example

Build a datapath Add outputs

acci

if (i < 128)

acc=0, i = 0

load a[i]

acc += a[i]

i++

Done

<

addra[i]

++ +

1 128

2x1

0

2x1

0

1

2x1

&a

In from memory

Memory addressacc

acc = 0;for (i=0; i < 128; i++) acc += a[i];

Page 11: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Manual Example

Build a datapath Add control signals

acci

if (i < 128)

acc=0, i = 0

load a[i]

acc += a[i]

i++

Done

<

addra[i]

++ +

1 128

2x1

0

2x1

0

1

2x1

&a

In from memory

Memory addressacc

acc = 0;for (i=0; i < 128; i++) acc += a[i];

Page 12: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Manual Example

Combine controller+datapath

acci

<

addra[i]

++ +

1 128

2x1

0

2x1

0

1

2x1

&a

In from memory

Memory addressaccDone Memory Read

Controller

acc = 0;for (i=0; i < 128; i++) acc += a[i];

Page 13: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Manual Example

Alternatives Use one adder (plus muxes)

acci

<

addra[i]

+

128

2x1

0

2x1

0

1

2x1

&a

In from memory

Memory addressacc

MUX MUX

Page 14: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Manual Example

Comparison with high-level synthesis Determining when to perform each

operation => Scheduling

Allocating resource for each operation => Resource allocation

Mapping operations onto resources => Binding

Page 15: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Another Example

Your turnx=0;for (i=0; i < 100; i++) { if (a[i] > 0) x ++; else x --;

a[i] = x;}//output x

Steps1) Build FSM (do not perform if conversion)2) Build datapath based on FSM

Page 16: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

High-Level Synthesis

High-level Code

Custom Circuit

High-Level Synthesis

Could be C, C++, Java, Perl, Python, SystemC, ImpulseC, etc.

Usually a RT VHDL description, but could as low level as a bit file

Page 17: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

High-Level Synthesis

High-Level Synthesis

acc = 0;for (i=0; i < 128; i++) acc += a[i];

acci

<

addra[i]

++ +1 128

2x1

0

2x1

0

1

2x1

&a

In from memory

Memory addressaccDone Memory Read

Controller

Page 18: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Main Steps

Syntactic Analysis

Optimization

Scheduling

Resource Allocation

Binding

High-level Code

Intermediate Representation

Controller + Datapath

Converts code to intermediate representation - allows all following steps to use common format.

Determines when each operation will execute

Determines physical resources to be used

Maps operations onto physical resources

Front-end

Back-end

Page 19: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Syntactic Analysis Definition: Analysis of code to verify syntactic

correctness Converts code into intermediate representation

2 steps 1) Lexical analysis (Lexing) 2) Parsing

Syntactic Analysis

High-level Code

Intermediate Representation

Lexical Analysis

Parsing

Page 20: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Lexical Analysis Lexical analysis (lexing) breaks code into a

series of defined tokens Token: defined language constructs

x = 0;if (y < z) x = 1;

Lexical Analysis

ID(x), ASSIGN, INT(0), SEMICOLON, IF, LPAREN, ID(y), LT, ID(z), RPAREN, ID(x), ASSIGN, INT(1), SEMICOLON

Page 21: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Lexing Tools

Define tokens using regular expressions - outputs C code that lexes input Common tool is “lex”

/* braces and parentheses */"[" { YYPRINT; return LBRACE; }"]" { YYPRINT; return RBRACE; }"," { YYPRINT; return COMMA; }";" { YYPRINT; return SEMICOLON; }"!" { YYPRINT; return EXCLAMATION; }"{" { YYPRINT; return LBRACKET; }"}" { YYPRINT; return RBRACKET; }"-" { YYPRINT; return MINUS; }

/* integers[0-9]+ { yylval.intVal = atoi( yytext ); return INT;}

Page 22: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Parsing

Analysis of token sequence to determine correct grammatical structure Languages defined by context-free

grammar

Program = ExpExp = Stmt SEMICOLON |

IF LPAREN Cond RPAREN Exp |

Exp Exp

Cond = ID Comp ID

Stmt = ID ASSIGN INTx = 0;if (y < z) x = 1;

x = 0; x = 0; y = 1;

if (a < b) x = 10;

if (var1 != var2) x = 10;

x = 0;if (y < z) x = 1; y = 5; t = 1;

GrammarCorrect Programs

Comp = LT | NE

Page 23: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Parsing

Program = ExpExp = S SEMICOLON |

IF LPAREN Cond RPAREN Exp |

Exp Exp

Cond = ID Comp ID

S = ID ASSIGN INT

Grammar

Comp = LT | NE

x = y;

x = 3 + 5;

x = 5;;

if (x+5 > y) x = 2;

Incorrect Programs

x = 5

Page 24: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Parsing Tools Define grammar in special language

Automatically creates parser based on grammar Popular tool is “yacc” - yet-another-compiler-

compiler

program: functions { $$ = $1; } ;

functions: function { $$ = $1; } | functions function { $$ = $1; } ; function: HEXNUMBER LABEL COLON code { $$ = $2; } ;

Page 25: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Intermediate Representation

Parser converts tokens to intermediate representation Usually, an abstract syntax tree

x = 0;if (y < z) x = 1;d = 6;

Assign

if

cond assign assign

x 0

x 1 d 6y < z

Page 26: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Intermediate Representation Why use intermediate representation?

Easier to analyze/optimize than source code Theoretically can be used for all languages

Makes synthesis back end language independent

Syntactic Analysis

C Code

Intermediate Representation

Syntactic Analysis

Java

Syntactic Analysis

Perl

Back End

Scheduling, resource allocation, binding, independent of source language - sometimes optimizations too

Page 27: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Intermediate Representation

Different Types Abstract Syntax Tree Control/Data Flow Graph (CDFG) Sequencing Graph

Etc.

We will focus on CDFG Combines control flow graph (CFG) and

data flow graph (DFG)

Page 28: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Control flow graphs CFG

Represents control flow dependencies of basic blocks

Basic block is section of code that always executes from beginning to end

I.e. no jumps into or out of block

acc = 0;for (i=0; i < 128; i++) acc += a[i];

if (i < 128)

acc=0, i = 0

acc += a[i]i ++

Done

Page 29: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Control flow graphs

Your turn

Create a CFG for this code

i = 0;while (j < 10) { if (x < 5) y = 2; else if (z < 10) y = 6;}

Page 30: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Data Flow Graphs

DFG Represents data dependencies between

operations

x = a+b;y = c*d;z = x - y;

+ *

-

a b c d

x y z

Page 31: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Control/Data Flow Graph

Combines CFG and DFG Maintains DFG for each node of CFG

acc = 0;for (i=0; i < 128; i++) acc += a[i];

if (i < 128)

acc=0; i=0;

acc += a[i]i ++

Done

acc

0

i

0

+

acc a[i]

acc

+

i 1

i

Page 32: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

High-Level Synthesis: Optimization

Page 33: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Synthesis Optimizations After creating CDFG, high-level synthesis

optimizes graph Goals

Reduce area Improve latency Increase parallelism Reduce power/energy

2 types Data flow optimizations Control flow optimizations

Page 34: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Data Flow Optimizations

Tree-height reduction Generally made possible from commutativity, associativity,

and distributivity

+

+

+

+ +

+

a b c da b c d

+

+

*

a b c d

+ *

+

a b c d

Page 35: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Data Flow Optimizations

Operator Strength Reduction Replacing an expensive (“strong”) operation with a faster

one Common example: replacing multiply/divide with shift

b[i] = a[i] * 8; b[i] = a[i] << 3;

a = b * 5; c = b << 2;a = b + c;

1 multiplication 0 multiplications

a = b * 13;c = b << 2;d = b << 3;a = c + d + b;

Page 36: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Data Flow Optimizations

Constant propagation Statically evaluate expressions with

constants

x = 0;y = x * 15;z = y + 10;

x = 0;y = 0;z = 10;

Page 37: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Data Flow Optimizations

Function Specialization Create specialized code for common inputs

Treat common inputs as constants If inputs not known statically, must include if statement for each

call to specialized function

int f (int x) { y = x * 15; return y + 10;}

for (I=0; I < 1000; I++) f(0); …}

int f_opt () { return 10;}

for (I=0; I < 1000; I++) f_opt(0); …}

Treat frequent input as a constant

int f (int x) { y = x * 15; return y + 10;}

Page 38: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Data Flow Optimizations

Common sub-expression elimination If expression appears more than once,

repetitions can be replaced

a = x + y; . . . . . . . . . . . . b = c * 25 + x + y;

a = x + y; . . . . . . . . . . . . b = c * 25 + a;

x + y already determined

Page 39: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Data Flow Optimizations

Dead code elimination Remove code that is never executed

May seem like stupid code, but often comes from constant propagation or function specialization

int f (int x) { if (x > 0 ) a = b * 15; else a = b / 4; return a;}

int f_opt () { a = b * 15; return a;}

Specialized version for x > 0 does not need else branch - “dead code”

Page 40: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Data Flow Optimizations

Code motion (hoisting/sinking) Avoid repeated computation

for (I=0; I < 100; I++) { z = x + y; b[i] = a[i] + z ;}

z = x + y;for (I=0; I < 100; I++) { b[i] = a[i] + z ;}

Page 41: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Control Flow Optimizations

Loop Unrolling Replicate body of loop

May increase parallelism

for (i=0; i < 128; i++) a[I] = b[I] + c[I];

for (i=0; i < 128; i+=2) { a[I] = b[I] + c[I]; a[I+1] = b[I+1] + c[I+1]}

Page 42: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Control Flow Optimizations

Function Inlining Replace function call with body of function

Common for both SW and HW SW - Eliminates instructions/branches HW - Eliminates unnecessary control states

for (i=0; i < 128; i++) a[I] = f( b[I], c[I] );. . . .int f (int a, int b) { return a + b * 15;}

for (i=0; i < 128; i++) a[I] = b[I] + c[I] * 15;

Page 43: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Control Flow Optimizations

Conditional Expansion Replace if with logic expression

Execute if/else bodies in parallel

y = ab;If (a) x = b+d;Else x =bd;

y = ab;x = a(b+d) + a’bd;

y = ab;x = y + d(a+b);

[DeMicheli]

Can be further optimized to:

Page 44: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Example

Optimize this

x = 0;y = a + b;if (x < 15) z = a + b - c;else z = x + 12;output = z * 12;

Page 45: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

High-Level Synthesis:Scheduling

Page 46: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Scheduling

Scheduling assigns a start time to each operation Start times must not violate dependencies

in DFG Start times must meet performance

constraints Alternatively, resource constraints

Page 47: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Examples

+

+

+

+ +

+

a b c da b c d

+ +

+

a b c d

Cycle1

Cycle2

Cycle3

Cycle3

Cycle1 Cycle2

Cycle1

Cycle2

Page 48: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Scheduling Problems Several types of scheduling problems

Usually some combination of performance and resource constraints

Problems: Unconstrained

Not very useful, every schedule is valid Minimum latency Latency constrained Mininum-latency, resource constrained

i.e. Find the scheduling with the shortest latency, that uses less than a specified # of resources

NP-Complete Mininum-resource, latency constrained

i.e. find the schedule that meets the latency constraint (which may be anything), and uses the minimum # of resources

NP-Complete

Page 49: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Minimum Latency Scheduling

ASAP (as soon as possible) algorithm Find a node whose predecessors are scheduled

Schedule node one cycle later than max cycle of predecessor

Repeat until all nodes scheduled

+ +

*

a b c d

*

- <

e f g h

Cycle1

Cycle2

Cycle3

+Cycle4

Minimum possible latency - 4 cycles

Page 50: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Minimum Latency Scheduling ALAP (as late as possible) algorithm

Run ASAP, get minimum latency L Find a node whose successors are scheduled

Schedule node one cycle before than min cycle of predecessor Nodes with no successors scheduled to cycle L

Repeat until all nodes scheduled

+ +

*

a b c d

*

- <

e f g h

Cycle1

Cycle2

Cycle3

+Cycle4

Cycle4

Cycle3

*

Page 51: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Minimum Latency Scheduling ALAP

Has to run ASAP first, seems pointless But, many heuristics need the mobility/slack of

each operation ASAP gives the earliest possible time for an operation ALAP gives the latest possible time for an operation

Slack = difference between earliest and latest possible schedule

Slack = 0 implies operation has to be done in the current scheduled cycle

The larger the slack, the more options a heuristic has to schedule the operation

Page 52: High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida

Latency-Constrained Scheduling

Instead of finding the minimum latency, find latency less than L Solutions:

Use ASAP, verify that minimum latency less than L

Use ALAP starting with cycle L instead of minimum latency (don’t need ASAP)