high-level synthesis: creating custom circuits from high-level code

122
High-Level Synthesis: Creating Custom Circuits from High- Level Code Greg Stitt ECE Department University of Florida

Upload: henry

Post on 20-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

High-Level Synthesis: Creating Custom Circuits from High-Level Code. Greg Stitt ECE Department University of Florida. FPGA. Processor. Existing FPGA Tool Flow. Register-transfer (RT) synthesis Specify RT structure (muxes, registers, etc) + Allows precise specification - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

High-Level Synthesis: Creating Custom Circuits from High-Level Code

Greg Stitt

ECE Department

University of Florida

Page 2: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Existing FPGA Tool Flow Register-transfer (RT) synthesis

Specify RT structure (muxes, registers, etc) + Allows precise specification - But, time consuming, difficult, error prone

HDL

Netlist

Bitfile

Processor FPGA

RT Synthesis

Physical Design

Technology Mapping

Placement

Routing

Page 3: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Future FPGA Tool Flow?

HDL

Netlist

Bitfile

Processor FPGA

RT Synthesis

Physical Design

Technology Mapping

Placement

Routing

High-level Synthesis

C/C++, Java, etc.

Page 4: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

High-level Synthesis Wouldn’t it be nice to write high-level code?

Ratio of C to VHDL developers (10000:1 ?) + Easier to specify + Separates function from architecture

+ More portable - Hardware potentially slower

Similar to assembly code era Programmers could always beat compiler But, no longer the case

Hopefully, high-level synthesis will catch up to manual effort

Page 5: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

High-level Synthesis More challenging than compilation

Compilation maps behavior into assembly instructions

Architecture is known to compiler High-level synthesis creates a custom architecture

to execute behavior Huge hardware exploration space Best solution may include microprocessors Should handle any high-level code

Not all code appropriate for hardware

Page 6: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

High-level Synthesis

First, consider how to manually convert high-level code into circuit

Steps 1) Build FSM for controller 2) Build datapath based on FSM

acc = 0;for (i=0; i < 128; i++) acc += a[i];

Page 7: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Manual Example

Build a FSM (controller) Decompose code into states

acc = 0;for (i=0; i < 128; i++) acc += a[i];

if (i < 128)

acc=0, i = 0

load a[i]

acc += a[i]

i++

Done

Page 8: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Manual Example

Build a datapath Allocate resources for each state

acci

if (i < 128)

acc=0, i = 0

load a[i]

acc += a[i]

i++

Done

<

addra[i]

++ +

1 128 1

acc = 0;for (i=0; i < 128; i++) acc += a[i];

Page 9: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Manual Example

Build a datapath Determine register inputs

acci

if (i < 128)

acc=0, i = 0

load a[i]

acc += a[i]

i++

Done

<

addra[i]

++ +

1 128

2x1

0

2x1

0

1

2x1

&a

In from memory

acc = 0;for (i=0; i < 128; i++) acc += a[i];

Page 10: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Manual Example

Build a datapath Add outputs

acci

if (i < 128)

acc=0, i = 0

load a[i]

acc += a[i]

i++

Done

<

addra[i]

++ +

1 128

2x1

0

2x1

0

1

2x1

&a

In from memory

Memory addressacc

acc = 0;for (i=0; i < 128; i++) acc += a[i];

Page 11: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Manual Example

Build a datapath Add control signals

acci

if (i < 128)

acc=0, i = 0

load a[i]

acc += a[i]

i++

Done

<

addra[i]

++ +

1 128

2x1

0

2x1

0

1

2x1

&a

In from memory

Memory addressacc

acc = 0;for (i=0; i < 128; i++) acc += a[i];

Page 12: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Manual Example

Combine controller+datapath

acci

<

addra[i]

++ +

1 128

2x1

0

2x1

0

1

2x1

&a

In from memory

Memory addressaccDone Memory Read

Controller

acc = 0;for (i=0; i < 128; i++) acc += a[i];

Page 13: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Manual Example

Alternatives Use one adder (plus muxes)

acci

<

addra[i]

+

128

2x1

0

2x1

0

1

2x1

&a

In from memory

Memory addressacc

MUX MUX

Page 14: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Manual Example

Comparison with high-level synthesis Determining when to perform each

operation => Scheduling

Allocating resource for each operation => Resource allocation

Mapping operations onto resources => Binding

Page 15: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Another Example

Your turnx=0;for (i=0; i < 100; i++) { if (a[i] > 0) x ++; else x --;

a[i] = x;}//output x

Steps1) Build FSM (do not perform if conversion)2) Build datapath based on FSM

Page 16: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

High-Level Synthesis

High-level Code

Custom Circuit

High-Level Synthesis

Could be C, C++, Java, Perl, Python, SystemC, ImpulseC, etc.

Usually a RT VHDL description, but could as low level as a bit file

Page 17: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

High-Level Synthesis

High-Level Synthesis

acc = 0;for (i=0; i < 128; i++) acc += a[i];

acci

<

addra[i]

++ +1 128

2x1

0

2x1

0

1

2x1

&a

In from memory

Memory addressaccDone Memory Read

Controller

Page 18: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Main Steps

Syntactic Analysis

Optimization

Scheduling/Resource Allocation

Binding/Resource Sharing

High-level Code

Intermediate Representation

Controller + Datapath

Converts code to intermediate representation - allows all following steps to use language independent format.

Determines when each operation will execute, and resources used

Maps operations onto physical resources

Front-end

Back-end

Page 19: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Syntactic Analysis Definition: Analysis of code to verify syntactic

correctness Converts code into intermediate representation

2 steps 1) Lexical analysis (Lexing) 2) Parsing

Syntactic Analysis

High-level Code

Intermediate Representation

Lexical Analysis

Parsing

Page 20: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Lexical Analysis Lexical analysis (lexing) breaks code into a

series of defined tokens Token: defined language constructs

x = 0;if (y < z) x = 1;

Lexical Analysis

ID(x), ASSIGN, INT(0), SEMICOLON, IF, LPAREN, ID(y), LT, ID(z), RPAREN, ID(x), ASSIGN, INT(1), SEMICOLON

Page 21: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Lexing Tools

Define tokens using regular expressions - outputs C code that lexes input Common tool is “lex”

/* braces and parentheses */"[" { YYPRINT; return LBRACE; }"]" { YYPRINT; return RBRACE; }"," { YYPRINT; return COMMA; }";" { YYPRINT; return SEMICOLON; }"!" { YYPRINT; return EXCLAMATION; }"{" { YYPRINT; return LBRACKET; }"}" { YYPRINT; return RBRACKET; }"-" { YYPRINT; return MINUS; }

/* integers[0-9]+ { yylval.intVal = atoi( yytext ); return INT;}

Page 22: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Parsing

Analysis of token sequence to determine correct grammatical structure Languages defined by context-free

grammar

Program = ExpExp = Stmt SEMICOLON |

IF LPAREN Cond RPAREN Exp |

Exp Exp

Cond = ID Comp ID

Stmt = ID ASSIGN INTx = 0;if (y < z) x = 1;

x = 0; x = 0; y = 1;

if (a < b) x = 10;

if (var1 != var2) x = 10;

x = 0;if (y < z) x = 1; y = 5; t = 1;

GrammarCorrect Programs

Comp = LT | NE

Page 23: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Parsing

Program = ExpExp = S SEMICOLON |

IF LPAREN Cond RPAREN Exp |

Exp Exp

Cond = ID Comp ID

S = ID ASSIGN INT

Grammar

Comp = LT | NE

x = y;

x = 3 + 5;

x = 5;;

if (x+5 > y) x = 2;

Incorrect Programs

x = 5

Page 24: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Parsing Tools Define grammar in special language

Automatically creates parser based on grammar Popular tool is “yacc” - yet-another-compiler-

compiler

program: functions { $$ = $1; } ;

functions: function { $$ = $1; } | functions function { $$ = $1; } ; function: HEXNUMBER LABEL COLON code { $$ = $2; } ;

Page 25: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Intermediate Representation

Parser converts tokens to intermediate representation Usually, an abstract syntax tree

x = 0;if (y < z) x = 1;d = 6;

Assign

if

cond assign assign

x 0

x 1 d 6y < z

Page 26: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Intermediate Representation Why use intermediate representation?

Easier to analyze/optimize than source code Theoretically can be used for all languages

Makes synthesis back end language independent

Syntactic Analysis

C Code

Intermediate Representation

Syntactic Analysis

Java

Syntactic Analysis

Perl

Back End

Scheduling, resource allocation, binding, independent of source language - sometimes optimizations too

Page 27: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Intermediate Representation

Different Types Abstract Syntax Tree Control/Data Flow Graph (CDFG) Sequencing Graph

Etc.

We will focus on CDFG Combines control flow graph (CFG) and

data flow graph (DFG)

Page 28: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Control flow graphs CFG

Represents control flow dependencies of basic blocks

Basic block is section of code that always executes from beginning to end

I.e. no jumps into or out of block

acc = 0;for (i=0; i < 128; i++) acc += a[i];

if (i < 128)

acc=0, i = 0

acc += a[i]i ++

Done

Page 29: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Control flow graphs

Your turn

Create a CFG for this code

i = 0;while (j < 10) { if (x < 5) y = 2; else if (z < 10) y = 6;}

Page 30: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Data Flow Graphs

DFG Represents data dependencies between

operations

x = a+b;y = c*d;z = x - y;

+ *

-

a b c d

x y z

Page 31: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Control/Data Flow Graph

Combines CFG and DFG Maintains DFG for each node of CFG

acc = 0;for (i=0; i < 128; i++) acc += a[i];

if (i < 128)

acc=0; i=0;

acc += a[i]i ++

Done

acc

0

i

0

+

acc a[i]

acc

+

i 1

i

Page 32: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

High-Level Synthesis: Optimization

Page 33: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Synthesis Optimizations After creating CDFG, high-level synthesis

optimizes graph Goals

Reduce area Improve latency Increase parallelism Reduce power/energy

2 types Data flow optimizations Control flow optimizations

Page 34: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Data Flow Optimizations

Tree-height reduction Generally made possible from commutativity, associativity,

and distributivity

+

+

+

+ +

+

a b c da b c d

+

+

*

a b c d

+ *

+

a b c d

Page 35: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Data Flow Optimizations

Operator Strength Reduction Replacing an expensive (“strong”) operation with a faster

one Common example: replacing multiply/divide with shift

b[i] = a[i] * 8; b[i] = a[i] << 3;

a = b * 5; c = b << 2;a = b + c;

1 multiplication 0 multiplications

a = b * 13;c = b << 2;d = b << 3;a = c + d + b;

Page 36: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Data Flow Optimizations

Constant propagation Statically evaluate expressions with

constants

x = 0;y = x * 15;z = y + 10;

x = 0;y = 0;z = 10;

Page 37: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Data Flow Optimizations

Function Specialization Create specialized code for common inputs

Treat common inputs as constants If inputs not known statically, must include if statement for each

call to specialized function

int f (int x) { y = x * 15; return y + 10;}

for (I=0; I < 1000; I++) f(0); …}

int f_opt () { return 10;}

for (I=0; I < 1000; I++) f_opt(0); …}

Treat frequent input as a constant

int f (int x) { y = x * 15; return y + 10;}

Page 38: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Data Flow Optimizations

Common sub-expression elimination If expression appears more than once,

repetitions can be replaced

a = x + y; . . . . . . . . . . . . b = c * 25 + x + y;

a = x + y; . . . . . . . . . . . . b = c * 25 + a;

x + y already determined

Page 39: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Data Flow Optimizations

Dead code elimination Remove code that is never executed

May seem like stupid code, but often comes from constant propagation or function specialization

int f (int x) { if (x > 0 ) a = b * 15; else a = b / 4; return a;}

int f_opt () { a = b * 15; return a;}

Specialized version for x > 0 does not need else branch - “dead code”

Page 40: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Data Flow Optimizations

Code motion (hoisting/sinking) Avoid repeated computation

for (I=0; I < 100; I++) { z = x + y; b[i] = a[i] + z ;}

z = x + y;for (I=0; I < 100; I++) { b[i] = a[i] + z ;}

Page 41: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Control Flow Optimizations

Loop Unrolling Replicate body of loop

May increase parallelism

for (i=0; i < 128; i++) a[i] = b[i] + c[i];

for (i=0; i < 128; i+=2) { a[i] = b[i] + c[i]; a[i+1] = b[i+1] + c[i+1]}

Page 42: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Control Flow Optimizations

Function Inlining Replace function call with body of function

Common for both SW and HW SW - Eliminates function call instructions HW - Eliminates unnecessary control states

for (i=0; i < 128; i++) a[i] = f( b[i], c[i] );. . . .int f (int a, int b) { return a + b * 15;}

for (i=0; i < 128; i++) a[i] = b[i] + c[i] * 15;

Page 43: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Control Flow Optimizations

Conditional Expansion Replace if with logic expression

Execute if/else bodies in parallel

y = abif (a) x = b+delse x =bd

y = abx = a(b+d) + a’bd

y = abx = y + d(a+b)

[DeMicheli]

Can be further optimized to:

Page 44: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Example

Optimize this

x = 0;y = a + b;if (x < 15) z = a + b - c;else z = x + 12;output = z * 12;

Page 45: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

High-Level Synthesis:Scheduling/Resource Allocation

Page 46: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Scheduling Scheduling assigns a start time to each

operation in DFG Start times must not violate dependencies in DFG Start times must meet performance constraints

Alternatively, resource constraints

Performed on the DFG of each CFG node => Can’t execute multiple CFG nodes in parallel

Page 47: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Examples

+

+

+

+ +

+

a b c da b c d

+ +

+

a b c d

Cycle1

Cycle2

Cycle3

Cycle3

Cycle1 Cycle2

Cycle1

Cycle2

Page 48: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Scheduling Problems Several types of scheduling problems

Usually some combination of performance and resource constraints

Problems: Unconstrained

Not very useful, every schedule is valid Minimum latency Latency constrained Mininum-latency, resource constrained

i.e. find the schedule with the shortest latency, that uses less than a specified # of resources

NP-Complete Mininum-resource, latency constrained

i.e. find the schedule that meets the latency constraint (which may be anything), and uses the minimum # of resources

NP-Complete

Page 49: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Minimum Latency Scheduling ASAP (as soon as possible) algorithm

Find a candidate node Candidate is a node whose predecessors have been scheduled and

completed (or has no predecessors) Schedule node one cycle later than max cycle of predecessor Repeat until all nodes scheduled

+ +

*

a b c d

*

- <

e f g h

Cycle1

Cycle2

Cycle3

+Cycle4

Minimum possible latency - 4 cycles

Page 50: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Minimum Latency Scheduling ALAP (as late as possible) algorithm

Run ASAP, get minimum latency L Find a candidate

Candidate is node whose successors are scheduled (or has none) Schedule node one cycle before min cycle of predecessor

Nodes with no successors scheduled to cycle L Repeat until all nodes scheduled

+ +

*

a b c d

*

- <

e f g h

Cycle1

Cycle2

Cycle3

+Cycle4

Cycle4

Cycle3

L = 4 cycles

Page 51: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Minimum Latency Scheduling ALAP (as late as possible) algorithm

Run ASAP, get minimum latency L Find a candidate

Candidate is node whose successors are scheduled (or has none) Schedule node one cycle before min cycle of predecessor

Nodes with no successors scheduled to cycle L Repeat until all nodes scheduled

+ +

*

a b c d

* -

<

e f g h

Cycle1

Cycle2

Cycle3

+Cycle4

L = 4 cycles

Page 52: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Minimum Latency Scheduling ALAP

Has to run ASAP first, seems pointless But, many heuristics need the mobility/slack of

each operation ASAP gives the earliest possible time for an operation ALAP gives the latest possible time for an operation

Slack = difference between earliest and latest possible schedule

Slack = 0 implies operation has to be done in the current scheduled cycle

The larger the slack, the more options a heuristic has to schedule the operation

Page 53: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Latency-Constrained Scheduling

Instead of finding the minimum latency, find latency less than L Solutions:

Use ASAP, verify that minimum latency less than L

Use ALAP starting with cycle L instead of minimum latency (don’t need ASAP)

Page 54: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Scheduling with Resource Constraints

Schedule must use less than specified number of resources

+ +

*

a b c d

+

-

e f g

Cycle1

Cycle3

Cycle4

+Cycle5

*

Cycle2

Constraints: 1 ALU (+/-), 1 Multiplier

Page 55: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Scheduling with Resource Constraints

Schedule must use less than specified number of resources

+ +

*

a b c d

+

-

e f g

Cycle1

Cycle2

Cycle3

+Cycle4

*

Constraints: 2 ALU (+/-), 1 Multiplier

Page 56: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Mininum-Latency, Resource-Constrained Scheduling

Definition: Given resource constraints, find schedule that has the minimum latency Example:

+ +

+

a b c d

-

e f g

Cycle1

Cycle3

Cycle6+

*

Cycle2

Constraints: 1 ALU (+/-), 1 Multiplier

Cycle4

Cycle5

Page 57: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Mininum-Latency, Resource-Constrained Scheduling

Definition: Given resource constraints, find schedule that has the minimum latency Example:

+ +

+

a b c d

-

e f g

Cycle1

Cycle4

Cycle5+

*

Cycle2

Constraints: 1 ALU (+/-), 1 Multiplier

Cycle3

Different schedules may use same resources, but have different latencies

Page 58: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Mininum-Latency, Resource-Constrained Scheduling

Hu’s Algorithm Assumes one type of resource

Basic Idea Input: graph, # of resources r 1) Label each node by max distance from output

i.e. Use path length as priority 2) Determine C, the set of scheduling candidates

Candidate if either no predecessors, or predecessors scheduled

3) From C, schedule up to r nodes to current cycle, using label as priority

4) Increment current cycle, repeat from 2) until all nodes scheduled

Page 59: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Mininum-Latency, Resource-Constrained Scheduling

Hu’s Algorithm Example

+ +

*

a b c d

+

-

e f g

+

*

j k

-

r = 3

Page 60: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Mininum-Latency, Resource-Constrained Scheduling

Hu’s Algorithm Step 1 - Label each node by max distance

from output i.e. use path length as priority

a b c d e f g

4 4 3

23

2

1

j k

1

r = 3

Page 61: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Mininum-Latency, Resource-Constrained Scheduling

Hu’s Algorithm Step 2 - Determine C, the set of scheduling

candidates

a b c d e f g

4 4 3

23

2

1

j k

1C =

r = 3Cycle 1

Page 62: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Mininum-Latency, Resource-Constrained Scheduling

Hu’s Algorithm Step 3 - From C, schedule up to r nodes to

current cycle, using label as priority

a b c d e f g

4 4 3

23

2

1

j k

1

r = 3

Not scheduled due to lower priority

Cycle1

Cycle 1

Page 63: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Mininum-Latency, Resource-Constrained Scheduling

Hu’s Algorithm Step 2

a b c d e f g

4 4 3

23

2

1

j k

1

r = 3

C =

Cycle1

Cycle 2

Page 64: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Mininum-Latency, Resource-Constrained Scheduling

Hu’s Algorithm Step 3

a b c d e f g

4 4 3

23

2

1

j k

1

r = 3

Cycle1

Cycle2

Cycle 2

Page 65: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Mininum-Latency, Resource-Constrained Scheduling

Hu’s Algorithm Skipping to finish

a b c d e f g

4 4 3

23

2

1

j k

1

r = 3

Cycle1

Cycle2

Cycle3

Cycle4

Page 66: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Mininum-Latency, Resource-Constrained Scheduling

Hu’s is simplified problem Common Extensions:

Multiple resource types Multi-cycle operation

a b c d

+ -

/

*Cycle1

Cycle2

Page 67: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Mininum-Latency, Resource-Constrained Scheduling

List Scheduling - (minimum latency, resource-constrained version) Extension for multiple resource types Basic Idea - Hu’s algorithm for each resource type

Input: graph, set of constraints R for each resource type 1) Label nodes based on max distance to output 2) For each resource type t

3) Determine candidate nodes, C (those w/ no predecessors or w/ scheduled predecessors )

4) Schedule up to Rt operations from C based on priority, to current cycle

Rt is the constraint on resource type t 3) Increment cycle, repeat from 2) until all nodes

scheduled

Page 68: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Mininum-Latency, Resource-Constrained Scheduling

List scheduling - minimum latency Step 1 - Label nodes based on max distance to

output (not shown, so you can see operations) *nodes given IDs for illustration purposes

a b c d e f g

* + +

**

+

-

j k

-1 2 3 4

5 6

78

2 ALUs (+/-), 2 Multipliers

Page 69: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Mininum-Latency, Resource-Constrained Scheduling

List scheduling - minimum latency For each resource type t

3) Determine candidate nodes, C (those w/ no predecessors or w/ scheduled predecessors)

4) Schedule up to Rt operations from C based on priority, to current cycle

Rt is the constraint on resource type t

a b c d e f g

* + +

**

+

-

j k

-1 2 3 4

5 6

78

Mult ALU Cycle

1 2,3,4 1

2 ALUs (+/-), 2 Multipliers Candidates

Page 70: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Mininum-Latency, Resource-Constrained Scheduling

List scheduling - minimum latency For each resource type t

3) Determine candidate nodes, C (those w/ no predecessors or w/ scheduled predecessors)

4) Schedule up to Rt operations from C based on priority, to current cycle

Rt is the constraint on resource type t

a b c d e f g

* + +

**

+

-

j k

-1 2 3 4

5 6

78

Mult ALU Cycle

1 2,3,4 1

2 ALUs (+/-), 2 Multipliers Candidates

Cycle1Candidate, but not scheduled due to low priority

Page 71: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Mininum-Latency, Resource-Constrained Scheduling

List scheduling - minimum latency For each resource type t

3) Determine candidate nodes, C (those w/ no predecessors or w/ scheduled predecessors)

4) Schedule up to Rt operations from C based on priority, to current cycle

Rt is the constraint on resource type t

a b c d e f g

* + +

**

+

-

j k

-1 2 3 4

5 6

78

Mult ALU Cycle

1 2,3,4 1

5,6 4 2

2 ALUs (+/-), 2 Multipliers Candidates

Cycle1

Page 72: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Mininum-Latency, Resource-Constrained Scheduling

List scheduling - minimum latency For each resource type t

3) Determine candidate nodes, C (those w/ no predecessors or w/ scheduled predecessors)

4) Schedule up to Rt operations from C based on priority, to current cycle

Rt is the constraint on resource type t

a b c d e f g

* + +

**

+

-

j k

-1 2 3 4

5 6

78

Mult ALU Cycle

1 2,3,4 1

5,6 4 2

2 ALUs (+/-), 2 Multipliers Candidates

Cycle1

Cycle2

Page 73: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Mininum-Latency, Resource-Constrained Scheduling

List scheduling - minimum latency For each resource type t

3) Determine candidate nodes, C (those w/ no predecessors or w/ scheduled predecessors)

4) Schedule up to Rt operations from C based on priority, to current cycle

Rt is the constraint on resource type t

a b c d e f g

* + +

**

+

-

j k

-1 2 3 4

5 6

78

Mult ALU Cycle

1 2,3,4 1

5,6 4 2

7 3

2 ALUs (+/-), 2 Multipliers Candidates

Cycle1

Cycle2

Page 74: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Mininum-Latency, Resource-Constrained Scheduling

List scheduling - (minimum latency) Final schedule Note - ASAP would require more resources

ALAP wouldn’t but in general, it would

a b c d e f g

* + +

**

+

-

j k

-

1 2 3

45 6

78

Cycle1

Cycle2

Cycle3

Cycle4

2 ALUs (+/-), 2 Multipliers

Page 75: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Mininum-Latency, Resource-Constrained Scheduling

Extension for multicycle operations Same idea (differences shown in red) Input: graph, set of constraints R for each resource type 1) Label nodes based on max cycle latency to output 2) For each resource type t

3) Determine candidate nodes, C (those w/ no predecessors or w/ scheduled and completed predecessors)

4) Schedule up to (Rt - nt) operations from C based on priority, one cycle after predecessor

Rt is the constraint on resource type t nt is the number of resource t in use from previous cycles

Repeat from 2) until all nodes scheduled

Page 76: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Mininum-Latency, Resource-Constrained Scheduling

Example:

a b c d e f g

*+ +

*

*

+

-

j k

*

1 2 3

4

5

6

78

Cycle1

Cycle2

Cycle5

Cycle6

2 ALUs (+/-), 2 Multipliers

Cycle4

Cycle3

Cycle7

Page 77: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

List Scheduling (Min Latency) Your turn (2 ALUs, 1 Mult)

Steps (will be on test) 1) Label nodes with priority 2) Update candidate list for each cycle 3) Redraw graph to show schedule

+ * * -

+

+

1 2 3

7

4

9-

+ +

*

5 6

8

-10

11

Page 78: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

List Scheduling (Min Latency)

Your turn (2 ALUs, 1 Mult, Mults take 2 cycles)

a b c d e f g

+ + * *

*

+

1 2 3

5

4

7

Page 79: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Minimum-Resource, Latency-Constrained

Note that if no resource constraints given, schedule determines number of required resources Max # of each resource type used in a single cycle

+ +

*

a b c d

+

-

e f g

Cycle1

Cycle2

Cycle3

+Cycle4

*

3 ALUs

2 Mults

Page 80: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Minimum-Resource, Latency-Constrained

Minimum-Resource Latency-Constrained Scheduling: For all schedules that have latency less than the

constraint, find the one that uses the fewest resources

+ +

*

a b c d

+

-

e f g

Cycle1

Cycle2

Cycle3+Cycle4

*+ +

*

a b c d

+

-

e f g

Cycle1

Cycle2

Cycle3

+Cycle4

*

3 ALUs, 2 Mult 2 ALUs, 1 Mult

Latency Constraint <= 4 Latency Constraint <= 4

Page 81: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Minimum-Resource, Latency-Constrained

List scheduling (Minimum resource version) Basic Idea

1) Compute latest start times for each op using ALAP with specified latency constraint

Latest start times must include multicycle operations 2) For each resource type

3) Determine candidate nodes 4) Compute slack for each candidate

Slack = current cycle - latest possible cycle 5) Schedule ops with 0 slack

Update required number of resources (assume 1 of each to start with)

6) Schedule ops that require no extra resources 7) Repeat from 2) until all nodes scheduled

Page 82: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Minimum-Resource, Latency-Constrained

1) Find ALAP schedulea b c d e f g

* + +

**

-

j k

-1 2 3 4

5 6

7

Node LPC

1 1

2 1

3 1

4 3

5 2

6 2

7 3

Last Possible Cycle

a b c d e f g

* + +

**

-

j k

-

1 2 3

45 6

7

Cycle1

Cycle2

Cycle3

Defines last possible cycle for each operation

Latency Constraint = 3 cycles

Page 83: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Minimum-Resource, Latency-Constrained

2) For each resource type 3) Determine candidate nodes C 4) Compute slack for each candidate

Slack = current cycle - latest possible cycle

Node LPC

1 1

2 1

3 1

4 3

5 2

6 2

7 3

SlackCandidates = {1,2,3,4} Cycle

0

0

0

2

Cycle 1

a b c d e f g

* + +

**

-

j k

-1 2 3 4

5 6

7

Initial Resources = 1 Mult, 1 ALU

Page 84: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Minimum-Resource, Latency-Constrained

5)Schedule ops with 0 slack Update required number of resources

6) Schedule ops that require no extra resources

Node LPC

1 1

2 1

3 1

4 3

5 2

6 2

7 3

SlackCandidates = {1,2,3,4}

Cycle

0 1

0 1

0 1

2 X

Resources = 1 Mult, 2 ALU

4 requires 1 more ALU - not scheduled

Cycle 1

a b c d e f g

* + +

**

-

j k

-1 2 3 4

5 6

7

Page 85: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Minimum-Resource, Latency-Constrained

2)For each resource type 3) Determine candidate nodes C 4) Compute slack for each candidate

Slack = current cycle - latest possible cycle

Node LPC

1 1

2 1

3 1

4 3

5 2

6 2

7 3

SlackCandidates = {4,5,6}

Cycle

1

1

1

1

0

0

Resources = 1 Mult, 2 ALU

Cycle 2

a b c d e f g

* + +

**

-

j k

-1 2 3 4

5 6

7

Page 86: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Minimum-Resource, Latency-Constrained

5)Schedule ops with 0 slack Update required number of resources

6) Schedule ops that require no extra resources

Node LPC

1 1

2 1

3 1

4 3

5 2

6 2

7 3

SlackCandidates = {4,5,6}

Cycle

1

1

1

1 2

0 2

0 2

Resources = 2 Mult, 2 ALU

Cycle 2

Already 1 ALU - 4 can be scheduled

a b c d e f g

* + +

**

-

j k

-1 2 3 4

5 6

7

Page 87: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Minimum-Resource, Latency-Constrained

2)For each resource type 3) Determine candidate nodes C 4) Compute slack for each candidate

Slack = current cycle - latest possible cycle

Node LPC

1 1

2 1

3 1

4 3

5 2

6 2

7 3

SlackCandidates = {7}

Cycle

1

1

1

2

2

2

0

Resources = 2 Mult, 2 ALU

Cycle 3

a b c d e f g

* + +

**

-

j k

-1 2 3 4

5 6

7

Page 88: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Minimum-Resource, Latency-Constrained

Final Schedule

Node LPC

1 1

2 1

3 1

4 3

5 2

6 2

7 3

Slack Cycle

1

1

1

2

2

2

3

a b c d e f g

* + +

**

-

j k

-1 2 3

45 6

7

Cycle1

Cycle2

Cycle3

Required Resources = 2 Mult, 2 ALU

Page 89: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Other extensions Chaining

Multiple operations in a single cycle

Pipelining Input: DFG, data delivery rate For fully pipelined circuit, must have one resource

per operation (remember systolic arrays)

a b c d

+ +

+

-

e f

/Multiple adds may be faster than 1 divide - perform adds in one cycle

Page 90: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Summary Scheduling assigns each operation in a DFG a

start time Done for each DFG in the CDFG

Different Types Minimum Latency

ASAP, ALAP Latency-constrained

ASAP, ALAP Minimum-latency, resource-constrained

Hu’s Algorithm List Scheduling

Minimum-resource, latency-constrained List Scheduling

Page 91: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

High-level Synthesis: Binding/Resource Sharing

Page 92: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Binding

During scheduling, we determined: When ops will execute How many resources are needed

We still need to decide which ops execute on which resources => Binding If multiple ops use the same resource

=>Resource Sharing

Page 93: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Binding

Basic Idea - Map operations onto resources such that operations in same cycle don’t use same resource

* + +

**

+

-

-

1 2 3

45 6

78

Cycle1

Cycle2

Cycle3

Cycle4

2 ALUs (+/-), 2 Multipliers

Mult1 ALU1 ALU2 Mult2

Page 94: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Binding

Many possibilities Bad binding may increase resources, require huge

steering logic, reduce clock, etc.

* + +

**

+

-

-

1 2 3

45 6

78

Cycle1

Cycle2

Cycle3

Cycle4

2 ALUs (+/-), 2 Multipliers

Mult1 ALU1 ALU2Mult2

Page 95: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Binding

Can’t do this 1 resource can’t perform multiple ops

simultaneously!

* + +

**

+

-

-

1 2 3

45 6

78

Cycle1

Cycle2

Cycle3

Cycle4

2 ALUs (+/-), 2 Multipliers

Page 96: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Binding How to automate?

More graph theory Compatibility Graph

Each node is an operation Edges represent compatible operations

Compatible - if two ops can share a resource I.e. Ops that use same type of resource (ALU, etc.)

and are scheduled to different cycles

Page 97: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Compatibility Graph

* + +

**

+

-

-

1 2 3

45 6

78

Cycle1

Cycle2

Cycle3

Cycle4

2 8

7 4

3

1 6

5

ALUs Mults

5 and 6 not compatible (same cycle)2 and 3 not

compatible (same cycle)

Page 98: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Compatibility Graph

* + +

**

+

-

-

1 2 3

45 6

78

Cycle1

Cycle2

Cycle3

Cycle4

2 8

7 4

3

1 6

5

ALUs Mults

Note - Fully connected subgraphs can share a resource (all involved nodes are compatible)

Page 99: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Compatibility Graph

* + +

**

+

-

-

1 2 3

45 6

78

Cycle1

Cycle2

Cycle3

Cycle4

2 8

7 4

3

1 6

5

ALUs Mults

Note - Fully connected subgraphs can share a resource (all involved nodes are compatible)

Page 100: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Compatibility Graph

* + +

**

+

-

-

1 2 3

45 6

78

Cycle1

Cycle2

Cycle3

Cycle4

2 8

7 4

3

1 6

5

ALUs Mults

Note - Fully connected subgraphs can share a resource (all involved nodes are compatible)

Page 101: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Compatibility Graph Binding: Find minimum number of fully

connected subgraphs that cover entire graph Well-known problem: Clique partitioning (NP-

complete)

Cliques = { {2,8,7,4},{3},{1,5},{6} } ALU1 executes 2,8,7,4 ALU2 executes 3 MULT1 executes 1,5 MULT2 executes 6

2 8

7 4

3

1 6

5

Page 102: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Compatibility Graph

* + +

**

+

-

-

1 2 3

45 6

78

Cycle1

Cycle2

Cycle3

Cycle4

2 8

7 4

3

1 6

5

ALUs Mults

Final Binding:

Page 103: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Compatibility Graph

* + +

**

+

-

-

1 2 3

45 6

78

Cycle1

Cycle2

Cycle3

Cycle4

2 8

7 4

3

1 6

5

ALUs Mults

Alternative Final Binding:

Page 104: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Translation to Datapath

* + +

**

+

-

-

1 2 3

45 6

78

Cycle1

Cycle2

Cycle3

Cycle4

Mult(1,5) Mult(6)ALU(2,7,8,4) ALU(3)

Mux Mux

Reg

Mux Mux

a b c h

Reg RegReg

a b c d e f g h i

d e i g e f

1) Add resources and registers

2) Add mux for each input

3) Add input to left mux for each left input in DFG

4) Do same for right mux

5) If only 1 input, remove mux

Page 105: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Left Edge Algorithm Alternative to clique partitioning

Take scheduled DFG, rotate it 90 degrees

2 ALUs (+/-), 2 Multipliers

a b f g

*+ +

*

*

+

-

j k

*

1 2 3

4

5

6

78

Cycle1

Cycle2

Cycle5

Cycle6

Cycle4

Cycle3

Cycle7

c d e

Page 106: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Left Edge Algorithm

2 ALUs (+/-), 2 Multipliers*

++

*

*

+

-

*

12

3

4

5

6

7

8

Cyc

le1

Cyc

le2

Cyc

le5

Cyc

le6

Cyc

le4

Cyc

le3

Cyc

le7

1) Initialize right_edge to 0

2) Find a node N whose left edge is >= right_edge

3) Bind N to a particular resource

4) Update right_edge to the right edge of N

5) Repeat from 2) for nodes using the same resource type until right_edge passes all nodes

6) Repeat from 1) until all nodes bound

right_edge

Page 107: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Left Edge Algorithm

2 ALUs (+/-), 2 Multipliers*

++

*

*

+

-

*

12

3

4

5

6

7

8

Cyc

le1

Cyc

le2

Cyc

le5

Cyc

le6

Cyc

le4

Cyc

le3

Cyc

le7

1) Initialize right_edge to 0

2) Find a node N whose left edge is >= right_edge

3) Bind N to a particular resource

4) Update right_edge to the right edge of N

5) Repeat from 2) for nodes using the same resource type until right_edge passes all nodes

6) Repeat from 1) until all nodes bound

right_edge

Page 108: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Left Edge Algorithm

2 ALUs (+/-), 2 Multipliers*

++

*

*

+

-

*

12

3

4

5

6

7

8

Cyc

le1

Cyc

le2

Cyc

le5

Cyc

le6

Cyc

le4

Cyc

le3

Cyc

le7

1) Initialize right_edge to 0

2) Find a node N whose left edge is >= right_edge

3) Bind N to a particular resource

4) Update right_edge to the right edge of N

5) Repeat from 2) for nodes using the same resource type until right_edge passes all nodes

6) Repeat from 1) until all nodes bound

right_edge

Page 109: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Left Edge Algorithm

2 ALUs (+/-), 2 Multipliers*

++

*

*

+

-

*

12

3

4

5

6

7

8

Cyc

le1

Cyc

le2

Cyc

le5

Cyc

le6

Cyc

le4

Cyc

le3

Cyc

le7

1) Initialize right_edge to 0

2) Find a node N whose left edge is >= right_edge

3) Bind N to a particular resource

4) Update right_edge to the right edge of N

5) Repeat from 2) for nodes using the same resource type until right_edge passes all nodes

6) Repeat from 1) until all nodes bound

right_edge

Page 110: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Left Edge Algorithm

2 ALUs (+/-), 2 Multipliers*

++

*

*

+

-

*

12

3

4

5

6

7

8

Cyc

le1

Cyc

le2

Cyc

le5

Cyc

le6

Cyc

le4

Cyc

le3

Cyc

le7

1) Initialize right_edge to 0

2) Find a node N whose left edge is >= right_edge

3) Bind N to a particular resource

4) Update right_edge to the right edge of N

5) Repeat from 2) for nodes using the same resource type until right_edge passes all nodes

6) Repeat from 1) until all nodes bound

right_edge

Page 111: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Left Edge Algorithm

2 ALUs (+/-), 2 Multipliers*

++

*

*

+

-

*

12

3

4

5

6

7

8

Cyc

le1

Cyc

le2

Cyc

le5

Cyc

le6

Cyc

le4

Cyc

le3

Cyc

le7

1) Initialize right_edge to 0

2) Find a node N whose left edge is >= right_edge

3) Bind N to a particular resource

4) Update right_edge to the right edge of N

5) Repeat from 2) for nodes using the same resource type until right_edge passes all nodes

6) Repeat from 1) until all nodes bound

right_edge

Page 112: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Left Edge Algorithm

2 ALUs (+/-), 2 Multipliers*

++

*

*

+

-

*

12

3

4

5

6

7

8

Cyc

le1

Cyc

le2

Cyc

le5

Cyc

le6

Cyc

le4

Cyc

le3

Cyc

le7

1) Initialize right_edge to 0

2) Find a node N whose left edge is >= right_edge

3) Bind N to a particular resource

4) Update right_edge to the right edge of N

5) Repeat from 2) for nodes using the same resource type until right_edge passes all nodes

6) Repeat from 1) until all nodes bound

right_edge

Page 113: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Left Edge Algorithm

2 ALUs (+/-), 2 Multipliers*

++

*

*

+

-

*

12

3

4

5

6

7

8

Cyc

le1

Cyc

le2

Cyc

le5

Cyc

le6

Cyc

le4

Cyc

le3

Cyc

le7

1) Initialize right_edge to 0

2) Find a node N whose left edge is >= right_edge

3) Bind N to a particular resource

4) Update right_edge to the right edge of N

5) Repeat from 2) for nodes using the same resource type until right_edge passes all nodes

6) Repeat from 1) until all nodes bound

right_edge

Page 114: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Left Edge Algorithm

2 ALUs (+/-), 2 Multipliers*

++

*

*

+

-

*

12

3

4

5

6

7

8

Cyc

le1

Cyc

le2

Cyc

le5

Cyc

le6

Cyc

le4

Cyc

le3

Cyc

le7

1) Initialize right_edge to 0

2) Find a node N whose left edge is >= right_edge

3) Bind N to a particular resource

4) Update right_edge to the right edge of N

5) Repeat from 2) for nodes using the same resource type until right_edge passes all nodes

6) Repeat from 1) until all nodes bound

right_edge

Page 115: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Extensions Algorithms presented so far find a valid

binding But, do not consider amount of steering logic

required Different bindings can require significantly different

# of muxes One solution

Extend compatibility graph Use weighted edges/nodes - cost function representing

steering logic Perform clique partitioning, finding the set of cliques that

minimize weight

Page 116: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Binding Summary

Binding maps operations onto physical resources Determines sharing among resources

Binding may greatly affect steering logic Trivial for fully-pipelined circuits

1 resource per operation Straightforward translation from bound

DFG to datapath

Page 117: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

High-level Synthesis:Summary

Page 118: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Main Steps Front-end (lexing/parsing) converts code into intermediate

representation We looked at CDFG

Scheduling assigns a start time for each operation in DFG CFG node start times defined by control dependencies Resource allocation determined by schedule

Binding maps scheduled operations onto physical resources Determines how resources are shared

Big picture: Scheduled/Bound DFG can be translated into a datapath CFG can be translated to a controller => High-level synthesis can create a custom circuit for any CDFG!

Page 119: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Limitations Pipeline parallelism

Discussed techniques only exploit arithmetic and bit level parallelism

Not much potential for speedup How to create pipelined circuit?

Difficult for general code Must try to transform into known “templates”

i.e. if conversion, loop fusion, etc. Not always possible/practical

Existing tools Academic

ROCCC - Riverside Optimizing Compiler for Configurable Computing SPARK

Commercial Catapult C (Mentor Graphics) Celoxica DK Design Suite

Page 120: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Limitations Task-level parallelism

Parallelism in CDFG limited to individual control states

Can’t have multiple states executing concurrently Potential solution: use model other than CDFG

Kahn Process Networks Nodes represents parallel processes/tasks Edges represent communication between processes

Discussed techniques can create a controller+datapath for each process

Must also consider communication buffers Challenge:

Most high-level code does not have explicit parallelism Difficult/impossible to extract task-level parallelism from code

Page 121: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Limitations Coding practices limit circuit performance

Very often, languages contain constructs not appropriate for circuit implementation

Recursion, pointers, virtual functions, etc.

Potential solution: use specialized languages Remove problematic constructs, add task-level

parallelism Challenges:

Difficult to learn new languages Many designers resist changes to tool flow

Page 122: High-Level Synthesis:       Creating Custom Circuits from High-Level Code

Limitations Expert designers can achieve better circuits

High-level synthesis has to work with specification in code

Can be difficult to automatically create efficient pipeline May require dozens of optimizations applied in a

particular order Expert designer can transform algorithm

Synthesis can transform code, but can’t change algorithm

Potential Solution: ??? New language? New methodology? New tools?