a parallel 'for' loop memory template for a high level synthesis compiler
Post on 04-Jul-2015
1.376 Views
Preview:
DESCRIPTION
TRANSCRIPT
A parallel for loop memory templatefor a high level synthesis compiler
Euromicro Conference on Digital System Design
Lille, France02/09/2010
Craig MooreWim Meeus, Harald Devos, and Dirk Stroobandt
30/06/2010 Craig Moore, DSD 02/09/2010 2
Outline
● High Level Synthesis● Hardware Development● External Memory● Burst memory transfers● Parallel For Loops● Memory Template Overview● Small Example● Future Work● Conclusions
30/06/2010 Craig Moore, DSD 02/09/2010 3
High Level Synthesis (HLS)Missing Pieces
30/06/2010 Craig Moore, DSD 02/09/2010 4
HLS Missing Pieces
30/06/2010 Craig Moore, DSD 02/09/2010 5
HLS Missing Pieces
30/06/2010 Craig Moore, DSD 02/09/2010 6
Memory Templatesas Tools
● HDL Programmers have:● Toolkit of memory designs● Use the right tool for the job● Manually adapt their designs
● HLS Compilers should:● Have a toolkit of templates● Adapt the template to the app● Evaluate each template● Suggest the best template
30/06/2010 Craig Moore, DSD 02/09/2010 7
1) Read values from memory2) Process each value3) Store output in memory
Basic Steps for any Algorithm
for (int i = start; i < end; i++){ b[i] = func(a[i]);}
30/06/2010 Craig Moore, DSD 02/09/2010 8
Implement on Hardware
30/06/2010 Craig Moore, DSD 02/09/2010 9
External Memoryfor FPGAs
● A bottle neck● Sequential in nature● Number of values
returned each cycle depends on bus width.
● Each memory request requires a handshake
30/06/2010 Craig Moore, DSD 02/09/2010 10
Adapting to the Bottleneck
● Stream values from memory
● Pre-fetch values● Read/Write more than
one value each clock cycle
● Store values locally to mask latency
● Reduce number of requests
30/06/2010 Craig Moore, DSD 02/09/2010 11
Burst Transfers
● Burst of consecutive memory operations
30/06/2010 Craig Moore, DSD 02/09/2010 12
Read Transfer Start Address: 3
Transfer: 4
Burst Transfers
● Burst of consecutive memory operations
0
1
4
2
5
3
6
30/06/2010 Craig Moore, DSD 02/09/2010 13
Read Transfer Start Address: 3
Transfer: 4
Burst Transfers
● Burst of consecutive memory operations
0
1
4
2
5
3
6
30/06/2010 Craig Moore, DSD 02/09/2010 14
Read Transfer Start Address: 3
Transfer: 4
Burst Transfers
● Burst of consecutive memory operations
0
1
4
2
5
3
6
30/06/2010 Craig Moore, DSD 02/09/2010 15
Read Transfer Start Address: 3
Transfer: 4
Burst Transfers
● Burst of consecutive memory operations
0
1
4
2
5
3
6
30/06/2010 Craig Moore, DSD 02/09/2010 16
Read Transfer Start Address: 3
Transfer: 4
Burst Transfers
● Burst of consecutive memory operations
0
1
4
2
5
3
6
30/06/2010 Craig Moore, DSD 02/09/2010 17
Write Transfer Start Address: 2
Transfer: 5
Burst Transfers
● Burst of consecutive memory operations
0
1
4
2
5
3
6
30/06/2010 Craig Moore, DSD 02/09/2010 18
Write Transfer Start Address: 2
Transfer: 5
Burst Transfers
● Burst of consecutive memory operations
0
1
4
2
5
3
6
30/06/2010 Craig Moore, DSD 02/09/2010 19
Write Transfer Start Address: 2
Transfer: 5
Burst Transfers
● Burst of consecutive memory operations
0
1
4
2
5
3
6
30/06/2010 Craig Moore, DSD 02/09/2010 20
Write Transfer Start Address: 2
Transfer: 5
Burst Transfers
● Burst of consecutive memory operations
0
1
4
2
5
3
6
30/06/2010 Craig Moore, DSD 02/09/2010 21
Write Transfer Start Address: 2
Transfer: 5
Burst Transfers
● Burst of consecutive memory operations
0
1
4
2
5
3
6
30/06/2010 Craig Moore, DSD 02/09/2010 22
Write Transfer Start Address: 2
Transfer: 5
Burst Transfers
● Burst of consecutive memory operations
0
1
4
2
5
3
6
30/06/2010 Craig Moore, DSD 02/09/2010 23
Parallel for Loop
● Each iteration is run in parallel● No loop dependencies
● Loop Transformations to remove them
for i = 1 to 4{ a(i) = a(i) + 1 b(i) = a(i – 1) + a(i + 1)}
Example with Dependencies
30/06/2010 Craig Moore, DSD 02/09/2010 24
Template Overview
30/06/2010 Craig Moore, DSD 02/09/2010 25
Template Overview
Requests read bursts and controls execution of data paths, waits foroutput buffer if it is full
30/06/2010 Craig Moore, DSD 02/09/2010 26
Template Overview
Non-pipelined loop bodies executing in parallel.
30/06/2010 Craig Moore, DSD 02/09/2010 27
Manual Design
With enough values, performs write bursts.
30/06/2010 Craig Moore, DSD 02/09/2010 28
Manual Design
Starts and stops execution
30/06/2010 Craig Moore, DSD 02/09/2010 29
Manual Design
Controls access to memory, grants permission based on request (output buffer priority)
30/06/2010 Craig Moore, DSD 02/09/2010 30
Manual Design
Controls access to memory, grants permission based on request (output buffer priority)
Starts and stops execution With enough values, performs write bursts.
Non-pipelined loop bodies executing in parallel.
Requests read bursts and controls execution of data paths, waits foroutput buffer if it is full
30/06/2010 Craig Moore, DSD 02/09/2010 31
Byte-Enable Signal
● Multiple values for each memory transaction● Tells which bytes to replace and preserve
30/06/2010 Craig Moore, DSD 02/09/2010 32
Byte-Enable Signal
● Multiple values for each memory transaction● Tells which bytes to replace and preserve
Ignore
Enable
30/06/2010 Craig Moore, DSD 02/09/2010 33
Byte-Enable Signal
● Multiple values for each memory transaction● Tells which bytes to replace and preserve
Ignore
Enable
30/06/2010 Craig Moore, DSD 02/09/2010 34
Byte-Enable Signal
● Multiple values for each memory transaction● Tells which bytes to replace and preserve
Ignore
Enable
30/06/2010 Craig Moore, DSD 02/09/2010 35
Byte-Enable Signal
● Multiple values for each memory transaction● Tells which bytes to replace and preserve
Ignore
Enable
30/06/2010 Craig Moore, DSD 02/09/2010 36
Parametrized Template
30/06/2010 Craig Moore, DSD 02/09/2010 37
Parametrized Template
● Memory Bus Width = MParameters
30/06/2010 Craig Moore, DSD 02/09/2010 38
● Word Width = W
Parametrized Template
● Memory Bus Width = MParameters
30/06/2010 Craig Moore, DSD 02/09/2010 39
● Word Width = W
Parametrized Template
● Memory Bus Width = MParameters
● Max Words = A = M / W
30/06/2010 Craig Moore, DSD 02/09/2010 40
● Word Width = W
Parametrized Template
● Memory Bus Width = MParameters
● Max Words = A = M / W
● Input FIFOs = X = Cx * A
30/06/2010 Craig Moore, DSD 02/09/2010 41
● Word Width = W
Parametrized Template
● Memory Bus Width = MParameters
● Max Words = A = M / W
● Input FIFOs = X = Cx * A
● Iterations = Output FIFOs = N = C
N * X
30/06/2010 Craig Moore, DSD 02/09/2010 42
● Word Width = W
Parametrized Template
● Memory Bus Width = MParameters
● Max Words = A = M / W
● Input FIFOs = X = Cx * A
● Iterations = Output FIFOs = N = C
N * X
● Burst Length
● Input FIFO Length
● Iteration Length
● Output FIFO Length
30/06/2010 Craig Moore, DSD 02/09/2010 43
● Word Width = W
Parametrized Template
● Memory Bus Width = MParameters
● Max Words = A = M / W
● Input FIFOs = X = Cx * A
● Iterations = Output FIFOs = N = C
N * X
● Burst Length
● Input FIFO Length
● Iteration Length
● Output FIFO Length
30/06/2010 Craig Moore, DSD 02/09/2010 44
Example – Reading Values
Values in Memory
Values to be read
Byte enabled
Byte disabled
Values processed
30/06/2010 Craig Moore, DSD 02/09/2010 45
Example – Processing Values
Values in Memory
Values to be read
Byte enabled
Byte disabled
Values processed
30/06/2010 Craig Moore, DSD 02/09/2010 46
Example – Writing Values
Values in Memory
Values to be read
Byte enabled
Byte disabled
Values processed
30/06/2010 Craig Moore, DSD 02/09/2010 47
Future Work
● More templates for other parallel for loops● Pipelined loop body● Data reuse
● Compiler identifies parallel for loop● No keywords● Check for loop dependencies, and do loop
transformations if required● Compiler suggests best memory template
● Chosen based on performance estimate● Design space exploration using templates
30/06/2010 Craig Moore, DSD 02/09/2010 48
Conclusions
● HLS Tools don't create memory designs● Manual memory designs can take
days/weeks/months to complete● Parametrized memory template designs are
generated in seconds● Easy to perform design space exploration using
different parameter values and/or templates
30/06/2010 Craig Moore, DSD 02/09/2010 49
Thank You!
Questions?
craig.moore@elis.ugent.behttp://www.elis.ugent.be/~cmoore
Wim Meeus*, Harald Devos‡, and Dirk Stroobandt**{wim.meeus, dirk.stroobandt}@elis.ugent.be, ‡devos.harald@gmail.com
top related