a computing origami: folding streams in fpgas

15
A Computing Origami: Folding Streams in FPGAs S. M. Farhad PhD Student University of Sydney DAC 2009, California, USA

Upload: lucas

Post on 16-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

A Computing Origami: Folding Streams in FPGAs. DAC 2009, California, USA. S. M. Farhad PhD Student University of Sydney. Outline. Motivation Stream programming FPGA Problem Stream Folding Results Conclusion. 2. Stream Programming Paradigm. Programs expressed as stream graphs - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Computing Origami: Folding Streams in FPGAs

A Computing Origami: Folding Streams in FPGAs

S. M. FarhadPhD Student

University of Sydney

DAC 2009, California, USA

Page 2: A Computing Origami: Folding Streams in FPGAs

2

Outline

Motivation Stream programming FPGA Problem

Stream Folding Results Conclusion

2

Page 3: A Computing Origami: Folding Streams in FPGAs

Stream Programming Paradigm

Programs expressed as stream graphs Streams: Sequence of data elements Actor: Functions applied to streams

Independent actors with explicit communication

Regular and repeating computation

3

Actor/Filter

Streams

Streams

Page 4: A Computing Origami: Folding Streams in FPGAs

FPGA

FPGAs are widely available as programmable coprocessors

Opportunities to exploit FPGA-based acceleration Multimedia, networking, graphics, and security codes

4

Page 5: A Computing Origami: Folding Streams in FPGAs

Problem

Maximizing throughput subject to Area and latency constraints

Resolving bottleneck actors The replicated filters do not require resynthesis

5

Page 6: A Computing Origami: Folding Streams in FPGAs

Motivating Example

6

Page 7: A Computing Origami: Folding Streams in FPGAs

Motivating Example

7

Page 8: A Computing Origami: Folding Streams in FPGAs

Motivating Example

8

Page 9: A Computing Origami: Folding Streams in FPGAs

9

Outline

Motivation Stream programming FPGA Problem

Stream Folding Results Conclusion

9

Page 10: A Computing Origami: Folding Streams in FPGAs

Area/Throughput Design Folding

1 foreach Filter f in S do2 workFactor[f] = f.latency.S.runs(f);3 designPointArea + = f.area.workFactor[f];4 scaleLimit = minf.hasState (1/workFactor[f]); 5 scaling = min(AREA/designPointArea, scaleLimit);6 foreach Filter f in S do7 replication[f] = workFactor[f].scaling;8 while area(replication) > AREA do9 replication = reduceThroughput(replication);

10

Page 11: A Computing Origami: Folding Streams in FPGAs

Calculating Throughput

11

)(

)()(

i

iout Flatency

Fpushit

)(

)().()(

j

j

njiout

Pout Flatency

Fpushitit

)(min1

itt Pout

ni

Pout

i

njj

outni

SJout w

w

itt 1).(min

))(..(min

))(.(min)(min

itCrt

itCitt

outiiSi

Sout

outiSi

Sout

Si

Sout

Page 12: A Computing Origami: Folding Streams in FPGAs

Calculating Latency

FPGAs that are coupled to host processors Initiation interval (DMA) Replication improves throughput, it often

increases the latency! Major factors for latency variation

Non-periodic data arrival Data-token reordering Local congestion

12

Page 13: A Computing Origami: Folding Streams in FPGAs

Latency constrained design folding

1 latConf= null ; T = ∞;2 while throughput(thrConf) ≤ T do3 if feasibleImprovement(thrConf) then4 candidates = simAnnealing(thrConf, T);5 foreach candidate in candidates do6 if throughput(candidate) < T then7 latConf = candidate;8 T = throughput(latConf);9 thrConf = reduceThroughput(thrConf);10 return latConf

13

Page 14: A Computing Origami: Folding Streams in FPGAs

Results

Benchmark

Minimum area Best throughput Constrained design

LUTs Latency II LUTs Latency II LUTs Latency IIConstraint

Run time

MatrixMult 1498 480 19 7618 185 3 4558 175 7

Latency ≤ 175 1.14s

Serpent 3028 1027 4 3878 773 2 3053 901 4Latency ≤ 910 0.73s

FFT2 37610 1199 3 43370 764 2 39530 868 7AREA ≤ 40000 34.7s

FMRadio 37458 371 39 87564 371 13 62511 371 20AREA ≤ 65000 1.01s

DCT 45752 349 3 137256 349 1 91504 349 2AREA ≤ 120000 0.73s

BitonicSort 43920 1042 3 131760 1042 1 47400 1282 2

AREA ≤ 50000 18.3s

Synthetic 350 309 135 15990 504 2 1490 309 47

AREA ≤ 1500 0.43s

14

Page 15: A Computing Origami: Folding Streams in FPGAs

Questions?