computing models for fpga-based...

58
Computing Models for FPGA-Based Accelerators* Martin Herbordt Tom VanCourt Yongfeng Gu Josh Model Bharat Sukhwani Matt Chiu Computer Architecture and Automated Design Laboratory Department of Electrical and Computer Engineering Boston University http://www.bu.edu/caadlab *This work is supported in part by the U.S. NIH/NCRR and NSF, and by MIT Lincoln Lab

Upload: others

Post on 22-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Computing Models for

FPGA-Based Accelerators*

Martin Herbordt Tom VanCourt Yongfeng Gu

Josh Model Bharat Sukhwani Matt Chiu

Computer Architecture and Automated Design Laboratory

Department of Electrical and Computer Engineering

Boston University

http://www.bu.edu/caadlab

*This work is supported in part by the U.S. NIH/NCRR and NSF, and by MIT Lincoln Lab

Page 2: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

The Promise of HPRC* …

*Trimberger/Xilinx, FPL07

Page 3: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Reality Check …

Lately … (just as everything seems to be going great for FPGAs)

Reported speed-ups have become more modest, with low single

digits frequently being reported …

Why? Some hypotheses

– More ambitious applications

• Large codes in established systems

• True HPC: large, complex, data types

– More realistic reporting

• end-to-end numbers

• production reference codes

– More “ambitious” development tools

– “Broader” developer base

– FPGA stagnation for two generations (4 years)

• Smaller chips (relative microprocessors)

• Relatively fewer “hard” components: Block RAMs, multipliers

Page 4: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

One key to successful HPRC application

development …

Use an appropriate computing model*

when formulating a problem#

*neither programming language substitute nor HDL

Computing Model ≡

an abstraction of a target machine used to

ease application development

# “One of the key challenges addressed by the ACS program was Problem Formulation”

-- Dr. J. Muñoz, ACS Program Manager, RSSI 2008

Page 5: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

A good computing model* …

• Ignores machine details

• Ignores programming language details

• Expresses enough of the underlying machine to enable

the user to develop/choose an optimal algorithm (for a

given application)

• Enables some amount of portability

*see, e.g., L. Snyder 1986 Annual Review of Computer Science

Page 6: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

The more machine detail …

… that is expressed in the computing model:

• the greater the potential performance

• the less the portability

• the more experience required to use effectively

Page 7: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Historically (ca. 1990) …

“If we only had the right computing model, then

we could port programs among all of our

different parallel computers.”

-- theme of several parallel computing conferences

Page 8: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Nowadays …

“We need to develop the right computing model so

that programmers can use multicore efficiently”

[without having to learn anything new].

Page 9: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Serial Architecture Computing Model

v. Neumann machine

• Single thread

• Random access memory

Page 10: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Parallel Architecture Computing Models

Three (and combinations) are in common use:

1. multiple threads with shared address space

2. multiple threads with message passing

3. single thread with dataparallel constructs

Page 11: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Good (and bad) Computing Models

Just about any programming model can map to any non-trivial computer

(C pointers to FPGAs?? Parallel LISP to SIMD? It‟s been done!)

Good models (w.r.t. a target architecture):

1. Is it convenient to use (in comparison with programming)?

– Inconvenient to use microcode, VHDL, etc.

2. Do constructs map efficiently to the target architecture?

– Inefficient mapping functional parallelism to a SIMD architecture

3. Can critical target machine features be expressed?

– Features can‟t be expressed for a multi-core architecture, parallel

independent functions in the dataparallel model

Page 12: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

A great question …

(that is beyond the scope of this talk):

Is it possible to support a “universal”

computing model?

In which we can create applications that port among target

architectures, where:

– the programmer effort is the same as for one version of the software

– the target architectures are unrestricted

– the performance is optimal for all target architectures

– there is no constraint on application domain

Page 13: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Outline

1. Computing Models

2. FPGA Basics – functional models

3. Things that FPGAs do really well –

FPGA Computing Models

4. Sample application mappings

5. What this says about how FPGA-based

computing can advance

Page 14: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

What’s a good computing model?

Page 15: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

What’s a good computing model?

Page 16: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

What’s a good model?

Page 17: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

A Basic FPGA Computing Model

Historically, FPGAs were a configurable “bag of gates”

Trimberger/Xilinx, FPL07

Page 18: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Nowadays, “bag of computer parts” is more accurate

Is no longer sufficient …

Trimberger/Xilinx, FPL07

Page 19: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Should also account for board-level …

… especially memory hierarchy!

Page 20: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

and the system interface …

Bhatt/Intel, FPL07

Page 21: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

• Millions of gate equivalents & connections

• ~500 ALU equivalents

• ~1,000 small on-chip caches

– Total on-chip memory ~16MB

• Several off-chip caches

– Capacity for 512b data transfers per cycle (8x64b)

• Several high-performance I/O streams

• Host w/ simple interface, e.g., FSB

FPGA Functional Model

Page 22: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Another Candidate Model

Page 23: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Outline

1. Computing Models

2. FPGA Basics – functional models

3. Things that FPGAs do really well –

FPGA Computing Models

4. Sample application mappings

5. What this says about how FPGA-based

computing can advance

Page 24: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

We know we’ve succeeded when …… we’ve restructured the problem into something we know

works really well on FPGAs

Effective computing models:

Streaming

Associative computing – broadcast, compare tags

HW structures – FIFOs, priority queues, systolic arrays

Cellular automata, SIMD PEs, Vector processing

Highly parallel (possibly complex) memory access

Overlapped parallel structures

Also assumed:

– Explicit memory control, e.g., to swap working sets

– High-bandwidth I/O

Page 25: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Some Observations

• Most architectures have a single preferred model, FPGAs have many

• FPGA models are surrogates for the component they replace

Page 26: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Model: StreamingEx: DSP replacement

Characteristics:

• Pass streams of data through a series of arithmetic units

• Iterative streaming computation with data beginning and

ending in Block RAMs

Page 27: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Typical Streaming Scenarios

1. On-line signal/video processing

– Stream originates from I/O

– Stream processed with computational

filters

2. Complex computation of large array

– Stream originates in memory

– Stream processed with pipelined

instantiation of computation

Page 28: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Model: Associative ComputingEx: SIMD array replacement

Characteristic operations:

• Broadcast query/data

• Tag check

• Collective response

• Reduction of responses

Krikelis: Associative String Processor

Page 29: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Typical Associative Scenario

Query/response

Example: Optimization with successive approximation

Class: Successive approximation

TBS

Scoringfunction

Fi(x)

TBS

Initial state

X0

TBS

Next state selection

Xi+1 =

NS[ F1(Xi), F2(Xi), … ]

F1 F2 F3 F4 F5

X0

Xi+1 = NextState[ F1, F2, F3, … ]

Xi

Page 30: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Model: MSI HW StructuresEX: ASIC replacement

... …

... … C[k]

A[L]

B[i]

0

A[L-1] A[0]A[L-2]

PE

A[k]

Init_A

Characteristics:

Standard HW versions of common data structures –

• FIFOs

• Priority queues

• Systolic arrays

Page 31: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Typical HW structure scenario

• Wherever HW instantiation does not have an

immediate SW analog

Example: Find palindromes in a character string

gap

+ + +

T3? T4?

Priority encoder

Maximal palindrome length

= = = =

Len=1Len=2Len=3Len=4

Charactercomparison

Length summation

Threshold detection

Length reporting

T1? T2?

Page 32: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Model: Highly Parallel Memory Access“The advantage of having an MPP is having lots of memory pipes”

Characteristics:

• Source and sink up to 2000 operands per cycle

• Possible complex access patterns

Divide n objects into subsets of size m …

… so that every size-3 subsetis in just one size-m subset

DPS1 DPS2 DPS3DPS0DPS83…

Y

X1-9

(m = 9)

DPS3

Vector DataMemory (VDM)

Example: access vectors in all possible 3-way combinations

Page 33: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Model: Functional Parallelism

Characteristics: (just what it says)

• potentially expensive computations can be hidden completely– random # generation

– coordinate transformations

• Access original molecule grid in rotated order

– Express (i,j,k) in (x,y,z) basis

i=(xi, yi, zi) j=(xj, yj, zj) k=(xk, yk, zk)

– Traverse (i,j,k) index space

– Find (x,y,z) from (i,j,k)

xi xj xk i x

yi yj yk j = y

zi zj zk k z

– Round and range check

• Pipelined, parallel computation

gives ~0 ns overhead for rotation

xy

i

j

Data reduction

filter

Molecule

voxel rotation

Systolic 3D

correlation array

Example: rigid molecule docking

Page 34: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Outline

1. Computing Models

2. FPGA Basics – functional models

3. Things that FPGAs do really well –

FPGA Computing Models

4. Sample application mappings

5. What this says about how FPGA-based

computing can advance

Page 35: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

BLAST*

• For a biological sequence (DNA, Protein) and a database

of such sequences, find the database sequences that are

most biologically relevant to the query sequence.

*FCCM06, ParCo07

Page 36: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Sequence Alignment – Basics

Example: GCGATCT versus an entire database

• Each character-character match (G-G, C-G, etc.) is scored

independently with a scoring matrix

• An alignment is a possible way for sequences to match (char-char)

• To score an alignment (evaluate a single ScoreSequence) …

• Simple algorithm to find maximal ungapped local alignment of all

possible ungapped alignment (i.e. N ScoreSequences) …

• Complexity of gapped alignments is potentially unbounded

# Find maximal local alignment of all ungapped alignments

# Find max cumulative score with cut-off = 0

# Complexity = O(MN)

Traverse Database – Foreach Alignment

Generate ScoreSequence

Do SimpleScoring

Page 37: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Gapped alignments (DP-based methods)

• Create query/database tableau:

• Traverse the tableau with a Dynamic Programming algorithm

Score each grid cell (i,j) Si,j is computed using the following recurrence:

• Complexity: O(MN)

G C G A T C T

G 1 0 1 0 0 0 0

C 0 1 0 0 0 1 0

A 0 0 0 1 0 0 0

T 0 0 0 0 1 0 1

T 0 0 0 0 1 0 1

T 0 0 0 0 1 0 1

A 0 0 0 1 0 0 0

Parallel to main diagonal – match/mismatch

Vertical or horizontal – indel

GCGATCT-GC-ATTTA

Qu

ery

(le

ng

th M

)

Database Sequence (length N)

Page 38: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

An even better way (BLAST)

The BLAST heuristic …

1. Look for small clusters of matches on main diagonal

2. Try to extend those (and only those) clusters

3. Try to merge those extended clusters

– e.g. using DP methods on regions of interest

database

query

database

query

1. 2.

• Complexity: O(N) + O(M2) with M << N

database

query

3.

Page 39: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Systolic HW Implementation of DP/ASM

database

query

DP processing follows main diagonal …

0 1 2 3

1 2 3

2 3 N-2 N-1

3 N-2 N-1 N

N-2 N-1 N

N-2 N-1 N

N-1 N

N N N N N

N-1 N-1 N-1 N-1

N-2 N-2 N-2 N-2 N-2

A B

… leading to a wavefront dependency (A),

which is easily computed with a linear array (B).

• Complexity with M cells: O(N)

databasequery

Page 40: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

What’s hard about HW BLAST?

• Random access into multi-GB database for extensions

• The serial version is already O(N)

• HW DP is already O(N) and handles gaps!

database

query

2.

Page 41: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

An

Observation

DP and BLAST are duals of each other …

DP processes M alignments simultaneously;

processing is perpendicular to main diagonal

BLAST processes 1 extension at a time

processing is parallel to main diagonal

DP HW advances one db character per cycle

BLAST HW advances one db character per cycle??

database

qu

ery

database

qu

ery

Page 42: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

TreeBLAST – Optimize the HW

Operation:

• Query string held in place,

database streams over it

• On each cycle (alignment), one

ScoreSequence generated

• ScoreSequences evaluated

systolically by the tree structure

database

query

TreeBLAST

# In a single cycle

Dimension 1: Foreach Alignment

generate ScoreSequence

# In log2(M) cycles for each ScoreSequence

# process log2(M) ScoreSequences

# simultaneously

Dimension 2: Foreach ScoreSequence

use tree structure to generate local alignment

# Time Complexity = O(N)

# Area Complexity = O(M)

8-2-3 -3 -3 -1 8-2

M

C

C

G

L

W

K

W

K

W

W

M

Y

Y

F

FC

Leaf Leaf Leaf Leaf

Intern. Intern.

Intern.

local alignment score

Query String

Database

ScoreSequence

Page 43: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

FPGA BLAST Summary

Key Methods:

• Streaming

• 2D Systolic array

• Custom pipeline

• Thousands of comparisons per cycle

Performance – Time to stream a database through an FPGA

• Average of 4-5 parallel streams

• 200Mhz

• ~1GB/sec

Page 44: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Time-Step Driven Molecular Dynamics*

MD – An iterative application of Newtonian mechanics to

ensembles of atoms and molecules

Runs in phases:

Many forces typically computed,

but complexity lies in the non-bonded, spatially extended forces:

van der Waals (LJ) and Coulombic (C)

Force

update

MotionUpdate(Verlet)

bondednonHtorsionanglebondtotal FFFFFF

Initially O(n2), done

on coprocessor

ji

ji

ab

ji

ab

ij ab

abLJ

i rrr

F

814

2612

ji

ijji

ii

C

i rr

qqF

3

Generally O(n),

done on host

*FPL05,FPL06,ParCo08

Page 45: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Make Short-Range Forces O(N)

with Cell Lists

Observation:

• Typical volume to be simulated = 100Å3

• Typical LJ cut-off radius = 10Å

Therefore, for all-to-all O(N2) computation,

most work is wasted

Solution:

Partition space into “cells,” each roughly the size

of the cut-off

Compute forces on P only w.r.t. particles in

adjacent cells.– Issue shape of cell – spherical would be more efficient,

but cubic is easier to control

– Issue size of cell – smaller cells mean less useless force computations, but more difficult control. Limit is where the

cell is the atom itself.

P

ji

ji

ab

ji

ab

ij ab

abLJ

i rrr

F

814

2612

Page 46: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Short-Range Force Computation

Problem:

– Compute force equations such as

Difficulty:

– It requires expensive division operations for r -x.

Method: Use table look-up with interpolation, but on individual terms (r-4, r-7)

Also used for short-range component of Coulombic

three tables are needed, plus further computation

ji

ji

ab

ji

ab

ij ab

abLJ

i rrr

F

814

2612

)()(...)()()()( 3

3

2

210

MM

M xoaxCaxCaxCaxCCxf

Page 47: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Interpolation Pipeline with Semi-FP

• r-x interpolation Pipeline„a‟ is the starting point of an interval

00001111001100

Offset (x-a)Section (format)

Interval (a)

r2

Find most significant 1 to:

get format

extract a

extract (x-a)

C3*(x-a) Coefficient

Memory

format

(x-a)

a

x=r2

(C3*(x-a)+C2

(C3*(x-a)+C2)*(x-a)

(C3*(x-a)+C2)*(x-a)+C1

((C3*(x-a)+C2)*(x-a)+C1)*(x-a)

((C3*(x-a)+C2)*(x-a)+C1)*(x-a)+C0

r-14, r-8, or r-3

Coefficient

Memory

Coefficient

Memory

Coefficient

Memory

M

i

i

ition axCxf0

sec )()(

r2

r -x

a x

x-a

Page 48: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

r-14,

r-8,

r-3

pos

r2

r2

Force Pipeline Array

Pos, Type

Memory

Acceleration

Memory

POS, Type

CacheAcceleration

Cache

BUS

Host Memory

Boundary

Condition

Check

Cutoff

Check

Distance

Squared

Extract format, a, (x-a)

((C3*(x-a)+C2)*(x-a)+C1)*(x-a)+C0

r-14,

r-8,

r-3,

r2

Lennard-Jones

force

Short-range

part of CL force

Pseudo

force

Page 49: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

FPGA MD Summary

Key Method – cast as a streaming problem

• Very deep (70 stage) pipeline

• Multiple (2-8) pipelines

• Optimize interpolation w.r.t. FPGA architecture

• Explicit control of off-chip cell swaps to maintain

constant stream flow

Page 50: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Discrete Event Simulation of MD*

• Simulation with simplified models

• Approximate forces with barriers and square wells

• Classic discrete event simulation

*FPL07, FCCM08

Page 51: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

An Alternative ...

Only update particle state when

“something happens”

• “Something happens” = a discrete event

• Advantage DMD runs 105 to 109 times faster

than tradition MD

• Disadvantage Laws of physics are continuous

Page 52: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

DMD step-wise force approximation

Pote

ntial

Pote

ntial

Distance Distance

Covalent Bond Hard Sphere

Single-wellMulti-well

Page 53: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Discrete Event Simulation

• Simulation proceeds as a series of

discrete element-wise interactions

– NOT time-step driven

• Seen in simulations of …

– Circuits

– Networks

– Traffic

– Systems Biology

– Combat

Time-Ordered

Event Queuearbitrary insertions

and deletions

Event

Processor

Event

Predictor

(& Remover)

System

State

events

new state

infostate

infoevents &invalidations

Page 54: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Overview - Dataflow

Main idea: DMD in one big pipeline

• Events processed with a throughput of one event per cycle

• Therefore, in a single cycle:

• State is updated (event is committed)

• Invalidations are processed

• New events are inserted – up to four are possible

Com

mit

Event

Predictor

Units

Collid

er

On-Chip

Event

Priority Queue

Off-Chip

Event Heap

New Event InsertionsStall Inducing Insertions

Invalidations

Event flowUpdate

state

Page 55: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

DMD Summary

Key Methods:

• Associative processing: broadcast, compare, etc.

• Standard HW components: priority queue, etc.

Performance –

• 200x – 400x for small to medium sized models

Page 56: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Outline

1. Computing Models

2. FPGA Basics – functional models

3. Things that FPGAs do really well –

FPGA Computing Models

4. Sample application mappings

5. What this says about how FPGA-based

computing can advance

Page 57: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Improving HPRC application development

Language support for

– Streams (common)

– Associative operators (less common)

– Complex memory interleaving (less common)

Libraries with support for

– Common HW functions (some, but low-level)

User knowledge & experience

– FPGA models, applying models (not HW design!)

Page 58: Computing Models for FPGA-Based Accelerators*rssi.ncsa.illinois.edu/proceedings/academic/Herbordt.pdf · 2010-11-23 · HPRC Computing Models RSSI –7/9/2008 Computing Models for

RSSI – 7/9/2008HPRC Computing Models

Questions?