reconfigurable computing with the partitioned global...

23
High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable Computing with the Partitioned Global Address Space model Cascadia 2012 Ruediger Willenberg and Paul Chow August 14, 2012

Upload: duonghuong

Post on 23-May-2018

224 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

High-Performance Reconfigurable Computing Group

University of Toronto

Reconfigurable Computing with the

Partitioned Global Address Space model

Cascadia 2012

Ruediger Willenberg and Paul Chow

August 14, 2012

Page 2: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Parallelizing computation:

How to partition, communicate and

synchronize data?

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

2

Page 3: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Parallel Programming Models

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

3

Page 4: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Partitioned Global Address Space

• Any thread can access any memory location,

but:

• There is a visible difference between local

and remote memory locations

• One-sided communication (remote read and

write without local thread involvement)

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

4

Page 5: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Language Level PGAS:

Unified Parallel C (UPC) example

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

5

#define N 100*THREADS

shared int [*] v1[N], v2[N], sum[N];

void main()

{

int i;

upc_forall(i=0; i<N; i++; &v1[i])

sum[i]=v1[i]+v2[i]; // all work is local

}

Others: Co-Array Fortran, Titanium (Java), Chapel (Cray), X10 (IBM)

Page 6: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Application Library Level PGAS:

Global Arrays

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

6

Page 7: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Communication Level PGAS:

GASNet (Global Address Space Networking)

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

7

Others: ARMCI (Global Arrays), SHMEM (App level)

Page 8: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Network Level PGAS:

Remote DMA (RDMA)

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

8

Examples: Infiniband, Myrinet, iWARP, RoCE

Page 9: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

CPUs+FPGAs: Co-processor Style

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

9

Page 10: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

CPUs+FPGAs: Symmetric Style

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

10

Page 11: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

What does „symmetric“ mean?

• CPU code and FPGA components can both

initiate data sends and requests

• Both use a similar or identical API to ease

migration

• For distributed-memory/message-passing,

TMD-MPI / ArchES-MPI implement this

• Our work strives to build a symmetric

PGAS system based on GASNet

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

11

Page 12: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

GASNet Active Messages

Remote Write: Long Request Message

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

12

Page 13: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

GASNet Active Messages

Remote Read: Long Reply Message

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

13

Page 14: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

GAScore FPGA component

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

14

HardwareProcessingElement

Page 15: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

GAScore FPGA system

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

15

Page 16: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

BEE3 multi-FPGA system

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

16

Page 17: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Next steps: Hardware

• External DRAM support (caching...?)

• Strided and scatter/gather transfers

• Messaging management for custom

hardware cores

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

17

Page 18: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Next steps: Hardware

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

18

Page 19: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Programmable Active

Message Sequencer

• Programmable/re-programmable through

GASNet messages

• Controls/synchronizes custom hardware

• Handles reception and transmission of

GASNet active messages

• Sequences based on: custom hardware state,

timer, amount of received data, number of

received messages of a specific type

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

19

Page 20: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Next steps: Toolchain

challenges for FPGAs in HPC

• PGAS languages without heterogeneity

support (UPC, CAF, Titanium)

• PGAS languages without clear HLL-to-FPGA

path (Chapel, X10)

• Lack of FPGA programming experts in HPC

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

20

Page 21: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

CPU-based

Host

CPU-based

Host

Next Steps: Toolchain

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

21

GASNet CPU-based

Host

GASNet Library

Heterogeneous

C++ PGAS Library

C++ PGAS Application C++ generated code

DSL application

Compile Static

generation

manual

or

C-to-gates

Dynamic generation

P A M S

Custom

FPGA

Hardware

Page 22: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Heterogenous C++ PGAS library

• Concepts stolen from Global Arrays, Chapel, X10

• Specialized data classes for multi-dim. arrays, etc.

• Location and subgroup classes

• Distribution and layout types; assigned to arrays to

define storage and computation patterns

• Can at compile-time as well as runtime generate

and distribute PAMS code

• Can be used as a runtime library for code

generation from Domain-Specific Languages (DSLs)

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

22

Page 23: Reconfigurable Computing with the Partitioned Global ...willenbe/publications/Cascadia2012_talk... · High-Performance Reconfigurable Computing Group University of Toronto Reconfigurable

Thank you for attention!

Questions?

August 14, 2012 High-Performance Reconfigurable Computing Group ∙ University of Toronto

23