cultivating a community about fpga-hpc...

23
Culvang a community about FPGA-HPC plaorms 9 th JLESC workshop. Knoxville, TN, USA. April 16, 2019 Kazutomo Yoshii <[email protected]> Mathematics and Computer Science Argonne National Laboratory JLESC project: Evaluating high-level programming models for FPGA platforms Carlos Alvarez (BSC), Daniel Jimenez-Gonzalez (BSC), Xavier Martorell (BSC), Osman Unsal (BSC), Eric Rutten (INRIA), Kentaro Sano (R-CCS), Zheming Jin (ANL), Hal Finkel (ANL), Franck Cappello (ANL)

Upload: others

Post on 10-Jun-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cultivating a community about FPGA-HPC platformsicl.utk.edu/jlesc9/files/PTA3.2/jlesc9_yoshii.pdf · HPC-FPGA community Historically, FPGA electric design automation (EDA) communities

Cultivating a community about FPGA-HPC platforms

9th JLESC workshop. Knoxville, TN, USA. April 16, 2019

Kazutomo Yoshii <[email protected]>Mathematics and Computer Science

Argonne National Laboratory

JLESC project: Evaluating high-level programming models for FPGA platforms

Carlos Alvarez (BSC), Daniel Jimenez-Gonzalez (BSC), Xavier Martorell (BSC),Osman Unsal (BSC), Eric Rutten (INRIA), Kentaro Sano (R-CCS),Zheming Jin (ANL), Hal Finkel (ANL), Franck Cappello (ANL)

Page 2: Cultivating a community about FPGA-HPC platformsicl.utk.edu/jlesc9/files/PTA3.2/jlesc9_yoshii.pdf · HPC-FPGA community Historically, FPGA electric design automation (EDA) communities

2

CMOS Scaling Is Coming to An End

● Requires significant investment

– e.g., Intel spent $5B on 14nm

– Rock’s law: cost of new plant doubles every four years

● Benefits are shrinking

– thermal, leakage, reliability, etc

● # of manufacturing companies is 20 to 5 in the past 15 years!

● “The number of people predicting the death of Moore’s law doubles every two years.”

IEEE International Roadmap for Devices and Systems (2017)

Page 3: Cultivating a community about FPGA-HPC platformsicl.utk.edu/jlesc9/files/PTA3.2/jlesc9_yoshii.pdf · HPC-FPGA community Historically, FPGA electric design automation (EDA) communities

3

How Can We Survive in The Post-Moore Era?

● Our demands for computational power keeps growing exponentially

– scientific discoveries depends on computational power

● New types of computers?

– quantum computers may not be ready in a timely manner

– different concept, the applicability of classical algorithms is questionable

● Still depend on general-purpose processors

– Performance is driven by transistor scaling

Specializatione.g., AI processors

Reconfigurable(co-design)

Quantum computers,Brain-inspired computers, etc

“Re-form” LDRD project was funded in 2015

Investigators: Kazutomo Yoshii, Franck Cappello,

Hal Finkel, Fangfang Xia

General-PurposeProcessors

new switching technologies

Page 4: Cultivating a community about FPGA-HPC platformsicl.utk.edu/jlesc9/files/PTA3.2/jlesc9_yoshii.pdf · HPC-FPGA community Historically, FPGA electric design automation (EDA) communities

4

Data and learning are new requirements for HPC

Page 5: Cultivating a community about FPGA-HPC platformsicl.utk.edu/jlesc9/files/PTA3.2/jlesc9_yoshii.pdf · HPC-FPGA community Historically, FPGA electric design automation (EDA) communities

5

FPGAs for data analytics?

MappingAligning

Positionsorting

Duplicatemarking

Variantcalling

FASTQ VCF

● Complicated pipelines– different implementation

– integer heavy computation on some stages

● Scaling study is still new– end up in runtime system development

● Edico Genome currently holds the Guinness world record for fastest time– 1,000 FPGAs Amazon EC2 F1

instances for 1,000 human genomes

Page 6: Cultivating a community about FPGA-HPC platformsicl.utk.edu/jlesc9/files/PTA3.2/jlesc9_yoshii.pdf · HPC-FPGA community Historically, FPGA electric design automation (EDA) communities

6

FPGAs for learning?

● Machine learning (ML) acceleration is becoming mandatory to future HPC!

● Workload characteristics are different from ordinal numerical computing

– ML algorithms and implementation techniques keep evolving

● mixed/reduced precision, stochastic rounding, zero-pruning, etc

● hard for ASICs

– Latency sensitive for some application

● Already many success stories using FPGAs

Page 7: Cultivating a community about FPGA-HPC platformsicl.utk.edu/jlesc9/files/PTA3.2/jlesc9_yoshii.pdf · HPC-FPGA community Historically, FPGA electric design automation (EDA) communities

7

Edge and Near-Sensor Computing

SensorSensorSensorSensorSensor

Edge node

Data acquisition node

more sensors, higher resolutionthe increase in data rateis exponential!

limited bandwidthhigher latency

Opportunity here!

Cloud

HPCdisk

Page 8: Cultivating a community about FPGA-HPC platformsicl.utk.edu/jlesc9/files/PTA3.2/jlesc9_yoshii.pdf · HPC-FPGA community Historically, FPGA electric design automation (EDA) communities

8

Unfortunately FPGAs are not adopted in HPC

● Although numerous success in cloud and data center space

● Individual FPGA researches and works are often impressive– innovative architecture

designs, dataflow studies, etc

● Hard to translate someone’s knowledge to others– the nature of the

platforms● large, complicated

heterogeneous architecture

– lack of abstraction

– lack of community● lack of common platforms

Page 9: Cultivating a community about FPGA-HPC platformsicl.utk.edu/jlesc9/files/PTA3.2/jlesc9_yoshii.pdf · HPC-FPGA community Historically, FPGA electric design automation (EDA) communities

9

Field Programmable Gate Array (FPGA)

● The first FPGA chip (1985)– 64 flip flops, 128 3 lookup

tables● Practical reconfigurable

architecture● Lower non-recurring engineering

cost compared to ASICs

– once a design gets fixed, no one touches

● Application

– prototype ASICs

– signal processing

– data acquisition system

Logic block

Switchblock

SwitchBlock

Page 10: Cultivating a community about FPGA-HPC platformsicl.utk.edu/jlesc9/files/PTA3.2/jlesc9_yoshii.pdf · HPC-FPGA community Historically, FPGA electric design automation (EDA) communities

10

Today’s FPGA technology

● Heterogeneous– not only login elements but also DSP, BRAM, etc

● Floating point capability– Intel Stratix 10’s theoretical peak is 10 Tflops (SP)

● Run faster– Technology like Hyperflex helps

– > 600 MHz is not a dream

● Large internal memory– up to ~40MB of SRAM (e.g., Xilinx VU37P)

– very high internal bandwidth

● Off-chip memory improvement– HBM2 integration

● High-speed transceivers– ~56 Gbps

– can be used for direct FPGA-FPGA communication

● The advent of FPGA-CPU hybrid platforms– ARM-FPGA, Xeon-FPGA, etc

● Embedded-class FPGAs

Page 11: Cultivating a community about FPGA-HPC platformsicl.utk.edu/jlesc9/files/PTA3.2/jlesc9_yoshii.pdf · HPC-FPGA community Historically, FPGA electric design automation (EDA) communities

11

Intel AgileX

● Scalable– edge, networking, cloud

● Heterogeneous system in package (SiP)– possible eASIC integration

● Bfloat16, HBM, CXL

● CXL– Compute Express Link

● One API

Page 12: Cultivating a community about FPGA-HPC platformsicl.utk.edu/jlesc9/files/PTA3.2/jlesc9_yoshii.pdf · HPC-FPGA community Historically, FPGA electric design automation (EDA) communities

12

Lack of abstraction and portability

● So many FPGA chips, different boards/platforms

● No compatibility, even source-level

– longer compilation time

– so many different Xeon chips, too, but they offer binary compatibility and shorter compilation time

● HLS can abstract FPGA resources to some degree

● Need to abstract off-chip memory, I/O, debugging APIs, etc

PCIe

Every FPGA chip has a big product table!

Tightly-coupledw/ beefy CPUs

Standalone

Page 13: Cultivating a community about FPGA-HPC platformsicl.utk.edu/jlesc9/files/PTA3.2/jlesc9_yoshii.pdf · HPC-FPGA community Historically, FPGA electric design automation (EDA) communities

13

bash $ cc hello.cbash $ ./a.outhellobash $ cc app.c -lm -l....

Page 14: Cultivating a community about FPGA-HPC platformsicl.utk.edu/jlesc9/files/PTA3.2/jlesc9_yoshii.pdf · HPC-FPGA community Historically, FPGA electric design automation (EDA) communities

14

Abstraction layers for FPGAs are emerging

● OpenCL BSPs

– provide low-level software APIs

– limit board choices, less extensible

● or build a custom BSP

– OpenCL features may be an overkill

● Intel Open Programmable Acceleration Engine (OPAE)

– abstract accelerator such as FPGA

– consists of kernel drivers, userspace libraries and tools (e.g., discover, reconfigure)

– Supported platforms ?

● AWS EC2 FPGA hardware and software development kits

– FPGA shells and software APIs

– https://github.com/aws/aws-fpga.git

● Questions

– support our edge-to-HPC needs?

– support various programming models?

● offload, streaming, pure dataflow, hybrid dataflow, etc

Page 15: Cultivating a community about FPGA-HPC platformsicl.utk.edu/jlesc9/files/PTA3.2/jlesc9_yoshii.pdf · HPC-FPGA community Historically, FPGA electric design automation (EDA) communities

15

Presented at Intel’s 2018 Architecture Day

Page 16: Cultivating a community about FPGA-HPC platformsicl.utk.edu/jlesc9/files/PTA3.2/jlesc9_yoshii.pdf · HPC-FPGA community Historically, FPGA electric design automation (EDA) communities

16

FPGAFPGA

Paradigm shift in cluster designs

CPU

FPGA

CPU

FPGA

CPU

FPGA

Interconnect

CPU

FPGA

CPU

FPGA

CPU

FPGA

Interconnect

CPU FPGA

Sequential codes to FPGA designsmore memory references

Communication via CPU Direct communicationForm a larger FPGA

CPU FPGA

Mem

Data flow modelsminimize memory references

FPGAs have a richset of I/O(transceiver, GPIO)

Catapult (Microsoft)Novo-G# (Boston U.)PEACH (U. Tsukuba)

Common software stack is missing!

Page 17: Cultivating a community about FPGA-HPC platformsicl.utk.edu/jlesc9/files/PTA3.2/jlesc9_yoshii.pdf · HPC-FPGA community Historically, FPGA electric design automation (EDA) communities

17

Efficient data movement

● Complicated memory hierarchy

– fast on-chip: register, BRAM

– off-chip: QDR, DDR, HMC, HBM2

– remote: over transceiver, Gen-Z, CXL, etc

– storage: SSD, 3D-xpoint

● Minimize costly data movement

– exploit data locality (e.g., cache)

– FPGAs in data path

● reduced precision, compression

● stream processing

● Address space management (OS level)

– OpenCL host memory, hybrid dataflow models

– shared virtual memory is becoming norm● IOMMU. e.g., PCIe ATS● efficient TLB management scheme

– dataflow address region like DMA region

customcache

ProcessingUnit

FPGA

Storage

FPGA

Page 18: Cultivating a community about FPGA-HPC platformsicl.utk.edu/jlesc9/files/PTA3.2/jlesc9_yoshii.pdf · HPC-FPGA community Historically, FPGA electric design automation (EDA) communities

18

HPC-FPGA community

● Historically, FPGA electric design automation (EDA) communities had little interaction with software folks

● With the emergence of high-level synthesis tools, more software folks started evaluating

● Successful FPGA stories in cloud space

– driven by specific workloads

– custom solutions● written in HDL

● Early stage but HPC and FPGA community started emerging

– organizing events to cultivate HPC-FPGA community

Page 19: Cultivating a community about FPGA-HPC platformsicl.utk.edu/jlesc9/files/PTA3.2/jlesc9_yoshii.pdf · HPC-FPGA community Historically, FPGA electric design automation (EDA) communities

19

Past workshops and conference events (1)

● 2016 Jan : Workshop on FPGAs for scientific simulation and data analytics, Argonne, IL – Highlights: 14 speakers. universities, national labs, vendors

(Xilinx, Altera), high-level synthesis, SYCL/SPIR-V

– https://collab.cels.anl.gov/display/REFORM/Workshop20160121

● 2016 Oct : Workshop on FPGAs for scientfic simulation and data analytics, Urbana, IL– Highlights: 19 speakers. programming models, scalable results

– http://www.ncsa.illinois.edu/Conferences/FPGA16/agenda.html

Page 20: Cultivating a community about FPGA-HPC platformsicl.utk.edu/jlesc9/files/PTA3.2/jlesc9_yoshii.pdf · HPC-FPGA community Historically, FPGA electric design automation (EDA) communities

20

Past workshops and conference events (2)

● 2017 Nov : SC17 bird-of-feather session, "Reconfigurable Computing in Exascale", Denver– Highlights: 8 speakers. CERN, Maxeller, Micron, Intel, Cray, BSC,

universities. not overlapped with our previous workshops

– https://sc17.supercomputing.org/SC17%20Archive/bof/bof_pages/bof148.html

● 2018 Mar: 3rd International Workshop on FPGA for HPC (IWFH), Tokyo– Highlights: 8 speakers. Microsoft, national labs, universities.

Successful large scale FPGA cluster, programming models, tightly-coupled FPGA clusters, runtime system

– https://www.ccs.tsukuba.ac.jp/hpc-iwfh/

Page 21: Cultivating a community about FPGA-HPC platformsicl.utk.edu/jlesc9/files/PTA3.2/jlesc9_yoshii.pdf · HPC-FPGA community Historically, FPGA electric design automation (EDA) communities

21

Past workshops and conference events (3)

● 2018 Nov: SC18 bird-of-feather session "Benchmarking Scientific Reconfigurable/FPGA Computing", Dallas– Highlights: 8 speakers. focuses on benchmarking

– https://sc18.supercomputing.org/proceedings/bof/bof_pages/bof190.html

● 2018 Nov: SC18 panel session "Reconfigurable computing for HPC: Will it make it this time?", Dallas– Highlights: 7 speakers. Intel, Xilinx, national labs, univerities.

– https://sc18.supercomputing.org/presentation/?id=pan112&sess=sess299

Page 22: Cultivating a community about FPGA-HPC platformsicl.utk.edu/jlesc9/files/PTA3.2/jlesc9_yoshii.pdf · HPC-FPGA community Historically, FPGA electric design automation (EDA) communities

22

Past workshops and conference events (4)

● 2018 Dec: FTP18 workshop "Workshop on Integrating HPC and FPGAs", Okinawa, Japan– Highlights: 4 speakers. RIKEN, INRIA, BSC, Boston University.

programming paradigms (dataflow, task), runtime, FPGA cluster

– https://collab.cels.anl.gov/display/HPCFPGA/HPC-FPGA

● 2018 Dec: FTP18 workshop "Workshop on Reconfigurable High-Performance Computing", Okinawa, Japan– Highlights: 9 speakers. people from the FPGA community.

abstraction, virtualization, runtime, dataflow

– https://collab.cels.anl.gov/display/RECONFHPC/RECONF-HPC

Page 23: Cultivating a community about FPGA-HPC platformsicl.utk.edu/jlesc9/files/PTA3.2/jlesc9_yoshii.pdf · HPC-FPGA community Historically, FPGA electric design automation (EDA) communities

23

Next workshop

● 9th JLESC workshop helped initiating the conversation!

● Planning to submit a workshop proposal to FPL2019– Barcelona, Spain

– Sep 12 or 13 (tentative)

● Format– Full day. one or two keynotes. short talks. panel. possibly call for extended abstracts

● Possibly topics– low-level runtime and operating systems

– high-level front-ends

– programming models● general or domain specific

– virtualization, coarse-grained reconfigurable architecture

– heterogeneous cluster● not only FPGA, but also other accelerators like GPUs, AI

● efficient data movement

– common playground/platform for collaborative works