enabling rapid design space exploration and prototyping of ... · 2/11/2019  · maeri tutorial @...

12
Enabling Rapid Design Space Exploration and Prototyping of DNN Accelerators Tushar Krishna Georgia Tech http://synergy.ece.gatech.edu http ://synergy.ece.gatech.edu/tools/maeri/maeri-tutorial-hpca-2019 Tutorial @ HPCA 2019 Feb 16 2019

Upload: others

Post on 05-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Enabling Rapid Design Space Exploration and Prototyping of ... · 2/11/2019  · MAERI Tutorial @ HPCA 2019 Tushar Krishna | Georgia Institute of Technology February 16, 2019 11 •MAESTRO

Enabling Rapid Design Space Exploration and Prototyping of DNN Accelerators

Tushar KrishnaGeorgia Tech

http://synergy.ece.gatech.edu

http://synergy.ece.gatech.edu/tools/maeri/maeri-tutorial-hpca-2019

Tutorial @ HPCA 2019Feb 16 2019

Page 2: Enabling Rapid Design Space Exploration and Prototyping of ... · 2/11/2019  · MAERI Tutorial @ HPCA 2019 Tushar Krishna | Georgia Institute of Technology February 16, 2019 11 •MAESTRO

Deep Learning Landscape

February 16, 2019MAERI Tutorial @ HPCA 2019 Tushar Krishna | Georgia Institute of Technology

3

Design Tools

MLSL

This Tutorial

Model Creation

On-c

hip

Buffe

r 168 PE Array

ShiDIanNao

Eyeriss

NVDLA

ARM Trillum

Apple Neural Engine

CambriconX

Training Inference

TensorRT

Page 3: Enabling Rapid Design Space Exploration and Prototyping of ... · 2/11/2019  · MAERI Tutorial @ HPCA 2019 Tushar Krishna | Georgia Institute of Technology February 16, 2019 11 •MAESTRO

Spatial (or Dataflow) Accelerators• Millions of Parameters (i.e., weights)

• Billions of computations

• Heavy data movement

Spread computations across hundreds of ALUs

Reuse data within the array via local memories

and direct communicationExamples: MIT Eyeriss, Google TPU, …

Memory Hierarchy

ALU ALU ALU ALU

ALU ALU ALU ALU

ALU ALU ALU ALU

ALU ALU ALU ALU Mem

ory Hierarchy

Control

Register/FIFO/SRAM

*

*Y. Chen et. al., “Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks,” ISCA, 2016.

Processing Element (PE)

February 16, 2019MAERI Tutorial @ HPCA 2019 Tushar Krishna | Georgia Institute of Technology

4

Page 4: Enabling Rapid Design Space Exploration and Prototyping of ... · 2/11/2019  · MAERI Tutorial @ HPCA 2019 Tushar Krishna | Georgia Institute of Technology February 16, 2019 11 •MAESTRO

Two Key HW Design Challenges

February 16, 2019MAERI Tutorial @ HPCA 2019 Tushar Krishna | Georgia Institute of Technology

5

• How do we map billions of computations over limitedcompute and memory resources (aka Dataflow)?

• How do we design the accelerator to efficiently map arbitrary layer types and dataflows?

Page 5: Enabling Rapid Design Space Exploration and Prototyping of ... · 2/11/2019  · MAERI Tutorial @ HPCA 2019 Tushar Krishna | Georgia Institute of Technology February 16, 2019 11 •MAESTRO

MAESTRO: Analytical cost model for DNN dataflows

February 16, 2019MAERI Tutorial @ HPCA 2019 Tushar Krishna | Georgia Institute of Technology

6

Buffer SizeConnectivityNoC Bandwidth…

PhysicalResourcesDescription

SharedBuffer NoC

PEPE

PE

IFMapWeightPSumPSum

/OFMap…

Accelerator Architecture

Layer VGG1Spatial[5] K…Unroll R

Dataflow Description

Layer VGG1K = 64;…S = 3;

DNN LayerDescription

CONV

IN

POOL

CONV

Neural Network Structure

… FC

FC

OUT

∑ ∑ …∑ W * I ∑ ∑ …∑ W * I

∑ ∑ …∑ W * I ∑( ∑ …∑ W * I+ ∑ …∑ W * I

+ ∑ …∑ W * I )…

Loop Ordering

Loop Tiling + Tile Mapping

PE0

PE1

PE N

DataflowMAESTRO Inputs

MAESTRO

Infrastructure and Target DNN MAESTRO Outputs

NLR WS Shi DLA RS0

0.5

1

1.5

2

NLR WS Shi DLA RS0

2

4

6

8

10

NLR WS Shi DLA RS0

1

2

3

4

B

C

D

Ba

nd

wid

th R

eq

uir

em

en

t (G

bp

s)

L1

Me

mo

ry R

eq

uir

em

en

t (K

B)

Th

ro

ug

hp

ut

(GFLO

PS

)

0

1.0

1.5

2.0

L1 M

emor

y Re

quire

men

t (KB

)

Peak

Ban

dwid

th

Requ

irem

ent (

Gbp

s)

0

2468

10

0

2

Thro

ughp

ut(G

FLO

PS)

34

NLR WS DLAShi RSDataflow Style

NLR WS DLAShi RSDataflow Style

NLR WS DLAShi RSDataflow Style

NLR WS Shi DLA RS0

0.5

1

NLR WS Shi DLA RS0

2

4

6

8

10

NLR WS Shi DLA RS0

5

10

15

20

B

C

D

Ba

nd

wid

th R

eq

uir

em

en

t (G

bp

s)

L1

Me

mo

ry R

eq

uir

em

en

t (K

B)

Th

ro

ug

hp

ut

(GFLO

PS

)

0

1

0.5

Peak

Ban

dwid

th

Requ

irem

ent (

Gbp

s)

L1 M

emor

y Re

quire

men

t (KB

)

0

2468

10

10

Thro

ughp

ut(G

FLO

PS)

0

15

20

NLR WS DLAShi RS NLR WS DLAShi RS NLR WS DLAShi RSDataflow Style Dataflow Style Dataflow StyleVG

G16

- CO

NV1

VGG

16- C

ON

V11

NoC Bandwidth

L1 MemoryRequirement

Roofline Throughput

1

5

0.5

NLR WS Shi

120

80

40

0RSDLA

0NLR WS Shi DLA

80

40

RS

MACL1 Read L1 WriteL2 Read L2 Write

Energy Analysis

https://arxiv.org/abs/1805.02566

Page 6: Enabling Rapid Design Space Exploration and Prototyping of ... · 2/11/2019  · MAERI Tutorial @ HPCA 2019 Tushar Krishna | Georgia Institute of Technology February 16, 2019 11 •MAESTRO

Schedule: Morning

February 16, 2019MAERI Tutorial @ HPCA 2019 Tushar Krishna | Georgia Institute of Technology

7

Time Topic Presenter

8:30 to 9:00 Introduction and Background Tushar

9:00 to 10:00 A primer on DNN Dataflows Michael

10:00 – 10:30 MAESTRO Data Directives Prasanth

10:30 – 10:50 Coffee Break

10:50- 11:10 MAESTRO Data Directives [contd] Prasanth

11:10 – 11:45 MAESTRO Analytical Model Hyoukjun

11:45 – 12:30 MAESTRO Hands-on Exercises Hyoukjun & Prasanth

12:30 – 2:00 Lunch

Page 7: Enabling Rapid Design Space Exploration and Prototyping of ... · 2/11/2019  · MAERI Tutorial @ HPCA 2019 Tushar Krishna | Georgia Institute of Technology February 16, 2019 11 •MAESTRO

MAERI –DNN Accelerator for Flexible Dataflows

February 16, 2019MAERI Tutorial @ HPCA 2019 Tushar Krishna | Georgia Institute of Technology

8

Deep Neural NetworkNeu

rons

Verilog

Dataflow Configs

Cycle-Accurate

Sims

MAERI Mapper

Find Optimal Dataflow

MAERI RTL

X X X XX X X X

+++

++

+ +

X X XX X X X

+++

++

+ +

+

X

… …

To/From DRAMWeight, Input, Output SRAM

X X X XX X X X

+++

++

+ +VN0

X X XX X X X

+++

++

+ +VN1

+

VN2

Weights/Inputs Weights/Inputs

Output Activation Output Activation

Output Activation

1Virtual Neurons

X

2

34567

Kwon et al., ASPLOS 2018, Zhao et al, ISPASS 2019

Page 8: Enabling Rapid Design Space Exploration and Prototyping of ... · 2/11/2019  · MAERI Tutorial @ HPCA 2019 Tushar Krishna | Georgia Institute of Technology February 16, 2019 11 •MAESTRO

Schedule: Afternoon

February 16, 2019MAERI Tutorial @ HPCA 2019 Tushar Krishna | Georgia Institute of Technology

9

Time Topic Presenter

2:00 - 2:20 Overview of MAERI Tushar

2:20 – 3:00 MAERI Mapper Zhongyuan

3:00 – 3:20 MAERI RTL Hyoukjun

3:20 – 3:40 MAERI Demo Hyoukjun

3:40 – 4:00 Coffee Break4:00 – 4:30 Hands-on Exercises Hyoukjun & Zhongyuan

4:30 – 5:00 Extensions Michael

5:00 – 5:10 Wrap-Up Tushar

Page 9: Enabling Rapid Design Space Exploration and Prototyping of ... · 2/11/2019  · MAERI Tutorial @ HPCA 2019 Tushar Krishna | Georgia Institute of Technology February 16, 2019 11 •MAESTRO

Tool Release and Resources

February 16, 2019MAERI Tutorial @ HPCA 2019 Tushar Krishna | Georgia Institute of Technology

10

• Slides and Video will be posted on the tutorial page• http://synergy.ece.gatech.edu/tools/maeri/maeri-tutorial-hpca2019/

• All codebases will be added to github by tomorrow evening• Link will be added on tutorial website

• Feedback• Please add your name to the sign-up if you have not

• For statistics

• We will send out a feedback form

• This is all work in progress• Please reach out to us if you find a bug

• Better still – fix it and contribute back on github!

Page 10: Enabling Rapid Design Space Exploration and Prototyping of ... · 2/11/2019  · MAERI Tutorial @ HPCA 2019 Tushar Krishna | Georgia Institute of Technology February 16, 2019 11 •MAESTRO

Future Extensions

February 16, 2019MAERI Tutorial @ HPCA 2019 Tushar Krishna | Georgia Institute of Technology

11

•MAESTRO• validation• support for sparsity• support for other layer-types

•MAERI• Testbenches for other layer-types and networks• Mapper to Testbench auto-generator• Code-optimization for FPGAs

Page 11: Enabling Rapid Design Space Exploration and Prototyping of ... · 2/11/2019  · MAERI Tutorial @ HPCA 2019 Tushar Krishna | Georgia Institute of Technology February 16, 2019 11 •MAESTRO

Presenters

February 16, 2019MAERI Tutorial @ HPCA 2019 Tushar Krishna | Georgia Institute of Technology

12

Michael PellauerSr. Research Scientist,NVIDIA

Intel VSSAD (2010-2015)

PhD (MIT) in 2010

[email protected]

Tushar KrishnaAssistant Professor, School of ECE,Georgia Tech

PhD (MIT) in 2014

[email protected]

Hyoukjun KwonPhD CandidateSchool of CS,Georgia Tech

[email protected]

Prasanth ChatarasiPhD CandidateSchool of CS,Georgia Tech

[email protected]

Zhongyuan ZhaoPhD CandidateSchool of CS,Shanghai JiaotongUniversity

[email protected]

Page 12: Enabling Rapid Design Space Exploration and Prototyping of ... · 2/11/2019  · MAERI Tutorial @ HPCA 2019 Tushar Krishna | Georgia Institute of Technology February 16, 2019 11 •MAESTRO

Enabling Rapid Design Space Exploration and Prototyping of DNN Accelerators

http://synergy.ece.gatech.edu

http://synergy.ece.gatech.edu/tools/maeri/maeri-tutorial-hpca-2019

Tutorial @ HPCA 2019Feb 16 2019