nvidia's deep learning accelerator meets …...•ai smart camera •edge inference (ai)...

20
Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive) NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM

Upload: others

Post on 09-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NVIDIA'S DEEP LEARNING ACCELERATOR MEETS …...•AI Smart Camera •Edge Inference (AI) •Wearables High performance, 64-bit multi-core platform: •Storage system controllers (SSD)

Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive)

NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM

Page 2: NVIDIA'S DEEP LEARNING ACCELERATOR MEETS …...•AI Smart Camera •Edge Inference (AI) •Wearables High performance, 64-bit multi-core platform: •Storage system controllers (SSD)

2 ©2018 NVIDIA CORPORATION ©2018 NVIDIA CORPORATION

NVDLA — NVIDIA DEEP LEARNING ACCELERATOR

IP Core for deep learning – part of NVIDIA’s Xavier SOC

Optimized for Convolutional Neural Networks (CNNs), computer vision

Targeted towards edge devices, IoT

Industry standard formats and parameterized

Why open source NVDLA

Encourage Deep Learning applications

Invite contributions from the community

Page 3: NVIDIA'S DEEP LEARNING ACCELERATOR MEETS …...•AI Smart Camera •Edge Inference (AI) •Wearables High performance, 64-bit multi-core platform: •Storage system controllers (SSD)

3 ©2018 NVIDIA CORPORATION ©2018 NVIDIA CORPORATION

NVDLA – TOP LEVEL ARCHITECTUE

SM SM SM SM

SDRAM On-Chip RAM

Configuration and control block

Post-processing

Memory interface

Convolution core (MAC array)

Control Bus

Convolutional Buffer

Page 4: NVIDIA'S DEEP LEARNING ACCELERATOR MEETS …...•AI Smart Camera •Edge Inference (AI) •Wearables High performance, 64-bit multi-core platform: •Storage system controllers (SSD)

4 ©2018 NVIDIA CORPORATION

AREA, PERFORMANCE, AND POWER

INT8 data path

1 RAM interface

No advanced features

SMALL CONFIGURATION

INT8, INT16, FP16 data path

2 RAM interfaces

Weight compression

LARGE CONFIGURATION

Config: 512 MACs, 256KB buffer

Area: 1.4mm2 in 16nm

Perf: 93 fps ResNet50

Power: 107mW

EXAMPLE Config: 1024 8b + 512 16b MACs, 256KB

Area: 2.4mm2 in 16nm (excl. TCM)

Perf: 230 INT8/115 FP16 fps ResNet50

Power: 348 / 475 mW for INT8 /FP16

EXAMPLE

Page 5: NVIDIA'S DEEP LEARNING ACCELERATOR MEETS …...•AI Smart Camera •Edge Inference (AI) •Wearables High performance, 64-bit multi-core platform: •Storage system controllers (SSD)

5 ©2018 NVIDIA CORPORATION ©2018 NVIDIA CORPORATION

SW ARCHITECTURE

Compile time

Run time

parser compiler

Caffe model compiler

params

loadable

Application

User Mode

Driver

Kernel Mode

Driver

DLA hardware

Page 6: NVIDIA'S DEEP LEARNING ACCELERATOR MEETS …...•AI Smart Camera •Edge Inference (AI) •Wearables High performance, 64-bit multi-core platform: •Storage system controllers (SSD)

6 ©2018 NVIDIA CORPORATION

VIDEO FILE

Inserting video: Insert/Video/Video from File. Insert video by browsing your directory and selecting OK.

File types that works best in PowerPoint are mp4 or wmv

Page 7: NVIDIA'S DEEP LEARNING ACCELERATOR MEETS …...•AI Smart Camera •Edge Inference (AI) •Wearables High performance, 64-bit multi-core platform: •Storage system controllers (SSD)

COPYRIGHT 2018 SIFIVE. ALL RIGHTS RESERVED. 7 7 7

There are lots of custom chips you can build with NVDLA

+ + +

Page 8: NVIDIA'S DEEP LEARNING ACCELERATOR MEETS …...•AI Smart Camera •Edge Inference (AI) •Wearables High performance, 64-bit multi-core platform: •Storage system controllers (SSD)

COPYRIGHT 2018 SIFIVE. ALL RIGHTS RESERVED.

SiFive Chip Designer

Package/Test

Design

IP

EDA

Infrastructure

Fab

Come to my Keynote talk “Opportunities and Challenges of

Building Silicon in the Cloud” tomorrow morning at 9:20am!

Page 9: NVIDIA'S DEEP LEARNING ACCELERATOR MEETS …...•AI Smart Camera •Edge Inference (AI) •Wearables High performance, 64-bit multi-core platform: •Storage system controllers (SSD)

COPYRIGHT 2018 SIFIVE. ALL RIGHTS RESERVED. 9 9 9

• SiFive Tapes Out Multiple Base Platforms

– Demonstrates silicon capability of each platform

– Enables RISC-V software development

– Reduces risk for customer

– Proves our and matures Design flow for each platform

• Customization Capabilities

– Add/remove DesignShare and SiFive IP

– Customization of SiFive CPU IP

– Customer can add own IP into Platform (accelerators, co-processors, other IP)

• From Prototype to Production

– SiFive handles all logistics, incl. fab, package, test

– SiFive scales to production

– Final delivery is packaged, tested, qualified Silicon

SiFive Freedom Chip Platforms

SiFive Freedom Chip Platforms

Low-cost, 32-bit microcontroller platform: • Edge Computing • Industrial IOT • AI Smart Camera • Edge Inference (AI) • Wearables

High performance, 64-bit multi-core platform: • Storage system

controllers (SSD) • Datacenter

Accelerators • Linux applications • Networking /

Baseband

Next talk in the same room will introduce brand new

Freedom Revolution Chip Platform with

HBM2 and 56-112Gb/s SerDes

Page 10: NVIDIA'S DEEP LEARNING ACCELERATOR MEETS …...•AI Smart Camera •Edge Inference (AI) •Wearables High performance, 64-bit multi-core platform: •Storage system controllers (SSD)

COPYRIGHT 2018 SIFIVE. ALL RIGHTS RESERVED. 10 10 10

Freedom Unleashed 64-bit Multi-Core RISC-V Linux Platform

• 1.5+ GHz U54-MC SiFive CPU • 1x E51: 16KB L1I$, 8KB DTIM with ECC support • 4x U54: 32KB L1I$, 32KB L1D$ with ECC support • Single- and Double-precision floating-point support • 2MB Banked L2$ with directory-based cache-coherence & ECC support

• ChipLink • Serialized Chip-to-Chip Coherent TileLink Interconnect

• DDR3/4, GbE, Peripherals Freedom U540, FCBGA, manufactured in TSMC 28nm

Page 11: NVIDIA'S DEEP LEARNING ACCELERATOR MEETS …...•AI Smart Camera •Edge Inference (AI) •Wearables High performance, 64-bit multi-core platform: •Storage system controllers (SSD)

COPYRIGHT 2018 SIFIVE. ALL RIGHTS RESERVED. 11 11 11

HiFive Unleashed: World’s First Multi-Core RISC-V Linux Dev Board

• SiFive FU540-C000 (built in 28nm)

• 8 GB 64-bit DDR4 with ECC

• Gigabit Ethernet Port

• 32 MB Quad SPI Flash

• MicroSD card for removable storage

• MicroUSB for debug and serial communication

• Digital GPIO pins

• FMC connector for future expansion with add-in cards

Page 12: NVIDIA'S DEEP LEARNING ACCELERATOR MEETS …...•AI Smart Camera •Edge Inference (AI) •Wearables High performance, 64-bit multi-core platform: •Storage system controllers (SSD)

COPYRIGHT 2018 SIFIVE. ALL RIGHTS RESERVED. 12 12 12

HiFive Unleashed with Microsemi PolarFire Expansion Board

Page 13: NVIDIA'S DEEP LEARNING ACCELERATOR MEETS …...•AI Smart Camera •Edge Inference (AI) •Wearables High performance, 64-bit multi-core platform: •Storage system controllers (SSD)

COPYRIGHT 2018 SIFIVE. ALL RIGHTS RESERVED. 13 13 13

HiFive Unleashed with Xilinx VCU118 Evaluation Kit

Page 14: NVIDIA'S DEEP LEARNING ACCELERATOR MEETS …...•AI Smart Camera •Edge Inference (AI) •Wearables High performance, 64-bit multi-core platform: •Storage system controllers (SSD)

COPYRIGHT 2018 SIFIVE. ALL RIGHTS RESERVED. 14 14 14

https://wiki.debian.org/RISC-V

0% March

65% April

Freedom Development Kit Comes with Linux BSP Based on Debian or

Fedora

Page 15: NVIDIA'S DEEP LEARNING ACCELERATOR MEETS …...•AI Smart Camera •Edge Inference (AI) •Wearables High performance, 64-bit multi-core platform: •Storage system controllers (SSD)

COPYRIGHT 2018 SIFIVE. ALL RIGHTS RESERVED. 15

Page 16: NVIDIA'S DEEP LEARNING ACCELERATOR MEETS …...•AI Smart Camera •Edge Inference (AI) •Wearables High performance, 64-bit multi-core platform: •Storage system controllers (SSD)

COPYRIGHT 2018 SIFIVE. ALL RIGHTS RESERVED. 16

Page 17: NVIDIA'S DEEP LEARNING ACCELERATOR MEETS …...•AI Smart Camera •Edge Inference (AI) •Wearables High performance, 64-bit multi-core platform: •Storage system controllers (SSD)

COPYRIGHT 2018 SIFIVE. ALL RIGHTS RESERVED. 17 17 17

DEMO Setup: HiFive Unleashed + NVDLA

FPGA NVDLA

Mem IF

DRAM

FPGA

I/O

inte

rfaces

RISC-V CPU

Mem IF

• NVDLA small config

– 2048 MACs, 512 KB

• NVDLA mapped onto Xilinx VU118 Evaluation Kit

• NVDLA running open-source YOLOv3 object recognition

• Linux OS running on HiFive Unleashed

– Easy to port over umd/kmd from ARM

• Demo setup built with OpenCV thanks to Debian

HiFive Unleashed

Page 18: NVIDIA'S DEEP LEARNING ACCELERATOR MEETS …...•AI Smart Camera •Edge Inference (AI) •Wearables High performance, 64-bit multi-core platform: •Storage system controllers (SSD)

COPYRIGHT 2018 SIFIVE. ALL RIGHTS RESERVED. 18

Check out HiFive Unleashed + NVDLA demo at SiFive Booth!

Page 19: NVIDIA'S DEEP LEARNING ACCELERATOR MEETS …...•AI Smart Camera •Edge Inference (AI) •Wearables High performance, 64-bit multi-core platform: •Storage system controllers (SSD)

COPYRIGHT 2018 SIFIVE. ALL RIGHTS RESERVED. 19 19 19

• Open-source IP cores further lower the bar to implement RISC-V-based products

• Freedom chip platform offers a complete template SoC with software support

• Freedom Unleashed + NVDLA is a great starting point for smart IoT SoCs and devices

• Everything is open-sourced, so check it out and contribute yourself!

– NVDLA

• https://github.com/nvdla/hw

• https://github.com/nvdla/sw

• http://nvdla.org

– Freedom Platform

• https://github.com/sifive/freedom

• https://github.com/sifive/nvidia-dla-blocks

• Once you’re ready, please come talk to us for your RISC-V AI chip needs!

Customize your Freedom Chip with NVDLA Today!

Page 20: NVIDIA'S DEEP LEARNING ACCELERATOR MEETS …...•AI Smart Camera •Edge Inference (AI) •Wearables High performance, 64-bit multi-core platform: •Storage system controllers (SSD)

COPYRIGHT 2018 SIFIVE. ALL RIGHTS RESERVED.

End