seamless prediction at the edge using tensorflow on fpgas

©2018 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications are subject

to change without notice. All information is provided on an “AS IS” basis without warranties of any kind.

Statements regarding products, including regarding their features, availability, functionality, or

compatibility, are provided for informational purposes only and do not modify the warranty, if any,

applicable to any product. Drawings may not be to scale. Micron, the Micron logo, and all other Micron

trademarks are the property of Micron Technology, Inc. All other trademarks are the property of their

respective owners.

Seamless Prediction at the Edge Using TensorFlow on FPGAs

©2018 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications are subject

to change without notice. All information is provided on an “AS IS” basis without warranties of any kind.

Statements regarding products, including regarding their features, availability, functionality, or

compatibility, are provided for informational purposes only and do not modify the warranty, if any,

applicable to any product. Drawings may not be to scale. Micron, the Micron logo, and all other Micron

trademarks are the property of Micron Technology, Inc. All other trademarks are the property of their

respective owners.

Brad Spiers, Principal Solutions Architect

Linley Spring Processor Conference: April 12, 2018

Prediction.. At the Edge▪ Limited Weight, Space and Power

▪ Very Limited External Bandwidth

▪ Cannot Move Data Must Compute Locally

▪ FPGAs Have Speed, Efficiency & Memory Capability

▪ Now Program FPGAs – with No Code Change!

Micron Confidential2

What are Field Programmable Gate Arrays (FPGAs)?

3

▪ Unlike a CPU, no Pre-Defined Instructions

▪ Can be Dynamically Reprogrammed

▪ Massive Inherent Parallelism

ALU

ALU

ALU

ALU

Control

Cache

CPU

GPU

FPGA

Current Customer Challenges

4

▪ Person and Face Recognition

▪ Body Pose Recognition

▪ Fingerprint Recognition

▪ Voice and Speaker Identification

▪ Object Categorization

▪ Time-Series Pattern Recognition (LSTM-based RNN’s)

FWDNXT Performance on FPGAs

5

From Just 24 Watts to Handle Power Constraints on “The Edge”

FWDNXT’s Approach

6

▪ Speed up Traces, not Layers

▪ Key Idea: Hide non-essential Work Behind Long Traces

▪ Traces Stretch

Across

Network Layers

▪With Long Traces, Bandwidth Becomes Key

FWDNXT Has a Hierarchical Architecture

7

▪Hierarchical Memory Design Achieves Efficiency

▪Hidden, Long Memory Fetches Fill Buffers

▪ Full Buffers Feed Compute Units

Micron Hybrid Memory Cube

April 6, 20188

Low-Power Bandwidth to Feed Long Traces

8.5x more bandwidth than DDR4

70% lessenergyper bit

How?▪ Stacked DRAM

▪ Multiple “banks” per layer

▪ “Light up” smaller bank less energy

Problem: How to Program FPGAs?

9

▪ Programming has Been a Barrier in the Past− Verilog, HDL --> Months to Deploy

▪ FWDNXT’s Snowflake Compiler & Micron FPGA Modules: ML for IoT

Your Network

Your

Framework

Network

DescriptionSnowflake

Compiler

Micron FPGA

Module

Machine Learning

At the Edge

What Model Types Can FWDNXT Handle?

10

▪ Any Model− CNN

− RNN

− LSTM

− …

▪ Any Framework− PYTORCH

− Caffe

− TensorFlow

− …

FWDNXT Representations

11

▪Now, 16 bit Fixed Point Used for Inputs

▪ Fixed Point: 5 bit integer, 11-bit fraction

▪Moving to 16 bit Floating Point

▪Now, 32-bit Fixed Point Used for Multiplication Output and Add’s

Fixed Point Representation

Steps to Deploy Models on FPGAs

12

1. Define Model in PYTORCH, Caffeor Tensorflow

2. Train Model with Data on GPUs

3. Input Framework-Trained Model into SnowFlake Compiler

4. Deploy Snowflake Output Directly onto Micron FPGA Module

NO CODE CHANGE

Hybrid Memory

Cube

Up to 512GB

DDR Footprints

Advanced

FPGAs▪ Xilinx UltraScale +

▪ Intel Stratix 10

What New Problems Can We Solve?


▪ Some Domains Have Problems that Require Larger Memory Footprints− Medical Imaging

− Oil Exploration

− Videos

− Government

▪ Need both High-Bandwidth and High-Capacity Memory

▪ Micron FPGA Cards Plus FWDNXT Snowflake Compiler Provide Missing Links

Summary


▪ The Edge Poses Challenges in Power and Bandwidth

▪ FPGAs Can Help, but Programming Was a Challenge—Until Now

▪ Memory Bandwidth now Key to Machine Learning Performance

▪ Plus, Solve Larger Problems on Boards with up to 512GB of Memory

www.micron.com/tensorflow

http://www.micron.com/tensorflow

seamless prediction at the edge using tensorflow on fpgas

Documents