seamless prediction at the edge using tensorflow on fpgas
TRANSCRIPT
©2018 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications are subject
to change without notice. All information is provided on an “AS IS” basis without warranties of any kind.
Statements regarding products, including regarding their features, availability, functionality, or
compatibility, are provided for informational purposes only and do not modify the warranty, if any,
applicable to any product. Drawings may not be to scale. Micron, the Micron logo, and all other Micron
trademarks are the property of Micron Technology, Inc. All other trademarks are the property of their
respective owners.
Seamless Prediction at the Edge Using TensorFlow on FPGAs
©2018 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications are subject
to change without notice. All information is provided on an “AS IS” basis without warranties of any kind.
Statements regarding products, including regarding their features, availability, functionality, or
compatibility, are provided for informational purposes only and do not modify the warranty, if any,
applicable to any product. Drawings may not be to scale. Micron, the Micron logo, and all other Micron
trademarks are the property of Micron Technology, Inc. All other trademarks are the property of their
respective owners.
Brad Spiers, Principal Solutions Architect
Linley Spring Processor Conference: April 12, 2018
Prediction.. At the Edge▪ Limited Weight, Space and Power
▪ Very Limited External Bandwidth
▪ Cannot Move Data Must Compute Locally
▪ FPGAs Have Speed, Efficiency & Memory Capability
▪ Now Program FPGAs – with No Code Change!
Micron Confidential2
What are Field Programmable Gate Arrays (FPGAs)?
3
▪ Unlike a CPU, no Pre-Defined Instructions
▪ Can be Dynamically Reprogrammed
▪ Massive Inherent Parallelism
ALU
ALU
ALU
ALU
Control
Cache
CPU
GPU
FPGA
Current Customer Challenges
4
▪ Person and Face Recognition
▪ Body Pose Recognition
▪ Fingerprint Recognition
▪ Voice and Speaker Identification
▪ Object Categorization
▪ Time-Series Pattern Recognition (LSTM-based RNN’s)
FWDNXT Performance on FPGAs
5
From Just 24 Watts to Handle Power Constraints on “The Edge”
FWDNXT’s Approach
6
▪ Speed up Traces, not Layers
▪ Key Idea: Hide non-essential Work Behind Long Traces
▪ Traces Stretch
Across
Network Layers
▪With Long Traces, Bandwidth Becomes Key
FWDNXT Has a Hierarchical Architecture
7
▪Hierarchical Memory Design Achieves Efficiency
▪Hidden, Long Memory Fetches Fill Buffers
▪ Full Buffers Feed Compute Units
Micron Hybrid Memory Cube
April 6, 20188
Low-Power Bandwidth to Feed Long Traces
8.5x more bandwidth than DDR4
70% lessenergyper bit
How?▪ Stacked DRAM
▪ Multiple “banks” per layer
▪ “Light up” smaller bank less energy
Problem: How to Program FPGAs?
9
▪ Programming has Been a Barrier in the Past− Verilog, HDL --> Months to Deploy
▪ FWDNXT’s Snowflake Compiler & Micron FPGA Modules: ML for IoT
Your Network
Your
Framework
Network
DescriptionSnowflake
Compiler
Micron FPGA
Module
Machine Learning
At the Edge
What Model Types Can FWDNXT Handle?
10
▪ Any Model− CNN
− RNN
− LSTM
− …
▪ Any Framework− PYTORCH
− Caffe
− TensorFlow
− …
FWDNXT Representations
11
▪Now, 16 bit Fixed Point Used for Inputs
▪ Fixed Point: 5 bit integer, 11-bit fraction
▪Moving to 16 bit Floating Point
▪Now, 32-bit Fixed Point Used for Multiplication Output and Add’s
Fixed Point Representation
Steps to Deploy Models on FPGAs
12
1. Define Model in PYTORCH, Caffeor Tensorflow
2. Train Model with Data on GPUs
3. Input Framework-Trained Model into SnowFlake Compiler
4. Deploy Snowflake Output Directly onto Micron FPGA Module
NO CODE CHANGE
Hybrid Memory
Cube
Up to 512GB
DDR Footprints
Advanced
FPGAs▪ Xilinx UltraScale +
▪ Intel Stratix 10
What New Problems Can We Solve?
Micron Confidential13
▪ Some Domains Have Problems that Require Larger Memory Footprints− Medical Imaging
− Oil Exploration
− Videos
− Government
▪ Need both High-Bandwidth and High-Capacity Memory
▪ Micron FPGA Cards Plus FWDNXT Snowflake Compiler Provide Missing Links
Summary
Micron Confidential14
▪ The Edge Poses Challenges in Power and Bandwidth
▪ FPGAs Can Help, but Programming Was a Challenge—Until Now
▪ Memory Bandwidth now Key to Machine Learning Performance
▪ Plus, Solve Larger Problems on Boards with up to 512GB of Memory
www.micron.com/tensorflow
Micron Confidential15