introduction to machine learning on fpgas...fpga ml workflow 21/11/2019 challenge: efficient mapping...
TRANSCRIPT
![Page 1: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/1.jpg)
Introduction to machine learning on FPGAs
Arthur Ruder ¦ Enclustra GmbH ¦ AI seminar EPFL Lausanne & ZHAW Winterthur ¦ 19 & 21/11/2019
![Page 2: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/2.jpg)
Quick reminder: neural network
21/11/2019
input layer:
e.g. pixelshidden layer 1
output layer:
e.g. probability
hidden layer 2
𝑤1
𝑤2
𝑤3
𝑥1
𝑥2
𝑥3
𝑎
𝑎𝑎
2
![Page 3: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/3.jpg)
21/11/20193
forward-propagation
Inputs: training set
• Goal: obtain trained weights
untrained network
back-propagation
Machine learning concepts: training phase
But: label says
100 % dog
Outputs: classification
probability
40 % dog,
60 % cat
![Page 4: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/4.jpg)
21/11/20194
forward-propagation
Inputs: e.g. photographsOutputs: classification
probability
99.07 % dog
0.93 % cat
trained network
Machine learning concepts: inference
![Page 5: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/5.jpg)
AlexNet VGG GoogleNet ResNet
2010 2011 2012 2013 2014 2014 2015
class
ific
ati
on
err
or
[%]
30
25
20
15
10
5
0
Quick reminder: Deep Learning
21/11/2019
Human error
shallow8 layers
19 layers
22 layers
152 layers
Image recognition challenge winner
5
![Page 6: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/6.jpg)
Hardware platform
21/11/20196
What hardware do we need for this?
CPUs, GPUs, FPGAs, ASICs??
![Page 7: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/7.jpg)
21/11/20197
What hardware do we need for this?
CPUs, GPUs, FPGAs, ASICs??
• What are the requirements for…?
Hardware platform
![Page 8: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/8.jpg)
21/11/2019
What hardware do we need for this?
CPUs, GPUs, FPGAs, ASICs??
• What are the requirements for…?
a) training
b) inference
8
Hardware platform
![Page 9: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/9.jpg)
21/11/2019
What hardware do we need for this?
CPUs, GPUs, FPGAs, ASICs??
• What are the requirements for…?
a) training
b) inference
• What type of hardware is best suited for each task?
9
Hardware platform
![Page 10: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/10.jpg)
Neural network training: computational complexity
21/11/2019
forward-propagation
back-propagation
Untrained neural network
ResNet50Result:
50 % cat
50 % dog
Label:
100% dog
For one picture: image classification
Labelled data
10
![Page 11: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/11.jpg)
21/11/2019
forward-propagation
back-propagation
Untrained neural network
ResNet50
For one picture: image classification
7.7 billion operations
~35 MB parameter storage
Labelled data
11
Neural network training: computational complexity
Result:
50 % cat
50 % dog
Label:
100% dog
![Page 12: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/12.jpg)
21/11/2019
23 billion operations
~380 MB parameter storage
forward-propagation
back-propagation
Untrained neural network
ResNet50
For one picture: image classification
7.7 billion operations
~35 MB parameter storage
Labelled data
12
Neural network training: computational complexity
Result:
50 % cat
50 % dog
Label:
100% dog
![Page 13: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/13.jpg)
21/11/2019
23 billion operations
~380 MB parameter storage
forward-propagation
back-propagation
Untrained neural network
ResNet50
For one picture: image classification
7.7 billion operations
~35 MB parameter storage
* for forward propagation only, backward propagation similar
Labelled data
13
Neural network training: computational complexity
Result:
50 % cat
50 % dog
Label:
100% dog
![Page 14: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/14.jpg)
21/11/2019
ResNet50
forward-propagation
back-propagation
23 billion operations
~380 MB for parameter storage
ImageNet: 1.2 Million
pictures
Result?
1 epoch: 1.2𝑀 ∗ 30.7𝐵 ≈ 37 ∗ 1015 operations (majority MAC)
7.7 billion operations
~35 MB parameter storage
For the whole training process:
14
Neural network training: computational complexity
![Page 15: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/15.jpg)
21/11/2019
ResNet50
forward-propagation
back-propagation
23 billion operations
~380 MB for parameter storage
ImageNet: 1.2 Million
pictures
Result?
1 epoch: 1.2𝑀 ∗ 30.7𝐵 ≈ 37 ∗ 1015 operations (majority MAC)
ResNet50 needs 100 epochs for training…
7.7 billion operations
~35 MB parameter storage
For the whole training process:
15
Neural network training: computational complexity
![Page 16: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/16.jpg)
Requirements breakdown: training
21/11/201917
![Page 17: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/17.jpg)
21/11/2019
• Typically not time-critical
18
Requirements breakdown: training
![Page 18: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/18.jpg)
21/11/2019
• Typically not time-critical
• Compute billions of floating point calculations
19
Requirements breakdown: training
![Page 19: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/19.jpg)
21/11/2019
• Typically not time-critical
• Compute billions of floating point calculations
• Handle large data sets (GBs to hundreds of GBs)
20
Requirements breakdown: training
![Page 20: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/20.jpg)
21/11/2019
• Typically not time-critical
• Compute billions of floating point calculations
• Handle large data sets (GBs to hundreds of GBs)
• Flexibility to train a wide variety of neural networks
21
Requirements breakdown: training
![Page 21: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/21.jpg)
21/11/2019
• Typically not time-critical
• Compute billions of floating point calculations
• Handle large data sets (GBs to hundreds of GBs)
• Flexibility to train a wide variety of neural networks
22
Clear answer (for now): GPUs do the heavy lifting
of neural network training
Requirements breakdown: training
![Page 22: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/22.jpg)
Requirements: inference
21/11/201923
![Page 23: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/23.jpg)
Requirements: inference
21/11/2019
• Edge requirements
• Cloud requirements
24
![Page 24: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/24.jpg)
Requirements: inference
21/11/2019
• Edge requirements
• Low (deterministic) latency (e.g. real-time object detection)
• Cloud requirements
25
![Page 25: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/25.jpg)
Requirements: inference
21/11/2019
• Edge requirements
• Low (deterministic) latency (e.g. real-time object detection)
• Power efficiency (limited battery capacity)
• Cloud requirements
26
![Page 26: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/26.jpg)
Requirements: inference
21/11/2019
• Edge requirements
• Low (deterministic) latency (e.g. real-time object detection)
• Power efficiency (limited battery capacity)
• Sensor fusion (e.g. industrial surveillance)
• Cloud requirements
27
![Page 27: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/27.jpg)
Requirements: inference
21/11/2019
• Edge requirements
• Low (deterministic) latency (e.g. real-time object detection)
• Power efficiency (limited battery capacity)
• Sensor fusion (e.g. industrial surveillance)
• Robustness (e.g. temperature)
• Cloud requirements
28
![Page 28: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/28.jpg)
Requirements: inference
21/11/2019
• Edge requirements
• Low (deterministic) latency (e.g. real-time object detection)
• Power efficiency (limited battery capacity)
• Sensor fusion (e.g. industrial surveillance)
• Robustness (e.g. temperature)
• Cloud requirements
• Low latency, e.g. search engines
29
![Page 29: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/29.jpg)
Requirements: inference
21/11/2019
• Edge requirements
• Low (deterministic) latency (e.g. real-time object detection)
• Power efficiency (limited battery capacity)
• Sensor fusion (e.g. industrial surveillance)
• Robustness (e.g. temperature)
• Cloud requirements
• Low latency, e.g. search engines
• Power efficiency (heat dissipation/cooling cost)
30
![Page 30: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/30.jpg)
21/11/201932
Resource requirements overview
![Page 31: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/31.jpg)
21/11/2019
Image
Classification
33
Resource requirements overview
![Page 32: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/32.jpg)
21/11/2019
Image
Classification
Object
Detection
34
Resource requirements overview
![Page 33: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/33.jpg)
21/11/2019
Image
Classification
Object
Detection
Semantic
Segmentation
35
Resource requirements overview
![Page 34: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/34.jpg)
21/11/2019
Image
Classification
Object
Detection
Semantic
SegmentationOCR
36
Resource requirements overview
![Page 35: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/35.jpg)
21/11/2019
Image
Classification
Object
Detection
Semantic
Segmentation
Speech
RecognitionOCR
37
Resource requirements overview
![Page 36: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/36.jpg)
21/11/2019
Image
Classification
Object
Detection
Semantic
Segmentation
Speech
RecognitionOCR
Main takeaway points:
• Inference is challenging
• Huge variation in compute and memory
requirements (even within subgroups)
• Models typically don’t fit into local memory
38
Resource requirements overview
![Page 37: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/37.jpg)
Inference Accelerator
Architectural challenges
21/11/2019
DMA
External memory
Buffer Compute Array
Partial Sums
Activation Functions, …
Weight Buffer
input result
39
![Page 38: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/38.jpg)
Inference Accelerator
Architectural challenges
21/11/2019
DMA
External memory
Buffer Compute Array
Partial Sums
Activation Functions, …
Weight Buffer
input result
Huge amount of
computations
Memory bandwidth
Memory bandwidth
40
![Page 39: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/39.jpg)
Performance & Power Efficiency
Fle
xib
ilit
y &
Ease
of
Use
Qualitative hardware comparison
21/11/201941
![Page 40: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/40.jpg)
Performance & Power Efficiency
Fle
xib
ilit
y &
Ease
of
Use
21/11/201942
Qualitative hardware comparison
![Page 41: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/41.jpg)
Requirements GPU FPGA ASIC
Low (deterministic) latency
Qualitative hardware comparison
21/11/201944
![Page 42: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/42.jpg)
Requirements GPU FPGA ASIC
Low (deterministic) latency
Qualitative hardware comparison
21/11/201945
![Page 43: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/43.jpg)
Qualitative hardware comparison
21/11/201946
Requirements GPU FPGA ASIC
Low (deterministic) latency
High throughput
![Page 44: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/44.jpg)
Qualitative hardware comparison
21/11/201947
Requirements GPU FPGA ASIC
Low (deterministic) latency
High throughput
![Page 45: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/45.jpg)
Requirements GPU FPGA ASIC
Low (deterministic) latency
High throughput
Power efficiency
Qualitative hardware comparison
21/11/201948
![Page 46: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/46.jpg)
Requirements GPU FPGA ASIC
Low (deterministic) latency
High throughput
Power efficiency
Qualitative hardware comparison
21/11/201949
![Page 47: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/47.jpg)
Qualitative hardware comparison
21/11/201950
Requirements GPU FPGA ASIC
Low (deterministic) latency
High throughput
Power efficiency
Sensor fusion
![Page 48: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/48.jpg)
Qualitative hardware comparison
21/11/201951
Requirements GPU FPGA ASIC
Low (deterministic) latency
High throughput
Power efficiency
Sensor fusion
![Page 49: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/49.jpg)
Qualitative hardware comparison
21/11/201952
Requirements GPU FPGA ASIC
Low (deterministic) latency
High throughput
Power efficiency
Sensor fusion
Robustness
![Page 50: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/50.jpg)
Qualitative hardware comparison
21/11/201953
Requirements GPU FPGA ASIC
Low (deterministic) latency
High throughput
Power efficiency
Sensor fusion
Robustness
![Page 51: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/51.jpg)
Requirements GPU FPGA ASIC
Low (deterministic) latency
High throughput
Power efficiency
Sensor fusion
Robustness
Programmability
Qualitative hardware comparison
21/11/201954
![Page 52: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/52.jpg)
Requirements GPU FPGA ASIC
Low (deterministic) latency
High throughput
Power efficiency
Sensor fusion
Robustness
Programmability
Qualitative hardware comparison
21/11/201955
![Page 53: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/53.jpg)
Requirements GPU FPGA ASIC
Low (deterministic) latency
High throughput
Power efficiency
Sensor fusion
Robustness
Programmability
Flexibility
Qualitative hardware comparison
21/11/201956
![Page 54: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/54.jpg)
Requirements GPU FPGA ASIC
Low (deterministic) latency
High throughput
Power efficiency
Sensor fusion
Robustness
Programmability
Flexibility
Qualitative hardware comparison
21/11/201957
![Page 55: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/55.jpg)
Requirements GPU FPGA ASIC
Low (deterministic) latency
High throughput
Power efficiency
Sensor fusion
Robustness
Programmability
Flexibility
Ease-of-use
Qualitative hardware comparison
21/11/201958
![Page 56: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/56.jpg)
Requirements GPU FPGA ASIC
Low (deterministic) latency
High throughput
Power efficiency
Sensor fusion
Robustness
Programmability
Flexibility
Ease-of-use
Qualitative hardware comparison
21/11/201959
![Page 57: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/57.jpg)
Requirements GPU FPGA ASIC
Low (deterministic) latency
High throughput
Power efficiency
Sensor fusion
Robustness
Programmability
Flexibility
Ease-of-use
(Development) cost
Qualitative hardware comparison
21/11/201960
![Page 58: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/58.jpg)
Requirements GPU FPGA ASIC
Low (deterministic) latency
High throughput
Power efficiency
Sensor fusion
Robustness
Programmability
Flexibility
Ease-of-use
(Development) cost
Qualitative hardware comparison
21/11/201961
![Page 59: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/59.jpg)
Requirements GPU FPGA ASIC
Low (deterministic) latency
High throughput
Power efficiency
Sensor fusion
Robustness
Programmability
Flexibility
Ease-of-use
(Development) cost
Qualitative hardware comparison
21/11/201962
![Page 60: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/60.jpg)
Requirements GPU FPGA ASIC
Low (deterministic) latency
High throughput
Power efficiency
Sensor fusion
Robustness
Programmability
Flexibility
Ease-of-use
(Development) cost
Qualitative hardware comparison
21/11/201963
![Page 61: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/61.jpg)
Requirements GPU FPGA ASIC
Low (deterministic) latency
High throughput
Power efficiency
Sensor fusion
Robustness
Programmability
Flexibility
Ease-of-use
(Development) cost
Qualitative hardware comparison
21/11/201964
![Page 62: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/62.jpg)
FPGA ML workflow
21/11/201965
![Page 63: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/63.jpg)
FPGA ML workflow
21/11/2019
Challenge: efficient mapping of floating point model to FPGA implementation
without losing accuracy
FP32
Trained network
Floating point model
66
![Page 64: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/64.jpg)
FPGA ML workflow
21/11/2019
Challenge: efficient mapping of floating point model to FPGA implementation
without losing accuracy
FP32
Trained network
Floating point model
Compression67
![Page 65: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/65.jpg)
FPGA ML workflow
21/11/2019
Challenge: efficient mapping of floating point model to FPGA implementation
without losing accuracy
FP32
Pruning
Pruned network
Trained network
Floating point model
Compression68
![Page 66: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/66.jpg)
Quick digression
21/11/201969
![Page 67: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/67.jpg)
FPGA ML workflow
21/11/2019
Challenge: efficient mapping of floating point model to FPGA implementation
without losing accuracy
FP32
Pruning
Pruned network
Quantization
Trained network
Floating point model
Compression70
![Page 68: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/68.jpg)
FPGA ML workflow
21/11/2019
Challenge: efficient mapping of floating point model to FPGA implementation
without losing accuracy
FP32
Pruning
Pruned network
Quantization
Compilation
Trained network
Floating point model
Compression74
![Page 69: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/69.jpg)
FPGA ML workflow
21/11/2019
Challenge: efficient mapping of floating point model to FPGA implementation
without losing accuracy
FP32
Pruning
Pruned network
Quantization
Compilation
FPGA implementationTrained network
Floating point model
Compression
Fixed Point
75
![Page 70: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/70.jpg)
Impact of compression
21/11/2019
https://www.hotchips.org/hc30/0tutorials/T2_Part_2_Song_Hanv3.pdf
76
![Page 71: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/71.jpg)
Impact of compression
21/11/2019
https://www.hotchips.org/hc30/0tutorials/T2_Part_2_Song_Hanv3.pdf
77
![Page 72: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/72.jpg)
Impact of compression
21/11/2019
https://www.hotchips.org/hc30/0tutorials/T2_Part_2_Song_Hanv3.pdf
Compression allows using significantly less resources when
deploying a neural network
with minimal impact on network accuracy78
![Page 73: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/73.jpg)
Hardware implementation architectures
21/11/2019
• Streaming architecture
Memory CPU
CO
NV
…
FPGA
HO
ST
PO
OL
CO
NV
FC
80
![Page 74: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/74.jpg)
Hardware implementation architectures
21/11/2019
• Streaming architecture • Single computation engine
NLCONV/FC POOL
MemoryCPU
HO
ST CONV LAYER
ACTIVATION
POOL
CONV LAYER
ACTIVATION
FC
DMAControl Unit
FP
GA
Memory CPU
CO
NV
…
FPGA
HO
ST
PO
OL
CO
NV
FC
81
![Page 75: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/75.jpg)
Hardware implementation architectures
21/11/2019
• Streaming architecture • Single computation engine
NLCONV/FC POOL
MemoryCPU
HO
ST CONV LAYER
ACTIVATION
POOL
CONV LAYER
ACTIVATION
FC
DMAControl Unit
FP
GA
Memory CPU
CO
NV
…
FPGA
HO
ST
PO
OL
CO
NV
FC
Properties Streaming architecture Single computation engine
Customizability
Flexibility
Power efficiency
82
![Page 76: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/76.jpg)
Toolchains for AI on FPGAs
21/11/2019
Provider
Edge Cloud
Computer vision Language processing Computer visionLanguage processing
Xilinx
DNNDK
(Deep Neural Network
Development Kit)
- ML (Machine Learning) Suite
Intel - - OpenVINO
Omnitek DPU (Deep Learning Processing Unit) + software framework
Lattice sensAI -
83
![Page 77: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/77.jpg)
Toolchains for AI on FPGAs
21/11/2019
Provider
Edge Cloud
Computer vision Language processing Computer visionLanguage processing
Xilinx
DNNDK
(Deep Neural Network
Development Kit)
- ML (Machine Learning) Suite
Intel - - OpenVINO
Omnitek DPU (Deep Learning Processing Unit) + software framework
Lattice sensAI -
84
![Page 78: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/78.jpg)
Summary
21/11/201985
![Page 79: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/79.jpg)
Summary
21/11/2019
• Neural network inference is viable on FPGAs
• Low power (~mW – W)
• Sensor integration
• Flexibility
• Low deterministic latency
• Edge examples
86
![Page 80: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/80.jpg)
Summary
21/11/2019
• Neural network inference is viable on FPGAs
• Low power (~mW – W)
• Sensor integration
• Flexibility
• Low deterministic latency
• Edge examples
Xnor.ai: solar powered
person detection
87
• Neural network inference is viable on FPGAs
• Low power (~mW – W)
• Sensor integration
• Flexibility
• Low deterministic latency
• Cloud examples
![Page 81: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/81.jpg)
Summary
21/11/2019
• Neural network inference is viable on FPGAs
• Low power (~mW – W)
• Sensor integration
• Flexibility
• Low deterministic latency
• Edge examples
CERN: sensor data filtering
and classificationXnor.ai: solar powered
person detection
88
• Neural network inference is viable on FPGAs
• Low power (~mW – W)
• Sensor integration
• Flexibility
• Low deterministic latency
• Cloud examples
![Page 82: Introduction to machine learning on FPGAs...FPGA ML workflow 21/11/2019 Challenge: efficient mapping of floating point model to FPGA implementation without losing accuracy FP32 Pruning](https://reader034.vdocuments.us/reader034/viewer/2022042620/5f41cd0d7a976204c321b2a4/html5/thumbnails/82.jpg)
Summary
21/11/2019
• Neural network inference is viable on FPGAs
• Low power (~mW – W)
• Sensor integration
• Flexibility
• Low deterministic latency
• Edge examples
CERN: sensor data filtering
and classificationMicrosoft: Azure cloud AIXnor.ai: solar powered
person detection
89
• Neural network inference is viable on FPGAs
• Low power (~mW – W)
• Sensor integration
• Flexibility
• Low deterministic latency
• Cloud examples