ultra fast ai with azure machine learning and proj. brainwave
TRANSCRIPT
Lig
ht B
lue
R0 G
188 B
242
Gre
en
R16 G
124 B
16
Red
R232 G
17 B
35
Mag
en
taR
180 G
0 B
158
Pu
rple
R92 G
45 B
145
Blu
eR
0 G
120 B
215
Teal
R0 G
130 B
114
Yello
wR
255 G
185 B
0
Ora
ng
eR
216 G
59 B
1
Lig
ht Y
ello
wR
255 G
241 B
0Lig
ht O
ran
ge
R255 G
140 B
0Lig
ht M
ag
en
taR
227 G
0 B
140
Lig
ht P
urp
leR
180 G
160 B
255
Lig
ht T
eal
R0 G
178 B
148
Lig
ht G
reen
R186 G
216 B
10
Dark
Red
R168 G
0 B
0D
ark
Mag
en
ta
R92 G
0 B
92
Dark
Pu
rple
R50 G
20 B
90
Mid
Blu
eR
0 G
24 B
143
Dark
Teal
R0 G
75 B
80
Dark
Gre
en
R0 G
75 B
28
Dark
Blu
eR
0 G
32 B
80
Mid
Gra
yR
115 G
115 B
115
Dark
Gra
yR
80 G
80 B
80
Ric
h B
lack
R0 G
0 B
0
Wh
iteR
255 G
255 B
255
Gra
yR
210 G
210 B
210
Lig
ht G
ray
R230 G
230 B
230
Ted Way, PhDSenior Program Manager, Azure ML
Ultra Fast AI with Azure Machine Learning and Proj. Brainwave
FPGAs in Microsoft’s Intelligent CloudFPGAs in Microsoft’s Intelligent Cloud
Brainwave
real-time AI
Deep neural networks have enabled major
advances in machine learning and AI
Computer vision
Language translation
Speech recognition
Question answering
And more…
Problem
DNNs are challenging to serve and deploy
in large-scale online services
Convolutional Neural Networks
ht-1 ht ht+1
xt-1 xt xt+1
ht-1 ht ht+1
yt-1 yt yt+1
Recurrent Neural Networks
Performance Flexibility Scale
Rapidly adapt to evolving ML
Inference-optimized numerical precision
Exploit sparsity, deep compression
Excellent inference at low batch sizes
Ultra-low latency | 10x < CPU/GPU
World’s largest cloud investment in FPGAs
Multiple Exa-Ops of aggregate AI capacity
Runs on Microsoft’s scale infrastructure
Low cost
$0.21/million images on Azure FPGA
BrainWave
F F F
L0
L1
F F F
L0
Pretrained DNN Model
in TensorFlow, CNTK, etc.
Scalable DNN Hardware
Microservice
BrainWave
Soft DPU
Instr Decoder
& Control
Neural FU
A Scalable FPGA-Powered DNN Serving Platform
Fast:
Flexible
Friendly:
Network switches
FPGAs
Runs on Azure’s
configurable cloud
at massive scale
Azure ML integration
End-to-end deployment and model lifecycle support
Hardware Accelerated
Model Gallery
Brainwave
Compiler & Runtime
“Brainslice” Soft
Neural Processing Unit
Unique advantage
No batching required
Brainwave delivers the ideal combination:
High hardware utilization
Low latency
Low batch sizes
Batch Size
Perf
orm
an
ce
Brainwave
NPU
1256
and Project Brainwave
Model Management
Service
Azure ML orchestratorPython and TensorFlow
Featurize images and train classifier
Classifier (TF/LGBM)
Preprocessing(TensorFlow, C++
API)
Control Plane Service
Brain Wave Runtime
FPGA
CPU
EUS
SEA
WEU
WUS
Stamp: 20 racks
Azure box
24 CPU cores
4 FPGAs
BrainWave
Azure ML
Wire service
AML FPGA VM Extension
Azure Host
MonAgent
DNN pipeline
Cloud services
at the edge
Azure ML, Azure Stream
Analytics, Azure Functions,
custom
Manage from
the cloud
Devices and services
from Azure Portal
Flexible
connectivity
Intermittent, low, or
no connectivity
Reduced latency
and cost
Bring compute to the data,
reduced bandwidth cost
http://aka.ms/aml-real-time-ai
Models are easy to create and deploy into Azure cloud
Write once, deploy anywhere – to intelligent cloud or edge
Manage and update your models using Azure IoT Edge
//aka.ms/aml-real-time-ai
Lig
ht B
lue
R0 G
188 B
242
Gre
en
R16 G
124 B
16
Red
R232 G
17 B
35
Mag
en
taR
180 G
0 B
158
Pu
rple
R92 G
45 B
145
Blu
eR
0 G
120 B
215
Teal
R0 G
130 B
114
Yello
wR
255 G
185 B
0
Ora
ng
eR
216 G
59 B
1
Lig
ht Y
ello
wR
255 G
241 B
0Lig
ht O
ran
ge
R255 G
140 B
0Lig
ht M
ag
en
taR
227 G
0 B
140
Lig
ht P
urp
leR
180 G
160 B
255
Lig
ht T
eal
R0 G
178 B
148
Lig
ht G
reen
R186 G
216 B
10
Dark
Red
R168 G
0 B
0D
ark
Mag
en
ta
R92 G
0 B
92
Dark
Pu
rple
R50 G
20 B
90
Mid
Blu
eR
0 G
24 B
143
Dark
Teal
R0 G
75 B
80
Dark
Gre
en
R0 G
75 B
28
Dark
Blu
eR
0 G
32 B
80
Mid
Gra
yR
115 G
115 B
115
Dark
Gra
yR
80 G
80 B
80
Ric
h B
lack
R0 G
0 B
0
Wh
iteR
255 G
255 B
255
Gra
yR
210 G
210 B
210
Lig
ht G
ray
R230 G
230 B
230
Thank you