analog compute-in- memory for inference...© 2019 mythic. all rights reserved. confidential. for...
TRANSCRIPT
© 2019 MYTHIC. ALL RIGHTS RESERVED.© 2019 MYTHIC. ALL RIGHTS RESERVED.
Analog Compute-in-Memory for InferenceMike Henry, CEO & Founder
© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.
Large Growth Opportunity for AI Inference
2
Semiconductor TAM for AI ($B) – Barclay’s Research
0
5
10
15
20
25
30
2016 2017 2018 2019 2020 2021 2022 2023
Training Data center inference Edge inference
AUTOMOTIVEINDUSTRIAL
AEROSPACE VIDEO ANALYTICS
DATACENTER
CONSUMER
© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.
Tough Requirements for Inference
▪ Accuracy requirements dictatemany millions of weights per model
▪ Trillions of operations per second
▪ Real-time, low-latency, with high levels of concurrency
▪ Affordable, scalable, low-power package
3
DATASET SIZE
AC
CU
RA
CY
Traditional ML algorithms/Statistical Learning
Small NN
Medium NN
Large NN
Large models needed for edge cases
© 2019 MYTHIC. ALL RIGHTS RESERVED.
On-Chip Compute-and-Memory
4 © 2019 MYTHIC. ALL RIGHTS RESERVED.
© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.
Principles of On-Chip Compute-and-Memory
▪ Minimize journey from memory to processing units
▪ Many parallel compute units to maximize throughput and minimize latency
▪ Maximize memory bandwidth for entire set of DNNs
5
CPU
DRAM
L1 L2 L3
Traditional Architecture
© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.
Principles of On-Chip Compute-and-Memory
6
Compute-and-Memory Architecture
Mem
CPU
Mem
CPU
Mem
CPU
Mem
CPU
Mem
CPU
Mem
CPU
Mem
CPU
Mem
CPU
. . .
. . .
. . .. . .. . .. . .
▪ Minimize journey from memory to processing units
▪ Many parallel compute units to maximize throughput and minimize latency
▪ Maximize memory bandwidth for entire set of DNNs
© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.
Principles of On-Chip Compute-and-Memory
7
Weights are Stationary for Inference
Model A
Model B
Model C Model D
▪ Minimize journey from memory to processing units
▪ Many parallel compute units to maximize throughput and minimize latency
▪ Maximize memory bandwidth for entire set of DNNs
© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.
Innovate Faster and Easily Build Complex Applications
8D
ET
EC
TIO
NS
Final Layers
conv1
pool
1
conv3_x
conv4_x conv5_x
conv2_x
▪ Easily trade-off performance, power, latency, accuracy
▪ Enable concurrent and dynamic applications with deterministic execution
▪ Scalability from Very Large to Very Small
© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.
Innovate Faster and Easily Build Complex Applications
9
▪ Easily trade-off performance, power, latency, accuracy
▪ Enable concurrent and dynamic applications with deterministic execution
▪ Scalability from Very Large to Very Small
NeuralNetwork A
NeuralNetwork B
FAA 4563
© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.
Scalability from Very Large to Very Small
10
GPU
On-Chip Stationary Weights = OK Off-Chip Weights = Major BottlenecksWeight bandwidth = 10-20X data bandwidth
HD Video30 fps
186MB/s
HD Video30 fps
100M WeightsDistributed
20M WeightsDistributed
186MB/s 186MB/s
HD Video30 fps
20MWeights
HD Video30 fps
100MWeights
186MB/s6 GB/s 30 GB/s
CPU
© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.
Best-in-Class Performance and Power for Inference
11
Traditional
CPU / GPU
Compute-and-
Memory
Total model capacity ✓
System Simplicity ✓
Low Latency ✓
Energy Efficiency ✓
High Performance ✓
Deterministic Execution ✓
© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.
Implementation Challenges in Digital
▪ True compute-in-memory not possible
▪ Large die-area from compute and memory
▪ High power-consumption
▪ High development costs due to state-of-the-art process nodes
12
© 2019 MYTHIC. ALL RIGHTS RESERVED.
Enter Analog
13 © 2019 MYTHIC. ALL RIGHTS RESERVED.
© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.
Unique Analog Compute-in-Memory
14
▪ Mythic computes directly inside the flash memory array– Weights stored in flash– Inputs provided as voltages– Outputs collected as currents
▪ Result: High-performance, yet amazingly efficient processing
X0
X1
X2
V0
V1
V2
YA YB YC
ADC ADC ADC
DAC
DAC
DAC
© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.
Power, Performance, and Latency
15
Incredibly high parallelization and “effective” memory
bandwidth
1000 x 1000vector-matrix product in
a few microseconds
Energy efficiency of4 TOPS/W for convolutional
and dense networks
Accounts for all system aspects including processor control
cores and PCI express
© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.
▪ NOR Flash: Up to 50x denser weight storage
▪ No need to add more compute – the storage is the compute
Analog Compute Gives Us Model Capacity
16
SRAM = 48 transistors per 8-bit weight Analog = 1 flash transistor per 8-bit weight
© 2019 MYTHIC. ALL RIGHTS RESERVED.
Rich Roadmap for Analog Compute Improvements
17
© 2019 MYTHIC. ALL RIGHTS RESERVED.
© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.
DNN Tile - Ultra-dense Storage + Matrix Multiplication
18
Tiles Connected in a Grid
Expandable Grid of Tiles
Inp
ut
Da
ta
Ac
tiv
ati
on
s
Weight Storage
+Analog Matrix
MultiplierDig
ita
l to
An
alo
g
An
alo
g T
o D
igit
al
Single Tile
Network Connections
SRAM Processor
SIMD Router
Scene Segmentation
ObjectTracking
Camera Enhancement
© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.
AI Inference Processor Landscape
19
On-Chip Compute-and-MemoryTraditional Von-Neuman-like Architectures
AnalogCompute-In-Memory
DigitalCompute-Near-MemorySoC CPU/GPU TPU
(IP)
Mythic is a trademark of Mythic. All other product or company names may be trademarks of their respective owners.
© 2019 MYTHIC. ALL RIGHTS RESERVED.
The Mythic IPU
20 © 2019 MYTHIC. ALL RIGHTS RESERVED.
© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.
Mythic Initial Product Lineup
21
Mythic PCIe CardsMythic IPU
PCIe x16 FHFLM.2 x4 2280
Mythic IPU x1 Mythic IPU x4 Mythic IPU x16
PCIe x4 HHHL
PCIeSwitchPCIe
Switch
Up to 120M Weights
© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.
Low-Power DNN Inference Solution
22
Inference Capability per Watt for Common DNN ProcessorsResNet-50, INT8, and Batch=1
Frames per Second/Watt
Mythic IPU (est)
Habana Goya HL – 100
nVidia Tesla T4
nVidia Jetson Xavier
Qualcomm 835
nVidia Jetson TX2
© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.
Mythic SDK - Develop with Latest Networks / Frameworks
23
▪ Post-training quantization▪ Retraining libraries
▪ Graph decomposition and mapping
▪ Code generation
▪ Profiling and logging
▪ Power, speed and memory estimates
Performance Estimator
Runtime API Visualizer▪ Host runtime API and OS drivers
▪ Annotated
Graph Compiler
Optimization Suite
© 2019 MYTHIC. ALL RIGHTS RESERVED.
The Next 10 years
24 © 2019 MYTHIC. ALL RIGHTS RESERVED.
© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.
What the Hardware Industry is Used to
25
$ / Transistor (Intel) (Normalized)
130nm 90nm 65nm 45nm 32nm 22nm 14nm 10nm
1
0.1
0.01
2001 2014 ??
$10,000
$1,000
$100
$10
$1
$0
NAND Flash: Cost per GB
1999 2002 2005 2008 2011
© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.
Reality: Cost Improvements are Flat-Lining
26
https://wccftech.com/apple-5nm-3nm-cost-transistors/
$-
$1.00
$2.00
$3.00
$4.00
$5.00
$6.00
Cost per 1B Transistors
This should be a huge concern
16nm 10nm 7nm 5nm 3nm
Chip area (mm2) 125.00 87.66 83.27 85.00 85.00
No. of transistors (BU) 3.3 4.3 6.9 10.5 14.1
Gross die per wafer 478 686 721 707 707
Net die per wafer 359.74 512.44 545.65 530.25 509.04
Wafer price ($) 5,912.00 8,389.00 9,965.00 12,500 15,500
Die cost ($) 16.43 16.37 18.26 23.57 30.45
Transistor cost per 1B transistors ($)
4.98 3.81 2.65 2.25 2.16
© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.
NVM Scalability With Analog Compute-In-Memory
27
340
290
240
190
140
90
40
Density Improvements
2009 2012 2013 2015 2016 2018 2020 2021 2022
Core Transistors NAND Flash
340
290
240
190
140
90
40
Density Improvements
2009 2012 2013 2015 2016 2018 2020 2021 2022
Core Transistors NAND Flash
© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.
Mythic Analog Compute is the Clear Path Forward
28
Mythic Analog Compute-in-Memory Digital
Performance per Dollar
+ 10 years
100x
On-chip Model Capacity
+ 10 years
100x
Performance per Watt
+ 10 years
100x
© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.29
Mythic Analog Compute is the Clear Path Forward
© 2019 MYTHIC. ALL RIGHTS RESERVED.© 2019 MYTHIC. ALL RIGHTS RESERVED.