dnn engine: a 16nm sub-uj deep neural network ......dnn engine: a 16nm sub -uj dnn inference...
TRANSCRIPT
![Page 1: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/1.jpg)
DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator
for the Embedded Masses
Paul N. Whatmough1,2
S. K. Lee2, N. Mulholland2, P. Hansen2, S. Kodali3, D. Brooks2, G.-Y. Wei2
1ARM Research, Boston, MA2Harvard University, Cambridge, MA3Princeton University, NJ
![Page 2: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/2.jpg)
The age of deep learning
More Compute
Bigger Data
Deeper Nets
![Page 3: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/3.jpg)
The embedded masses
3
Interpret noisy real-world data from sensor-rich systems
Keep inference at the edge: always-on sensing
Large memory footprint and compute load
![Page 4: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/4.jpg)
DNN inference at the edge
Normalizemean=0, variance=1
Pre-Processing Feature Extraction Classifier
Trained weights
Features
FFT, MFCC, Sobel, HOG, etc
Fully-connected Deep neural network
(DNN)
Norm. DataRaw Data Classes
![Page 5: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/5.jpg)
DNN classifier design flow
Data- Training, validation, test
Training- Optimize hyper-parameters- Minimize size and test error
Quantization- Fixed-point 8-bit / 16-bit
Inference- CPU, DSP, GPU, Accelerator
5
[Reagen et al., ISCA 2016]
![Page 6: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/6.jpg)
DNN ENGINE accelerator architecture
Inpu
t Vec
tor
Out
put C
lass
es
DNN ENGINE
Hidden Layers
Outputs
+DNN Topology
+Weights
Inputs
Sensor Data
WeightNeuron Classification
Argmax
Parallelism and data reuse
Sparse data and small data types
Algorithmic resilience
![Page 7: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/7.jpg)
Outline
Background and motivation
DNN ENGINE- Parallelism and data reuse- Sparse data and small data types- Algorithmic resilience
Measurement Results
Summary
7
![Page 8: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/8.jpg)
Fully-connected DNN graph
Inpu
t Vec
tor
Out
put C
lass
es
Hidden Layers
Input Layer Output Layer
8
![Page 9: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/9.jpg)
Fully-connected DNN graph
Inpu
t Vec
tor
Out
put C
lass
es
NeuronWeight
9
![Page 10: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/10.jpg)
Fully-connected DNN graph
Inpu
t Vec
tor
Out
put C
lass
es
SIMD Window
10
Process a group of neurons in parallel
![Page 11: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/11.jpg)
Fully-connected DNN graph
Inpu
tVec
tor
Out
put C
lass
es
Weights
11
Re-use of activation data across neurons
![Page 12: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/12.jpg)
Fully-connected DNN graph
Inpu
t Vec
tor
Out
put C
lass
es
12
![Page 13: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/13.jpg)
Fully-connected DNN graph
Inpu
t Vec
tor
Out
put C
lass
es
13
![Page 14: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/14.jpg)
Fully-connected DNN graph
Inpu
t Vec
tor
Out
put C
lass
es
14
![Page 15: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/15.jpg)
Fully-connected DNN graph
Inpu
t Vec
tor
Out
put C
lass
es
15
Add bias and apply activation function
![Page 16: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/16.jpg)
Fully-connected DNN graph
Inpu
t Vec
tor
Out
put C
lass
es
16
![Page 17: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/17.jpg)
Fully-connected DNN graph
Inpu
t Vec
tor
Out
put C
lass
es
17
![Page 18: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/18.jpg)
Fully-connected DNN graph
Inpu
t Vec
tor
Out
put C
lass
es
18
![Page 19: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/19.jpg)
Fully-connected DNN graph
Inpu
t Vec
tor
Out
put C
lass
es
19
![Page 20: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/20.jpg)
Balancing efficiency and bandwidthNo Data Reuse
20
Parallelism increases throughput and data reuse
![Page 21: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/21.jpg)
Balancing efficiency and bandwidthBandwidth
too high
Parallelism increases throughput and data reuse- But also increases Memory Bandwidth demands
No Data Reuse
21
![Page 22: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/22.jpg)
Balancing efficiency and bandwidth
Parallelism increases throughput and data reuse- But also increases Memory Bandwidth demands
8-Way SIMD is efficient at reasonable memory BW- 10x Activation reuse, with 128b AXI channel
Bandwidth too high
No Data Reuse
128b
/cyc
le
22
![Page 23: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/23.jpg)
DNN ENGINE micro-architecture
MAC DatapathWeight Load
NBUF
IPBUF
XBUF
Data In
Data Out
CFG
ActivationDMA
Host
Pro
cess
or
Writeback
Data Read
W-MEM
23
8-Way SIMD accelerator architecture
![Page 24: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/24.jpg)
MAC DatapathWeight Load
NBUF
IPBUF
XBUF
Data In
Data Out
CFG
ActivationDMA
Host
Pro
cess
or
Writeback
Data Read
W-MEM
DNN ENGINE micro-architecture
Host Processor loads configuration and input data
24
![Page 25: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/25.jpg)
MAC DatapathWeight Load
NBUF
IPBUF
XBUF
Data In
Data Out
CFG
ActivationDMA
Host
Pro
cess
or
Writeback
Data Read
W-MEM
DNN ENGINE micro-architecture
Accumulate products of Activation and Weights
25
![Page 26: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/26.jpg)
MAC DatapathWeight Load
NBUF
IPBUF
XBUF
Data In
Data Out
CFG
ActivationDMA
Host
Pro
cess
or
Writeback
Data Read
W-MEM
DNN ENGINE micro-architecture
Add bias, apply ReLU activation and writeback
26
![Page 27: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/27.jpg)
MAC DatapathWeight Load
NBUF
IPBUF
XBUF
Data In
Data Out
CFG
ActivationDMA
Host
Pro
cess
or
Writeback
Data Read
W-MEM
DNN ENGINE micro-architecture
IRQ to host, which retrieves output data
27
![Page 28: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/28.jpg)
Outline
Background and motivation
DNN ENGINE- Parallelism and data reuse- Sparse data and small data types- Algorithmic resilience
Measurement Results
Summary
28
![Page 29: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/29.jpg)
Exploiting sparse data
Inpu
t Vec
tor
Out
put C
lass
es
29
![Page 30: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/30.jpg)
Exploiting sparse data
Inpu
t Vec
tor
Out
put C
lass
es
30
Discard small activation data
![Page 31: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/31.jpg)
Exploiting sparse data
Inpu
t Vec
tor
Out
put C
lass
es
31
Dynamically prune graph connectivity
![Page 32: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/32.jpg)
Exploiting sparse data
Inpu
t Vec
tor
Out
put C
lass
es
32
![Page 33: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/33.jpg)
Exploiting sparse data
Inpu
t Vec
tor
Out
put C
lass
es
33
![Page 34: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/34.jpg)
Exploiting sparse data
Inpu
t Vec
tor
Out
put C
lass
es
34
![Page 35: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/35.jpg)
Exploiting sparse data
Inpu
t Vec
tor
Out
put C
lass
es
35
Discard small activation data
![Page 36: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/36.jpg)
Exploiting sparse data
Inpu
t Vec
tor
Out
put C
lass
es
36
Dynamically prune graph connectivity
![Page 37: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/37.jpg)
Exploiting sparse data
Inpu
t Vec
tor
Out
put C
lass
es
37
![Page 38: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/38.jpg)
Exploiting sparse data
Inpu
t Vec
tor
Out
put C
lass
es
38
![Page 39: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/39.jpg)
Exploiting sparse data
Inpu
t Vec
tor
Out
put C
lass
es
39
![Page 40: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/40.jpg)
Exploiting sparse data
Inpu
t Vec
tor
Out
put C
lass
es
40
![Page 41: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/41.jpg)
DNN ENGINE micro-architecture
MAC DatapathWeight Load
NBUF
IPBUF
XBUF
Data In
Data Out
CFG
ActivationDMA
Host
Pro
cess
or
Writeback
Data Read
W-MEM
41
![Page 42: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/42.jpg)
DNN ENGINE micro-architecture
MAC DatapathWeight Load
NBUF
IPBUF
XBUF
Data In
Data Out
CFG
ActivationDMAIndex
Host
Pro
cess
or
Writeback
SKIPData Read
W-MEM
42
![Page 43: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/43.jpg)
DNN ENGINE micro-architecture
MAC DatapathWeight Load
NBUF
IPBUF
XBUF
Data In
Data Out
CFG
ActivationDMAIndex
Host
Pro
cess
or
Writeback
SKIPData Read
W-MEM
43
W-MEM supports 8b or 16b data types
![Page 44: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/44.jpg)
Outline
Background and motivation
DNN ENGINE- Parallelism and data reuse- Sparse data and small data types- Algorithmic resilience
Measurement Results
Summary
44
![Page 45: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/45.jpg)
MAC DatapathWeight Load
RZNBUF
IPBUF
XBUF
RZ
Data In
Data Out
CFG
ActivationDMAIndex
Host
Pro
cess
or
Writeback
SKIPData Read
W-MEM
DNN ENGINE micro-architecture
45
[Whatmough et al., ISSCC’17]
![Page 46: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/46.jpg)
Timing error tolerance
10-1
10-3
10-5
10-7
MNIST @ 98.36% AccuracyMemory Datapath Combined
Tim
ing
Viol
atio
n R
ate
TC SM BM TC SM TB TC ALL
> 1,
000,
000
X
46
[Whatmough et al., ISSCC’17]
![Page 47: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/47.jpg)
Outline
Background and motivation
DNN ENGINE- Parallelism and data reuse- Sparse data and small data types- Algorithmic resilience
Measurement Results
Summary
47
![Page 48: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/48.jpg)
16nm SoC for always-on applications
48
![Page 49: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/49.jpg)
16nm SoC for always-on applications
49
![Page 50: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/50.jpg)
Measured energy for always-on applications
Nominal Vdd / signoff Fmax
Energy varies with application Exponential with accuracy
On-chip memory critical A few KBs from DRAM >1uJ Constrains model size
50
667MHz / 0.8V
2.5x energy0.3% accuracy
![Page 51: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/51.jpg)
Measured energy improvement
Joint architecture and circuit optimizations
Typical silicon at room temp
Measurements demonstrate 10x energy reduction 4x throughput increase
51
10x
667MHz / 0.8V 667MHz / 0.67V
![Page 52: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/52.jpg)
State of the art classifiers
52
Dedicated NN accelerators Different performance points Different technologies
Accuracy critical metric Linear classifier 88% Exponential cost
The next 10x improvement Algorithm innovations?
![Page 53: DNN ENGINE: A 16nm Sub-uJ Deep Neural Network ......DNN ENGINE: A 16nm Sub -uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough1,2 S. K. Lee 2, N. Mulholland , P](https://reader030.vdocuments.us/reader030/viewer/2022012918/5e55ea7362900633a4007d19/html5/thumbnails/53.jpg)
Summary
Inference moving from cloud to the device: always-on sensing DNN ENGINE architecture optimizations
- Parallelism and data reuse- Sparse data and small data types- Algorithmic resilience
16nm test chip measurements- Critical to store the model in on-chip memory- 10x energy and 4x throughput improvement- 1uJ per prediction for always-on applications
We are grateful for support from DARPA CRAFT and PERFECT projects
53