analog compute-in- memory for inference...© 2019 mythic. all rights reserved. confidential. for...

© 2019 MYTHIC. ALL RIGHTS RESERVED.© 2019 MYTHIC. ALL RIGHTS RESERVED.

Analog Compute-in-Memory for InferenceMike Henry, CEO & Founder

© 2019 MYTHIC. ALL RIGHTS RESERVED. CONFIDENTIAL. FOR INTERNAL USE ONLY.© 2019 MYTHIC. ALL RIGHTS RESERVED.

Large Growth Opportunity for AI Inference

2

Semiconductor TAM for AI ($B) – Barclay’s Research

0

5

10

15

20

25

30

2016 2017 2018 2019 2020 2021 2022 2023

Training Data center inference Edge inference

AUTOMOTIVEINDUSTRIAL

AEROSPACE VIDEO ANALYTICS

DATACENTER

CONSUMER


Tough Requirements for Inference

▪ Accuracy requirements dictatemany millions of weights per model

▪ Trillions of operations per second

▪ Real-time, low-latency, with high levels of concurrency

▪ Affordable, scalable, low-power package

3

DATASET SIZE

AC

CU

RA

CY

Traditional ML algorithms/Statistical Learning

Small NN

Medium NN

Large NN

Large models needed for edge cases

© 2019 MYTHIC. ALL RIGHTS RESERVED.

On-Chip Compute-and-Memory

4 © 2019 MYTHIC. ALL RIGHTS RESERVED.


Principles of On-Chip Compute-and-Memory

▪ Minimize journey from memory to processing units

▪ Many parallel compute units to maximize throughput and minimize latency

▪ Maximize memory bandwidth for entire set of DNNs

5

CPU

DRAM

L1 L2 L3

Traditional Architecture



6

Compute-and-Memory Architecture

Mem

CPU

Mem

CPU

Mem

CPU

Mem

CPU

Mem

CPU

Mem

CPU

Mem

CPU

Mem

CPU

. . .

. . .

. . .. . .. . .. . .






7

Weights are Stationary for Inference

Model A

Model B

Model C Model D





Innovate Faster and Easily Build Complex Applications

8D

ET

EC

TIO

NS

Final Layers

conv1

pool

1

conv3_x

conv4_x conv5_x

conv2_x

▪ Easily trade-off performance, power, latency, accuracy

▪ Enable concurrent and dynamic applications with deterministic execution

▪ Scalability from Very Large to Very Small


Innovate Faster and Easily Build Complex Applications

9

▪ Easily trade-off performance, power, latency, accuracy

▪ Enable concurrent and dynamic applications with deterministic execution

▪ Scalability from Very Large to Very Small

NeuralNetwork A

NeuralNetwork B

FAA 4563


Scalability from Very Large to Very Small

10

GPU

On-Chip Stationary Weights = OK Off-Chip Weights = Major BottlenecksWeight bandwidth = 10-20X data bandwidth

HD Video30 fps

186MB/s

HD Video30 fps

100M WeightsDistributed

20M WeightsDistributed

186MB/s 186MB/s

HD Video30 fps

20MWeights

HD Video30 fps

100MWeights

186MB/s6 GB/s 30 GB/s

CPU


Best-in-Class Performance and Power for Inference

11

Traditional

CPU / GPU

Compute-and-

Memory

Total model capacity ✓

System Simplicity ✓

Low Latency ✓

Energy Efficiency ✓

High Performance ✓

Deterministic Execution ✓


Implementation Challenges in Digital

▪ True compute-in-memory not possible

▪ Large die-area from compute and memory

▪ High power-consumption

▪ High development costs due to state-of-the-art process nodes

12


Enter Analog



Unique Analog Compute-in-Memory

14

▪ Mythic computes directly inside the flash memory array– Weights stored in flash– Inputs provided as voltages– Outputs collected as currents

▪ Result: High-performance, yet amazingly efficient processing

X0

X1

X2

V0

V1

V2

YA YB YC

ADC ADC ADC

DAC

DAC

DAC


Power, Performance, and Latency

15

Incredibly high parallelization and “effective” memory

bandwidth

1000 x 1000vector-matrix product in

a few microseconds

Energy efficiency of4 TOPS/W for convolutional

and dense networks

Accounts for all system aspects including processor control

cores and PCI express


▪ NOR Flash: Up to 50x denser weight storage

▪ No need to add more compute – the storage is the compute

Analog Compute Gives Us Model Capacity

16

SRAM = 48 transistors per 8-bit weight Analog = 1 flash transistor per 8-bit weight


Rich Roadmap for Analog Compute Improvements

17

▪ More on this later



DNN Tile - Ultra-dense Storage + Matrix Multiplication

18

Tiles Connected in a Grid

Expandable Grid of Tiles

Inp

ut

Da

ta

Ac

tiv

ati

on

s

Weight Storage

+Analog Matrix

MultiplierDig

ita

l to

An

alo

g

An

alo

g T

o D

igit

al

Single Tile

Network Connections

SRAM Processor

SIMD Router

Scene Segmentation

ObjectTracking

Camera Enhancement


AI Inference Processor Landscape

19

On-Chip Compute-and-MemoryTraditional Von-Neuman-like Architectures

AnalogCompute-In-Memory

DigitalCompute-Near-MemorySoC CPU/GPU TPU

(IP)

Mythic is a trademark of Mythic. All other product or company names may be trademarks of their respective owners.


The Mythic IPU



Mythic Initial Product Lineup

21

Mythic PCIe CardsMythic IPU

PCIe x16 FHFLM.2 x4 2280

Mythic IPU x1 Mythic IPU x4 Mythic IPU x16

PCIe x4 HHHL

PCIeSwitchPCIe

Switch

Up to 120M Weights


Low-Power DNN Inference Solution

22

Inference Capability per Watt for Common DNN ProcessorsResNet-50, INT8, and Batch=1

Frames per Second/Watt

Mythic IPU (est)

Habana Goya HL – 100

nVidia Tesla T4

nVidia Jetson Xavier

Qualcomm 835

nVidia Jetson TX2


Mythic SDK - Develop with Latest Networks / Frameworks

23

▪ Post-training quantization▪ Retraining libraries

▪ Graph decomposition and mapping

▪ Code generation

▪ Profiling and logging

▪ Power, speed and memory estimates

Performance Estimator

Runtime API Visualizer▪ Host runtime API and OS drivers

▪ Annotated

Graph Compiler

Optimization Suite


The Next 10 years



What the Hardware Industry is Used to

25

$ / Transistor (Intel) (Normalized)

130nm 90nm 65nm 45nm 32nm 22nm 14nm 10nm

1

0.1

0.01

2001 2014 ??

$10,000

$1,000

$100

$10

$1

$0

NAND Flash: Cost per GB

1999 2002 2005 2008 2011


Reality: Cost Improvements are Flat-Lining

26

https://wccftech.com/apple-5nm-3nm-cost-transistors/

$-

$1.00

$2.00

$3.00

$4.00

$5.00

$6.00

Cost per 1B Transistors

This should be a huge concern

16nm 10nm 7nm 5nm 3nm

Chip area (mm2) 125.00 87.66 83.27 85.00 85.00

No. of transistors (BU) 3.3 4.3 6.9 10.5 14.1

Gross die per wafer 478 686 721 707 707

Net die per wafer 359.74 512.44 545.65 530.25 509.04

Wafer price ($) 5,912.00 8,389.00 9,965.00 12,500 15,500

Die cost ($) 16.43 16.37 18.26 23.57 30.45

Transistor cost per 1B transistors ($)

4.98 3.81 2.65 2.25 2.16

https://wccftech.com/apple-5nm-3nm-cost-transistors/


NVM Scalability With Analog Compute-In-Memory

27

340

290

240

190

140

90

40

Density Improvements

2009 2012 2013 2015 2016 2018 2020 2021 2022

Core Transistors NAND Flash

340

290

240

190

140

90

40

Density Improvements

2009 2012 2013 2015 2016 2018 2020 2021 2022

Core Transistors NAND Flash


Mythic Analog Compute is the Clear Path Forward

28

Mythic Analog Compute-in-Memory Digital

Performance per Dollar

+ 10 years

100x

On-chip Model Capacity

+ 10 years

100x

Performance per Watt

+ 10 years

100x

analog compute-in- memory for inference...© 2019 mythic. all rights reserved. confidential. for...

Documents