the n3xt technology for brain-inspired computing · 1 h.-s. philip wong 2015.04.15 stanford...

48
Stanford University Department of Electrical Engineering 2007.11.08 Stanford SystemX Alliance The N3XT Technology for Brain-Inspired Computing H.-S. Philip Wong Stanford University

Upload: hoangcong

Post on 22-May-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Stanford University 2015.04.15 H.-S. Philip Wong 1

Stanford University

Department of Electrical Engineering 2007.11.08 Stanford SystemX Alliance

The N3XT Technology for Brain-Inspired Computing

H.-S. Philip Wong Stanford University

Stanford University 2015.04.15 H.-S. Philip Wong 2 Source: Google

Stanford University 2015.04.15 H.-S. Philip Wong 3 Source: vrworld.com

Stanford University 2015.04.15 H.-S. Philip Wong 4 Source: BDC Magazine

Stanford University 2015.04.15 H.-S. Philip Wong 5 1988 Winter Olympic Games in Calgary, Canada

Stanford University 2015.04.15 H.-S. Philip Wong 6

Stanford University 2015.04.15 H.-S. Philip Wong 7 Source: Gogamguro.com

Stanford University 2015.04.15 H.-S. Philip Wong 8

100’s of kW

Source: Google

Stanford University 2015.04.15 H.-S. Philip Wong 9

Application Hardware used

Estimated

power

consumption

Emulating 4.5% of human brain:1013 synapses, 109 neurons

Blue Gene/P:

36,864 nodes,

147,456 cores

2.9 MW (LINPACK)

Deep sparse autoencoder:

109 synapses, 10M images 1,000 CPUs

(16,000 cores) ~100 kW

(cores only)

Convolutional neural net with 60M

synapses, 650K neurons 2 GPUs 1,200 W

Restricted Boltzmann Machine:

28M synapses; 69,888 neurons GPU 550 W

CPU 65 W

Processing 1 s of speech using deep

neural network GPU 238 W

CPU (4 cores) 80 W

Larg

e s

cale

S

mall

to

mo

dera

te s

cale

Scale Up Requires Energy Efficiency

S. B. Eryilmaz et al., IEDM 2015

Stanford University 2015.04.15 H.-S. Philip Wong 10

These nanotechnology innovations will have to be developed in close coordination with new computer architectures, and will likely be informed by our growing understanding of the brain—a remarkable, fault-tolerant system that consumes less power than an incandescent light bulb.

Stanford University 2015.04.15 H.-S. Philip Wong 11

Approaches of Neuromorphic Hardware

Stanford University 2015.04.15 H.-S. Philip Wong 12

Approaches of Neuromorphic Hardware B

iolo

gy-b

ase

d

mo

de

ls /

al

gori

thm

s

Co

nve

nti

on

al

ML

algo

rith

ms

Stanford University 2015.04.15 H.-S. Philip Wong 13

Approaches of Neuromorphic Hardware

Conventional hardware (CPU, GPU, supercomputers, etc)

Neuromorphic hardware

Stanford University 2015.04.15 H.-S. Philip Wong 14

Approaches of Neuromorphic Hardware

Conventional hardware (CPU, GPU, supercomputers, etc)

with analog non-volatile memory synapses

Neuromorphic hardware

Stanford University 2015.04.15 H.-S. Philip Wong 15

Approaches of Neuromorphic Hardware B

iolo

gy-b

ase

d

mo

de

ls /

al

gori

thm

s Conventional hardware (CPU, GPU, supercomputers, etc)

with analog non-volatile memory synapses

Neuromorphic hardware

Brain emulation on BlueGene [7]

HTM [3]

Stanford University 2015.04.15 H.-S. Philip Wong 16

Approaches of Neuromorphic Hardware B

iolo

gy-b

ase

d

mo

de

ls /

al

gori

thm

s

Co

nve

nti

on

al

ML

algo

rith

ms

Conventional hardware (CPU, GPU, supercomputers, etc)

with analog non-volatile memory synapses

Neuromorphic hardware

Brain emulation on BlueGene [7]

HTM [3]

“Cats on YouTube” ANNs: ConvNets, DNNs, DBNs [10-13]

Stanford University 2015.04.15 H.-S. Philip Wong 17

Approaches of Neuromorphic Hardware B

iolo

gy-b

ase

d

mo

de

ls /

al

gori

thm

s

Co

nve

nti

on

al

ML

algo

rith

ms

Conventional hardware (CPU, GPU, supercomputers, etc)

with analog non-volatile memory synapses

Neuromorphic hardware

Brain emulation on BlueGene [7]

HTM [3]

“Cats on YouTube” ANNs: ConvNets, DNNs, DBNs [10-13]

Human Brain Project [20]

TrueNorth [16]

SpiNNaker [19]

Stanford University 2015.04.15 H.-S. Philip Wong 18

Approaches of Neuromorphic Hardware B

iolo

gy-b

ase

d

mo

de

ls /

al

gori

thm

s

Co

nve

nti

on

al

ML

algo

rith

ms

Conventional hardware (CPU, GPU, supercomputers, etc)

with analog non-volatile memory synapses

Neuromorphic hardware

Brain emulation on BlueGene [7]

HTM [3]

“Cats on YouTube” ANNs: ConvNets, DNNs, DBNs [10-13]

Human Brain Project [20]

TrueNorth [16]

SpiNNaker [19]

Hebbian learning Spike-based ANN

PCM, RRAM, CBRAM

Stanford University 2015.04.15 H.-S. Philip Wong 19

Approaches of Neuromorphic Hardware B

iolo

gy-b

ase

d

mo

de

ls /

al

gori

thm

s

Co

nve

nti

on

al

ML

algo

rith

ms

Conventional hardware (CPU, GPU, supercomputers, etc)

with analog non-volatile memory synapses

Neuromorphic hardware

Brain emulation on BlueGene [7]

HTM [3]

“Cats on YouTube” ANNs: ConvNets, DNNs, DBNs [10-13]

Human Brain Project [20]

TrueNorth [16]

SpiNNaker [19]

Hebbian learning Spike-based ANN

PCM, RRAM, CBRAM

ANN, RBM, sparse learning

PCM, RRAM

Stanford University 2015.04.15 H.-S. Philip Wong 21

Many of these breakthroughs will require new kinds of nanoscale devices and materials integrated into three-dimensional systems and may take a decade or more to achieve.

Stanford University 2015.04.15 H.-S. Philip Wong 22

Memory Ultra-dense

vertical connections

N3XT Nanosystems Computation immersed in memory

Computing logic

Stanford University 2015.04.15 H.-S. Philip Wong 23

Memory Ultra-dense

vertical connections

N3XT Nanosystems Computation immersed in memory

Impossible with today’s technologies

Computing logic

Stanford University 2015.04.15 H.-S. Philip Wong 24

N3XT: Computation Immersed in Memory

thermal

thermal

MRAM Quick access

3D Resistive RAM

Massive storage

Not TSV thermal

1D CNFET, 2D FET Compute, RAM access

1D CNFET, 2D FET Compute, RAM access

1D CNFET, 2D FET Compute, Power,

Clock

Silicon compatible

Ultra-dense, fine-grained

vias

Stanford University 2015.04.15 H.-S. Philip Wong 25

Aly et al., IEEE Computer, 2015

Stanford University 2015.04.15 H.-S. Philip Wong 26

Non-Volatile Memory (NVM)

D. Kuzum et al., Nano Lett. 2013, Y. Wu et al., IEDM 2013; A. Calderoni et al., IMW 2014

Metal oxide resistive switching memory (RRAM)

25nm

12 nm

TiOx/

HfOx

TiN

10 nm

SiO2

filament

oxygen ion

Top Electrode

Bottom Electrode

metaloxide

oxygen vacancy

poly c-GSTpoly c-GSTpoly c-GST

amorphousamorphous

SiO2 SiO2 SiO2TiN TiN TiN

TiNTiNTiN

fully set state partially reset state fully reset state

Phase change memory (PCM)

Bottom Electrode

Top Electrode

oxide isolation

switching region

phase change material

Conductive bridge memory (CBRAM)

filament

Bottom Electrode

solid electrolyte

Active Top Electrode

metal atoms

Phase change memory

Cu ion Electrolyte

Bottom electrode

Stanford University 2015.04.15 H.-S. Philip Wong 27

Non-Volatile Memory (NVM) → Synapse

Analog programmable

Scalable to a few nm

Stack in 3D

D. Kuzum et al., Nano Lett. 2013, Y. Wu et al., IEDM 2013; A. Calderoni et al., IMW 2014

Metal oxide resistive switching memory (RRAM)

25nm

12 nm

TiOx/

HfOx

TiN

10 nm

SiO2

poly c-GSTpoly c-GSTpoly c-GST

amorphousamorphous

SiO2 SiO2 SiO2TiN TiN TiN

TiNTiNTiN

fully set state partially reset state fully reset state

Phase change memory (PCM)

Conductive bridge memory (CBRAM)

Phase change memory

Cu ion Electrolyte

Bottom electrode

Stanford University 2015.04.15 H.-S. Philip Wong 28

0 500 1000 1500 2000 250010

3

104

105

Re

sis

tan

ce

(O

hm

)

Pulse Number2100 2200 2300

103

104

105

Resis

tan

ce (

Oh

m)

Pulse Number

100-step grey scale (1% resolution) Control the size and nature of the amorphous region

Partial RESET

Partial SET

Phase change synapse

Nanoscale Memory as Synaptic Weights

Synaptic updates in the brain: basis for learning Requirement: analog resistance change

D. Kuzum et al., Nano Lett., p. 2179 (2012)

Stanford University 2015.04.15 H.-S. Philip Wong 29

Nanoscale Memory Can Emulate Biological Synaptic Behavior

-60 -40 -20 0 20 40 60

-40

0

40

80

120

=-11.3

=-18.6

=-29

=11

=20.7

LTP-1

LTP-2

LTP-3

LTD-1

LTD-2

LTD-3

Sy

na

pti

c w

eig

ht

ch

an

ge

w

(%

)

Spike timing t (ms)

=29.5

Various time constants

-50 0 50

-50

0

50

100

w

(%

)

t (ms)-50 0 50

-50

0

50

100

w

(%

)

t (ms)

-50 0 50

-50

0

50

100

w

(%

)

t (ms)-50 0 50

-50

0

50

100

w

(%

)

t (ms)

Various STDP kernels

0 20 40 60 80 100

0

20

40

60

80

100

120

140 10 ms 25 ms

15 ms 35 ms

20 ms 45 ms

Syn

ap

tic W

eig

ht

Ch

an

ge

w (

%)

Number of pre/post spike pairs

Weight update saturation

STDP (spike-timing-dependent plasticity)

D. Kuzum et al., Nano Lett., p. 2179 (2012)

Stanford University 2015.04.15 H.-S. Philip Wong 30

Nanoscale Memory Can Emulate Biological Synaptic Behavior

-60 -40 -20 0 20 40 60

-40

0

40

80

120

=-11.3

=-18.6

=-29

=11

=20.7

LTP-1

LTP-2

LTP-3

LTD-1

LTD-2

LTD-3

Sy

na

pti

c w

eig

ht

ch

an

ge

w

(%

)

Spike timing t (ms)

=29.5

Various time constants

-50 0 50

-50

0

50

100

w

(%

)

t (ms)-50 0 50

-50

0

50

100

w

(%

)

t (ms)

-50 0 50

-50

0

50

100

w

(%

)

t (ms)-50 0 50

-50

0

50

100

w

(%

)

t (ms)

Various STDP kernels

0 20 40 60 80 100

0

20

40

60

80

100

120

140 10 ms 25 ms

15 ms 35 ms

20 ms 45 ms

Syn

ap

tic W

eig

ht

Ch

an

ge

w (

%)

Number of pre/post spike pairs

Weight update saturation

STDP (spike-timing-dependent plasticity)

D. Kuzum et al., Nano Lett., p. 2179 (2012)

Stanford University 2015.04.15 H.-S. Philip Wong 31

Nanoscale Memory Can Emulate Biological Synaptic Behavior

-60 -40 -20 0 20 40 60

-40

0

40

80

120

=-11.3

=-18.6

=-29

=11

=20.7

LTP-1

LTP-2

LTP-3

LTD-1

LTD-2

LTD-3

Sy

na

pti

c w

eig

ht

ch

an

ge

w

(%

)

Spike timing t (ms)

=29.5

Various time constants

-50 0 50

-50

0

50

100

w

(%

)

t (ms)-50 0 50

-50

0

50

100

w

(%

)

t (ms)

-50 0 50

-50

0

50

100

w

(%

)

t (ms)-50 0 50

-50

0

50

100

w

(%

)

t (ms)

Various STDP kernels

0 20 40 60 80 100

0

20

40

60

80

100

120

140 10 ms 25 ms

15 ms 35 ms

20 ms 45 ms

Syn

ap

tic W

eig

ht

Ch

an

ge

w (

%)

Number of pre/post spike pairs

Weight update saturation

STDP (spike-timing-dependent plasticity)

D. Kuzum et al., Nano Lett., p. 2179 (2012)

Stanford University 2015.04.15 H.-S. Philip Wong 32

Nanoscale Memory Can Emulate Biological Synaptic Behavior

-60 -40 -20 0 20 40 60

-40

0

40

80

120

=-11.3

=-18.6

=-29

=11

=20.7

LTP-1

LTP-2

LTP-3

LTD-1

LTD-2

LTD-3

Sy

na

pti

c w

eig

ht

ch

an

ge

w

(%

)

Spike timing t (ms)

=29.5

Various time constants

-50 0 50

-50

0

50

100

w

(%

)

t (ms)-50 0 50

-50

0

50

100

w

(%

)

t (ms)

-50 0 50

-50

0

50

100

w

(%

)

t (ms)-50 0 50

-50

0

50

100

w

(%

)

t (ms)

Various STDP kernels

0 20 40 60 80 100

0

20

40

60

80

100

120

140 10 ms 25 ms

15 ms 35 ms

20 ms 45 ms

Syn

ap

tic W

eig

ht

Ch

an

ge

w (

%)

Number of pre/post spike pairs

Weight update saturation

STDP (spike-timing-dependent plasticity)

D. Kuzum et al., Nano Lett., p. 2179 (2012)

Stanford University 2015.04.15 H.-S. Philip Wong 33

Hyper Dimensional (HD) Computing

Information is represented by High-dimensional representation (e.g., D = 10,000)

Variables and values are combined into a “holistic” record using vector algebra:

– Multiplication for Binding

– Addition for Bundling

Composed vector can in turn become a component in further composition

Holistic record is decoded with (inverse) multiplication

Approximate results of vector operations are identified with exact ones using content-addressable memory

ENIGMA

Elements of Hyper dimensional Computing: Also known as vector symbolic architectures or holographic reduced representation (Kanerva, Cognitive Computation, 1(2):139-159,2009)

Stanford University 2015.04.15 H.-S. Philip Wong 34

HD Computing layers

011101010101…..011101010100…..

10110010010…..101010011110…..

Letters/Image features/

Phonemes/DNA sequences...Letters/Image features/

Phonemes/DNA sequences...Letters/Image features/

Phonemes/DNA sequences...

Projected into hyper-dimensional space

Random vectors: 1k ~ 10k bits

MAP Kernels(Multiplication-Addition-Permutation)

v1 v2, sum(v1, v2,…), sum(v1), perm(v1)

unknown

‘closest’?Measure ‘distance’: learned & unseen vectors

(recognition/classification/reasoning/etc.)

Application

Representation

Computation

Inference

Stanford University 2015.04.15 H.-S. Philip Wong 35

HD Computing layers

011101010101…..011101010100…..

10110010010…..101010011110…..

Letters/Image features/

Phonemes/DNA sequences...Letters/Image features/

Phonemes/DNA sequences...Letters/Image features/

Phonemes/DNA sequences...

Projected into hyper-dimensional space

Random vectors: 1k ~ 10k bits

MAP Kernels(Multiplication-Addition-Permutation)

v1 v2, sum(v1, v2,…), sum(v1), perm(v1)

unknown

‘closest’?Measure ‘distance’: learned & unseen vectors

(recognition/classification/reasoning/etc.)

Application

Representation

Computation

Inference

Alg

ori

thm

s

Stanford University 2015.04.15 H.-S. Philip Wong 36

HD Computing layers

011101010101…..011101010100…..

10110010010…..101010011110…..

Letters/Image features/

Phonemes/DNA sequences...Letters/Image features/

Phonemes/DNA sequences...Letters/Image features/

Phonemes/DNA sequences...

Projected into hyper-dimensional space

Random vectors: 1k ~ 10k bits

MAP Kernels(Multiplication-Addition-Permutation)

v1 v2, sum(v1, v2,…), sum(v1), perm(v1)

unknown

‘closest’?Measure ‘distance’: learned & unseen vectors

(recognition/classification/reasoning/etc.)

Application

Representation

Computation

Inference

Alg

ori

thm

s Sy

stem

an

d C

ircu

it D

esig

n

Stanford University 2015.04.15 H.-S. Philip Wong 37

HD Computing layers

011101010101…..011101010100…..

10110010010…..101010011110…..

Letters/Image features/

Phonemes/DNA sequences...Letters/Image features/

Phonemes/DNA sequences...Letters/Image features/

Phonemes/DNA sequences...

Projected into hyper-dimensional space

Random vectors: 1k ~ 10k bits

MAP Kernels(Multiplication-Addition-Permutation)

v1 v2, sum(v1, v2,…), sum(v1), perm(v1)

unknown

‘closest’?Measure ‘distance’: learned & unseen vectors

(recognition/classification/reasoning/etc.)

Application

Representation

Computation

Inference

Alg

ori

thm

s Sy

stem

an

d C

ircu

it D

esig

n

Ass

oci

ativ

e m

emo

ry e

nab

led

by

no

vel d

evic

e te

chn

olo

gies

Stanford University 2015.04.15 H.-S. Philip Wong 38

Hyperdimensional (HD) Computing

Monolithic 3D enables

– Energy-efficient classification

– Area efficient HD projection

➔ use RRAM variability & stochastic switching

3D RRAM

+ low power access transistors

+ address decoders

Low power computation

High-density

Inter-layer vias

Stanford University 2015.04.15 H.-S. Philip Wong 39

3D Enables In-Memory Computing

50 nm

TiN (20 nm)

TiN/Ti (50 nm)

TiN

TiN

TiN (BE)

TiNTiN/Ti(TE)

Layer 1 (L1)

Layer 2 (L2)

Layer 3 (L3)

Layer 4 (L4)

3D RRAM with FinFET BL select

H. Li et al., Symp. VLSI Tech., 2016

Stanford University 2015.04.15 H.-S. Philip Wong 40

0 0 1 1

0 1 0 1

1 0 0 1

0 1 1 0

Multiplication

Multiplication

{-1, 1} system

XOR

{0, 1} system

equiv.

MAP Kernels: 3D RRAM Approach

Key HD operations: multiplication, addition, permutation

100101

1

0101

1

1

0

1

Addition

01

0

0

0

1

0

Permutation

10 1k 100k 10M 1G 100G

0

10

20

30

40

50

60

Cu

rre

nt

(A

)

Addition Cycle (#)

Symbol: 1-pillar exp. Line: 2-pillar emulated

11111111

01111111

0011111

100011111

1111

0111

0011

0001

0000

1

2

3

4

B

4.000E+05

5.030E+05

6.325E+05

7.953E+05

1.000E+06

0

0

0

1

1

2

3

4

B

4.000E+05

5.030E+05

6.325E+05

7.953E+05

1.000E+06

0

0

1

0

1

2

3

4

B

4.000E+05

5.030E+05

6.325E+05

7.953E+05

1.000E+06

0

1

0

0

1

2

3

4

B

4.000E+05

5.030E+05

6.325E+05

7.953E+05

1.000E+06

1

0

0

0

VDD

gnd VDD

gnd VDD

gnd

L1

L2

L4

L3

200 ns

(a)

4.000E+05

5.030E+05

6.325E+05

7.953E+05

1.000E+06

4.000E+05

5.030E+05

6.325E+05

7.953E+05

1.000E+06

4.000E+05

5.030E+05

6.325E+05

7.953E+05

1.000E+06

4.000E+05

5.030E+05

6.325E+05

7.953E+05

1.000E+06

1

1

1

0

1

1

0

1

1

0

1

1

0

1

1

1VDD

gnd

gnd

VDD

VDD

gnd

200 ns

L1

L2

L4

L3

(b)

1 2 3 4

B

4.0

00E

+05

5.0

30E

+05

6.3

25E

+05

7.9

53E

+05

1.0

00E

+06

Measured HRS

(400kΩ-1MΩ)

Measured LRS

(~10kΩ)

1 2 3 4

B

4.0

00E

+05

5.0

30E

+05

6.3

25E

+05

7.9

53E

+05

1.0

00E

+06

1 0

Measured data on 4-layer 3D vertical RRAM

1M

100k

10k

1k 1M 1G 1T

1M

100k

10k

Resis

tan

ce

(

)

Logic Evaluation Cycle (#)1k 1M 1G 1T

D = 0

C = 1

C = 0

D = 1

C = 0

D = 1

D = 0

C = 1

A

BCD

00

10

Input AB = pillar addr. = 00

A

BCD

10

01

Input AB = pillar addr. = 01

A

BCD

01

01

Input AB = pillar addr. = 10

A

BCD

11

10

Input AB = pillar addr. = 11

Logic Evaluation Cycle (#)

Resis

tan

ce (

)

Stanford University 2015.04.15 H.-S. Philip Wong 41

3D Integration of Memory and Logic Circuits within the Same Layer & Across Layers

Logic (Si CMOS)

Neuron circuits Communication Synapses/Weights (CNFET/2D FET) + RRAM

Synapses/Weights (3D RRAM)

0 1 0 1

Stanford University 2015.04.15 H.-S. Philip Wong 42

Nano-Engineered Computing Systems Technology

0 1 0 1

Aly et al., IEEE Computer, 2015

Stanford University 2015.04.15 H.-S. Philip Wong 43

Students and Post-Docs

Stanford University 2015.04.15 H.-S. Philip Wong 44

Collaborators

Gert Cauwenberghs Siddharth Joshi Emre Neftci (UC San Diego)

Jinfeng Kang (Peking U)

Chung Lam SangBum Kim Matt Brightsky

K.S. Lee, J.M. Shieh, W.K. Yen… (NDL, Taiwan)

Stanford University 2015.04.15 H.-S. Philip Wong 45

Sponsors

Non-Volatile Memory Technology Research Initiative

E2CDA “Engima”

Stanford University 2015.04.15 H.-S. Philip Wong 46

Stanford SystemX Alliance

Stanford University 2015.04.15 H.-S. Philip Wong 47

Non-Volatile Memory Technology Research Initiative (NMTRI) @ Stanford University

Stanford University 2015.04.15 H.-S. Philip Wong 48

End of Talk

Questions?

poly c-GSTpoly c-GSTpoly c-GST

amorphousamorphous

SiO2 SiO2 SiO2TiN TiN TiN

TiNTiNTiN

fully set state partially reset state fully reset state

Stanford University 2015.04.15 H.-S. Philip Wong 49

Open Research Questions 1. Functionality → performance/Watt, performance/m2 →

variability → reliability

2. Scale up (system size), scale down (device size)

3. Role of variability (functionality, performance)

4. Fan-in / fan-out, hierarchical connections, power delivery

5. Low voltage (wire energy ≅ device energy)

6. Stochastic learning behavior → statistical learning rules

7. Meta-plasticity (internal state variables)

8. Timing as an internal variable

9. Learning rules: biological? AI?

10. Algorithm-device co-design

11. Materials/fabrication: monolithic 3D integration a must, MUST be low temperature