time-domain analog computing and vlsi systems toward … · 2017-08-22 · outline • introduction...
TRANSCRIPT
Workshop on Brain-inspired Hardware
@AIResearchAIST, March 30, 2017
Takashi Morie
Kyushu Institute of Technology, Japan
Time-domain analog computing
and VLSI systems
toward ultimately high-efficient
brain-like hardware
1
Outline
• Introduction
– Our brain-like VLSI chips
– My approach toward brain
• Time-domain analog computing and VLSI systems
– Time-domain energy-efficient weighted sum
calculation based on simple spiking neuron
model
– Chaotic Boltzmann machine circuit based on
oscillator neuron model
• Conclusion
2
Our brain-like VLSI chips
1995 2000 2005 2010
BP/DBM nets
Gabor filter
Conv net
Spiking neural net
Chaos circuit
Coupled chaotic system
Anisotropic propagation
Matching processor
Spiking coupled MRF
ISSCC2009 ISSCC2012SRP
ECCTD2011
VLSI Cir. 2005ESSCIRC 2002
VLSI Cir. 2004
PWM/PPM
Year
ISCAS2008
NCSP2007 ICONIP2011
Spike
based
JSC 1994
Nonlinear
oscillator
BP nets
Floating-gate memory
IEICE Trans. 1997
Analog3
Different approaches to brain functions
LIF H-H Izhikevich
model
Faithfulness/
Plausibility/
Complexity
Neuron
SynapseMAC
(weighted
sum)
Various
nonlinear
properties
Spiking
STDP
. . . . . . .
Poor/
Simple
Good/
Complex
. . . . . . . . . . .
. . .
Operation freq. 10~100 Hz (brain)
Analog . . .
My approach
used in DL algorithms
~1 MHz (VLSI) 4
Outline
• Introduction
– Our brain-like VLSI chips
– My approach toward brain
• Time-domain analog computing and VLSI systems
– Time-domain energy-efficient weighted sum
calculation based on simple spiking neuron
model
– Chaotic Boltzmann machine circuit based on
oscillator neuron model
• Conclusion
5
Digital and analog processor architectures
An
alo
g i
np
ut
MAC output
Memory/Processing
cell array
Digital processing
(von Neumann architecture) Analog/pulse processing
(Cross-bar architecture)
Row
addre
ss
Row
decoder
・・
・・
Processing unit
Memory
(RAM)
Memory cell array
RAM only accesses one row data, and
no calculation can be performed in RAM.6
DL Processors in ISSCC 2017
Latest digital DL processors
~10TOPS/W
7
Measure of energy efficiency of processors
Energy efficiency: TOPS/W
= Tera Ops. per sec. / Joule per sec.
= Tera ops. / Joule
Energy consumption per op.: 1/(TOPS/W) [pJ/op]
= 1 [pJ/op]
Operation performance: TOPS/GOPS
(Tera/Giga Operations Per Second)
FLOPS -> OPS (Fixed-point operations per sec.)
e.g. PC(Core i7) ~500GFLOPS
Synapse op. in brain: 0.1~1 fJ/op
1,000~10,000 TOPS/W
=1~10 POPS/W
# of neurons: ~1011
# of synapses: ~1015
Power cons.: 20(~1) W
Op. freq.: 10~100 Hz
Activity: ~10 %
Latest digital DL processors:
~10TOPS/W
8
Two types of neuron models
analog values
S
PSP
S
spike pulses
Synapse
S summation
saturation functionthresholding
Spiking neuron modelscoded by spatiotemporal
patterns of spike pulses
Analog neuron modelscoded by analog values representing
firing rate or population of spike pulses
Integrate-and-fire neuron
PSP: post-synaptic
potential
(sigmoid)
biological neurons output “spikes”
9
Energy consumption using resistive elements
Voltage-domain circuits based on analog neuron model
Vin1
Vin2
Vinj
N
Op-amp
virtual ground
Op-amps consume much more energy
than resistors.
uses DC-mode operation
CR
Dendrite line
10
𝑽𝒐𝒖𝒕𝑹𝒇
= −
𝒋=𝟏
𝑵𝑽𝒊𝒏𝒋
𝑹𝒋
Ew=tV2/R= 0.1~10fJ
Assuming R=100MW, Vin=0.1~1V, t=1ms
i1
i2
i3
P1
P2
P3
Vn qin
Functions of PSPs in spiking neurons
ISI: Inter-spike
interval
PSP: Post-synaptic
potentialtime
Vth
P1
i1
Internal
state
Spike
output
ISI<< tPSP
ISI
t PSP
>> tPSP
t
IntegratorCoincidence
detector
PSP
i2
P2
ISItime
ISI
Vn
in
Spike
input
11
w1
i1
i2
i3
P1
P2
P3
Vn
spkn
th
tvtime
w2
w3
t1 t3 t2
Integrate & fire neuron
i1
i2
i3
P1
P2
P3
Vn qin
ii
time
Pi
Typical time course of PSPs: a-function
Vn
Spatiotemporal
summation of all
linear rising
waveforms of
PSPs
W.Maass et al., 1999 12
Time-domain weighted-sum calculation model
given
weighted summation
(MAC: multiply and
accumulate)
xi --> spike timing ti
wi --> slope of PSP
spike timing tv --> y
,
w1
i1
i2
i3
P1
P2
P3
Vn
spkn
th
tvtime
Tin
w2
w3
t1 t3 t2
T. Morie et al. IEEE NANO 2016, Q. Wang et al. ICONIP 2016 13
Time-domain MAC calculationIn
put X
Weig
hts
W
𝑖=1
𝑁
𝑤𝑖𝑥𝑖 = 𝟖. 𝟗𝟗𝑥𝑖 ∈ 0,1
V(t
)
𝑇𝑖𝑛 𝑇𝑖𝑛
𝒕𝝂𝒕𝝂𝒎𝒂𝒙 𝒕𝝂
𝒎𝒊𝒏
𝜽
Time
𝒕𝝂𝒎𝒂𝒙 =
𝜽
𝜷
𝑁 249
𝑇𝑖𝑛 1
𝛽 24.01
𝜃 24.25
𝒕𝝂 𝟏. 𝟔𝟑𝟓ൗ𝜃 𝜆 + 𝛽(𝑇𝑖𝑛 − 𝑡𝜈)
𝑇𝑖𝑛𝟖. 𝟗𝟗
3-layer MLP784
500 10
OutputInput
(numerical simulation)
𝒕𝝂𝒎𝒊𝒏 =
𝜽
𝜷+ 𝑻𝒊𝒏
Positive weight distribution
14Q. Wang et al. ICONIP 2016
Energy consumption using resistive elements
Vin
t =RC
Vin/t
q
k
0 time
If C=1fF, V=1V, Ewsum=1fJ.
Assuming N=100, Ew=Ewsum/N=10aJ
To guarantee time resolution,
t=1ms (1,000 steps with 1 ns)
Ewsum= CV2
Extremely low-energy
computation due to
parallelism,
but very high resistance
is needed.
Time-domain analog circuits
independent of
resistance values
Vin1
Vin2
Vinj
N
R
CComparator
transient state of RC circuit
thDendrites
To reduce energy, reduce (parasitic)
interconnect capacitance
RON ~ 1GΩ, (ROFF ~ 1TΩ)
(input lines)
15Q. Wang et al. ICONIP 2016
Time-domain analog cross-bar circuit
DendritesAxon
Post-neuron
Output
Input
Coi
Cii
Ci
Low parasitic
(interconnect)
capacitance
[ Nanowire]
Low-power
neuron circuit
(comparator)
Resistance
change
memory
[ReRAM device]
Pre
-neu
ron
w1
i1
i2
i3
P1
P2
P3
Vn
spkn
th
tvtime
Tin
w2
w3
t1 t3 t2
Vn
ii
Pi
16
Comparison among different approaches
Approach
# of
neurons/
synapses
Energy per
synapse
operation
Processing
frequency
Power
consump-
tion
Biologi-
calHuman brain
~1011/
~1015
10-16~10-15 J
(0.1~1 fJ)10~100 Hz(10% active)
20 (~1) W
Digital
Super Comput.
(Kei「京」) (*1)
1.7x109/
1.0x1013
(6.5x10-4J)
(~1 mJ)
Brain:1s =
Kei: 2,400s
(4.4 fires/s)
~12 MW
Digital chips
(TrueNorth~)
1x106/
256x106
~10-13 J
(< 0.1 pJ)1 kHz 86 mW
Analog
Voltage/current
-domain
[1x106/
256x106] (*2)
>~10-15 J
(~ 1 fJ) (*3)<~MHz
[~300 mW]
(*4)
Time-domain[1x106/
256x106] (*2)
>~10-17 J
(~10 aJ) (*3)<~MHz
[~3 mW]
(*4)
(*1) http://www.riken.jp/pr/topics/2013/20130802_2/
(*2) Assuming the same values as TrueNorth
(*3) Estimation when assuming ideal condition
(*4) Only for weighted summation
17
Outline
• Introduction
– Our brain-like VLSI chips
– My approach toward brain
• Time-domain analog computing and VLSI systems
– Time-domain energy-efficient weighted sum
calculation based on simple spiking neuron
model
– Chaotic Boltzmann machine circuit based on
oscillator neuron model
• Conclusion
18
BMs
CBMs
Original and chaotic Boltzmann machines
[1] D. H. Ackley et al., Cog. Sci. 9, 1985.
[2] H. Suzuki et al., Sci. Rep., 2013.
Boltzmann machines (BMs)[1]
• Stochastic operation of binary neurons
• Symmetrically connected networks
• Solving optimization problems using energy minimization
• Success of deep learning by restricted BMs
Chaotic Boltzmann machines (CBMs)[2]
• Deterministic operation
• Using chaotic dynamics instead
of stochastic operation
• Computing ability comparable
to BMs
Simulation results 19
Chaotic Boltzmann machines (CBMs)
T
zSS
dt
dx iii
i 21exp121
Si ←0 0ix
Si ←1 1ix
i
N
j
jiji θswz 1
jiij ww
Dynamics of CBMs
Si
xi
T
wij
qi
: Binary output of i-th neuron
: Internal state of i-th neuron
: Temperature parameter
: Synaptic weight between i and j
: Constant bias of i-th neuron
without input
Slope change in xi
when Sj changes
...
Neuron i
wi1
wij
Neuron 1
Neuron 2
Neuron j
20
T
zSS
dt
dx iii
i 21exp121
A CMOS unit circuit for CBMs
Log(IDS)
VGSVth
Vth: Threshold voltage
: Sub-threshold region
IDS
D
S
G
NMOSFET
VGS
Implemented using
subthreshold region
of MOSFET
…
VminVmax
Sixi
S1
S2 exp
T
zSS
dt
dx iii
i 21exp121
j
jijii swbz
Si ←0 0xi
Si ←1 1xi
jiij ww
Si
xi
zitime
zi
(a) (b)
(c)
wi1
wi2
M. Yamaguchi et al. ICONIP 2016. 21
Measurement results of CBM VLSI chip
20ms
Continuous
scan
Single
scan
S2
S1
Vx3
S3
3 neurons
with 3 synapses
22
Conclusions
• Time-domain analog VLSI implementation can achieve
extremely energy-efficient operation including nonlinear
transforms, which is difficult for digital VLSI
implementation.
• Weighted-sum calculation as a simple synaptic function
can be achieved with extremely low energy consumption
based on time-domain operation of simple spiking
neuron model, but high-resistance element is required.
Also, to reduce parasitic interconnection capacitance is
another challenge to achieve energy-efficient operation.
• Nonlinear transforms performed in charging operation to
a capacitor were successfully applied to implementation
of nonlinear dynamical models, such as chaotic
Boltzmann machines.
23