model-free stochastic perturbative adaptation and optimization
DESCRIPTION
Model-Free Stochastic Perturbative Adaptation and Optimization. Gert Cauwenberghs Johns Hopkins University [email protected] 520.776 Learning on Silicon http://bach.ece.jhu.edu/gert/courses/776. Model-Free Stochastic Perturbative Adaptation and Optimization OUTLINE. Model-Free Learning - PowerPoint PPT PresentationTRANSCRIPT
G. Cauwenberghs 520.776 Learning on Silicon
Model-Free Stochastic Perturbative
Adaptation and Optimization
Gert CauwenberghsJohns Hopkins University
520.776 Learning on Siliconhttp://bach.ece.jhu.edu/gert/courses/776
G. Cauwenberghs 520.776 Learning on Silicon
Model-Free Stochastic PerturbativeAdaptation and Optimization
OUTLINE
• Model-Free Learning– Model Complexity– Compensation of Analog VLSI Mismatch
• Stochastic Parallel Gradient Descent– Algorithmic Properties– Mixed-Signal Architecture– VLSI Implementation
• Extensions– Learning of Continuous-Time Dynamics– Reinforcement Learning
• Model-Free Adaptive Optics– AdOpt VLSI Controller– Adaptive Optics “Quality” Metrics– Applications to Laser Communication and Imaging
G. Cauwenberghs 520.776 Learning on Silicon
The Analog Computing Paradigm
• Local functions are efficiently implemented with minimal circuitry, exploiting the physics of the devices.
• Excessive global interconnects are avoided:– Currents or charges are accumulated along a single wire.– Voltage is distributed along a single wire.
Pros:– Massive Parallellism– Low Power Dissipation– Real-Time, Real-World Interface– Continuous-Time Dynamics
Cons:– Limited Dynamic Range– Mismatches and Nonlinearities (WYDINWYG)
G. Cauwenberghs 520.776 Learning on Silicon
Effect of Implementation Mismatches
Associative Element:– Mismatches can be properly compensated by adjusting
the parameters pi accordingly, provided sufficient degrees of freedom are available to do so.
Adaptive Element:– Requires precise implementation– The accuracy of implemented polarity (rather than
amplitude) of parameter update increments pi is the performance limiting factor.
REFERENCE
pi
INPUTS OUTPUTS
(p)
SYSTEM
G. Cauwenberghs 520.776 Learning on Silicon
Example: LMS Rule
A linear perceptron under supervised learning:
with gradient descent:
reduces to an incremental outer-product update rule, with scalable, modular implementation in analog VLSI.
pij(k) = -
(k)
pij = - xj
(k) yitarget (k) - yi
(k)
yi(k) = pij xj
(k)j
= 12 yi
target (k) - yi(k) 2
jk
( )
G. Cauwenberghs 520.776 Learning on Silicon
Incremental Outer-Product Learning in Neural Nets
Multi-Layer Perceptron:
Outer-Product Learning Update:
– Hebbian (Hebb, 1949):
– LMS Rule (Widrow-Hoff, 1960):
– Backpropagation (Werbos, Rumelhart, LeCun):
pij = xjei
ei = xi
ei = f 'i xitarget - xi
pijxjxi
ejei
j i
xi = f( pij xjj
)
ej = f 'j pij eii
G. Cauwenberghs 520.776 Learning on Silicon
Gradient Descent Learning
Minimize (p) by iterating:
from calculation of the gradient:
Implementation Problems:
– Requires an explicit model of the internal network dynamics.
– Sensitive to model mismatches and noise in the implemented network and learning system.
– Amount of computation typically scales strongly with the number of parameters.
pi(k + 1) = pi
(k) - pi
(k)
pi
= yl
yl
xmxm
pim
l
G. Cauwenberghs 520.776 Learning on Silicon
Gradient-Free Approach to Error-Descent Learning
Avoid the model sensitivity of gradient descent, by observing the parameter dependence of the performance error on the network directly, rather than calculating gradient information from a pre-assumed model of the network.
Stochastic Approximation:– Multi-dimensional Kiefer-Wolfowitz (Kushner & Clark 1978)
– Function Smoothing Global Optimization (Styblinski & Tang 1990)
– Simultaneous Perturbation Stochastic Approximation (Spall 1992)
Hardware-Related Variants:– Model-Free Distributed Learning (Dembo & Kailath 1990)
– Noise Injection and Correlation (Anderson & Kerns; Kirk & al. 1992-93)
– Stochastic Error Descent (Cauwenberghs 1993)
– Constant Perturbation, Random Sign (Alspector & al. 1993)
– Summed Weight Neuron Perturbation (Flower & Jabri 1993)
G. Cauwenberghs 520.776 Learning on Silicon
Stochastic Error-Descent Learning
Minimize (p) by iterating:
from observation of the gradient in the direction of (k):
with random uncorrelated binary components of the perturbation
vector (k):
Advantages:– No explicit model knowledge is required.– Robust in the presence of noise and model mismatches.– Computational load is significantly reduced.– Allows simple, modular, and scalable implementation.– Convergence properties similar to exact gradient descent.
p(k+1) = p(k) – (k)(k)
(k) = 12(p(k)+(k)) – (p(k)–(k))
i(k) = ; E(i
(k)j(l)) 2 ijkl
G. Cauwenberghs 520.776 Learning on Silicon
Stochastic Perturbative LearningCell Architecture
GLOBAL
NETWORK
pi(t) + (t) i(t)pi(t) (p(t) + (t) (t))z–1
i(t)
–(t)^ (t)
LOCAL
(t)
–(t)^
p(k+1) = p(k) – (k)(k) (k) = 12(p(k)+(k)) – (p(k)–(k))
G. Cauwenberghs 520.776 Learning on Silicon
Stochastic Perturbative LearningCircuit Cell
ENp
ENn
POL
Vbp
Vbn
pi(t) + (t)i(t)
Cstore
Cperturb
i i
V+ V
–
i
i
sign()^
G. Cauwenberghs 520.776 Learning on Silicon
Charge Pump Characteristics
0 0.1 0.2 0.3 0.4 0.5 0.6
10
10
10
10
10
10
-5
-4
-3
-2
-1
0
Gate Voltage Vbn (V)
Vol
tage
Dec
rem
ent ²V
sto
red (
V)
t = 40 msec
1 msec
23 µsec
t = 0
(a)
0 0.1 0.2 0.3 0.4 0.5 0.6
10
10
10
10
10
10
-5
-4
-3
-2
-1
0
Gate Voltage Vbp (V)
Vol
tage
Incr
emen
t ²Vst
ore
d (V
)
t = 40 msec
1 msec
23 µsec
(b)
C
VstoredIadapt
Qadapt
ENp
ENn
POL
Vbp
Vbn
(b)
(a)
G. Cauwenberghs 520.776 Learning on Silicon
Supervised Learning of Recurrent Neural Dynamics
Iref+
1H
2H
3H
4H
5H
6H
0H
6V5
V4V3
V2V1
V
UP
DA
TE
AC
TIV
AT
ION
AN
D P
RO
BE
MU
LTIP
LE
XIN
G
Q(.) Q(.) Q(.) Q(.) Q(.) Q(.)
BINARY QUANTIZATION
x1(t)
x1T(t)
x2(t)
x2T(t)
x1
x2
x3
x4
x5
x6
Iref–
W11 W12 W13 W14 W15 W16
W21
W31
W41
W51
W61
W22
W66
W56
W65
1 2 3 4 5 6
W off W off W off W off W off W off
TEACHER FORCING
dxdt
= F p, x, y z = G xx(t)y(t) z(t)
DYNAMICAL SYSTEM
G. Cauwenberghs 520.776 Learning on Silicon
The Credit Assignment Problemor How to Learn from Delayed Rewards
– External, discontinuous reinforcement signal r(t).– Adaptive Critics:
• Discrimination Learning (Grossberg, 1975)
• Heuristic Dynamic Programming (Werbos, 1977)
• Reinforcement Learning (Sutton and Barto, 1983)
• TD() (Sutton, 1988)
• Q-Learning (Watkins, 1989)
ADAPTIVE CRITIC
SYSTEM
pi
INPUTS OUTPUTS
r(t)
r*(t)
G. Cauwenberghs 520.776 Learning on Silicon
Reinforcement Learning(Barto and Sutton, 1983)
Locally tuned, address encoded neurons:
Adaptation of classifier and adaptive critic:
– eligibilities:
– internal reinforcement:
(t) {0, ... 2n–1} : n–bit address encoding of state space
y(t) = y(t) : classifier output
q(t) = q(t) : adaptive critic
yk(t+1) = yk(t) + r(t) ek(t) yk(t)qk(t+1) = qk(t) + r(t) ek(t)
ek(t+1) = ek(t) + (1 – ) k (t)
r(t) = r(t) + q(t) – q(t – 1)
G. Cauwenberghs 520.776 Learning on Silicon
Reinforcement Learning Classifier for Binary Control
ek
r̂
SELhor
SELvert
V
Vbp UPD
UPDVbn
qkVp
Vbn
SELhor
qvert
Vbn
SELhor
yvert
UPDUPD
Vbp
VbpVp
VnLOCKLOCK
yk
HYST
HYST
u(t)
y = –1 y = 1y(t)
x1(t)
x2(t)
Adaptive Critic
Action Network
Neuron Select(State)
State Eligibility
State(Quantized)
Action(Binary)
64 Reinforcement Learning Neurons
G. Cauwenberghs 520.776 Learning on Silicon
cornea
irisretina
optic nerve
lens
zonulefibers
brain
A Biological Adaptive Optics System
G. Cauwenberghs 520.776 Learning on Silicon
Wavefront Distortion and Adaptive Optics
• Imaging - defocus - motion
• Laser beam - beam wander/spread - intensity fluctuations
G. Cauwenberghs 520.776 Learning on Silicon
Adaptive OpticsConventional Approach
– Performs phase conjugation• assumes intensity is unaffected
– Complex• requires accurate wavefront phase sensor (Shack-Hartman; Zernike
nonlinear filter; etc.)• computationally intensive control system
G. Cauwenberghs 520.776 Learning on Silicon
Adaptive OpticsModel-Free Integrated Approach
– Optimizes a direct measure J of optical performance (“quality metric”)
– No (explicit) model information is required• any type of quality metric J, wavefront corrector (MEMS, LC, …)• no need for wavefront phase sensor
– Tolerates imprecision in the implementation of the updates• system level precision limited by accuracy of the measured J
Wavefront corrector withN elements: u1 ,…,un ,…,uN
Incomingwavefront
G. Cauwenberghs 520.776 Learning on Silicon
wavefrontcorrector
performancemetric sensor
AdOpt VLSI wavefront controller
u
J(u)
image(u)
J(u)
Adaptive Optics Controller ChipOptimization by Parallel Perturbative Stochastic Gradient Descent
G. Cauwenberghs 520.776 Learning on Silicon
wavefrontcorrector
performancemetric sensor
AdOpt VLSI wavefront controller
J(u)
image(u)
u+u
J(u+u)
(u+u)
J(u+u)
Adaptive Optics Controller ChipOptimization by Parallel Perturbative Stochastic Gradient Descent
G. Cauwenberghs 520.776 Learning on Silicon
wavefrontcorrector
performancemetric sensor
AdOpt VLSI wavefront controller
J(u)
image(u)
J(u+u)
(u+u)
J(u-u)
u-u
(u-u)
J(u-u)
J
Adaptive Optics Controller ChipOptimization by Parallel Perturbative Stochastic Gradient Descent
uj(k+1) = uj (k) + J(k) uj (k)
G. Cauwenberghs 520.776 Learning on Silicon
Parallel Perturbative Stochastic Gradient DescentArchitecture
uj(k+1) = uj (k) + J(k) uj (k)
z–1
uj
uj
J
J(u)
{-1,0,+1}
(k) (k) (k) (k) (k)1 N 1 N
k k k k1 N 1 Nu uJ J u ,...,u J u , ,u u... u
uiuj
ijuncorrelatedperturbations
2
G. Cauwenberghs 520.776 Learning on Silicon
Parallel Perturbative Stochastic Gradient DescentMixed-Signal Architecture
uj(k+1) = uj (k) + |J(k)| sgn(J(k) ) uj (k)
z–1
uj
uj
sgn(J)
J(u)
uj = ±1Bernoulli
{-1,0,+1}
PolarityAmplitude
J
analog
digital
G. Cauwenberghs 520.776 Learning on Silicon
Wavefront ControllerVLSI Implementation
Edwards, Cohen, Cauwenberghs, Vorontsov & Carhart (1999)
AdOpt mixed-mode chip 2.2 mm2, 1.2m CMOS
• Controls 19 channels
• Interfaces with LC SLM or MEMS mirrors
kj
k k kj j j
(k) (k) (k)
(k 1) (k) (k) (k)j j j
Generate Bernoulli distributed u
with u and u 1
Decompose J into J J
Update u u sgxor n J
sgn
sgn
G. Cauwenberghs 520.776 Learning on Silicon
Wavefront ControllerSystem-level Characterization (LC SLM)
HEX-127LC SLM
G. Cauwenberghs 520.776 Learning on Silicon
Wavefront ControllerSystem-level Characterization (SLM)
Maximized and minimized J(u) 100 times
minmax
G. Cauwenberghs 520.776 Learning on Silicon
Wavefront ControllerSystem-Level Characterization (MEMS)
Weyrauch, Vorontsov, Bifano, Hammer, Cohen & Cauwenberghs
Response function
MEMS mirror(OKO)
G. Cauwenberghs 520.776 Learning on Silicon
Wavefront ControllerSystem-level Characterization
(over 500 trials)
G. Cauwenberghs 520.776 Learning on Silicon
Quality Metrics
2J dimage
I( , t)
r
rr
J bit-error ratecomm
Horn(1968),Delbruck (1999)
J = F I , tbeam
r Muller et al.(1974) Vorontsov et al.(1996)
laser comm:
imaging:
laser beam focusing:
G. Cauwenberghs 520.776 Learning on Silicon
Image Quality Metric ChipCohen, Cauwenberghs, Vorontsov & Carhart (2001)
N,M N,M
i, j i , j
i , j i , j
IQM I K I
0 1 0K 1 4 1
0 1 0
2mm 120m22 x 22pixel array
ijimage I ijedge image I K
120
m
pixel
G. Cauwenberghs 520.776 Learning on Silicon
Architecture (3 x 3)
Edge and Corner Kernels
+4
-1
-1
-1 -1
G. Cauwenberghs 520.776 Learning on Silicon
Image Quality Metric Chip CharacterizationExperimental Setup
G. Cauwenberghs 520.776 Learning on Silicon
Image Quality Metric Chip Characterization Experimental Results
uniformly illuminated
dynamicrange~ 20
G. Cauwenberghs 520.776 Learning on Silicon
Image Quality Metric Chip CharacterizationExperimental Results
G. Cauwenberghs 520.776 Learning on Silicon
Beam Variance Metric ChipCohen, Cauwenberghs, Vorontsov & Carhart (2001)
2N,M N,M
2i, j i, j
i, j i, j
BVM N M I I
22 x 22array22x22
Pixel array
+ + + +
20 x 20Pixel array 70m
perimeter of dummy pixels
G. Cauwenberghs 520.776 Learning on Silicon
Beam Variance Metric Chip CharacterizationExperimental Setup
G. Cauwenberghs 520.776 Learning on Silicon
Beam Variance Metric Chip CharacterizationExperimental Results
dynamicrange~ 5
G. Cauwenberghs 520.776 Learning on Silicon
Beam Variance Metric Chip CharacterizationExperimental Results
G. Cauwenberghs 520.776 Learning on Silicon
Beam Variance Metric Sensor in the LoopLaser Receiver Setup
G. Cauwenberghs 520.776 Learning on Silicon
Conclusions
• Computational primitives of adaptation and learning are naturally implemented in analog VLSI, and allow to compensate for inaccuracies in the physical implementation of the system under adaptation.
• Care should still be taken to avoid inaccuracies in the implementation of the adaptive element. Nevertheless, this can easily be achieved by ensuring the correct polarity, rather than amplitude, of the parameter update increments.
• Adaptation algorithms based on physical observation of the “performance” gradient in parameter space are better suited for analog VLSI implementation than algorithms based on a calculated gradient.
• Among the most generally applicable learning architectures are those that operate on reinforcement signals, and those that blindly extract and classify signals.
• Model-free adaptive optics leads to efficient and robust analog implementation of the control algorithm using a criterion that can be freely chosen to accommodate different wavefront correctors, and different imaging or laser communication applications.