model-free stochastic perturbative adaptation and optimization

G. Cauwenberghs 520.776 Learning on Silicon

Model-Free Stochastic Perturbative

Adaptation and Optimization

Gert CauwenberghsJohns Hopkins University

[email protected]

520.776 Learning on Siliconhttp://bach.ece.jhu.edu/gert/courses/776


Model-Free Stochastic PerturbativeAdaptation and Optimization

OUTLINE

• Model-Free Learning– Model Complexity– Compensation of Analog VLSI Mismatch

• Stochastic Parallel Gradient Descent– Algorithmic Properties– Mixed-Signal Architecture– VLSI Implementation

• Extensions– Learning of Continuous-Time Dynamics– Reinforcement Learning

• Model-Free Adaptive Optics– AdOpt VLSI Controller– Adaptive Optics “Quality” Metrics– Applications to Laser Communication and Imaging


The Analog Computing Paradigm

• Local functions are efficiently implemented with minimal circuitry, exploiting the physics of the devices.

• Excessive global interconnects are avoided:– Currents or charges are accumulated along a single wire.– Voltage is distributed along a single wire.

Pros:– Massive Parallellism– Low Power Dissipation– Real-Time, Real-World Interface– Continuous-Time Dynamics

Cons:– Limited Dynamic Range– Mismatches and Nonlinearities (WYDINWYG)


Effect of Implementation Mismatches

Associative Element:– Mismatches can be properly compensated by adjusting

the parameters pi accordingly, provided sufficient degrees of freedom are available to do so.

Adaptive Element:– Requires precise implementation– The accuracy of implemented polarity (rather than

amplitude) of parameter update increments pi is the performance limiting factor.

REFERENCE

pi

INPUTS OUTPUTS

(p)

SYSTEM


Example: LMS Rule

A linear perceptron under supervised learning:

with gradient descent:

reduces to an incremental outer-product update rule, with scalable, modular implementation in analog VLSI.

pij(k) = -

(k)

pij = - xj

(k) yitarget (k) - yi

(k)

yi(k) = pij xj

(k)j

= 12 yi

target (k) - yi(k) 2

jk

( )


Incremental Outer-Product Learning in Neural Nets

Multi-Layer Perceptron:

Outer-Product Learning Update:

– Hebbian (Hebb, 1949):

– LMS Rule (Widrow-Hoff, 1960):

– Backpropagation (Werbos, Rumelhart, LeCun):

pij = xjei

ei = xi

ei = f 'i xitarget - xi

pijxjxi

ejei

j i

xi = f( pij xjj

)

ej = f 'j pij eii


Gradient Descent Learning

Minimize (p) by iterating:

from calculation of the gradient:

Implementation Problems:

– Requires an explicit model of the internal network dynamics.

– Sensitive to model mismatches and noise in the implemented network and learning system.

– Amount of computation typically scales strongly with the number of parameters.

pi(k + 1) = pi

(k) - pi

(k)

pi

= yl

yl

xmxm

pim

l


Gradient-Free Approach to Error-Descent Learning

Avoid the model sensitivity of gradient descent, by observing the parameter dependence of the performance error on the network directly, rather than calculating gradient information from a pre-assumed model of the network.

Stochastic Approximation:– Multi-dimensional Kiefer-Wolfowitz (Kushner & Clark 1978)

– Function Smoothing Global Optimization (Styblinski & Tang 1990)

– Simultaneous Perturbation Stochastic Approximation (Spall 1992)

Hardware-Related Variants:– Model-Free Distributed Learning (Dembo & Kailath 1990)

– Noise Injection and Correlation (Anderson & Kerns; Kirk & al. 1992-93)

– Stochastic Error Descent (Cauwenberghs 1993)

– Constant Perturbation, Random Sign (Alspector & al. 1993)

– Summed Weight Neuron Perturbation (Flower & Jabri 1993)


Stochastic Error-Descent Learning

Minimize (p) by iterating:

from observation of the gradient in the direction of (k):

with random uncorrelated binary components of the perturbation

vector (k):

Advantages:– No explicit model knowledge is required.– Robust in the presence of noise and model mismatches.– Computational load is significantly reduced.– Allows simple, modular, and scalable implementation.– Convergence properties similar to exact gradient descent.

p(k+1) = p(k) – (k)(k)

(k) = 12(p(k)+(k)) – (p(k)–(k))

i(k) = ; E(i

(k)j(l)) 2 ijkl


Stochastic Perturbative LearningCell Architecture

GLOBAL

NETWORK

pi(t) + (t) i(t)pi(t) (p(t) + (t) (t))z–1

i(t)

–(t)^ (t)

LOCAL

(t)

–(t)^

p(k+1) = p(k) – (k)(k) (k) = 12(p(k)+(k)) – (p(k)–(k))


Stochastic Perturbative LearningCircuit Cell

ENp

ENn

POL

Vbp

Vbn

pi(t) + (t)i(t)

Cstore

Cperturb

i i

V+ V

–

i

i

sign()^


Charge Pump Characteristics

0 0.1 0.2 0.3 0.4 0.5 0.6

10

10

10

10

10

10

-5

-4

-3

-2

-1

0

Gate Voltage Vbn (V)

Vol

tage

Dec

rem

ent ²V

sto

red (

V)

t = 40 msec

1 msec

23 µsec

t = 0

(a)

0 0.1 0.2 0.3 0.4 0.5 0.6

10

10

10

10

10

10

-5

-4

-3

-2

-1

0

Gate Voltage Vbp (V)

Vol

tage

Incr

emen

t ²Vst

ore

d (V

)

t = 40 msec

1 msec

23 µsec

(b)

C

VstoredIadapt

Qadapt

ENp

ENn

POL

Vbp

Vbn

(b)

(a)


Supervised Learning of Recurrent Neural Dynamics

Iref+

1H

2H

3H

4H

5H

6H

0H

6V5

V4V3

V2V1

V

UP

DA

TE

AC

TIV

AT

ION

AN

D P

RO

BE

MU

LTIP

LE

XIN

G

Q(.) Q(.) Q(.) Q(.) Q(.) Q(.)

BINARY QUANTIZATION

x1(t)

x1T(t)

x2(t)

x2T(t)

x1

x2

x3

x4

x5

x6

Iref–

W11 W12 W13 W14 W15 W16

W21

W31

W41

W51

W61

W22

W66

W56

W65

1 2 3 4 5 6

W off W off W off W off W off W off

TEACHER FORCING

dxdt

= F p, x, y z = G xx(t)y(t) z(t)

DYNAMICAL SYSTEM


The Credit Assignment Problemor How to Learn from Delayed Rewards

– External, discontinuous reinforcement signal r(t).– Adaptive Critics:

• Discrimination Learning (Grossberg, 1975)

• Heuristic Dynamic Programming (Werbos, 1977)

• Reinforcement Learning (Sutton and Barto, 1983)

• TD() (Sutton, 1988)

• Q-Learning (Watkins, 1989)

ADAPTIVE CRITIC

SYSTEM

pi

INPUTS OUTPUTS

r(t)

r*(t)


Reinforcement Learning(Barto and Sutton, 1983)

Locally tuned, address encoded neurons:

Adaptation of classifier and adaptive critic:

– eligibilities:

– internal reinforcement:

(t) {0, ... 2n–1} : n–bit address encoding of state space

y(t) = y(t) : classifier output

q(t) = q(t) : adaptive critic

yk(t+1) = yk(t) + r(t) ek(t) yk(t)qk(t+1) = qk(t) + r(t) ek(t)

ek(t+1) = ek(t) + (1 – ) k (t)

r(t) = r(t) + q(t) – q(t – 1)


Reinforcement Learning Classifier for Binary Control

ek

r̂

SELhor

SELvert

V

Vbp UPD

UPDVbn

qkVp

Vbn

SELhor

qvert

Vbn

SELhor

yvert

UPDUPD

Vbp

VbpVp

VnLOCKLOCK

yk

HYST

HYST

u(t)

y = –1 y = 1y(t)

x1(t)

x2(t)

Adaptive Critic

Action Network

Neuron Select(State)

State Eligibility

State(Quantized)

Action(Binary)

64 Reinforcement Learning Neurons


cornea

irisretina

optic nerve

lens

zonulefibers

brain

A Biological Adaptive Optics System


Wavefront Distortion and Adaptive Optics

• Imaging - defocus - motion

• Laser beam - beam wander/spread - intensity fluctuations


Adaptive OpticsConventional Approach

– Performs phase conjugation• assumes intensity is unaffected

– Complex• requires accurate wavefront phase sensor (Shack-Hartman; Zernike

nonlinear filter; etc.)• computationally intensive control system


Adaptive OpticsModel-Free Integrated Approach

– Optimizes a direct measure J of optical performance (“quality metric”)

– No (explicit) model information is required• any type of quality metric J, wavefront corrector (MEMS, LC, …)• no need for wavefront phase sensor

– Tolerates imprecision in the implementation of the updates• system level precision limited by accuracy of the measured J

Wavefront corrector withN elements: u1 ,…,un ,…,uN

Incomingwavefront


wavefrontcorrector

performancemetric sensor

AdOpt VLSI wavefront controller

u

J(u)

image(u)

J(u)

Adaptive Optics Controller ChipOptimization by Parallel Perturbative Stochastic Gradient Descent


wavefrontcorrector



J(u)

image(u)

u+u

J(u+u)

(u+u)

J(u+u)



wavefrontcorrector



J(u)

image(u)

J(u+u)

(u+u)

J(u-u)

u-u

(u-u)

J(u-u)

J


uj(k+1) = uj (k) + J(k) uj (k)


Parallel Perturbative Stochastic Gradient DescentArchitecture

uj(k+1) = uj (k) + J(k) uj (k)

z–1

uj

uj

J

J(u)

{-1,0,+1}

(k) (k) (k) (k) (k)1 N 1 N

k k k k1 N 1 Nu uJ J u ,...,u J u , ,u u... u

uiuj

ijuncorrelatedperturbations

2


Parallel Perturbative Stochastic Gradient DescentMixed-Signal Architecture

uj(k+1) = uj (k) + |J(k)| sgn(J(k) ) uj (k)

z–1

uj

uj

sgn(J)

J(u)

uj = ±1Bernoulli

{-1,0,+1}

PolarityAmplitude

J

analog

digital


Wavefront ControllerVLSI Implementation

Edwards, Cohen, Cauwenberghs, Vorontsov & Carhart (1999)

AdOpt mixed-mode chip 2.2 mm2, 1.2m CMOS

• Controls 19 channels

• Interfaces with LC SLM or MEMS mirrors

kj

k k kj j j

(k) (k) (k)

(k 1) (k) (k) (k)j j j

Generate Bernoulli distributed u

with u and u 1

Decompose J into J J

Update u u sgxor n J

sgn

sgn


Wavefront ControllerChip-level Characterization


Wavefront ControllerSystem-level Characterization (LC SLM)

HEX-127LC SLM


Wavefront ControllerSystem-level Characterization (SLM)

Maximized and minimized J(u) 100 times

minmax


Wavefront ControllerSystem-Level Characterization (MEMS)

Weyrauch, Vorontsov, Bifano, Hammer, Cohen & Cauwenberghs

Response function

MEMS mirror(OKO)


Wavefront ControllerSystem-level Characterization

(over 500 trials)


Quality Metrics

2J dimage

I( , t)

r

rr

J bit-error ratecomm

Horn(1968),Delbruck (1999)

J = F I , tbeam

r Muller et al.(1974) Vorontsov et al.(1996)

laser comm:

imaging:

laser beam focusing:


Image Quality Metric ChipCohen, Cauwenberghs, Vorontsov & Carhart (2001)

N,M N,M

i, j i , j

i , j i , j

IQM I K I

0 1 0K 1 4 1

0 1 0

2mm 120m22 x 22pixel array

ijimage I ijedge image I K

120

m

pixel


Architecture (3 x 3)

Edge and Corner Kernels

+4

-1

-1

-1 -1


Image Quality Metric Chip CharacterizationExperimental Setup


Image Quality Metric Chip Characterization Experimental Results

uniformly illuminated

dynamicrange~ 20


Image Quality Metric Chip CharacterizationExperimental Results


Beam Variance Metric ChipCohen, Cauwenberghs, Vorontsov & Carhart (2001)

2N,M N,M

2i, j i, j

i, j i, j

BVM N M I I

22 x 22array22x22

Pixel array

+ + + +

20 x 20Pixel array 70m

perimeter of dummy pixels


N,M

1+κ2 κu i,j u

i,jBVM 2

N,M

i,ji,j

MNI I I

I

I


Beam Variance Metric Chip CharacterizationExperimental Setup


Beam Variance Metric Chip CharacterizationExperimental Results

dynamicrange~ 5


Beam Variance Metric Chip CharacterizationExperimental Results


Beam Variance Metric Sensor in the LoopLaser Receiver Setup


Beam Variance Metric Sensor in the Loop


Conclusions

• Computational primitives of adaptation and learning are naturally implemented in analog VLSI, and allow to compensate for inaccuracies in the physical implementation of the system under adaptation.

• Care should still be taken to avoid inaccuracies in the implementation of the adaptive element. Nevertheless, this can easily be achieved by ensuring the correct polarity, rather than amplitude, of the parameter update increments.

• Adaptation algorithms based on physical observation of the “performance” gradient in parameter space are better suited for analog VLSI implementation than algorithms based on a calculated gradient.

• Among the most generally applicable learning architectures are those that operate on reinforcement signals, and those that blindly extract and classify signals.

• Model-free adaptive optics leads to efficient and robust analog implementation of the control algorithm using a criterion that can be freely chosen to accommodate different wavefront correctors, and different imaging or laser communication applications.

model-free stochastic perturbative adaptation and optimization

Documents