an all-mos analog feed forward
Post on 06-Apr-2018
226 Views
Preview:
TRANSCRIPT
8/3/2019 An All-mos Analog Feed Forward
http://slidepdf.com/reader/full/an-all-mos-analog-feed-forward 1/4
AN ALL-MOS ANALOG FEEDFORWARDNEURAL CIRCUIT WITH LEARNING
Fathi M.A. Salam and Myung-Ryul Choi
Systems and Circuits & Artificial Neural NetsLaboratories
Department of Electrical EngineeringMichigan State University
East Lansing, MI 48824
ABSTRACT
We describe an all-MOS circuit realization for a
feedforward artificial neural network. We also introduce an
all-MOS realization of a modified leaming rule. In addition
to analytical verification, the modified learning rule has beenshown, via computer code as well as SPICE simulations, to
successfully store into the network any given analog values
(within the permissible range). An all-MOS architecture for aprototype two-layer artificial neural network has beenspecifically tested via SPICE simulations. The resultsdemonstrate the leaming capability of the all-MOS circuitrealization and establish a VLSI modular architecture forcomposing a larg e-scale neural network system.
I. INTRODUCTION
Implementation in large scale is the vehicle through
which artificial neural networks (ANNs) would reveal their
compu tational powers. It is thus crucial that mutual
accommodation is achieved between the medium of
implementation and the proposed artificial neural network
(ANN) models. Only through such an accommodation wouldthe full advantage of the powers of ANNs be realized.
The main obstacle in the VLSI electronic
implementation of neural models has been the monolithic
implementation of the variable changeable linear resistive
element@). In [I], a solution to this obstacle was proposed by
introducing all-MOS four-quadrant vector multipliers, where
a single all-MOS vector multiplier circuit executes the
product of a vector of [output) signals and their
corresponding weights. In [ 2 ] , all-MOS feedforward and
feeddback ANNs architectures employing the four-quadrant
scalar multipliers have been introduced. SPICE simulationresults of prototype ANN networks have demonstrated the
successful operation of these all-MOS ANN architectures.
In this work, we introduce a modified leaming rule for
feedforward ANNs that requires less computation and that issuitable for analog circuit implementation employing all-MOS four-quarant multiplier modules as building blocks. We
then present SPICE simulation for all-MOS feedforward
prototype network with all-MOS implementation of the
modified leaming rule. We demonstrate the functionality of
this prototype network in learning a given pattem via SPICE
simulation. It is important to point out that the SPICE
simulation converges in the order of nano seconds in learning
each pattem. This order of magnitude represents the timeconstant(s) of the overall dynamics of the circuitimplementation.
In the following sections, we briefly describe the basicmathematical model for feedforward ANNs and the
Acknowledgment: This work is supported in part by ONR Grant
N00014-89-5-1833. the Michigan Research Excellence Fund (REF), andNS F Grant ECS-8814027, .
mathematical model for the so-called delta-rule or error
back-propogation. Then we describe the learning algorithm
based on the theory of continuous-time gradient dynamical
systems. Moreover, we introduce a modified version of thealgorithm w hich reduces computation and zliminates the needfor "computing" some derivatives of the sigmoidal neuron
functions.
We emphasize the differential equations form of the
learning algorithm for basically two reasons: (i) the theory
and construction of gradient dynamical systems in the
differential equations form is well established: this featuresimplifies the construction of mathematical models and itguarantees the global convergence (of all initial conditions) to
only equilibrium points (for bounded below energy functions)
[5,6] , and (ii) this form is naturally suitable for analog circuit(silicon VLSI) implementation which is believed to be a
natural medium for ANN implementation [4].
These two features enhance the potential for large scale
implementation with guaranteed global convergenceproperties to equilibria The (global) convergence speed is
independent of the number of neuron units in the networkand is only governed by the circuit time-constants. That is,
scalability to large networks would not cause any instability
or convergence problems.
11. A MODIFIED LEARNING ALGORITHMFOR FEEDFORWARD ANNS
TI.1 Feed forward Neural Network s
Consider the basic structure for multilayer feedforward
ANNs , see [ 7 ] ,where outputs of one layer are weighted andsummed as an input to a neuron ir, the next layer. Thisprocess begins with the first layer, which is called the inputlayer, and ends with the last layer, which is called the output
layer. Any layer between the input and the output layers isoften referred to as a Hidden layer. This simple regular
repetitive structure constitutes the feedforward ANN. The
input layer is optional and its purpose is to uniformally
squash each component of the real world data via thesigmoid al neuron element. The goveming static equation for
each neuron unit in any layer may be described as
where B s the i-th output of a neuron unit in the previous
layer (for the first layer, it is the i-th output, i.e. the
squashed external input to the network), LVji is the weight (orstrength) of the connection from the i-th output of the
previous layer to the j-th neuron input, g j is a bias at the
input of the j-th neuron, and rJ is the output of the j-th
neuron. S j (.) is a nonlinear (sigmoid) function [ 7 ] .
The Widrow-Hoff rule and its extension, the Delta-rule [ 7 ] ,
recursively update the weights in the network so that a given
CH2868-8/90/0000-2$1.000 1990IEEE
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DA BAHIA. Downloaded on June 8, 2009 at 10:02 from IEEE Xplore. Restrictions apply.
8/3/2019 An All-mos Analog Feed Forward
http://slidepdf.com/reader/full/an-all-mos-analog-feed-forward 2/4
set of desired input-output pairs (or patterns) is realized via
the mapping (2.1). The Delta rule has also been popularized
as the (Error) Back-Propagation learning rule [7].
11.2 Th e Er ro r Back Pro p ag a t io n Rule
The error back-propagation rule attempts to minimizethe squared output error for training sample input-output pairs
(or patterns). For every training sample, it modifies eachweight in proportion to (the negative of) the partial derivative
of the squared error with respect to that weight.
( r p l , . . . , )
define the squared error function as follows [7].
For each desired target p, say =
I "E,, = -C( t .- ,)*.
PJ YF J2 I=1
The error back-propagation or delta rule adjusts weights
in the local direction of greatest error reduction. Le t
(2.3)P
where Ap W j denotes the change in the weight W ji at the k-
th iteration due to the input-output pattern p . The delta ruledefines the change due to applying pattern p by
(2.4)
where ?l0 is the (learning) rate. Observe that q is assumed
to be sufficiently small in order for eqn. (2.4) to be a truly
gradient system. Whe n the gradient in (2.4) is compute d, itresults in the same form for all w eights [7], namely,
where the index p signifies that we are computing the effectof pattern p alone, yp i is the output of the previous layer,
and hPj is a function of the error. Spj is given as follows:
(a ) if unit j is a member of the (last) output layer, then
(2.6i)
(b ) if unit j a member of the (last) output layer, then
(2.6ii)
where k is the index for the units in the next layer to
which unit j is connected.
Using the update eqn (2.5) and the appropriate equation for
tiP, as specified by eqn (2.6), the update law is recursively
computed in discrete time.
Note that the total error function due to all patterns is
E = C E p . (2.7)P
Thus the total update for every weight is the sum of theupdates due to each pattern. That is,
where we used a common (learning) rate.
Remarks:
1. It should be emphasized (as stress in [7], page 680) that
the (learning) rate must be small for convergence and for the
prevention of oscillations. However, the speed of
convergence would suffer considerably as q is taken to be
very small.
2. The discrete form of updating the weights, eqn (2.5), isinfluenced to some extent by the assumed implementation of
the algorithm as a software digital computer program. It
should be noted, however, that such implementation wouldresult in expo nential growth of computations for the weights
of the hidden layers farthest from the output layer. This can
be seen from the repetitive application of the recursiveformula (2,6ii).
3. The computation of the update laws (2.5), as suggested in
171, is sequential. Tha t is, first the computation f or theupdate laws for the weights feeding into the last (output)layer is executed, then the computation for the update laws
for the weights feeding into the layer before last, etc.. This
method of computation violates the gradient descent theoryand consequently the characteristics of the gradient dynamic
system are not necessarily preserved.
4. In [7], the patterns are presented to the networksequentially. That is, one pattern is presented, then the
network executes until learning or updating is completed forthis pattern. Then, another pattern is presented and updatingstarts off from the weights attained previuosly until
convergence of the weights is achieved, and so on. It shouldbe remarked, that this procedure would not necessarily retain
memory of all patterns. Theoretically, the network is
guaranteed to retain only the last pattern for which the
weights have converged to an acceptable local minimium ofEP .
11.3 A Continuous-t ime Grad ient Update Rule
defined as
The correct gradient dynamic update law should be
(2.9i)
where Wji is the time-rate of change of the weight W j i and
where we used equations (2.6) with ep j given by
(a ) if unit j is a member of the (last) output layer, then
e ' = [ rp j - y p j 1
(b) if unit j
(2.9ii)P I
a member of the (last) output layer
ep j = wk j , (2.9iii)k
where k is the index for neuron units in the next layer
to which unit j is connected.
Remarks:
1. The differential equation update laws of (2.9) represent atrue (continuous-time) gradient dynamic system.
Consequently, there exist no oscillations or complicated
dynamics. Equilibria are the only steady states.
2. The (learning) rate q may, in principle, take any arbitrary
positive value without jeopardizing the gradient descentnature. If a digital computer integration routine is used,
however, q should be sufficiently small as compared to, and
is dictated by, the integration step-size.
3. If the update law (2.9) is directly realized via an analogelectronic circuit, then: (i) the (learning) rate can be as large
as it is physically possible, (ii) there is no time-complexity
2509
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DA BAHIA. Downloaded on June 8, 2009 at 10:02 from IEEE Xplore. Restrictions apply.
8/3/2019 An All-mos Analog Feed Forward
http://slidepdf.com/reader/full/an-all-mos-analog-feed-forward 3/4
problem in "computing" the update laws. All of the updatelaws for all the weights will be govemed by their time-
constants (i.e., the product of the equivalent capacitors and
their corresponding equivalent resistors.)
4. For analog implementations, a neuron is often modeled as
an operational amplifier or as a double inverter. An explicitexpression for the op-amp function is not readily available in
closed form. Consequently, the function S j , or more
specifically, dSjlduj,n the update eqn. (2.9) is not available
(or computable) explicitly. This presents a potentially seriousproblem for implementing the differential equation update
law via electronic or silicon analog circuits. One solution,which may require excessive computations, is to approximate
this derivative as
dYj - AYj3=--du j dui Au j '
This entails a digital differentiation which must be executed
with fast enough sampling rate.
11.4 A Modified Differential Equation Update Law
particular view to analog electronic circuit implementation:
We propose the following modified update law, with
(2.10)P
where epj is as given in (2.9ii) and (2.9iii).
R e m a r k
We have also used Runge Kutta Fourth Order integration
routine to simulate (2.9) and (2.10) with very fast
conv ergen ce properties as compared to the discrete update
rule (2.5).
The update law (2.10) may be rewritten in the general
form
as. -l aE,w. i = - q (A).
auj awji
(2.1 1)
Equation (2.1 l) , and consequently (2.10), is a gradient-like
dynamic system, see [8].
III The Feedforward AN N All-MOS Circuit with Learning
A prototype all-MOS continuos-time analog feedforwardANN circuit with learning is designed to d emonstrate theimplementation of the learning capability using the proposed
learning algorithm (2.10). The block diagram of the circuit is
depicted in Fig. 1. The feedforward ANN all-MOS circuit
consists of three layers: the input layer, the hidden layer, and
the output layer. The input layer is simply the voltage nodesof the external inpu t voltages denoted by X1 and X 2 . Th e
hidden layer has two neurons with outputs denoted by y l and
y2 . vvji denotes the interconnection between the i th input of
the input layer and the j th neuron of the hidden layer. The
output layer has one neuron which generates an output
denoted by 7. k, is the interconnection between the jth
neuron of the hidden layer and the k th neuron of the output
layer (here k = 1). During learning, these weights are
adjusted according to the learning rule (2.10) which now
specializes to
(3. l i )
(3.lii)
(3.2i)
(3.2ii)
(3.2iii)
1 1 2 = x 2 ( f - P I W l l (3.2iv)
(3.7v)
(3.2vi)
In eqn (3.1), s j and are the nondecreasing differentiablemonotone sigmoid functions of a neuron for the hidden layer
and the output layer, respectively.
Fig. 1 consists of two components, the feedfonvard
A NN (map) subcircuit and the leaming subcircuits. The
feedforward AN N subcircuit executes the function of eqn
(3.1). It consists of 2-dimensional vector multipliers to
com pute the 2-dimensional vector multiplication of eqn (3. l) ,
doub le inverters as neurons, and voltage followers. The
leaming subcircuits execute the differential equations of eqn
(3.2). They are composed of scalar multipliers to compute all
the scalar multiplication, capacitors for the integration in eqn
(3.2), voltage shifters, and voltage followers. In these
subcircuits, voltage followers are used to increase the fanout
capability of the outputs of multipliers when feeding into the
driving inputs through the drain of MO S transistors of a
another multiplier [13. In addition, we used a voltage follower
to isolate the output of the multiplier from the capacitor
implementing the derivative of each weight according to th e
leam ing update laws (3.2). The voltage shifters are used to
achieve output-input compatability at the cascading junctions
of multipliers while satisfying the operation constraints of the
four-quadrant multipliers [l-31. For example, the output of
the multiplier leading into the hidden layer with output y j
ra ng es f ro m - 2 . 5 ~o 2 . 5 ~ . In order to satisfy the operatingvoltage range of the next multiplier, this value is shifted by
2 . 5 to a range from Ov to 5v, passed to the sigmoidalnonlinearity of the neuron j in the Hidden layer, then it is
connected to the compatible inputs of the multipliers in the
next stage (with range from Ov to 5v). Fig. 2 depicts the
schematic for the all-MOS analog four-quadrant 2-D vector
multiplier which we have used in our implementation. See
[1,2] and the references therein for more information on these
vector m ultiplier.Table 1 gives a set of fixed weights that (via SPICE
simulations) solves the well-known XOR problem for the
feedforward ANN (map) subcircuit. SPIC E simulation has
verified the performance of the feedforward ANN MOS cir-
cuit with the learning rule (see Fig.1). For each pattem, allthe weights of this circuit, namely, ~ 1 1 , 1 2 , 21 , W22, @1i,
and iVI2, are simultaneously initialized.Table 2 summarizes the results of the SPICE
simulations for each given pattem. The leaming process
evolves until, in the order of 10-100 nano seconds, the
weights converge to their steady states. These steady state
values achieve the leaming of the given pattem. Observe that
all variables are given in volts.We remark that the steady state weights in each case do
not solve the XOR problem, however. To guarantee solving
the XOR, we note that all patterns must be simultaneously
presented-- as the theory of continuous-time gradient dynamicsystems dictates. This would require additional learning
subcircuits and is presently being pursued. We finally remark
that the building blocks of this prototype can be composed
into regular structures to build large feedforward ANN
circuits with analog learning capability.
2510
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DA BAHIA. Downloaded on June 8, 2009 at 10:02 from IEEE Xplore. Restrictions apply.
8/3/2019 An All-mos Analog Feed Forward
http://slidepdf.com/reader/full/an-all-mos-analog-feed-forward 4/4
IV. CONCLUSIONS
An analog all-MOS implementation of artificial
feedforward neural network with a modified learning rule is
described. The implementation employs all-MOS analogrour-quadrant multipliers as building blocks. SPICE
simulations have been conducted to demonstrate the
functionality of the MO S circuit with modified continuous-time dynamic learning rule. The SPICE simulations have
shown that learning each given pattern is achieved in the
ordcr of nano seconds. The results quantify the advantagesand demonstrate the feasibility of direct dedicated hardwarecontinuous-time analog VLSI silicon implementation offecdforward ANNs .
X l
x 2
1
weights
~ 1 1~ 1 2
~ 2 1~ 2 2
~ 1 1--W 17
output
Y ,,
-
[ I F. M. A. Salam, N.I. Khachab, M. Ismail, and Y. Wang, " An analogMOS implementation of the synaptic weights for feedback neural nets",Prcx. of IEEE Int. Sym p. Circuits and Systems May 1989, pp. 12 23-
1226.
121 F. M. A. Salam, M. R. Choi, and Y. Wang, "An Analog MOSImplementation of Synaptic Weights for FeedforwadFeedback Neural
Ne&", Proc. of 32nd Midwest Symposium on Circuits and Systems,
Champaign, Illinois, August, 1989.
131 N.I. Khachab and M. Ismail," Novel Continuous-Time All MOS FourQuadrant Multipliers" Roc. of IEEE Int. Symp. Circuits and Systems
1.11 C. Mead, Analog VLSI and Neural Systems, Prentice Hall, 1989.1.5) F. M. A. Salam and Y. Wang, "Some Properties of Dynamic
Feedbxk Neural Nets," in the session on Neural Networks and Control
Systems, the 27th IEEE Conference on Decision and Control, December
1088. pp. 337-342.
101 M . llirsch and S. Smale, Differential Equations, Dynamical Systems,
and Linear Algebra, Academic Press, 1974.
171 I). Rumelhart, G. Hinton, and R. Williams, "Leaming internal
representations by error propagation," in Parallel Distributed Processing:Exploration in the Microstructures of Cognition, Vol. 1, D. R u m h a r t
and J. McClelland (Eds.), Cambridge, MA: MIT Press, 1986, pp. 318-362.
181 F. M. A . Salam, "A Modified Learning Rule for Feedforward
Artificial Neural Nets For Analog Implementation," Memorandu m No.MSU/EE/S 90102, Department of Electrical Engineering, Michigan State
University, East Lansing, MI 48824-1226, 26 January 1990.
May 1987. P.P. 762-765.
0.5 0.5 4.5 4.5
0.5 4. 5 0.5 4.5
1 o 4.5 4.5 1 o
init. S.S. init. S.S. init. S.S . init. S.S.
-0.5 0.053 0.4 0.02 0.4 -0.019 -0.5 -0.0520.6 0.053 -0.4 -0.019 -0.4 0.02 0.6 -0.052
-0.5 0.053 -0.5 0.02 -0.5 -0.019 -0.5 -0.0520.5 0.053 0.5 -0.019 0.5 0.02 0.5 -0.0520.7 0.276 0.7 0.168 0.7 0.168 0.7 0.2760.5 0.276 0.5 0.168 0.5 0.168 0.5 0.276
init. S.S. init. S.S. init. S.S . init. S.S.
0.0 1.81 0.0 5.0 0.0 5.0 0.0 1.81
Fig. 1 The block diagram of the feedforward
AN N all-MOS circuit with learning
Fig. 2 The all-MOS a nalog four-quadrant 2-D vector multiplier
Table 1. SPICE simulation of the feedforward MOS circuit with fixed weights solving
the XOR problem (units are in volts). b 1 1 E 1 2 w 2 1 w 2 2 i F 1 1 W12) = (0.5 -0.5 -0.5 0.5
0.5 0.5)
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DA BAHIA Downloaded on June 8 2009 at 10:02 from IEEE Xplore Restrictions apply
top related