[ieee 2014 international joint conference on neural networks (ijcnn) - beijing, china...

8
An Introduction to Complex-Valued Recurrent Correlation Neural Networks Marcos Eduardo Valle Abstract—In this paper, we generalize the bipolar recurrent correlation neural networks (RCNNs) of Chiueh and Goodman for complex-valued vectors. A complex-valued RCNN (CV- RCNN) is characterized by a possible non-linear function which is applied on the real part of the scalar product of the current state and the fundamental vectors. Computational experiments reveal that some CV-RCNNs can implement associative memo- ries with high-storage capacity. Furthermore, these CV-RCNNs exhibit an excellent noise tolerance. I. I NTRODUCTION Neural networks that can be used to implement associa- tive memories (AMs) have been investigated by numerous researchers since the middle of the last century [1]. In fact, re- search on neural associative memories date back the works of Steinbuch [2], [3]. In 1972, Anderson, Kohonen, and Nakano introduced independently the correlation matrix model, where the outer product rule, also referred to as hebbian learning, is used to synthesize the synaptic weight matrix [1], [4], [5]. The optimal linear associative memory (OLAM), which represents the best linear AM in the least squares sense, is obtained by replacing the hebbian learning by the projection rule, also called generalized-inverse learning [4], [6]. In 1982, Hopfield introduced a non-linear dynamic net- work that can be used for the storage of bipolar vectors [7]. The Hopfield network is implemented by a recursive single-layer neural network constituted of threshold neurons of McCulloch and Pitts [1]. It has many attractive features in- cluding: ease of implementation in hardware, characterization in terms of an energy function, and a variety of applications [4], [8], [9], [10], [11]. On the downside, the Hopfield network suffers from a low absolute storage capacity of approximately 0.15n items, where n is the length of the vectors [12]. As a consequence, it has a limited application as an AM model. To overcome the storage capacity limitation on the Hop- field network, several researchers have developed improved dynamic neural networks. In particular, we call the reader attention to the following contributions: 1a) Kanter and Sompolinsky enhanced the storage capacity as well as the noise tolerance of the Hopfield network by replacing the outer product rule by the generalized inverse learning [13]. 1b) Morita showed that the performance of the Hopfield model can be significantly improved by modifying the activation function of the neurons of the network [14]. Marcos Eduardo Valle is with the Department of Applied Mathematics, University of Campinas, Campinas, Brazil (email: [email protected]). This work was supported in part by by CNPq under grant no. 304240/2011-7, FAPESP under grant no. 2013/12310-4, and FAEPEX/Unicamp under grant no. 519.292. 1c) A simple but significant improvement in storage ca- pacity is achieved by the recurrent correlation neural networks (RCNNs) introduced by Chiueh and Good- man in the early 1990s [15], [16]. Broadly speaking, the RCNNs generalize the Hopfield model by adding a layer of nodes that compute a correla- tion measure between the input vector and the fundamental vectors. The activation function of the neurons in this layer characterizes the RCNN. For example, the Hopfield network is obtained by considering a certain linear activation function. Moreover, the storage capacity scales exponentially with the length of the vectors for some activation functions [15], [17]. Besides the very high storage capacity, some RCNNs also exhibit excellent error correction capabilities [15], [18], [19]. Furthermore, some RCNNs are close related to certain types of Bayesian processes [20] and kernel-based AMs such as the model of Garcia and Moreno [21], [22]. All neural networks discussed in the last two paragraphs can implement an AM for the storage of bipolar vectors. However, many applications of AMs, including the retrieval of gray-scale images in the presence of noise, require the storage of multistate or complex-valued patterns [23], [24], [25], [26]. Although research on complex-valued neural net- works date back to the early 1970s [27], in our opinion, the first significant contributions on complex-valued networks that realize an AM appeared only twenty years later. In 1996, Jankowski et al. generalized the bipolar Hopfield network using complex numbers in the unity circle [28]. To this end, the complex-signum function was used in place of the sign activation function. However, such as the Hopfield network, the complex-valued network of Jankowski et al. suffers from a low storage capacity [25], [29]. Afterwards, several researchers have developed improved complex-valued dynamic models to overcome this limitation. In particular, we call the reader attention to the following contributions: 2a) Lee showed that the performance of the complex- valued network of Jankowski et al. can be enhanced by using the projection rule instead of the hebbian learning [29]. 2b) Tanaka and Aihara improved further the model of Lee by adopting activation functions different from the complex-signum function [26]. In some sense, the contributions listed in 2a) and 2b) are similar to the works pointed out previously in items 1a) and 1b). Extending this analogy to item 1c), one might ask if there is a complex-valued version of the class of RCNNs. We are not aware of any contribution addressing this question besides this article. 2014 International Joint Conference on Neural Networks (IJCNN) July 6-11, 2014, Beijing, China 978-1-4799-1484-5/14/$31.00 ©2014 IEEE 3387

Upload: marcos-eduardo

Post on 30-Mar-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2014 International Joint Conference on Neural Networks (IJCNN) - Beijing, China (2014.7.6-2014.7.11)] 2014 International Joint Conference on Neural Networks (IJCNN) - An introduction

An Introduction toComplex-Valued Recurrent Correlation Neural Networks

Marcos Eduardo Valle

Abstract—In this paper, we generalize the bipolar recurrentcorrelation neural networks (RCNNs) of Chiueh and Goodmanfor complex-valued vectors. A complex-valued RCNN (CV-RCNN) is characterized by a possible non-linear function whichis applied on the real part of the scalar product of the currentstate and the fundamental vectors. Computational experimentsreveal that some CV-RCNNs can implement associative memo-ries with high-storage capacity. Furthermore, these CV-RCNNsexhibit an excellent noise tolerance.

I. INTRODUCTION

Neural networks that can be used to implement associa-tive memories (AMs) have been investigated by numerousresearchers since the middle of the last century [1]. In fact, re-search on neural associative memories date back the works ofSteinbuch [2], [3]. In 1972, Anderson, Kohonen, and Nakanointroduced independently the correlation matrix model, wherethe outer product rule, also referred to as hebbian learning,is used to synthesize the synaptic weight matrix [1], [4],[5]. The optimal linear associative memory (OLAM), whichrepresents the best linear AM in the least squares sense, isobtained by replacing the hebbian learning by the projectionrule, also called generalized-inverse learning [4], [6].

In 1982, Hopfield introduced a non-linear dynamic net-work that can be used for the storage of bipolar vectors[7]. The Hopfield network is implemented by a recursivesingle-layer neural network constituted of threshold neuronsof McCulloch and Pitts [1]. It has many attractive features in-cluding: ease of implementation in hardware, characterizationin terms of an energy function, and a variety of applications[4], [8], [9], [10], [11]. On the downside, the Hopfieldnetwork suffers from a low absolute storage capacity ofapproximately 0.15n items, where n is the length of thevectors [12]. As a consequence, it has a limited applicationas an AM model.

To overcome the storage capacity limitation on the Hop-field network, several researchers have developed improveddynamic neural networks. In particular, we call the readerattention to the following contributions:

1a) Kanter and Sompolinsky enhanced the storage capacityas well as the noise tolerance of the Hopfield networkby replacing the outer product rule by the generalizedinverse learning [13].

1b) Morita showed that the performance of the Hopfieldmodel can be significantly improved by modifying theactivation function of the neurons of the network [14].

Marcos Eduardo Valle is with the Department of Applied Mathematics,University of Campinas, Campinas, Brazil (email: [email protected]).

This work was supported in part by by CNPq under grantno. 304240/2011-7, FAPESP under grant no. 2013/12310-4, andFAEPEX/Unicamp under grant no. 519.292.

1c) A simple but significant improvement in storage ca-pacity is achieved by the recurrent correlation neuralnetworks (RCNNs) introduced by Chiueh and Good-man in the early 1990s [15], [16].

Broadly speaking, the RCNNs generalize the Hopfieldmodel by adding a layer of nodes that compute a correla-tion measure between the input vector and the fundamentalvectors. The activation function of the neurons in this layercharacterizes the RCNN. For example, the Hopfield networkis obtained by considering a certain linear activation function.Moreover, the storage capacity scales exponentially with thelength of the vectors for some activation functions [15], [17].Besides the very high storage capacity, some RCNNs alsoexhibit excellent error correction capabilities [15], [18], [19].Furthermore, some RCNNs are close related to certain typesof Bayesian processes [20] and kernel-based AMs such asthe model of Garcia and Moreno [21], [22].

All neural networks discussed in the last two paragraphscan implement an AM for the storage of bipolar vectors.However, many applications of AMs, including the retrievalof gray-scale images in the presence of noise, require thestorage of multistate or complex-valued patterns [23], [24],[25], [26]. Although research on complex-valued neural net-works date back to the early 1970s [27], in our opinion,the first significant contributions on complex-valued networksthat realize an AM appeared only twenty years later.

In 1996, Jankowski et al. generalized the bipolar Hopfieldnetwork using complex numbers in the unity circle [28]. Tothis end, the complex-signum function was used in place ofthe sign activation function. However, such as the Hopfieldnetwork, the complex-valued network of Jankowski et al.suffers from a low storage capacity [25], [29]. Afterwards,several researchers have developed improved complex-valueddynamic models to overcome this limitation. In particular, wecall the reader attention to the following contributions:

2a) Lee showed that the performance of the complex-valued network of Jankowski et al. can be enhanced byusing the projection rule instead of the hebbian learning[29].

2b) Tanaka and Aihara improved further the model ofLee by adopting activation functions different from thecomplex-signum function [26].

In some sense, the contributions listed in 2a) and 2b) aresimilar to the works pointed out previously in items 1a) and1b). Extending this analogy to item 1c), one might ask ifthere is a complex-valued version of the class of RCNNs.We are not aware of any contribution addressing this questionbesides this article.

2014 International Joint Conference on Neural Networks (IJCNN) July 6-11, 2014, Beijing, China

978-1-4799-1484-5/14/$31.00 ©2014 IEEE 3387

Page 2: [IEEE 2014 International Joint Conference on Neural Networks (IJCNN) - Beijing, China (2014.7.6-2014.7.11)] 2014 International Joint Conference on Neural Networks (IJCNN) - An introduction

a) UT f(·) U sgn(·)x1(t) // ©

++

$$

��

© // x1(t+ 1)

©33

++

$$

��

x2(t) // ©33

++

��

© // x2(t+ 1)

©

::

33

++

��

x3(t) // ©

::

33

""

© // x3(t+ 1)...

......

......

©

FF

BB

<<

++xn(t) // ©

FF

CC

33

© // xn(t+ 1)

b) U∗ f(<{·}

)U σ(·)

Fig. 1. The network topology of a recurrent correlation neural network: a)bipolar and b) complex-valued. The matrix U is obtained by concatenatingthe fundamental column vectors.

The paper is organized as follows: A briefly review onthe bipolar RCNNs of Chiueh and Goodman is given inSection II. Some complex-valued dynamic neural networks,including the model proposed by Tanaka and Aihara, areshortly discussed in Section III. The complex-valued RCNNsare introduced in Section IV. Section V provides computa-tional experiments concerning the storage capacity and noisetolerance of the novel models. The paper finishes with theconcluding remarks in Section VI.

II. A BRIEF REVIEW ON THE BIPOLAR RECURRENTCORRELATION NEURAL NETWORKS

A bipolar recurrent correlation neural network (RCNN)is implemented by the fully connected two layer recurrentneural network depicted in Fig. 1a) [4], [15]. The first layercomputes a kind of correlation between the current stateand the fundamental vectors, followed by a possibly non-linear function f which often emphasizes the induced localfield. The second layer, composed of threshold neurons ofMcCulloch and Pitts, yields the sign of a weighted sum ofthe fundamental vectors as the next state of the RCNN.

Formally, let B denote the bipolar set {−1,+1} andconsider a set U = {u1,u2, . . . ,up}, where each uξ =[uξ1, u

ξ2, . . . , u

ξn]T ⊆ Bn is a bipolar n-bit fundamental vector.

Given an input x(0) = [x1(0), x2(0), . . . , xn(0)]T ∈ Bn, the

RCNN defines recursively the following sequence of n-bitbipolar vectors for t ≥ 0:

xj(t+ 1) = sgn

p∑ξ=1

wξ(t)uξj

, ∀j = 1, . . . , n, (1)

where the dynamic weights wξ(t) are given by the followingequation for some continuous and monotone nondecreasingfunction f : [−n, n]→ R:

wξ(t) = f( ⟨

x(t),uξ⟩ ), ∀ξ = 1, . . . , p. (2)

Such as the Hopfield network, the sgn function is evaluatedin (1) as

sgn(vj(t)

)=

+1, vj(t) > 0,

xj(t), vj(t) = 0,

−1, vj(t) < 0,

(3)

where vj(t) =∑pξ=1 wξ(t)u

ξj is the activation potential of

the jth output neuron at the iteration t.We would like to recall that the sequence {x(t)}t≥0 given

by (1)-(3) converges, in both synchronous and asynchronousupdate modes, for any initial state x(0) [15].

Example 1. The famous Hopfield network, with the slightdifference that the diagonal of the weight matrix is notzeroed, is obtained by considering in (2) the linear function

fc(x) = x/n. (4)

Example 2. The exponential correlation neural network(ECNN) is obtained by setting f equals to an exponentialfunction, i.e.,

fe(x) = eαx/n, α > 0. (5)

The ECNN seems to be the RCNN that is more suitedfor implementation using very-large-scale-integrated (VLSI)technology [15]. Furthermore, the storage capacity and thenoise tolerance of this recurrent network have been exten-sively explored in the literature [15], [18], [19].

In particular, Chiueh and Goodman showed that the storagecapacity of the ECNN scales exponentially with n, the lengthof the vectors [15]. In other words, this model is able toimplement perfect recall of approximately cn n-bits bipolarfundamental vectors. In particular, the base c > 1 depends onthe coefficient α of the exponential function. Moreover, thestorage capacity of the ECNN approaches the ultimate upperbound for the capacity of a memory model for bipolar vectorsas α tends to +∞. In practice, however, the exponentialstorage capacity is very difficult to meet due to the limitationon the dynamic range of the exponentiation [15], [16]. Infact, a typical VLSI implementation of the exponentiationcircuit has a dynamic range of approximately 105 to 107.For large vectors, even the implementation of the ECNN intraditional von Neumann computers is limited to the floatingpoint representation of real numbers.

Example 3. The high-order and potential-function RCNNsare obtained by considering respectively the functions [15]:

fh(x) = (1 + x/n)q, q > 1 is an integer, (6)

and

fp(x) = 1/(1− x/n)L, L ≥ 1. (7)

Such as the ECNN, the storage capacity of the potential-function RCNN scales exponentially with n [17]. In contrast,the high-order RCNN has a polynomial storage capacity [30].

3388

Page 3: [IEEE 2014 International Joint Conference on Neural Networks (IJCNN) - Beijing, China (2014.7.6-2014.7.11)] 2014 International Joint Conference on Neural Networks (IJCNN) - An introduction

III. SOME COMPLEX-VALUED DYNAMIC NEURALNETWORKS

The complex-valued multistate neural network ofJankowski et al. is an efficient and elegant generalizationof the bipolar Hopfield network with potential applicationfor the storage of K-state vectors [28]. Before discussingsome variations of this model, let us clarify the relationshipbetween complex-valued and K-state patterns.

First, let K = {0, 1, . . . ,K−1} and S = {z ∈ C : |z| = 1}denote respectively the set of the first K non-negative integersand the unit circle in the complex plane. The number K ofelements in the set K is called the resolution factor [28]. Theresolution factor is used to define the angle

θK =2π

K, (8)

referred to as the angular size.A K-state vector x = [x1, . . . , xn]

T ∈ Kn can be con-verted into a complex-valued vector z = [z1, . . . , zn]

T ∈ Snby applying rK : [0,K)→ S defined below in a component-wise manner, i.e., zj = rK(xj) for j = 1, . . . , n:

rK(x) = eixθK . (9)

Conversely, let φK : [0,K) → K denote the floor functionon K, i.e., φK(x) = max{k ∈ K : k ≤ x}, and qK : C →[0,K) be the function given by the following equation whereArg(z) ∈ [0, 2π) is the principal argument of z:

qK(z) =Arg(z)

θK. (10)

Then, a complex-valued vector z ∈ Cn can be mapped intoa K-state vector x ∈ Kn by setting xj = φK ◦ qK(zj) forall j = 1, . . . , n.

The complex-valued multistate network of Jankowski etal. is obtained by replacing the McCulloch-Pitts neuron bya complex-valued neuron model [28]. In general terms, acomplex-valued neuron computes a weighted sum of itsinputs followed by a complex activation function ϕ : C→ Sthat usually generalizes the sgn function.

Formally, many complex-valued dynamic neural networks(CV-DNNs) are described by the following recursive equationwhere M = (mjk) ∈ Cn×n denotes the synaptic weightmatrix and z(0) = [z1(0), . . . , zn(0)]

T ∈ Sn is the complex-valued input vector:

zj(t+ 1) = ϕ

(n∑k=1

mjkzk(t)

), ∀j = 1, . . . , n. (11)

Such as the bipolar dynamic neural networks [7], [8], theouter product rule [28] and the generalized-inverse learning[26], [29] are among the most widely used techniques tosynthesize the synaptic weight matrix M of a CV-DNN givenby (11). In particular, given a set of fundamental vectors{u1,u2, . . . ,up} ⊆ Sn, the projection rule defines the matrixM ∈ Cn×n as

M =1

n

(UU† − diag

((UU†)11, . . . , (UU

†)nn)), (12)

where U = [u1, . . . ,up] ∈ Cn×p is the matrix whosecolumns corresponds to the fundamental vectors, U† denotesits pseudo-inverse, also called generalized inverse of Moore-Penrose, and diag(d1, . . . , dn) ∈ Cn×n is a diagonal matrixwhose entries are d1, . . . , dn.

In some sense, the projection rule makes optimal use ofthe synaptic weight matrix in the absence of the activationfunction ϕ. Also, computational experiments revealed thatthis learning rule usually yields a network with many desiredcharacteristics including high storage capacity and somenoise tolerance [29], [26]. In this paper, only the projectionrule is used to synthesize a CV-DNNs given by (11).

Let us now turn our attention to some activation functionsthat can be used in (11). Such as the sgn function given by(3), special attention should be given for the evaluation of ϕat 0. In the following, we shall consider ϕ : C \ {0} → Sand define ϕ(0) such that (11) yields

zj(t+ 1) = zj(t) ifn∑k=1

mjkzk(t) = 0. (13)

A. The Complex-Signum Activation Function

The complex-signum function csgn : C \ {0} → S,introduced by Aizenberg et al. in the beginning of 1970s[27], is defined by the following equation where θK is theangular size given by (8):

csgnK(z) =

1, 0 ≤ Arg(z) < θK ,

eiθK , θK ≤ Arg(z) < 2θK ,...

...ei(K−1)θK , (K − 1)θK ≤ Arg(z) < 2π.

(14)Note that the domain of the complex-signum function can

be divided in K sections: [(j−1)θK , jθK ], for j = 1, . . . ,K.In order to place the input states in the angular centers of eachsection, (11) is written as follows using the complex-valuedsignum function with a constant factor eiθK/2:

zj(t+ 1) = csgn

(eiθK/2

n∑k=1

wjkzk(t)

). (15)

Let us conclude this example by recalling that the complex-signum function can be expressed alternatively in terms ofthe following composition [26]:

csgnK(z) = rK ◦ φK ◦ qK(z). (16)

B. The Continuous-Valued Activation Function

In [31], Aizenberg et al. proposed the continuous-valuedactivation function σ : C \ {0} → S given by

σ(z) =z

|z|, (17)

where |z| denotes the modulus or absolute value of thecomplex number z.

The function σ can be expressed alternatively by

σ(z) = eiArg(z) = rK ◦ ι ◦ qK(z), (18)

3389

Page 4: [IEEE 2014 International Joint Conference on Neural Networks (IJCNN) - Beijing, China (2014.7.6-2014.7.11)] 2014 International Joint Conference on Neural Networks (IJCNN) - An introduction

where ι denotes the identity function, i.e., ι(x) = x, for allx. Therefore, σ can be obtained from (16) by replacing themultilevel step function φK by the identity function ι. Also,the continuous-valued function σ can be viewed as the limitof csgn for k → +∞ [31].

C. The Complex-Sigmoid Activation Function

Motivated by the successful applications of multilevelsigmoid functions in threshold multilevel neurons, Tanakaand Aihara proposed to replace the multilevel step functionφK by the multilevel sigmoid function mK given by

mK(x) = mod

([(K∑κ=1

1

1− e−(x−κ)/ε

)− 1

2

],K

), (19)

where mod denotes the modulo function, i.e., for any realnumber u, mod(u,K) = u + jK ∈ [0,K) for someappropriate integer j [26]. Therefore, the complex-sigmoidactivation function, denoted by csgm : C \ {0} → S, isdefined by means of the following composition:

csgmK(z) = rK ◦mK ◦ qK(z). (20)

The multilevel sigmoid function mK , and consequently thecomplex-sigmoid function csgmK , depends on an additionalparameter ε that controls the slope between neighboringstates. As ε → 0, the complex-sigmoid function approachesthe complex-signum function.

D. A Brief Comparison of the Three Activation Functions

Let us conclude this section by performing one of thecomputational experiments used by Tanaka and Aihara toinvestigate the dependence of the csgm function on theparameter ε [26]. Furthermore, this experiment is used tocompare the CV-DNNs given by (11) with the activationfunctions csgn, σ, and csgm.

First, we generated p = 12 uniformly distributed multi-valued vectors of length n = 128 using a resolution factorK = 8. These twelve multivalued vectors have been con-verted into complex-valued vectors u1, . . . ,u12 in S128 using(9). The generalized projection rule was used to synthesizethe synaptic weight matrix of the CV-DNNs.

Similarly, a complex-valued input vector z(0) ∈ S128 havebeen randomly generated M = 500 times. For each step,we iterated (11) either until convergence or t ≥ 500n. Welabeled the trial of a certain CV-DNN as successful if

maxξ=1:p

{|y∗uξ|} > 0.99n, (21)

where y∗ denotes the conjugate transpose of the vectorrecalled after the presentation of z(0) ∈ Sn as input. Thus,we computed the probability of a CV-DNN to recall one ofthe fundamental vectors.

The probability of successful recall of the three CV-DNNsversus the parameter ε of the csgm function is shown in Fig.2. The activation functions csgn and σ yielded constant linesbecause they do not depend on ε.

Note that the recall probability of the csgm-based CV-DNN visually approaches the recall probability of the csgn-based model as ε→ 0. The larger recall probabilities of the

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 0.2 0.4 0.6 0.8 1 1.2

Rec

all P

roba

bilit

y

Epsilon

csgmcsgn

sigma

Fig. 2. Probability of successful recall of the CV-DNNs with the activationfunctions csgn, σ, and csgm by the parameter ε of the latter.

former are attained for ε ∈ [0.2, 0.4]. In this interval, thecsgm-based memory outperformed the other two CV-DNNs.

We would like to recall that, after performing similarexperiments but storing different numbers of fundamentalvectors, Tanaka and Aihara suggested the value ε = 0.2for the csgm-based CV-DNN with the generalized projectionrule [26]. For this parameter value, however, the activationfunctions csgm and σ yielded almost the same recall prob-abilities. Furthermore, the multilevel sigmoid function mK

with ε = 0.2 can be fairly approximated by the identityfunction ι.

Concluding, we believe that the csgm function can bereplaced by the simpler continuous-valued activation functionσ in many practical applications. Besides the theoreticalsimplifications, such substitution reduces the number of freeparameters of the memory. In the light of this remark, weadopted the continuous-valued activation function σ in thefollowing section.

IV. THE COMPLEX-VALUED RECURRENT CORRELATIONNEURAL NETWORKS

Such as the CV-DNNs presented in the previous sectiongeneralize the bipolar Hopfield network, the complex-valuedrecurrent correlation neural networks (CV-RCNNs) extendthe bipolar RCNNs to the complex unit circle. Precisely, aCV-RCNN is implemented by a two layer recurrent networkwith the fully connected topology depicted in Fig. 1b). Thefirst layer computes the real part of the correlation betweenthe current state and the fundamental vectors followed by thepossibly non-linear function f . The second layer evaluates thecontinuous-valued activation function σ at a weighted sum ofthe fundamental vectors.

In mathematical terms, let U = {u1, . . . ,up} ⊆ Sn bethe set of fundamental vectors and f : [−n, n] → R be acontinuous and monotone nondecreasing function. Given acomplex-valued input z(0) = [z1(0), . . . , zn(0)]

T ∈ Sn, a

3390

Page 5: [IEEE 2014 International Joint Conference on Neural Networks (IJCNN) - Beijing, China (2014.7.6-2014.7.11)] 2014 International Joint Conference on Neural Networks (IJCNN) - An introduction

CV-RCNN defines recursively the sequence

zj(t+ 1) = σ

p∑ξ=1

wξ(t)uξj

, ∀ j = 0, 1, . . . , n, (22)

where the weights w1(t), . . . , wp(t) are given by

wξ(t) = f(<{ ⟨

uξ, z(t)⟩ })

, ∀ξ = 1, . . . , p. (23)

Here, <{z} denotes the real part of the complex number z.Recall that the function σ : C → S is evaluated as followswhere vj(t) =

∑pξ=1 wξ(t)u

ξj is the activation potential of

the jth output neuron at the iteration t:

σ(vj(t)

)=

{vj(t)/|vj(t)|, vj(t) 6= 0,

zj(t), vj(t) = 0.(24)

Note that the synaptic weight matrix U∗ and U of a CV-RCNN are obtained by concatenating the fundamental vectorsu1, . . . ,up, that is, U = [u1, . . . ,up] ∈ Sn×p.

The following theorem shows that a CV-RCNN yields aconvergent sequence {z(t)}∞t=0 independently of the numberof fundamental vectors and the input vector z(0) ∈ Sn.Hence, this network can implement an associative memorythat maps an input z(0) into the limit y = limt→∞ z(t).

Theorem 1. Let f : [−n, n] → R be a continuousand monotone nondecreasing function. Given a set U ={u1, . . . ,up} ⊆ Sn of fundamental vectors and an inputz(0) ∈ Sn, the sequence {z(t)}t≥0 given by (22)-(24)converges in both synchronous and asynchronous updatemodes.

The proof of Theorem 1, which we intent to provide in ajournal paper, is very similar to the one provided by Chiuehand Goodman for the bipolar RCNNs [15]. Briefly, we showthat the time evolution of (22) yields a minima of the energyfunction

E(z) = −p∑ξ=1

F(<⟨uξ, z

⟩), ∀ z ∈ Sn, (25)

where F denotes a primitive of the non-linear function f .We would like to point out that, although there are othercorrelation measures in the complex-valued statistics, the realpart of the inner product in (22) is used in the proof of thisimportant theorem.

V. COMPUTATIONAL EXPERIMENTS

Let us now perform computational experiments concerningthe storage of complex-vectors in some CV-RCNNs. Specif-ically, we considered the correlation, high-order, potential-function, and exponential CV-RCNNs obtained by consideringin (23) the functions fc, fh, fp, fe given by (4)-(7), respec-tively.

A. A Comparison Between the Four CV-RCNNs

Let us first investigate the effect of the parameters q, L,and α on the error correction capability of the high-order,potential-function, and exponential CV-RCNNs, respectively.Let us also investigate the error correction capability of thecorrelation CV-RCNN and compare the four memories. Tothis end, we performed the following steps 1000 times:

1) We generated p = 12 uniformly distributed multistatesvectors of length n = 128, using a resolution factorK = 8.

2) The twelve multistates vectors have been converted intocomplex-valued vectors u1, . . . ,u12 ∈ S128 using (9).

3) We also randomly generated a multistate vector andconverted it into a complex-valued vector s =[s1, . . . , sn]

T ∈ S128.4) For δ ∈ {0, 0.1, . . . , 0.9, 1}, we defined the input vector

z(0) = [z1(0), . . . , zn(0)]T ∈ Sn as

zj(0) =(1− δ)u1j + δsj

|(1− δ)u1j + δsj |, ∀j = 1, . . . , n, (26)

and determined the vector z(1) obtained after oneiteration of a certain CV-RCNN.

The outcome of this experiment is visualized in Fig. 3.Precisely, for each value of δ, we defined z(0) by means of(26) and determined the vector z(1) by evaluating (22). Theinput error Ein(δ) and the output error Eout(δ) are givenby the average of ‖u1 − z(0)‖2 and ‖u1 − z(1)‖2 in 1000simulations for each value of δ. Fig. 3 shows the parametrizedcurves

(Ein(δ), Eout(δ)

)produced by the CV-RCNNs. Note

that the inequality Ein < Eout holds true if the curve isbelow the identity line. Therefore, a network exhibit someerror correction capability if its parameterized curve in Fig.3 is below the dotted line. Also, observe that Ein(0) = 0implies Eout(0) = 0 if the network implemented a perfectrecall of u1. Conversely, the CV-RCNN failed to retrieve thefundamental vector u1 in at least one step if Eout(0) > 0.

The plots in Fig. 3 allows us to make the following generalobservations regarding the four CV-RCNNs:• The high-order CV-RCNN, as well as the exponential

CV-RCNN, failed to perfectly recall a fundamentalvector for small values of the parameters q and α, i.e.,for q, α ≤ 5.

• The error correction capability of the high-order andexponential CV-RCNNs have a similar dependence onthe parameters q and α.

• The vector recalled by the single-step potential-functionCV-RCNN is always very similar to the desired funda-mental vector u1.

• The correlation CV-RCNN failed to retrieve the funda-mental vector u1 in many steps.

• The high-order, potential-function, and exponential CV-RCNNs, besides apparently giving perfect recall ofundistorted vectors, exhibited similar error capability forlarge values of their parameters, i.e., for q, L, α ≥ 10.

Concluding, except from the correlation CV-RCNN, webelieve that the other three CV-RCNNs will exhibit a satisfac-

3391

Page 6: [IEEE 2014 International Joint Conference on Neural Networks (IJCNN) - Beijing, China (2014.7.6-2014.7.11)] 2014 International Joint Conference on Neural Networks (IJCNN) - An introduction

a) High-order CV-RCNN b) Potential-function CV-RCNN

0

2

4

6

8

10

12

14

16

0 2 4 6 8 10 12 14 16

Out

put E

rror

Input Error

q=2q=3q=5

q=10q=20q=30

0

2

4

6

8

10

12

14

16

0 2 4 6 8 10 12 14 16

Out

put E

rror

Input Error

L=1L=3L=5

L=10L=20L=30

c) Exponential CV-RCNN d) Comparison between the four CV-RCNNs with theparameters q, L, and α all equal to 10.

0

2

4

6

8

10

12

14

16

0 2 4 6 8 10 12 14 16

Out

put E

rror

Input Error

alpha=1alpha=3alpha=5

alpha=10alpha=20alpha=30

0

2

4

6

8

10

12

14

16

0 2 4 6 8 10 12 14 16

Out

put E

rror

Input Error

CorrelationHigh-Order

PotentialExponential

Fig. 3. Comparison of the one-step error correction capability of the four CV-RCNNs.

tory storage capacity and error correction capability by finetuning the parameters L, q, and α. In view of this remark and,considering the theoretical results available in the literatureconcerning the bipolar exponential RCNN, we shall focus onthe exponential CV-RCNN in the following subsections.

B. A Brief Discussion on the Effect of the Activation Function

Let us perform an experiment similar to the one describedin Section III-D to compare the effect of the three activationfunctions σ, csgn, and csgm in a CV-RCNN. Precisely, weconsidered the exponential CV-RCNN with α = 10. Forp = 12, the three variations of the exponential CV-RCNNalways recalled one of the fundamental vectors. Hence, fora better comparison, we increased considerably the numberof fundamental vectors from 12 to 4096.

In analogy to Fig. 2, Fig. 4 shows the probability ofsuccessful recall of the exponential CV-RCNN equipped withthe activation functions σ, csgn, and csgm by the parameterε of the complex-sigmoid activation function. Again, the ac-tivation functions csgn and σ yielded constant lines because

they do not depend on ε.Note that the σ-based model outperformed the other two

variations of the exponential CV-RCNN. Also, such as theCV-DNN described by (11), the recall probability of thecsgm-based exponential CV-RCNN visually approaches therecall probability of the csgn-based model as ε → 0. Thelarger recall probabilities of the former are attained forε ∈ [0.16, 0.88]. In this interval, the csgm-based modelyielded a recall probability similar to the σ-based exponentialCV-RCNN.

C. Some Remarks on the Recall Phase and Storage Capacity

In the computational experiment performed in the previoussubsection, the exponential CV-RCNN – with the continuous-valued activation function σ – exhibited high probability torecall one of the fundamental vectors under presentation of arandomly generated input vector. In order to obtain someinsights on the nature of the recalled vector, we slightlyadapted this experiment to count the number of times thenetwork recalls the fundamental vector that is closer to

3392

Page 7: [IEEE 2014 International Joint Conference on Neural Networks (IJCNN) - Beijing, China (2014.7.6-2014.7.11)] 2014 International Joint Conference on Neural Networks (IJCNN) - An introduction

0

0.2

0.4

0.6

0.8

1

1.2

0 0.2 0.4 0.6 0.8 1 1.2

Rec

all P

roba

bilit

y

Epsilon

csgmcsgn

sigma

Fig. 4. Probability of successful recall of the exponential CV-RCNN withthe activation functions csgn, σ, and csgm by the parameter ε of the latter.

0

0.2

0.4

0.6

0.8

1

12 4096

alpha=10alpha=20alpha=30

Fig. 5. Probability of recalling the fundamental vector that is closer to theinput vector in the Euclidean distance sense.

the input vector using the Euclidean distance. A visualinterpretation of this experiment for the parameters α = 10,α = 20, and α = 30 can be appreciated in Fig. 5 when thenumber of stored items is p = 12 or p = 4096. Visually,the probability of recalling the fundamental vector that is themost similar to the input vector – in the Euclidean distancesense – increases with the parameter α. Nevertheless, thefundamental vector recalled by the exponential CV-RCNNmay be related to the input vector by means of a differentsimilarity measure for small values of α.

Let us now turn our attention to the storage capacity of theexponential CV-RCNN. For each length n ∈ {2, 3, . . . , 20},the following steps have been performed 1000 times toestimate the larger number p of fundamental vectors in whichthe network implements perfect recall:

101

102

103

2 4 6 8 10 12 14 16 18 20

Avera

ge o

f th

e E

stim

ate

d S

tora

ge C

apaci

ty

n

alpha=3 alpha=4

alpha=5alpha=6alpha=7

Fig. 6. Estimation of the storage capacity of the exponential CV-RCNN.

TABLE ICOEFFICIENT AND BASE OF THE EXPONENTIALS Acn DEPICTED AS

STRAIGHT LINES IN FIG. 6.

α 3.00 4.00 5.00 6.00 7.00A 3.43 4.51 5.39 4.18 3.50c 1.05 1.12 1.22 1.39 1.58

1) Initialize with an empty set U = ∅ of fundamentalvectors and let p = 0.

2) While the associative memory implements perfect re-call or p ≤ pmax, do:

a) Increment p, i.e., p← p+ 1.b) Generate an uniformly distributed multistate vec-

tor of length n using a resolution factor K = 8.c) Convert the multistate vector into a complex-

valued vector up using (9).d) Add the complex-valued vector into the funda-

mental vector set, i.e., U ← U ∪ up.3) The storage capacity of the memory is estimated as

p− 1.

Due to computational limitations, we limited the number offundamental vectors to p ≤ pmax = 1500.

The semi-log plot on Fig. 6 displays the average of theestimated storage capacity of the exponential CV-RCNN withthe parameters α = 3, 4, 5, 6, and 7. We also included in thissemi-log plot the straight lines corresponding to exponentialsof the form Acn obtained by ordinary least-squares. Thecoefficient A and the base c for each value of the parameterα can be found in Table I. Such as the bipolar ECNN, thestorage capacity of the exponential CV-RCNN visually scalesexponentially with the length of the stored vectors.

VI. CONCLUDING REMARKS

In this paper, we generalized the bipolar recurrent corre-lation neural networks (RCNNs) of Chiueh and Goodmanto the complex domain. Before presenting the novel models,

3393

Page 8: [IEEE 2014 International Joint Conference on Neural Networks (IJCNN) - Beijing, China (2014.7.6-2014.7.11)] 2014 International Joint Conference on Neural Networks (IJCNN) - An introduction

called complex-valued RCNNs (CV-RCNNs), we briefly re-viewed the original RCNNs as well as some complex-valueddynamic associative memories. Particular attention was givento the complex-valued activation functions csgn, csgm, andσ in Section III.

Such as the original bipolar RCNNs, the CV-RCNNs areimplemented by the fully connected two layer recurrentneural network depicted in Fig. 1b). The first layer evaluatesa continuous non-decreasing function f at the real part of theinner product of the input vector and each fundamental vec-tor. The second layer projects the components of a weightedsum of the fundamental vectors into the complex unit circleby means of the continuous-valued activation function σ.Furthermore, we pointed out that the sequence of statesproduced by a CV-RCNN always converge to a stationarystate in both synchronous and asynchronous update modes.Therefore, they are able to realize associative memories.

Preliminary computational experiments revealed that thestorage capacity of the exponential CV-RCNN scales expo-nentially with the length of the stored vectors. Also, giventhat this CV-RCNN implements perfect recall, the probabilityof recalling the fundamental vector which is the most similarto the input vector in the Euclidean distance sense increaseswith the value of the parameter α.

REFERENCES

[1] S. Haykin, Neural Networks and Learning Machines, 3rd ed. UpperSaddle River, NJ: Prentice-Hall, 2009.

[2] K. Steinbruch, “Die Lernmatrix,” Kybernetick, vol. 1, pp. 36–45, 1961.[3] R. Hecht-Nielsen, Neurocomputing. Reading, MA: Addison-Wesley,

1989.[4] M. H. Hassoun, Fundamentals of Artificial Neural Networks. Cam-

bridge, MA: MIT Press, 1995.[5] J. A. Anderson, A. Pellionisz, and E. Rosenfeld, Eds., Neurocomput-

ing: Directions for Research. Cambridge, MA: MIT Press, 1990,vol. 2.

[6] T. Kohonen, Self-organization and associative memory, 2nd ed. NewYork, NY, USA: Springer-Verlag New York, Inc., 1987.

[7] J. J. Hopfield, “Neural Networks and Physical Systems with Emer-gent Collective Computational Abilities,” Proceedings of the NationalAcademy of Sciences, vol. 79, pp. 2554–2558, Apr. 1982.

[8] M. H. Hassoun, Ed., Associative Neural Memories: Theory and Im-plementation. Oxford, U.K.: Oxford University Press, 1993.

[9] J. Hopfield and D. Tank, “Neural computation of decisions in optimiza-tion problems,” Biological Cybernetics, vol. 52, pp. 141–152, 1985.

[10] K. Smith, M. Palaniswami, and M. Krishnamoorthy, “Neural Tech-niques for Combinatorial Optimization with Applications,” IEEETransactions on Neural Networks, vol. 9, no. 6, pp. 1301–1318, Nov.1998.

[11] Q. Liu and J. Wang, “A One-Layer Recurrent Neural Network forConstrained Nonsmooth Optimization,” IEEE Transactions on Systems,Man, and Cybernetics, Part B, vol. 41, no. 5, pp. 1323–1333, Oct.2011.

[12] R. J. McEliece, E. C. Posner, E. R. Rodemich, and S. Venkatesh, “Thecapacity of the Hopfield associative memory,” IEEE Transactions onInformation Theory, vol. 1, pp. 33–45, 1987.

[13] I. Kanter and H. Sompolinsky, “Associative Recall of Memory withoutErrors,” Physical Review, vol. 35, pp. 380–392, 1987.

[14] M. Morita, “Associative memory with nonmonotone dynamics,” NeuralNetworks, vol. 6, no. 1, pp. 115–126, 1993.

[15] T. Chiueh and R. Goodman, “Recurrent Correlation Associative Mem-ories,” IEEE Trans. on Neural Networks, vol. 2, pp. 275–284, Feb.1991.

[16] ——, Recurrent Correlation Associative Memories and their VLSIImplementation. Oxford, U.K.: Oxford University Press, 1993, ch. 16,pp. 276–287.

[17] A. Dembo and O. Zeitouni, “General Potential Surfaces and NeuralNetworks,” Physical Review A, vol. 37, no. 6, pp. 2134–2143, 1988.

[18] R. C. Wilson and E. R. Hancock, “Storage Capacity of the ExponentialCorrelation Associative Memory,” Neural Processing Letters, vol. 13,no. 1, pp. 71–80, Feb. 2001.

[19] ——, “A study of pattern recovery in recurrent correlation associativememories,” IEEE Transactions on Neural Networks, vol. 14, no. 3, pp.506–519, May 2003.

[20] E. R. Hancock and M. Pelillo, “A Bayesian interpretation for the expo-nential correlation associative memory,” Pattern Recognition Letters,vol. 19, no. 2, pp. 149–159, Feb. 1998.

[21] R. Perfetti and E. Ricci, “Recurrent correlation associative memories:A feature space perspective,” IEEE Transactions on Neural Networks,vol. 19, no. 2, pp. 333–345, Feb. 2008.

[22] C. Garcıa and J. A. Moreno, “The Hopfield Associative MemoryNetwork: Improving Performance with the Kernel “Trick”,” in LectureNotes in Artificial Inteligence - Proceedings of IBERAMIA 2004, ser.Advances in Artificial Intelligence – IBERAMIA 2004. Springer-Verlag, 2004, vol. 3315, pp. 871–880.

[23] C. Oh and S. H. Zak, “Image recall using a large scale generalizedBrain-State-in-a-Box neural network,” International Journal of AppliedMathematics and Computer Science, vol. 15, no. 1, pp. 99–114, Mar.2005.

[24] J. Zurada, I. Cloete, and E. van der Poel, “Generalized HopfieldNetworks for Associative Memories with Multi-Valued Stable States,”Neurocomputing, vol. 13, pp. 135–149, 1996.

[25] M. Muezzinoglu, C. Guzelis, and J. Zurada, “A New Design Methodfor the Complex-Valued Multistate Hopfield Associative Memory,”IEEE Transactions on Neural Networks, vol. 14, no. 4, pp. 891–899,Jul. 2003.

[26] G. Tanaka and K. Aihara, “Complex-Valued Multistate AssociativeMemory With Nonlinear Multilevel Functions for Gray-Level ImageReconstruction,” IEEE Transactions on Neural Networks, vol. 20,no. 9, pp. 1463–1473, Sep. 2009.

[27] N. N. Aizenberg, Y. L. Ivaskiv, and D. A. Pospelov, “A certaingeneralization of threshold functions,” Dokrady Akademii Nauk SSSR,vol. 196, pp. 1287–1290, 1971.

[28] S. Jankowski, A. Lozowski, and J. Zurada, “Complex-Valued Multi-State Neural Associative Memory,” IEEE Transactions on NeuralNetworks, vol. 7, pp. 1491–1496, 1996.

[29] D.-L. Lee, “Improvements of complex-valued Hopfield associativememory by using generalized projection rules,” IEEE Transactions onNeural Networks, vol. 17, no. 5, pp. 1341–1347, September 2006.

[30] D. Psaltis and C. H. Park, Nonlinear Discriminant Functions andAssociative Memories, ser. Physica, 1986, vol. 22D, pp. 370–375.

[31] I. Aizenberg, C. Moraga, and D. Paliy, “A Feedforward Neural Net-work based on Multi-Valued Neurons,” in Computational Intelligence,Theory and Applications, ser. Advances in Soft Computing, B. Reusch,Ed. Springer Berlin Heidelberg, 2005, vol. 33, pp. 599–612.

3394