stephen j. verzi et al- universal approximation, with fuzzy art and. fuzzy artmap

Upload: tuhma

Post on 06-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Stephen J. Verzi et al- Universal Approximation, With Fuzzy ART and. Fuzzy ARTMAP

    1/6

    Universal Approximation,With Fuzzy ART and. Fuzzy ARTMAPStephen J. VerziComputer Science Department, University of New,Mexico

    Albuquerque, NM 87131, USAverzi0eece.unm.edu

    andDepartment of Electrical & Computer Engineering, University of New MexicoAlbuquerque, NM 87131, USAheileman0eece.unm.edu

    andMichael GeorgiopoulosSchool of Electrical Engineering and Computer Science, University of Central Florida

    Orlando, FL 32816, USAmichaelg0mail.ucf.eduand

    Georgios C. AnagnostopoulosDepartment of Electrical & Computer Engineering, Florida Institute of TechnologyMelbourne, FL 32901, USAgeorgio0fit .edu

    . .

    Gregory L. Heileman . .

    AbstractA measure of s k c e s s for any learning algorithm is how u s eful it is in a variety of learning situations. Those learningalgorithms that support universal function approximationcan theoretically he applied to a very large and interestingclass of learning problems. Many kinds of neural networkarchitectures have already been shown to support univer-sal approximation. In this paper, we will provide a proof toshow that Fuzzy ART augmented with a ingle layer of per-eeptrons is a universal approximator. Moreover, the FuzzyARTMAP neural network architecture, by itself, will beshown to be a universal approximator.Keywords: Adaptive Resonance Theory, Machine Learning, Neu-ral Networks, Universal Function Approximation.

    I. INTRODUCTIONIn the late 1980's and early 199O's, important theo-

    retical results were proved that showed certain classesof learning algorithms capable of universal function ap-proximation. Early on it was shown that combinationsof sigmoid functions could be used to support universalfunction approximation [I]. This result was importantsince a standard neural network perceptron computes asigmoid function. Then multi-layered feedforward neuralnetworks with either sigmoid or Gaussian kernel functionswere shown to he universal approximators [2], [3], [4].Also, radial basis function neural networks were proved tohe capable of universal function approximation [ 5 ] .Veryrecently a hybrid ART-based neural network has beenshown to be a universal approximator IS].

    In thi s paper we will show th at Fuzzy ART with only anextra layer of perceptrons can support universal approx-imation. More importantly, the Fuzzy ARTMAP neuralnetwork by itself can perform universal approximation.The result showing Fuzzy ART to be 'a universal approx-imator is an important fact in establishing the utility of

    ART-based neural network architectures as viable learn-ing techniques. A learning algorithm which is known tobe a universal approximator can he applied to a largeclass of interesting problems with the confidence th at asolution is at least theoretically available.

    Before presenting ou r main results, we will describe theFuzzy ART and Fuzzy ARTMAP neural network archi-tectures. Then we will present a proof showing how aFuzzy ART neural network extended with a layer of per-ceptrons can he used to support universal function ap-proximation. Next we will show how Fuzzy ARTMAP byitself can support this same capability. Finally, we willdiscuss the utility of Fuzzy ART, the curse of dimension-ality and practical learning algorithms.

    11. F U Z Z Y ART AND FUZZY ARTMAPFuzzy ARTMAP is a neural network architecture d e

    signed to learn a mapping between example instances andtheir associated labels [7]. These training examples aredenoted (z,y), where 2 E [O , 1Im s an example data in-stance, and y [0,m)d s its corresponding d-dimensionallabel. Fuzzy ARTMAP is composed of two Fuzzy ARTneural network modules connected through a MAP field,as shown in Fig. 1.

    During training, the pair (z,y) s preprocessed to formthe pair ((z zC),y y")) which is then presented t o theneural network. The instance z is presented t o the A-sideFuzzy ART module (ARTA) an d label y is presented tothe B-side Fuzzy ART module (ARTB) in Fig. 1. FuzzyARTMAP performs supervised learning by enforcing thatthe A-side FZ ode which learns z will only be associatedwith a single B-side FZwhich learns y .

    0-7803-7898-9/03/$17.00 02003 IEEE 1987

    http://verzi0eece.unm.edu/http://heileman0eece.unm.edu/http://michaelg0mail.ucf.edu/http://georgio0fit.edu/http://georgio0fit.edu/http://michaelg0mail.ucf.edu/http://heileman0eece.unm.edu/http://verzi0eece.unm.edu/
  • 8/3/2019 Stephen J. Verzi et al- Universal Approximation, With Fuzzy ART and. Fuzzy ARTMAP

    2/6

    Fig. 1. The Fuzzy ARTMAP Architecture.

    A . Fuzzy ARTThe Fuzzy ART neural network architecture w& de-

    signed to conduct unsupervised learning, or clustering, onreal-valued data [SI.Clustering is the process of groupingsimilar data points into cluster groups, clusters or cate-gories. The measure of similarity among data points forFuzzy ART will be specified in de tail below. I n this sec-tion, we will describe the struc ture of the Fuzzy ARTneural network architecture, followed by a detailed lookat its clustering algorithm.

    Fuzzy AR T Input Data. Th e Fuzzy ART neural net-work architecture, with complement coding, assumes tha tthe input data used to train i t is normalized to fit withinthe unit hypercube. Thus, an input d ata point, x, s a m -dimensional vector of values each of which lies betweenzero and one, inclusive (i.e., xi E [0,1], = 1 , 2 , . m).The dimension, m, is constant for a particular learningproblem. Th e complement of an input vector, x, s nowwell defined as x: = 1- xi, = 1,2 , . . m.

    Fuzzy ART is structuredinto three layers of interacting nodes, labeled Fo, FI andFzrwhere the output of FO is connected to 4, nd FIand FZ are mutually interconnected, as shown in Fig. 1.At Fa, a m-length input vector from the environment iscomplement coded and passed on to Fl. The process ofcomplement coding a pattern vector, x, produces a newvector I = ( x , x c ) ,where x c s the complement of x de-fined previously.

    There are 2m nodes in layer Fl, and N 2 1 nodes inlayer Fz. Each node in the F2 layer is fully connected,by a weighted link, wj in Fig. 1 , to each node in the FIlayer. Th e number of nodes in the FZ layer is allowedto grow as necessary during learning. An FZ layer nodethat has learned at least one data point is called commit-

    f i z z y ART Structure.

    Actually, there are bottom-up and top-dawn weights connectingthe nodes in the F, layer to the nodes in the Fz layer. In this paper,the topdown weights representing the cluster template are the onlyweights of interest, and so these weight will be referred to as w j .

    ted. A Fuzzy ART neural network module always has oneuncommitted node in the F Z layer available for training,along with N -1 committed nodes. When the uncommit-ted node learns its first data point, a new uncommittednode then becomes available, and N s increased by one.Each committed Fz node and its associated weights, wjrepresents a separate category of input da ta, also called acategory template.Th e output vector, y, from a Fuzzy ART network con-sists of boolean values signifying those FZnodes which areactive. Thus

    (1), if FZ node j is active = 0, otherwise{where 1 5 j 5 N. ote that the uncommitted node, N,savailable in (1).Th e operation of Fuzzy ART ensures th atonly a single Fz node is active for a given pattern. Wewill see this in more detail in the algorithmic descriptionpresented next.

    Fuz zy A R T Al gor ith m. The Fuzzy ART algorithmdescribed here is a combination of work presented by Car-penter and Moore, however, the symbols used will reflectthose used throughout the rest of this paper for consis-tency [SI, 9]. For a given input data point, the FuzzyART learning algorithm has three stages. First , the in-put is complement coded. Then the best matching FZnode is found for th e complement coded input da ta. Notethat the FZ node found might be th e uncommitted node,and initially, a Fuzzy ART neural network architecturehas only the single uncommitted node available for learn-ing. Finally, the best matching F2 node found is allowedto learn th e new dat a point. Given a complement codedinput vector, I, he similarity measure at node j of theFz layer, called Tj(I),s computed as a weighted sum of Iand the weights w j , shown in (2). Note tha t these weightsconnect the FI layer nodes to node j in the FZ ayer.

    The mathematical formula used by Fuzzy ART to findthe best matching category template during cluster for-mation is

    J = arg maw Tj ( I ) , (2 )O < j < N

    The parameter a, alled the choice parameter, is usually asmall positive quantity, A is the element-wise vector minoperator, and I . I is the &norm of a vector. The bestmatching FZ node from th e choice competition, J , mustsatisfy the vigilance criterion

    (4)The vigilance parameter, pi n (4), is a user-supplied inputbetween zero and one. Note th at at least one FZnode, the

    1988

  • 8/3/2019 Stephen J. Verzi et al- Universal Approximation, With Fuzzy ART and. Fuzzy ARTMAP

    3/6

    uncommitted node, will always satisfy the vigilance crite-rion. Th e madmum choice Fz template node satisfyingthe vigilance criterion is allowed to learn the input vector,a condition called resonance. Ties between F2 nodes withthe same choice value are broken by assigning an indexto all FZ nodes, and choosing the node with the lowestindex value in a tie. The index values are assigned whenF2 nodes are committed.Initially all template weights w j are set to one, andlearning proceeds as follows

    w y 4= p (1 A w y d ) )+ (1- p)wYd), ( 5 )where /3 is the learning parameter. In this paper we use= 1, which is a special case called fast learning. Note

    that learning only occurs at the winning FZ node, J , dur-ing resonance. An important feature of Fuzzy ART is tha tthe F2 layer grows as needed for a particular problem.Fuzzy ART Fz Node Category Template. .TheFuzzy ART neural network module accepts a vector ofvalues as input, hut it also produces a .vector of valuesas output. A committed Fuzzy ART FZ node j ha s aweight vector defined as wj = X I A xz A . . A x,, whereF2 node j has learned all of the input data points inX = {xl, 2,. ,x,}. Because of complement coding, wjdefines th e minimum hyperhox containing the da ta pointsin X. Th e vigilance criterion ensures that lwjl 2 .

    2 m 2m

    Thus, wj = (pq') where pk = minic{~,~, , , , ,n}ik and q k =maxiE~l,z , . . , ,n~i b . The axis-parallel hyper-rectangle forw j has a minimum point at p and a maximum point a tq . The first m points from w j are the "lower left" corner,and the second m points are the complement of the "up-per right" corner of the hyperbox defined by the FZ odej . The vigilance parameter, p, can he used to controlthe granularity of clusters covering the problem space. Alarger p value will force Fuzzy ART to create smaller clus-ters , nccessitating more clusters t o cover a larger problemspace. A smaller p value will allow Fuzzy ART to createlarger clusters, meaning fewer clusters are needed to covera problem space.B . Fuzzy ARTMAP

    The Fuzzy ARTMAP architecture shown in Fig. 1con-sists of two Fuzzy ART modules connected by a MAPfield. Th e ARTA module is given pattern da ta and theARTB module is given label data for a given supervisedlearning task . The MAP field links data cluster templates(A-side) with label cluster templates (B-side). Supervisedlearning is performed in Fuzzy ARTMAP by ensuring thateach ARTA template is linked with only one ARTB tem-plate. Thus , a many-to-one association from pattern t olabel templates is formed in the Fuzzy ARTMAP MAPfield.

    1s

    The Fuzzy ARTMAP MAP field weights, wff, are usedto control associations between A-side F2 nodes and B-side Fz nodes. An uncommitted A-side FZ node, j , hasthe following initial weight values

    = 1, Vk , 0 5 k 5 N E , (7)meaning that j s not currently associated with any B-side F2 node (there are N B B-side F2 nodes), and in factit is available for future learning. An uncommitted A-side F2 node j becomes committed with B-side FZ nodeK through the following weight assignments

    wf/ = 1 ,and w = 0, Qk # K , (8)thus A-side FZ node, j , is exclusively and permanentlylinked with B-side F2 node, K .

    The fizzy ARTMAP architecture ensures the many-to-one mapping through the use of a match tracking lat-eral reset, as shown in Fig. 1. The lateral reset is usedin Fuzzy ARTMAP to ensure that each triinin g patternresonates with an A-side F2 ode associated with a B-side Fz node that is consistent with the pattern's label.After a bounded number of epochs,' Fuzzy ARTMAP isguaranteed to reach a steady state [lo]. Note that duringtesting it is possible for a test pattern, never seen before,to choose the uncommitted node. In this case no B-sidelabel prediction is possible.

    111. MAINRESULTSIn this chapter, we will present a proof showing the

    universal approximation capabilities of the Fuzzy ARTneural network. Actually, these results will apply to amodified Fuzzy ART network. It will also be shown thatthe Fuzzy ARTMAP network can he used without furthermodification to perform universal approximation.In order to show tha t the Fuzzy ARTFZnode is capableof universal approximation, it will be necessary to showthat the Fuzzy ART neural network architecture is capa-ble of computing any member of a sequence of functions,S = Usn(x)which are dense in LP(F2"'). Actually, FuzzyART with complement coding operates in- theunit hyper:square, thus the space of interest is P ( [ O , l I m ) where[0, I]"' C Em. Saying that sn(z ) is dense in LP([O,]"')where 1 5 p < M is equivalent to saying that for everyf E LP([O,]"') and every c > 0, there exists Q E S suchthat 11 f - Qllp 5 E . If we have S C LP([0, 1'") and ourmodified Fuzzy ART architecture can be shown to com-pute any member of S, then we will have shown that thisneural network is capable of universal function approxi-mation in LP([O,11"').

    Fuzzy ART Fz nodes conduct data clustering similar tothe internal layer nodes of a radial basis function (RBF)neural network 151. By itself then, the Fuzzy ART modulecannot be expected to perform function approximation,but by adding an output layer of perceptron nodes, thisapproximation can be achieved. The internal layer of an

    M9

  • 8/3/2019 Stephen J. Verzi et al- Universal Approximation, With Fuzzy ART and. Fuzzy ARTMAP

    4/6

    for n = 1 , 2 , 3 , . . .2"

    l xFig.2. Modified &zzy ART far Universal Approximation,

    RBF neural network is connected to an output layer ofnodes to perform approximation in a similar manner, seeFigure 1in [5]. In Fig. 2, the Fuzzy ART neural net-work module is connected to a n output layer perceptron.Universal approximation for the Fuzzy ART will applyto an m-dimensional input space, but for simplicity inthis paper, only a single input dimension will be consid-ered. Also, the approximation output will, in general, fitmulti-dimensional output, but for simplicity, only a sin-gle output dimension will be considered, and thus, only asingle output layer perceptron is used, as shown in Fig. 2 .

    There are several steps necessary to prove the univer-sal approximation capabilities of the Fuzzy ART FZ node.Given a measurable function f > 0, the first task will beto determine a sequence of functions which will be usedto approach f from below. This sequence of functionswill rely upon partitioning of the domain of f into dis-joint sets. Next, the Fuzzy ART F2 node will be shownto be capable of computing the indicator function for anarb itrary member of these disjoint sets. It is this indicatorfunction th at will actual ly be used in th e sequence of func-tions tha t we are interested in. Finally, these results willbe pulled together with the construction and specificationof the modified Fuzzy ART neural network architecture,shown in Fig. 2, for computing th e sequence of functionsfor approximating f , nd this sequence of functions willbe shown to be dense in LP([O,11"').

    Given a measurable function f E LP([O,l]"), f 2 0and e > 0, Consider the following sequence of functions

    j= l

    where t he coefficients Cn,jare determined using and thediadic sets Dn,j C [0,1]. Diadic sets, also called diadicintervals, boxes or cubes are members of R = R I U Rz URSU . .,where 0, is defined as the collection of all 2-"boxes with corners at P,, and P, is the set of all z E L"whose coordinates are integer multiples of 2-" [ll].Thus,Dn,jE 0,. 0's density in L" and its capacity to cover(measurable) sets is exploited by th e sequence of functionsdefined in (10) [ l l ] , 12]. Later i t will be shown thatsn(z) f(z) or all z except for a set of measure less thanc. Note that sn(x)are simple functions if the coefficientstake on a finite number of values, and this will he shownbelow. Thus sn(z)E LP([O,l]").

    Next, it will be shown that the modified Fuzzy ARTneural network architecture in Fig. 2 can be configuredsuch that it computes sn(z)n (10). The first step is toshow tha t Fz nodes in Fig. 2can compute the indicatorfunctions of Dn,jfrom (10) with proper network quanti-ties including weights, indexes and vigilance values).

    Lemma 1: The Fuzzy ART F2 node can compute theindicator function for D,,j E 0,.Proof. Given Dn,j, the network quantities for a FuzzyART FZ node will be determined so that it computesXO . , ~ ( Z ) . A Fuzzy ART Fz node has three quantitiesthat need to be determined, wTA, p and the node index6 . The minimum point in Dn,j is and the mini-mum point not in Dn,j s$ 2-" = $. Therefore, theweights become

    Th e function computed by th e Fuzzy ART F2 node j is1 if j = arg max T,(I(z))0 otherwise (11)< i < NYj(Z) =

    I ( 2 ) = (2 C )

    where I ( r ) s the complement coded value for z and a sa small positive number. Note th at because of comple-ment coding I / 1(z) 1, Vz (in general this will be m,but here m = 1) . The vigilance parameter will have thefollowing value

    1990

  • 8/3/2019 Stephen J. Verzi et al- Universal Approximation, With Fuzzy ART and. Fuzzy ARTMAP

    5/6

    Thus, yj(z)will only be 1 if I E [G,&]. And so theFuzzy ART F2 node j computes the indicator functionfor the closed interval [F ,3 and F2 node j + 1 willcompute the indicator'function for [&,F ]. ote thatthese two overlap at p. n Fuzzy ART ties between com-peting F2 nodes are broken by assigning an index value tothe competing Fz nodes and choosing the FZ ode withth e lowest index. The index values, V,, for the modifiedFuzzy ART FZ nodes become V, = 2" - + 1. Thus,yj(z) = x [ ~ , ~ ) ( ~ )or 1 5 < 2", and y z - ( I ( s ) ) =X [ ~ , ~ I ( Z ) .herefore, the Fuzzy ART FZ odes con-structed asdescribed above compute the disjoint intervalsDn,j for 1

  • 8/3/2019 Stephen J. Verzi et al- Universal Approximation, With Fuzzy ART and. Fuzzy ARTMAP

    6/6

    defined above, a separate Fuzzy ART network is neededsince there is only a single vigilance parameter p . FuzzyART F2 nodes, in a single network, can represent anyDn,j E an.A . Universal Approximation With Fuzzy A R T M A P

    It is a very simple extension to use Fuzzy ARTMAP,by itself, instead of the modified Fuzzy ART neural net-work to perform universa.1 approximation. With FuzzyARTMAP, we need to construct the A-side Fuzzy ARTmodule, the B-side Fuzzy ART module and the MAPfield. The A-side Fuzzy ART module will be constructedprecisely as described in the previous section, however,instead of using the ext ra layer of perceptron nodes withtheir association weights, we use the Fuzzy ARTMAPMAP field and the B-side category templates. There willbe one B-side template for each of the Lehesgue intervals,E,,,, in (14) . The value of th e B-side template weight willbe exactly the same as the coefficients used in the mod-ified Fuzzy ART network, Cn,j in (14) . Note th at therewill be no complement coding in the B-side Fuzzy ARTmodule. Next, the MAP field will he used to computethe minimum intersection between the diadic cube thatA-side node J computes and th e pre-images of all B-sidenodes 1 5 k 5 N E , A,,,, in (14) . Therefore, w;lks = 1if A,,j = k, otherwise w$? = 0. Because each An,j isunique, then the constructed MAP field will conform toFuzzy ARTMAP learning in that each A-side F2 nodeis associated with only a single B-side F2 node. Thus,given input z, the value output by this Fuzzy ARTMAPnetwork will be Cn,jwhere z E D,,j.

    IV . CONCLUDINGEMARKSAn example of an ART-based neural network archi-

    tecture that can perform such universal approximationdescribed above is BARTMAP-SRM [13]. BARTMAP-SRM operates using th e diadic hyperhoxes described prc-viously. The network is initiated using the unit hyper-box, which is subsequently split in half across each dimen-sion creating 2 new squares for an m-dimensional inputlearning space. Note that BARTMAP-SRM as well asthe network constructions described in our main resultsabove both suffer from th e curse of dimensionality [14].This means tha t for an m-dimensional input space, an ex-ponentially large number of internal layer Fz nodes maybe required to reach a final solution. BARTMAP-SRMw as not designed as a practical solution for high dimen-sion input problems bu t rather as a theoretical constructfor demonstrating the universal approximation capabili-ties. Another modification to Fuzzy ART, called BoostedART has been shown to represent the same data spacewith exponentially fewer F2 nodes, with respect to theinput dimension m [15]. It is hoped that Boosted ARTcan be used in high dimension input spaces to help ad-dress the curse of dimensionality.

    In this paper we have shown that Fuzzy ART aug-mented with a single layer of perceptron nodes can s u pport universal function approximation. Furthermore, wehave demonstrated that Fuzzy ARTMAP, by itself is auniversal approximator. These results establish both ofthese neural network architectures asviable learning tech-niques on a large class of learning problems. These re-sul ts continue to suffer from the curse of dimensionality,as do other universal approximation results. Our futureresearch continues to expand upon these results in design-ing practical ART-based neural network architectures forconducting learning.

    ACKNOWLEDGMENTThe authors from the University of Central Florida ac-

    knowledge the partial support from N SF through a CRCDgrant number 0203446 entitled Machine Learning Ad-vances for Engineering Education.

    REFERENCESG. Cybenko, Approximation by superpositions of a sigmoidalfunction, Mathematical Journal of Contml, Signals, and Sys-temics. vol. 2. DD.303-314. 1989...K.Funahashi, On the approximate realization of continuousmappings by neural networks, Neuml Networks, vol. 2, pp.183-192. 19s9.. ~~~~~.~~K . M. Homik, M. Stinchcombe, and H. White, Multllayerfeedforward networks are universal approximators, NeumlNetworks, vol. 2, pp. 359-366, 1989.E. J . Hartman, J . D.Keeler, and J . M. Kowalski, Layeredneural networks with gaussian hidden units as universal a pprosimators, Neuml CompuQtlOfI, ol. 2, pp. 210-215, 1990.3 . Park and I. W. Sandberg, Universal approximation usingradial-basis-function networks, Neuml Computation, vol. 3,pp. 245-257,1991.L. Marti, A. Policriti, and L. Garcia, AppART: An ARThybrid stable learning neural network for universal functionapproximation, in H y b r i d Information Systems,A . Abrahamand M. Koeppen, Eds., Heidelberg, Ju l 2002 , pp. 92-120, Phys-ica Verlag.G . A. Carpenter, S. Grassberg, N . Markuzon, J . H. Reynolds,and D.B. Rosen, Fuzzy ARTMAP: A neural network ar-chitecture for incremental supervised learning of analog multi-dimensional maps, IEEE Thnsoctions on Neuml Networks,vol. 3, no . 5,pp. 698-713, 1992.G . A. Carpenter, S. Grossberg, and D. . Rosen, lz zy ART.Fast stable learning and categorization of analog patterns byan adaptive resonance system, Neuml Networks, vol. 4 , no .5 , pp. 759-771, 1991.B. Moore, ART 1 and pattern clustering, in Proceedingsof the 1988 Connectionist Summer School. 1988, pp. 174-185,Morgan Kaufmann.M. Georgiopoulos, J . Huang, and G. L. Heileman, Propertiesof IearninR in ARTMAP, Neuml Networks, vol. 7, no. 3, pp.495-506, i994.W. Rudin, Reo1 ond Complez Analysis, McGraw-Hill, NewYork, second edition, 1974.E. DiBenedetto, Iienl Analysis, BirkhBuser, Boston, 2001.S. J. Verzi, G . L. Heileman, M. Georgiopoulos, and G. C.Anagnostopoulos, Off-line structural risk minimization andBARTMAP-S, in Proceedings of th e Internationol Joint Con-ference on Neuml Networks, 2 0 0 2 .H. White, Artificial Neuml Networks: Appmzimation andLearning Theory, Blackwell, Cambridge MA, 1992.S . J . Verzi, Boasted ART ond Boosted ARTMAP: Eztensionsof Fuzzy A R T and Fuzzy ARTMAP, Ph.D. thesis, Universityof New Mexico, Albuquerque, New Mexico, 2002.

    1992