fukushima cognitron
TRANSCRIPT
Fukushima, K
The Cognitron - First Multilayered Network (1975
In 1975, inspired by the self-organization ability of the brain, Kunihiko Fukushima from Japan introduced the Cognitron network as an extension of the original perceptron. Like the Original Perceptron the Cognitron is a pattern regularity detector meaning it is able to learn patterns without some mechanism (a teacher) to indicate the success or non-success of a pattern match. Unlike the original Perceptron the Cognitron is better able to handle (but not perfectly) the pattern subset problem in which one pattern is completely contained within the other.
It does this by using a special inhibitory input to the convergent subcircuit node which tends to counteract the effects of larger patterns. Also unlike the original Perceptron the Cognitron can discriminate to some degree between analog patterns although binary patterns are usually presented to the first layer.
A basic unit (section) of the Cognitron having two convergent subcircuits is shown in figure 16. It has four input lines labeled A through D. Notice lines B and C are common to both convergent subcircuits. It learns by increasing the weights on the the active convergent subcircuit lines of the subcircuit selected by the gate comparitor as having the greatest output.
Like the original perceptron, the best match is simply strengthened. The rule for adjusting each positive line weight in a selected convergent subcircuit is:
Facilitory Weight Increment = (Proportionality Constant) * (Positive Line Value) / (Number of Subcircuit Inputs).
In the Cognitron the weights can increase without limit but this is balanced by increasing the weights on the inhibitory inputs at the same time. The rule for adjusting the weight is:
Inhibitory Weight Increment = (Pattern Generality Constant) * [(Sum of all positive inputs into the subcircuit node) / (Total Pattern Value)]
The Pattern Generality Constant in the original paper was 1/2 and it is needed to help define the dynamic equilibrium of the network between the positive and negative line values feeding into the subcircuit node. This dynamic equilibrium in turn defines the degree of pattern discrimination versus pattern generality.
Dynamic equilibrium effects are best seen in the example shown in figure 16 which represents a Cognitron section at a particular moment in time in which the positive weights have a value of 1 and the top inhibitory weight has a value of 0.75 while the bottom subcircuit has an inhibitory weight value of 0.6. These weights allow a subset pattern discrimination (such as 1,1,1 verses 0,1,1). Yet this discrimination is only possible if the bottom inhibitory weight has a value between 0.75 and 0.45. Any weight value outside that range forces the two patterns to be classified as belonging to the same general class.
The choice of the Pattern Generality Constant is what ever works for no analytical derivation as to its value for any degree of generalization has yet been devised. Also one would think that it would be a prime candidate to be adaptively determined itself but no method has yet been devised for that either.
With such a narrow range for subset pattern discrimination the number of input lines of a convergent subcircuit needs to be rather small in order to preserve resolution. The percentage difference between patterns having 20 and 21 binary values is not as great as the difference between patterns having 3 and 4 binary values. Consequently, Fukushima divided the Cognitron into repeatable sections and to connect the sections he was forced to use several layers. This use of multiple layers was to inspire other multilayered yet quite different networks in the future (such as the hybrid network below).
In order to combat the ever increasing line values due to ever increasing weight values Fukushima did not use simple summation and subtraction operations for the convergent subcircuit node. Instead he combined the positive and subtractive nodal inputs with a formula which slows the growth of the output value. The exact equation is (e - h)/(1 + h) where e is the exitory or additive input and h is the inhibitory or subtractive input. Also the gate comparitor circuit, like all those found in all pre-multivalued logic neural networks, is based upon lateral inhibition.
The many layers and sections of the Cognitron allowed it to be modified so that it could respond in the same way (having the same final output) to the same object moved around in a visual field. This modification was called the Neocognitron by Fukushima who published it in 1980. All that was done was to add another set of summation nodes (effectively acting as logical OR operations) after a layer's gate comparitor which summed all the outputs from all the convergent subcircuits in the same location of each section. (see figure 17).
If a feature pattern was moved it would be in the same location in some new section has it had been in its old previously learned section. Consequently it would activate the same OR-like summation node as before to effect position independence which is limited only by the degree of overlap between sections (if the sections do not overlap very much then the pattern would have a low probability of being in its exact relative location in the new section).
Bibliografia
Fukushima, K (1975) Cognitron: A Self-organizing Multilayered Neural Network, Biological Cybernetics, 20:121-136
Fukushima, K (1980) Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position, Biological Cybernetics 36, 193-202
Prof. Shun-ichi Amari
Laboratory For Mathematical Neuroscience Brain Science Institute The Institute of Physical and Chemical Research (RIKEN) Hirosawa, 2-1, Wako-shi, Saitama, 351-0198, Japan Phone: +81-48-467-9669 (dial-in) Fax: +81-48-462-4687 E-mail: [email protected]