heuristic self-organization in problems of engineering ... · pdf fileheuristic...

Automatica, Vol. 6, pp. 207-219. Pergamon Press, 1970. Printed in Great Britain.

Heuristic Self-Organization in Problems of EngineeringCybernetics*

Auto-organisation heuristique dans les problemes de cybernetique technique

Heuristische Selbstorganisation bei Problemen der technischen Kybernetik

SepHCTHHecKaa caMOOpramoaijHH B npoGjieMax TexuHnecKOH KnGepneTHKH

A. G. IVAKHNENKOt

An analysts of engineering cybernetics shows that the current deterministic approachcan only solve comparatively simple problems. A new approach called heuristic self-organization is needed for solving complex problems.

Summary—The systems, or programs, of heuristic self-organization are defined as those which include the generatorsof random hypotheses, or combinations, and several layersof threshold self-sampling of useful information. The com-plexity of combinations increases from layer to layer. Aknown system, Rosenblatt's perceptron, may be taken as anexample.

The Group Method of Data Handling (GMDH) basedon the principles of heuristic self-organization is developedto solve complex problems with large dimensionality whenthe data sequence is very short. Two examples are given toillustrate how this method applies to problems of predictingrandom processes and to identifying characteristics of amultiextremum plant.

One: Heuristics are groundless decisions whichhave no mathematical proofs. They give usthe results which are only good enough forpractice, but they are not the best ones.

The other: No \ Heuristics are decisions in afield irrel-evant to the subject and competence ofmathematics. The results of heuristics areoften much better than those which can beobtained from a formalized approach.

HEURISTIC SELF-ORGANIZATION

A DISPROPORTIONAL development of two basic partsof cybernetics may now be seen: a dominance ofwork using deterministic approaches and analmost complete lack of work concerning practicaluse of heuristic self-organization. Although theideas of self-organization have been discussedmany times in the well known papers of N. Wiener,J. von Neumann, G. Pask, R. Ashby, S. BEER [1],A. Ya. Lerner, V. S. Fain and others, the papers

* Received Novembet 1968; revised 22 April and 1 October1969. The original version of this paper was not presentedat any IFAC meeting. It was recommended for publicationin revised form by associate editor B. Gaines.

t Institute of Cybernetics, Kiev, USSR.

and books of the sixties only repeat concepts whichhad already been stated 10 or 20 years ago. Therehas been almost no progress in this field.

But it is clear that only self-organization andideas associated with it can justify the very existenceof cybernetics as a science on the general approachto problems which are different by their nature.The present-day deterministic approach is as-sociated with the analysis of system inputs andoutputs. The specific features of each particularproblem is of main importance, and this resultsin a situation where all problems related to com-puters are related to cybernetics. Such a viewpointand the more universal original idea of cybernetics,given by N. Wiener, are at variance. Certainmethods often associated with cybernetics, suchas the "black box" idea, are now considered notto be constructive. Instead, self-organizationconcepts must re-establish the general ideas ofcybernetic sand show their constructiveness.

Moreover, too much confidence in the deter-ministic approach nonpluses us. It is now clear thatit is impossible to solve many practical problems,such as the problem of automatic synchronoustranslation from one language into another, or theproblem of classification when 200-300 classes areinvolved, and so on, by deterministic methods.Self-organization must be used to find a way outof this impasse. However, in order to do this,it is necessary to begin with practical problems,and having decided to make an attempt, we tooksome first steps by solving various problems ofpattern recognition, of random processes predic-tion, and multiextremum plant identification [2-6].

For the present we cannot given an exact mathe-matical definition of "self-organization", but it isclear that self-organization is necessary when it isimpossible to trace all input-output relations

207

208 A. G. IVAKHNENKO

throughout an entire system which is too complexfor the purpose [1]. Therefore, we must use thenotion of general "integral influences" which actupon a network of components, each having its own"elementary algorithm" of action.

The integral influence is defined to be one whichis not found from an analysis of a complex system.It does not use information about the state ofeach particular component of the system, but it ischosen by the summary result of active responses.

Automatic control theory, as mathematicsitself, has been developed as a pure deductivetheory for investigating causes and consequences,inputs and outputs. The arrows and squares ofblock diagrams are embedded in our consciousnessso deeply that we may say for certain that thecontrol theory actually impedes the developmentof spontaneous processes control.

As previously indicated, self-organization is theart of controlling spontaneous processes throughuse of integral influences.

An income tax is a good example of integralinfluence, because market spontaneity can be con-trolled by changing a nonlinearity—the income tax.If the nonlinearity is high, the income tax maybecome an integral action of a threshold type:nothing from the poor and all from the rich.

The simplest realization of integral influences incybernetics, for example, in the perceptron [7],is a threshold unit permitting only some inputs topass. In fact, we have used this simplest type ofintegral influence in solving the three interpolationproblems mentioned above.

Finally, self-organization should be associatedwith heuristics which we mean to be conjectures inevaluating a course of problem solution by man.In this respect, self-organization resembles asandwich: after mathematical processing of infor-mation, a "layer" of heuristic evaluation of theresults follows, and this process is repeated severaltimes. Man controls the course of the solution bycontinuously directing its way to desired resultsby means of integral influences. That is whyheuristic self-organization ensures an accuracywhich could not be reached by the use of routinemathematical methods. The influence of heuristicsis so potent that it is possible to apply a mathe-matical tool of less sophistication than those whichare usually used. Heuristics are creative thoughtprocesses of men, and their results are decisions.They are connected with the wishes of man, withfactors associated with his motives. They pertainneither to the subject, nor to competence of mathe-matics, therefore no mathematical tool can beperfected to compensate for them nor can one evenbe compared with them with regard to their effecton the accuracy of a solution.

The history of civilization is full of exampleswhere various control problems have been solvedby self-organization. For example, the problemof raising the yield of farms with a minimum ofhuman labour has been solved so successfully thatsoon only 5 per cent of world population will beinvolved in farming.

Some scientists tell us that the problem of "large"or "complex" systems is a new one. but it is onlynatural to advise the scientists interested in thecomplex plant control to investigate the experienceof the mankind in this respect. It may seem strange,but mathematics has no tool capable of solvingpractical complex system problems. Mathematicsis not prepared to meet the challenge of problemsinvolving self-organization.

"HYPOTHESIS OF SELECTION" INCOMPLEX SYSTEMS THEORY

The threshold type of integral influence, whichmay be considered as "an examination" are widelyused in the mass selection of plants and animals.To obtain plants which have certain desiredcharacteristics, for example, a portion of seeds isselected from several generations of the plants inwhich these properties are more predominant thanin others. In what follows we use a similar "hypo-thesis of selection" process to solve engineeringcybernetics problems. This hypothesis states thatmethods of selection are the best for solvinginterpolation problems of prediction, patternrecognition or identification.

The "hypothesis of selection" has a probabilisticcharacter. The more the value of a given variableexceeds the threshold value, the more is the pro-bability that it is just this variable which providesus with information about the best, or optimal,decision. Therefore, each threshold has a singleoptimal setting corresponding to the maximum ofthe accuracy in the result.

For example, when selecting plants the followingthree questions are answered in a purely heuristicway:

(1) Which seeds must be used for the first crop?This is the first heuristic: Choosing elementaryalgorithms for producing input signals.

(2) Which criteria should be used to select thebest plants ? This is the second heuristic: Choosingcriteria for self-samplings.

(3) According to which laws are the crossing ofthe plants to be determined? This is the thirdheuristic: Choosing laws for generating combina-tions.

Having answered these questions we can alsoanswer the next two:

Heuristic self-organization in problems of engineering cybernetics 209

(4) What portion of seeds is to be selected ineach generation? This illustrates that there is anoptimum level for each threshold of self-sampling.

(5) After which generation is the selection to bestopped? This question must be answered becauseafter a certain number of generations, the desiredplant characteristics begin to degenerate.

The systems of heuristic self-organization con-sidered below are based on the same sort of heur-istics that have been illustrated above.

THE PERCEPTRON AND OTHER EXAMPLES OFSYSTEMS HAVING

HEURISTIC SELF-ORGANIZATION

Let us define the system, or a program, ofheuristic self-organization as a system which has

of random combinations of arguments are pro-vided, so that the complexity of variables increaseswith each successive layer. If the combinations arenot numerous, all of them may be subject toexhaustive search. It is clear from the exampleabove, that the selection of seeds may be used asan algorithm for a heuristic self-organizationsystem.

Figure 1 shows several other examples of self-organizing systems which are known in engineering.The first example in Fig. la is the well-knownperceptron, the model of the brain perceptionfunction, designed by ROSENBLATT [7]. Randomconnections of links between perceptron layers areconsidered to be a kind of generation of newcombinations, to complete the analogy previouslymade.

a multilayered or a hierarchical algorithm, i.e. astructure where self-sampling thresholds of usefuldata are used in each layer. To make these self-samplings more effective, one or several generators

The second example shown in Fig. Ib is thestructure of a system designed at the StanfordUniversity. The problem is solved to predict thestructure of organic molecules [8]. Only one


"generator of hypotheses" and three threshold forself-samplings are used here each having a differentheuristic criterion.

The third example in Fig. Ic is the structure ofalgorithms of the Group Method of Data Handling(GMDH). Here the combination generators receiveas their inputs only small groups of arguments.This algorithm will be thoroughly explained below.

Some other examples of the self-organizingsystems are also known. For instance, the methodof the iS-matrix in theoretical physics and thealgorithms of the so-called "evolutionary pro-gramming" may also be considered as examplesof systems of heuristic self-organization [9, 10].

HEURISTICS IN THE GROUP METHOD OFDATA HANDLING (GMDH)

The GMDH is developed for solving variousinterpolation problems of engineering cybernetics.Here in place of selecting seeds for growing plantswith desired characteristics, we deal with somefunctions of inputs and intermediate variables. The

mean square error criterion and the correlationcriterion.

The third heuristic. Laws for constructing acomplete description of the plant or processes arechosen according to several partial descriptions.In other words: it is necessary to choose theGMDH algorithm. Each GMDH algorithm hasa complete description of the complex plant orprocess in some form which is replaced by severalpartial descriptions. The complete descriptiontakes into account all the arguments, the partialdescription only a group of them, perhaps onlytwo for example.Examples of GMDH Algorithms

The GMDH can be realized by many algorithmswhich differ with respect to construction of thecomplete description of a complex plant. Abouttwenty algorithms have been proposed up to thepresent time. Let us consider, for brevity, onlythree which seem to be most significant. The threewill be considered for the case where only fourinput arguments are avilable as discussed below.

selection heuristics, formulated above, may beinterpreted for this system as follows:

The first heuristic. "Elementary algorithms" areto be chosen. This term in GMDH case concernslaws of non-linear transformations of arguments andintermediate variables. For example, we shall usehereafter the covariations and the first, second,second, third, and fourth powers of input argu-ments.

The second heuristic. Heuristic criteria are to bechosen for threshold self-samplings. We use the

1. A GMDH algorithm using probabilistic graphs.Four binary arguments xlt x2, x3, x4 can give ussixteen combinations which we call the input states.These states can be connected with two binaryresponses of the automata Z=R{ and Z=R2 bytKe use of the transmition probability graph shownin Fig. 2a. This graph is the complete descriptionof some complex automaton.

According to the GMDH method, this completegraph is to be replaced by three partial graphs, eachfor two arguments only, and Fig. 2b shows an


example of graphs for the combination of argu-ments jc, — x2 and *3 — x4. Such graphs can beconstructed for all combinations of arguments,particularly for xl~- x4 and x2 — A'3 or X1 — x3 andx2—x4 in this example. To learn the structure ofgraphs, the probabilities of connecting input statesand responses are calculated. When calculating theprobabilities, we assume y1= y2=Z, determinevariables y{y2 as functions of time, and use themin the last graph. Note that the calculation ofprobabilities for two arguments in the partial graphrequires a much shorter learning sequence of datathan the calculation of probabilities for fourarguments which is needed for the complete graph.The algorithm has a multilayered structure.Therefore it is possible to insert a threshold self-sampling after each layer to select the useful infor-mation.

2. A GMDH algorithm using the Bayes formulas.The Bayes formula in a complete form

where yl and y2 are decisions made from the twofirst formulas. Such formulas may also be writtenfor the other combinations of arguments: x1 ~x4

and x2 — x3 or x{ — x3 and x2 — x4. When cal-culating the probabilities we as:ume y\=y2 = Z,then we determine variables y^ and y2 as a func-tion of time, and we use them m the third formula.Note that for the calculation of probabilities fortwo arguments, such as or

, much shorter sequences of learningdata are necessary as compared to the calculationof probabilities for three of four arguments.

The algorithm is multilayered, and thus it ispossible to use self-sampling thresholds.

3. A GMDH algorithm using polynomials ofsecond order. The algorithm using polynomials ofsecond order actually uses several short partialpolynomials instead of one very long discrete

Kolmogorov-Gabor polynomial which is usuallyused to approximate an unknown decision func-tion. For example, with four arguments x{,x2, *3, x4, the complete polynomial including termsof all powers and all covariations of argumentshas 70 terms.

The complete polynomial is:

Learning consists in determining coefficients ofthis polynomial. To determine the coefficients bysolving Gaussian normal equations, it would benecessary to invert matrices with dimension of70 x 70 components and to use learning sequenceshaving no less than 70 data points of interpolation.Such an extensive number of calculations generallyexceeds the capabilities of the most modern com-puters. However, if there are more than four inputs,the solution becomes completely impossible, forexample, when there are ten inputs, the polynomialcontains about 200,000 terms. This is the sourceof Bellman's "curse of multidimensionality" whichexplains why no actual complex problem has yetbeen solved.

In the example with four inputs, or arguments,the GMDH uses three partial second orderpolynomials instead of one complete polynomial.

Partial polynomials for the combination ofarguments _T, — x2 and x$ — x4 are:


The other combinations are xl—x4 and x2 — x3 or.YJ — x3 and x2 — x4. We can choose any combina-tion which gives us better accuracy. When calculat-ing the coefficients, we assume yl —y2 — 'Z and thendetermine the variables yt and y2 as functions oftime, which are used in the third polynomial.

After substituting the first and the secondpolynomials into the third, we obtain a polynomialin which sixteen covariations of the 4th power wouldbe omitted. It is known that the omission of anyterm of the complete polynomial or the super-position of any links upon its coefficients can de-crease the accuracy of approximation although thedecrease is small when these links are "optimal"or the polynomials are orthogonal. The basicresult of our calculation is that the use of heuristicthresholds for the self-sampling of useful informa-tion provides an increase in accuracy which cannotbe even compared with that which can be attainedby perfecting a mathematical tool of approximation.Thus, the heuristics employed in the field of self-organization are more effective than the heuristicsused to perfect a mathematical tool.

Let us consider the case when all the four argu-ments are binary thus taking on only two values:— 1 and -t-1. Then the complete polynomial is:

The partial polynomials, for the combinations ofx{-x2 and x3-x4, are:

Using this combination of equations, the coefficientsof the complete polynomial can obviously beconstructed by the following formulas:

It is easy to find similar formulas from the othertwo combinations of arguments if they prove to bemore accurate than those shown above. However.

note that not a single term of the complete poly-nomial is lost. However, this does not mean thatthere are no additional limitations on the choice ofcoefficients. Using the complete polynomial doesgive more freedom when we attempt to minimizethe mean square error.

Coefficients of partial polynomials may also befound by solving the Gaussian normal equations.The minimum number of interpolation points isequal to number of unknown coefficients whereas,in the last example, only four points instead of 16were needed.

CRITERION OF OPTIMAL1TY FOR THEGMDH ALGORITHM WITH

POLYNOMIALS OF SECOND DEGREE

The GMDH algorithm with polynomials ofsecond order guarantees a choice of coefficients ofthe partial polynomials whereby the minimummean square error may be obtained. Then thecoefficients of the complete polynomial may becalculated from the formulas determined by thechosen structure of the algorithm.

To raise the accuracy and to get the well-con-ditioned matrices, all possible combinations ofarguments are tried. Only combinations producingthe smallest error are allowed to pass through athreshold to the next layer. For accuracy, thenumber of equations averaged, according to theGaussian rule, is increased as much as possible forstationary processes. If N^n, the complete poly-nomial and partial polynomials give the same valueof mean square error.

DIFFERENCES BETWEEN THEPERCEPTRON AND GMDH ALGORITHMS WITH

POLYNOMIALS OF SECOND DEGREE

The perceptron has a multilayer structure. In-stead of accepting final decisions in the first layerof data processing, as is recommended by themodern theory of statistical decision, the signalspass through several layers each of which consistsof links with variable gains, summators, and thres-hold units. The above integral influences, acting"without human influence", are realized only bymeans of threshold units. The value of each thres-hold is high enough to permit the sampling of aboutonly 40 per cent of the most probable decisionsbeyond each layer; the rest of the signals are notallowed to pass. This is just "the principle ofnonfinal decisions", which is realized by the per-ceptron contrary to conclusions reached throughstatistical decision theory. The idea of nonfinaldecisions enables different heuristic criteria to acton the information flow several times, and thisresults in the exceptionally high accuracy ofsystems using heuristic self-organization.


Our modifications of the perceptron are asfollows:

(1) Coefficients of the perceptron links arecalculated by solving Gaussian normal equations,formulated for a small group of input signals,instead of a random search or adaptation.

(2) Integral influences are realized by differentheuristic criteria and by the use of threshold units,more often, by the correlation of signals and teach-ing data, contrary to a scalar multiplication in theRosenblatt's perceptron [7]. Instead of the linearperceptron decision function

a more developed non linear polynomial is used:

This polynomial is often called the Kolmogorov-Gabor polynomial [11]. It is stated in N. J.NILSSON'S book [12] that non-linear decision func-tions were proposed by I. Koford. In fact theywere introduced by D. Gabor around 1960. Notethat for Gaussian random processes, the optimumfilter is linear and the perceptron decision func-tion is the best function.

(3) The continuous optimization of thresholdvalues is used to get the highest accuracy.

The distinctions between the perceptron andthe GMDH are not great nor fundamental;therefore we often call our systems perceptrons or"systems of the perceptron type".

Other features of our systems are the same asthose of Rosenblatt's perceptron. The simplestperceptron is used for pattern recognition. Itinvolves two thresholds only, both determined bycorrelation coefficients (Fig. 3a). The perceptronfor a random process prediction is more compli-cated. Here the self-sampling of the length of acurrent interval of prehistory is added (Fig. 3b).And finally, two preliminary thresholds are used inidentification problems: First, for choosing themost active variables, and second, for choosingdata which do not repeat the previous information.Data which repeat are omitted, and then the twomain correlation thresholds are used.

FOUR REASONS FOR USINGMULTILAYERED ALGORITHMS OF GMDH

There are at least four reasons why the percep-tron-like multilayered structures of the GMDHalgorithms are much better than usual, singlelayeredstructures:

(1) Only short learning sequences are everavailable when we attempt to predict a process ortry to find the mathematical model of a complexplant. Thus we must use one and the same pointsof interpolation several times, the number of pointsis less than the number of members of the completepolynomial. Only two methods are known whichwill work under such conditions: Methods ofstochastic approximation and the GMDH. Thatis why we have called them rival methods in Ref.[2]. But stochastic approximations cannot solvethe problem of identifying the global maximum ofa multiextremum hill and they do not permit usto organize self-sampling thresholds to omit"harmful information". Therefore, the GMDH isthe superior method.

(2) The interval of the data observation isalways limited. Therefore the input data, "fea-tures", can not be useful or only neutral, but evenharmful too. This statement contradicts Shannon'sinformation theory, but it is true. We can givethe following definition of the harmful feature.A given feature is harmful if its average value andother statistical characteristics in the learningsequence differ from those in the testing sequence.Such features are poorly correlated with theoutput and they must be eliminated in order toincrease the accuracy. The thresholds provideself-sampling of the useful information, but theydo not allow "harmful" information to pass.

(3) Even if we could obtain very long learningsequences, we would not be able to find computerslarge enough to solve normal equations based oncomplete polynomials.


(4) The coefficient matrix of the equations for acomplete polynomial is always ill-conditioned.However, among many combinations of small

Deviations are shown in Fig. 4. Theyare included in the input data, or features, for ourperceptron.

partial equations, we can always find well-condi-tioned matrices of small dimensionality.

Let us draw attention to the main fact that theGMDH solves not only the problem of dimension-ality, but it also permits the use of very short learn-ing sequences consisting of only six interpolationpoints as a minimum, and, with linear operators,only three [6].

Many examples of solving various interpolationproblems encountered in engineering cyberneticsby the GMDH have already been published in theUkrainian journal "Avtomatika" [2-6 and on].This journal is now translated into English as the"Soviet Automatic Control" journal by theInstitute of Electrical and Electronic Engineers, Inc.,345 East 47 Street, New York, N.Y. 10017.Therefore, we shall consider below only the mainresults of two examples to show the application ofone GMDH algorithm, particularly the algorithmwith second order polynomials, to the solution oftwo rather different interpolation problems.Example of random process predicting [2]

In Table 1, data about the size of areas used togrow wheat and other produce for a period of14 years in one district of the Ukraine are given.Here y is the area used to grow wheat, hzwv arethe areas used to grow other produce. The problemis to predict the area which will be used to growwheat y(i) for at least one year in advance.

The first heuristic, the choice of an "elementaryalgorithm" of input features. The preliminarydata processing consists in calculating deviations ofthe variable y(t) from the non-linear trend describedby the 3rd order equation:

The second heuristic, the choice of criteria forthreshold self-samplings. Three thresholds forself-sampling of useful information were used. Thefirst one by the length of the prehistory beingconsidered, the second and the third by the corre-lation coefficients of the intermediate and predictedvariables.

The third heuristic, the choice of the GMDHalgorithm. An algorithm using polynomials ofsecond order was chosen. The method of con-structing the complete description by a series ofpartial descriptions was conditioned by this choice.

The results of calculations for one definite valueof three thresholds shown in Fig. 5 are as follows:when the prehistory length being considered equals5 years , 35 input quantities pass through thefirst threshold. (It is easy to calculate that the sixvariables being taken into account plus one variableof deviation shown in Fig. 4 give us 35 ordinatesfor 5 years.)

Each of the 35 features is considered to be randomfunction, and therefore we can calculate theircorrelation with the deviation y(i) from theaverage trend. The threshold value of the firstcorrelation was taken as =0-443. The secondself-sampling passed the following features:

Sowing year after year (in hectares)

Produce

White y

Other produce n

z

w

V

Total area

Deviation fromtrend

1

2500

80

380

630

160

3750

-4714

2

5500

140

600

2740

540

9250

214-7

3

7700

280

1180

4530

980

14,670

795-2

4

8334

500

1100

3400

800

14,134

408-8

5

7800

630

1020

1390

630

11,470

-641-8

6

7400

1140

920

1280

780

11,520

-1150-0

7

8647

1880

860

750

900

13,037

302-1

8

8795

2430

1150

370

670

13,415

373-2

9

7400

3300

1520

380

740

13,340

24-1

10

6200

3040

1800

450

1090

12,580

-602-6

11

6060

2990

1840

660

1050

12,600

-237-1

12

6370

3500

1970

1170

1170

14,180

415-3

13

6380

3800

2530

1690

1430

15,830

509-5

14

5700

3500

2980

1900

1370

15,450

-740-0

TABLE 1. INPUT DATA ABOUT SOWING AREAS (LEARNING SEQUENCE)


7. Sowing area of produce w 3 years ago

8. Total area 2 years ago

9. Deviation from the trend of sowing areaoccupied by wheat 2 years ago

10. Sowing area of produce w 5 years ago

Note that features chosen by the threshold arequite unexpected, and they cannot be found by anydeductive reasoning. This is why the given methodis said to be one of self-organization. Data on thefifth, eighth and ninth variables is obtained for themost recent past, 2 years ago. Therefore the opti-mum prediction should be for 2 years in the future.

Ten variables chosen by the correlation thresholdenable us to construct 45 Kolmogorov-Gabor poly-nomials of second order each having 2 arguments.Each polynomial can be written thirteen times,according to the length of the learning sequence,by substituting the input data. After the Gaussianaveraging, we get 45 systems of normal equations,each having a small matrix of 6 x 6 elements.The solution of normal equations determines 45intermediate variables.

Then the correlation coefficients between theintermediate variables and centred deviations of thepredicted variable are calculated. The thresholdvalue for the second correlation was andthis allowed only 4 variables to pass the third self-sampling threshold namely: j34, y18, y12, y56.Four variables make it possible to find two variablesat the next level of complexity, and after combiningresult in the output Kolmogorov-Gabor polynomial:

Having written this polynomial thirteen times,inserting the data and averaging by the Gaussianrule, we obtain the last system of normal equations,

with a 6 x 6 matrix. Its solution gives us the predic-tion formula:

Using this formula we can predict the sowing areaoccupied by wheat for the fourteenth year whichis used for the testing. Predicting for each succes-sive year the formula is redeveloped from the verybeginning to evolve the formula coefficients. Thepredictions for the sixth to the fourteenth yearare shown in Fig. 4 by a dashed line. The accuracyof prediction proved to be unusually high sincethe mean-root-square error was

Optimization of thresholds indicated in Fig. 6.The above algorithm realizes a feedforward heur-istic self-organization method according to theprinciple "by inputs". To realize the feedbackprinciple "by outputs" a procedure of thresholdoptimization should be used. The portion of thesignals passed by each threshold is to be chosen sothat the accuracy of the results, for a sequence, bemaximal. This optimal portion is equal to about40 per cent in the first layer and decreases veryrapidly in the next layers of the perceptron. Papershave been written in which the solution of thisproblem is obtained using a probabilistic approach[7 and 13]. However, we prefer to solve it using thedata of a given testing sequence by the simplecalculation of several variants of threshold values.

The accuracy decreases if thresholds are toolow or too high. So the problem is to find the singleextremum value of accuracy in the space of thethresholds using, for example, theFibonacci method. The variation of thresholdsmentioned above increases the accuracy even more.

Example of identification of static characteristicof multiextremum plant [4 and 6]

The value of extremum index , the manipulatingvariable , and the main disturbance measured

Heuristic self-organization in problems ot" engineering cybernetics 217

for 6 last instants, are input information in thesecond example. If memory devices permit thestorage of data for more than 6 instants, the accur-acy will be multiplied due to reduction in measure-ment noise.

The first heuristic is the construction of an"elementary algorithm", i.e. simple non-linearfunctions of inputs. When identifying the staticcharacteristic we have used the first, second andthird powers of the inputs

When identifying the dynamic characteristic wehave used the integrals of these functions [6].

The second heuristic is connected with the choiceof criteria for the threshold self-samplings of usefulinformation. As previously stated, we used cor-relation coefficients between every intermediatevariable and the extremum index.

organize the process of self-sampling, we againused the four-layer perceptron shown in Fig. 7.The first layer gives us those polynomials of secondorder which are best suited to approximate thecomplex surface of the multiextrernum hill. Onlyabout 40 per cent of total number of polynomialsare used to construct intermediate variables of thesecond layer. In the second layer, new polynomialsof second order are selected again, but here theyare constructed from intermediate variables of thefirst layer, and because of that they are of fourthorder with respect to the input arguments. Thosepolynomials ensuring the best approximation ofthe extremum hill surface pass the new thresholdand new polynomials are constructed again andso on until a degeneracy begins, i.e. until theaccuracy of approximation starts to diminish.Finally, only the single best decision is chosenin the last layer. As a result the surface of the

The third heuristic is concerned with the choiceof the GMDH algorithm. The algorithm usingsecond order polynomials was chosen again. To

extremum hill will be described by several, op-timal, polynomials of second order, chosen inall layers. These comparatively simple polynomials

218 A. G . I V A K H N E N K O

represent the statical characteristics of the extremumplant, replacing a very long Kolmogorov-Gaborcomplete polynomial.

Let us point out some results: 60 points of themultiextremum hill, shown in Fig. 8, were used asa learning sequence of data and only 10 as a testingor examining sequence. The first threshold waspassed by 11 of 15 variables, the second, 10, and thethird, 2. The last threshold passed only one poly-nomial of the "fourth generation". The accuracycan be evaluated by the correlation coefficient

= 0-9815. It is very high.

Optimization of the thresholds. This accuracywas reached by the optimum values of all thethresholds which were found by the Fibonnaccisearch for the maximum accuracy.

CONCLUSION

The examples of the GMDH application tothe solution of different interpolation problemsshow the high accuracy of this method. In somecases, e.g. in the case of random process prediction,the accuracy is quite fantastic. The unusual predic-tion accuracy of a process which seems to be quiteunpredictable makes us change our estimate of therole of randomness in our environment. It seemsnow that perhaps Laplace was almost correct. Thewhole world around us is perhaps more determin-istic than we usually think. The randomnessexists but it shows up at the 4th or 5th decimalplace.

This high accuracy can be explained in a verysimple way: Everybody who has used predictiontheory knows that the accuracy is higher when theprocess itself is well correlated, i.e. when the

autocorrelation of the process is high. In theGMDH, the thresholds select only useful variables,i.e. those which are well correlated with the output.This is the first reason why the accuracy is so high.

The second reason is that the GMDH, havingmultilayered algorithms, enables us, despite ofbrevity of data readings, to take into account thehigh order covariations in the Kolmogorov-Gaborpolynomials, or dependent inputs in the Bayesformula. Present-day statistical decision theoryhas a single-layered structure of algorithms, andtherefore it requires learning data sequence whichare too long to be obtained and used in practice.

We can recommend the following method forverifying the accuracy: All data except those forthe next predicted moment are always used as thelearning sequence. The testing sequence consistsof only one future point. So we are to repeat allcalculations from the beginning before each nextprediction. We call this method "the methodpredicting formula evolution" because it con-tinuously changes.

Note that it would not be correct to verify theaccuracy by means of the learning sequence itselfbecause in this case we cannot reach "degeneracy"of formulas: more algorithm layers may be taken—more the accuracy will increase. We mustuse a separate testing sequence of data and onlythen it is possible to find an optimum number oflayers. The accuracy first increases with eachsubsequent layer but then, after exceeding theoptimal number of layers, the accuracy begins todecrease. There are an optimal number of genera-tions just as in the process of plant or animal selec-tion.

This is correct not only for prediction but also forpattern recognition and for identification problems.Let me point out finally that only the GMDHenables us to solve the problem of identifying amultiextremum plant directly because this methodis specially developed for solving high dimensionalproblems when the data sequences are very short.

REFERENCES

[1] S. BEER: Cybernetics and Management. EnglishUniversity Press, London (1963).

[2] A. G. IVAKHNENKO: The group method of datahandling—A rival of the method of stochastic approxi-mation, Avtomatika, No. 3 (1968).

[3] A. G. IVAKHNENKO, V. B. KONOVALENKO, Yu. M.TULUPCHUK and I. K. TIMCHENKO: The group methodof data handling in the problem of pattern recognitionand decisions making. Avtomatika, No. 5 (1968).

[4] A. G. IVAKHNENKO, Yu. S. KOPPA and N. A.IVAKHNENKO: The group method of data handling inthe problem of identification of the multiextremumplant. Avtomatika, No. 2 (1969).

[51 A. G. IVAKHNENKO, V. D. DIMITROV and S. G.MGELADSE: The probability algorithms of the groupmethod of data handling in the problem of the pre-diction of random events. Avtomatika, No. 3 (1969).

heuristic self-organization in problems of engineering ... · pdf fileheuristic...

Documents