imitative learning based emotional controller for unknown systems with unstable equilibrium

Imitative learning basedemotional controller for unknown

systems with unstableequilibrium

Mehrsan Javan Roshtkhari, Arash Arami and Caro LucasControl and Intelligent Processing Center of Excellence,

Faculty of Engineering, School of ECE,University of Tehran, Tehran, Iran

Abstract

Purpose – Intelligent control for unidentified systems with unstable equilibriums is not always aproper control strategy, which results in inferior performance in many cases. Because of the existingtrial and error manner of the procedure in former duration of learning, this exploration for finding theappropriate control signals can lead to instability. However, the recent proposed emotional controllersare capable of learning swiftly; the use of these controllers is not an efficient solution for the mentionedinstability problems. Therefore, a solution is needed to evade the instability in preliminary phase oflearning. The purpose of this paper is to propose a novel approach for controlling unstable systemsor systems with unstable equilibrium by model free controllers.

Design/methodology/approach – An existing controller (model-based controller) with limitedperformance is used as a mentor for the emotional learning controller in the first step. This learning phaseprepares the controller to control the plant as well as mentor, while it prevents any instability. When theemotional controller can imitate the behavior of model based one properly, the employed controller isgently switched from model based one to an emotional controller using a fuzzy inference system (FIS).Also, the emotional stress is softly switched from the mentor-imitator output difference to the combinationof the objectives. In this paper, the emotional stresses are generated once by using a nonlinear combinationof objectives and once by employing different stresses to a FIS which attentionally modulated the stresses,and makes a subset of these objectives salient regarding the contemporary situation.

Findings – The proposed model free controller is employed to control an inverted pendulum systemand an oscillator with unstable equilibrium. It is noticeable that the proposed controller is a model freeone, and does not use any knowledge about the plant. The experimental results on two benchmarksshow the superiority of proposed imitative and emotional controller with fuzzy stress generationmechanism in comparison with model based originally supplied controllers and emotional controllerwith nonlinear stress generation unit – in control of pendulum system – in all operating conditions.

Practical implications – There are two test beds for evaluating the proposed model free controllerperformance which are discussed in this paper: a laboratorial inverted pendulum system, which is awell-known system with unstable equilibrium, and Chua’s circuit, which is an oscillator with twostable and one unstable equilibrium point. The results show that the proposed controller with thementioned strategy can control the systems with satisfactory performance.

Originality/value – In this paper, a novel approach for controlling unstable systems or systems withunstable equilibrium by model free controllers is proposed. This approach is based on imitative learning inpreliminary phase of learning and soft switching to an interactive emotional learning. Moreover, FISs areused to model the linguistic knowledge of the ascendancy and situated importance of the objectives. TheseFISs are used to attentionally modulate the stress signals for the emotional controller. The results ofproposed strategy on two benchmarks reveal the efficacy of this strategy of model free control.

Keywords Learning, Fuzzy logic, Systems and control theory, Controllers

Paper type Research paper

The current issue and full text archive of this journal is available at

www.emeraldinsight.com/1756-378X.htm

IJICC3,2

334

Received 16 November 2008Revised 11 August 2009Accepted 23 August 2009

International Journal of IntelligentComputing and CyberneticsVol. 3 No. 2, 2010pp. 334-359q Emerald Group Publishing Limited1756-378XDOI 10.1108/17563781011049232

1. IntroductionThere are several ways of knowledge sharing which are used in multi agent systems.One of the most prominent mechanisms of knowledge sharing is imitation. Particularlyfor intelligent agents, imitative learning is an approach to transfer knowledge from anexpert agent to another agent without any brain-to-brain transfers (Chellaa et al., 2006).The imitation speeds up the learning and reform the performance of the learner.By imitative learning, the agent tries to gain the same result which mentor has beenachieved. It is different from mimicking, because the imitator does not act exactly likementor, it only performs an action which tends to the same results in environment.In many tasks in which there is an expert agent and a tyro agent come to learn theproper actions for doing the task, the imitative learning is employed. Imitative learningis widely used in robotics and human machine interface (Kuniyoshi and Inoue, 1994;Chellaa et al., 2007; Montesano et al., 2008; Lopes and Santos, 2005). For example, animitation framework based on concept learning is presented by Mobahi et al. (2007).Moreover, the application of imitative learning for soccer playing robots are discussedin Behenke and Bennewitz (2005) and Latzke et al. (2006).

Nowadays, development of new biological inspired algorithms is an area of interest.One of the most important aspects of any intelligent system is its’ capability to learn andadopt. Although the learning process can be done in various ways, the main aim isadaptation of parameters to improve the performance of the system, and overcome thedifficulties which are encountered with changes in the environment (Shahmirzadi,2005). As the emotional behavior of humans and other animals is an important partof their intelligence, modeling this process leads to have an intelligent system withfast learning ability (Balkenius and Moren, 2001). Although evolution mechanismcodes emotional reactions in animals, the mammalian can learn them very fast. Inbiological system, emotional reactions are utilized for fast decision making in complexenvironments or emergency situations. The main part of mammalians’ brain which isresponsible for emotional processes is called the limbic system. Several attempts havebeen made to model the limbic system (Balkenius and Moren, 1998; Moren, 2002). Thecomputational models of Amygdala and Orbitofrontal cortex which are the main partsof limbic system in the brain were first introduced in Balkenius and Moren (2000).Consequently, based on works of Balkenius and Moren (2000), brain emotional learningbased intelligent controller (BELBIC) which is an intelligent controller introduced inLucas et al. (2004). The fast learning ability of BELBIC makes it a powerful model freecontroller for many tasks. Describing BELBIC in pattern format was first introduced inJamali et al. (2006). Pattern describes a problem, which occurs over and over again in ourenvironment, and then describes the core of the solution to that problem (Alexander,1979). Reusability, extendibility and implementation concerns in different platformswere described in this pattern (Lucas et al., 2004). BELBIC is applied on severalapplications such as control of intelligent washing machines (Milasi et al., 2006a, 2007),speed control of an interior permanent magnet synchronous motor (IPMSM) (Milasiet al., 2004; Sheikholeslami et al., 2006), a modified version of BELBIC is employed forcontrolling heating, ventilating and air conditioning systems. Moreover, the BELBIC isused in time series prediction (Gholipour et al., 2004) and sensor-data fusion(Shahmirzadi et al., 2003). The real-time implementation of the BELBIC for IPMSMdrives was first introduced in Milasi et al. (2006b). This implementation is done withattending to the industrial implementing concerns of a controller. The controller was

Learning basedemotionalcontroller

335

successfully implemented real-time by using a digital signal processor board for alaboratory 1-hp IPMSM and the results show fast response, simple implementation,robustness with respects to uncertainties such as manufacturing imperfections andgood disturbance rejection. Another real-time implementation of BELBIC in positiontracking and swing damping of laboratorial overhead crane in computer control viaMATLAB external mode is described in Jamali et al. (2008). In addition, BELBIC hasbeen used in many robots (Sharbafi et al., 2006) and shows the high performance andthe capability of simple real time implementation. Moreover, a flexible hardwareimplementation of BELBIC has done on field programmable gate array (FPGA) board(Jamali et al., 2009). Benefits of this embedded BELBIC on FPGA include reusability,scalability, interpretability, flexibility, robustness, and computational stability. Also, thesampling frequency of mentioned controller on FPGA is 1 kHz which is appropriate formany control engineering applications. In addition, nonlinear combinations of objectivesare used to design emotional stresses for BELBIC to include more objectives, and usedto control an overhead crane under uncertainties and disturbances (Arami et al., 2008).

The stability of brain emotional learning (BEL) system which is used in control asBELBIC was discussed in Shahmirzadi and Langari (2005). They analyzed the stabilityof the BEL system by cell-to-cell mapping method which was initially developed as anefficient numerical technique for global analysis of nonlinear systems (Hsu, 1987; Hsuand Guttalu, 1980). To ensure this interpretation of the stability of the system, a generalidea for choosing control parameters is described in Shahmirzadi and Langari (2005).A recent study on stability analysis of BELBIC has done in Jafarzadeh et al. (2008)which guarantee the stability dealing with first and second order linear systems basedon Lyapunov theory. This analysis provides some constraints on the learning rate,structure of input signals for BELBIC and the state equations as well. Although morecomplex Lyapunov function is needed to prove stability of BELBIC in control of higherorder linear or nonlinear systems, these functions could imply more constraints on stateequation of systems. These constraints postulate that the stability based on Lyapunovtheory cannot be achieved in all systems.

In most of the applications, BELBIC is employed as a model free controller. Modelfree control is an approach to control systems with complex dynamics and variousuncertainties to get rid of the expense of system identification. To eliminate theidentification of the system or to reduce the cost of identification process, nowadays,designing controllers for systems which are not completely identified becomes an areaof interests. In other words, the ability of designing controller for a system by assumingthat a simple description of system dynamics is available is the essence of model freecontrol. Fuzzy control can be a solution, but it cannot deal properly with unrecognizeddynamics of system (Tong and Li, 2007). For improving the performance of the system,it is necessary to add compensators with learning ability to adapt themselves withsystem variations during control tasks. Another approach is designing an intelligentcontroller with capability of learning control signal. The learning approaches should bebased on reinforcement learning methods in which a controller optimizes a fitnessfunction. It must be noticed that in most of control application the designers have tosatisfy more than one objective to achieve the desire behavior of the system. In order todeal with these multi-objective problems different approaches are used such as findingthe optimal Pareto points ( Jin and Sendhoff, 2008; Farina et al., 2004), or using a fixedlinear combination of objectives as the cost or fitness function (Burl, 1999; Ogata, 1997).

IJICC3,2

336

In general, case for improving the fitness value with respect to different objectivesexisting in the designer’s mind, and the varying degrees of importance of each of themaccording to the states and the degrees of satisfaction, a nonlinear fusion of theobjectives is necessary to generate appropriate reward and punishments for learningprocess (Arami et al., 2008). When the number of objectives increases (for example, morethan four objectives), some objectives are neglected practically in the linear combinationof objectives with fixed weights, due to their weights.

The main drawback of model free controllers with learning ability – without anyprior knowledge of the system’s dynamics – such as reinforcement learning basedcontrollers and BELBIC is that in early stages of learning process, they may cause thelow performance, due to producing wrong control signal. This preliminary phase oflearning can result in instability in some cases. After this first period of learning, if noinstability occurs, the controller can learn the proper control signals to improveperformance gradually. Although BELBIC shows fast learning ability, it has the sameproblem, but in a shorter period of time. If the system is inherently unstable, applyingthese controllers may cause the system become unstable, and the process must bestopped in order to prevent damages. Thus, BELBIC cannot be applied on such systemsindependently. To solve this problem, another approach is introduced.

The aim of this paper is to implement an appropriate model free controller to controlthe systems with unstable equilibrium such as an inverted pendulum[1] and introducenew mechanism for stress generation of BELBIC to modulate the attention control.Because of the pendulum angle sensitivity to the control signal any wrong changes incontrol signal makes the system oscillates and the pendulum falls down. Therefore, ifwe use BELBIC as a totally model free controller, which learn from scratch, the learningwill be useless, and the pendulum will fall down many times. For solving the problemand accelerating the learning phase, a new approach is used. First, BELBIC imitativelylearns from a classical simple controller – which is designed based on model of system-.The classical controller can be a simple one that only stabilizes the system, regardlessof good performance and robustness. Then the output of BELBIC is gradually appliedto the system and it replaces the initial controller. The important part in switchingbetween controllers is changing the emotional signal of BELBIC due to change inobjective. When BELBIC learns to imitate behavior of the initial controller, the objectiveis reducing the error between controllers output, and when BELBIC replaces the initialcontroller the objective is reduction of tracking and angle error. In order to generate theemotional stresses in the second phase of learning, it is necessary to attend to moreimportant objectives at any time. To generate this coded attention mechanism, twomethods are applied. First, a nonlinear combination of objectives is used to produceproper stresses for BELBIC. Besides, a set of linguistic rules are used to generate theuseful stresses which is implemented in a fuzzy inference system (FIS).

This paper organized as follows: Section 2 briefly introduces BELBIC, Section 3includes different crucial aspects of the proposed controller, and describes the controlstrategy and structure of the controllers, Section 4 discusses the simulation results, andfinally Section 5 concludes the paper.

2. BELBICThe BELBIC structure is a simple computational model of most important parts inlimbic system of brain, Amygdala and Orbitofrontal cortex. Figure 1 shows the


337

schematic diagram of BELBIC structure and each part of it will be described briefly(Balkenius and Moren, 2000).

As shown in Figure 1, the system consists of four main parts. As it is seen, sensoryinput signals first entered in Thalamus. Thalamus is a simple model of real Thalamusin the brain in which some simple pre-processing on sensory input signals is done.After pre-processing in Thalamus, the signal will be sent to Amygdala and Sensorycortex. Sensory cortex is responsible for subdivision and discrimination of the coarseoutput from Thalamus and then sent it to Amygdala and Orbitofrontal cortex.Amygdala is a small structure in the medial temporal lobe of brain which is thought tobe responsible for the emotional evaluation of stimuli. This evaluation is in turn usedas a basis of emotional states, emotional reactions and is used to signal attention andlaying down long-term memories. And the last part, Orbitofrontal cortex, is supposed

Figure 1.Structure of BELBIC

Orbitofrontal cortex

Sensory input (s)

Primary reward(rew)

Amygdala

Sens

ory

cort

exT

hala

mus

W1

V3

V2

V1

Vth Ath

A

A

A E

E'

Inhibitory

Plastic

Learning

Excitatory

W2

W3

O

O

O

Source: Balkenius and Moren (2000)

IJICC3,2

338

to inhibit inappropriate responses from the Amygdala, based on the context given bythe hippocampus (Balkenius and Moren, 2000). In this section, functionality of theseparts and the learning algorithm is based on what is stated in Balkenius and Moren(2000).

As the Thalamus must provide a fast response to stimuli, in this model themaximum signal, over all sensory inputs, S, is sent directly to the Amygdala as anotherinput (equation (1)). Unlike other inputs to the Amygdala, the Thalamic input is notprojected into the Orbitofrontal cortex, so it cannot be inhibited by itself:

Sth ¼ maxðSiÞ ð1Þ

In the Amygdala, each A node has a plastic connection weight V. The sensory input ismultiplied by the weight and forms output of the node:

Ai ¼ SiV i ð2Þ

In the Orbitofrontal cortex, each O is similar to A nodes, and the output is calculated byapplying connection weight W the input signal:

Oi ¼ SiWi ð3Þ

The model output can be computed as follow:

E ¼X

Ai 2X

Oi ð4Þ

where the A nodes produce their outputs proportionally to their contribution inpredicting the stress, while the O nodes inhibit the output of E if necessary.

As it is shown in Figure 1, except the Thalamic signal going directly to theAmygdala, the Amygdala, and the Orbitofrontal cortex receive the same input signals.But the main difference between them is the learning rules.

The connection weights Vi are adjusted proportionally to the difference between thereinforcement signal and the activation of the A nodes. The a term is a constant used toadjust the learning speed:

DV i ¼ a max 0; Si stress 2X

Aj

� �h i� �ð5Þ

As mentioned before, the task of the Amygdala is learning the associations betweenthe sensory and the emotional input to generate an output. But the equation (5) ismainly different from similar associative learning systems, because this weightadaptation rule is monotonic, i.e. the weights V cannot be decreased. At first, it mayseem as a drawback of learning rule, but this adaptation rule has biological reasons.According to what occurs in Amygdala, once an emotional reaction is learned, thisshould become permanent. The Orbitofrontal cortex inhibits inappropriate reactions ofAmygdala.

The Orbitofrontal cortex learning rule is very similar to the Amygdala rule:

DW i ¼ bðSiðE 2 stressÞ ð6Þ

The reinforcement signal for the O-nodes is defined as difference between modeloutput E and the stress signal. In other words, the O-nodes compare expected andreceived reinforcement signal, and inhibit output of the model if there is a mismatch.


339

The main difference between adaptation rule of Orbitofrontal cortex and Amygdala,is that the Orbitofrontal connection weight can be increased and decreased as neededto track the required inhibiting of Amygdala. Parameter b is another learning rateconstant.

As discussed, BELBIC learns from its emotional signal and produce its outputbased on sensory inputs and connection weights. In Shahmirzadi and Langari (2005),the stability of BELBIC is demonstrated by using cell-to-cell mapping method.

3. Model free controller designIn order to accelerate the learning process and avoid making the system unstable, weproposed a new approach. In Figure 2, a flow diagram of this approach for training modelfree controllers for systems with unstable equilibrium has been represented. Thisapproach consists of two parts, imitative learning and performance enhancement. In theimitative learning phase, a simple stabilizing controller used as the main control systemand BELBIC learns to imitate the behavior of this controller. In other words, this controlleracts as a mentor and BELBIC tries to produce the same control force as the mentoraccording to the observed states. In the second phase, after BELBIC imitatively learnedto stabilize the system from initial controller, the controller is replaced with BELBIC.At this phase, control objectives will change and BELBIC tries to improve performanceof the control system instead of imitating the initial controller. Owing to the capabilityof learning, BELBIC will learn to enhance the system’s performance quickly.

3.1 First benchmark: inverted pendulum systemAs mentioned before, the first test bed for evaluating the controller performance is aninverted pendulum which is a well-known SIMO system. Controlling the invertedpendulum is a challenging and interesting task. The control task is tracking thereference signal and stabilizing the pendulum. The system which is used to evaluatethe controller performance is a nonlinear model of a laboratorial inverted pendulumsystem provided by Feedback Ltd This pendulum system consists of a cart, a rope, anda load. The load is regarded as a material particle with a mass of m. The rope isconsidered as an inflexible rod with length of l, which its mass is negligible incomparison with the load mass. The cart with mass of M moves on a straight rail.A schematic of pendulum system is shown in Figure 3. The state equations of utilizedsystem are (Feedback Instrument Ltd, 2002):

M þ mð Þ x 2 l sinðuÞð Þ00 ¼ F 2 Tc

M þ mð Þ l cosðuÞð Þ00 ¼ V 2 M þ mð Þg

J €u ¼ F 2 Tcð Þl cosðuÞ þ V l sinðuÞ2 Dp

ð7Þ

where Tc is the friction of moving cart, Dp is the friction moment of angular movementof pendulum load. The reaction force of rail is also denoted by V. In addition, the inertiamoment of cart and pendulum is denoted by J.

3.2 Second benchmark: Chua’s circuitThe second test bed for evaluating emotional controller performance is a chaoticsystem (Chua’s circuit). Chua’s circuit was the first electronic dynamical system

IJICC3,2

340

Figure 2.Flowchart for model free

controller design

Observe control signal of thementor + sensory input

Are conditions forimitative learning part

satisfied?

Use a model-based controlleras mentor

Switch controller from mentorto BELBIC

Generate stress signal forimitative learning

No

Yes

Observe sensory input andgenerate stress signal

according to the performancemeasures

Imitativelearningphase

Performanceenhancement

phase

Update parameters of belbicaccording to emotional stress

Update parameters of BELBICaccording to emotional stress


341

capable of generating chaotic phenomena in the laboratory. The Chua’s circuit hasbeen shown in Figure 4, and the state equation of the system is ( Jiang et al., 2002):

_X ¼ AX þ Bu þ gðXÞ þ Dw

Y ¼ CX

X ¼

v1

v2

i

26664

37775 A ¼

2ac a 0

1 21 1

0 2b 0

26664

37775 B ¼

b1

b2

b3

26664

37775 gðXÞ ¼

2av31

0

0

26664

37775

ð8Þ

This system has three equilibrium points, and the equilibrium point of X ¼� ffiffiffiffiffiffi2c

p0 2

ffiffiffiffiffiffi2c

p �T is unstable.

3.3 Proposed controllers for the inverted pendulum systemOwing to nonlinearity of system’s state equations and nonlinear properties of drivingmotor and friction, designing a model-based controller is a hard task. We used BELBIC

Figure 3.Schematic of pendulumsystem

F Trolley

Load

mg

(M)

x, dx/dt, d2x/dt2

d2θ/dt2

Θ, dΘ/dt

Figure 4.Chua’s circuit

R1

C2 C1

R2

L

Chua’s diode

i

V2 V1

IJICC3,2

342

as a model free controller to control the inverted pendulum. The main challenge inusing BELBIC as a model free controller in unstable systems or stable systems withunstable equilibrium point such as our test bed is the learning phase at the beginning.As the BELBIC has no information of system’s dynamics, performance of thecontrolled system may seem to be awful at the beginning of learning process and thependulum falls down. BELBIC has fast learning ability; and theoretically in the shorttime it should learn the proper control action according to its sensory inputs andemotional stress. But in this task the pendulum angle is very sensitive to the controlsignal, so any wrong changes in control signal make the system unstable or oscillating.First, we used BELBIC as the only controller of the system. It is possible for BELBIC tolearn the proper control strategy, but in our simulation, we find that this process willtoo long or probably impossible in real applications.

According to the idea of hierarchical controller structure, to design a controller tosatisfy various objectives, at first, it is assumed that the objectives can be decoupledand then a separate controller is designed to satisfy each objective. After that, outputsof these controllers must be fused together. Figure 5 shows the proposed BELBICstructure. As there are two major objectives, position tracking and pendulum angleregulating, two BELBICs are employed. The cart position error and its first derivationare defined as sensory signals for one of BELBICs and the pendulum angel and its firstderivation are for the other. In most of the previously reported structures of BELBIC,they have only one neuron, because the sensory input signal was one dimensional.In our structure, as each BELBIC has two sensory inputs, they must have more thanone neuron and in this task two neurons seems to be adequate.

The emotional stress signal which will be described, couples the two separatecontrollers. Also by employing this kind of stress signals there will be no need to usecomplex fusion block to combine the output of controllers, and just a summationoperator is adequate (Shahmirzadi et al., 2003). Meanwhile, the computational cost ofoutput fusion is reduced to the cost of fusing some main and auxiliary objectives in

Figure 5.Diagram of the proposed

BELBIC controller

Stressgenerator 1

Stressgenerator 3

(FIS)

Stressgenerator 4

(FIS)

FIS

FISPendulum

system

BELBIC

BELBIC

Model-basedcontroller(mentor)

FIS

Stressgenerator 2

+–

∑


343

stress generator block. Also to change the control objectives, and switching fromimitative learning to normal learning, there is no need to change the controllerstructure and only changing the emotional stress signal is enough.

Stress generation. As stated before, BELBIC can show various behaviors byapplying different stress signals on it. Therefore, in order to satisfy different controlobjectives, proper stress signal must be defined based on each objective. The ability ofachieving more different objectives can be obtained by defining different stress signals.Control objectives change at each learning phase:

(1) Imitative learning phase. Producing similar control force and stabilizing thesystem.

(2) Performance enhancement. Reducing the position and angle errors of thependulum and hold it at its equilibrium point.

As a result, it is necessary to define appropriate stress signals to satisfy each objectivebase on its importance.

Stress generation for imitative learning phase. In imitative learning, the objective isthat BELBIC produces a similar control signal to the initial controller. Thus, reducingthe difference between these two control signals is the main goal in this part and thereis no more control objectives. Consequently, the emotional stress signal is consists oftwo signals, error of control signal and first derivation of this error:

Stress ¼ w1 euj j þ w2 _euj j þ w3eu_eu ð9Þ

The reason to add _eu in above sum is that in sometimes eu may become zero,while these control signals may have completely different behaviors. Also, by addingthe last term, the designer can expect that transition of errors to zero become faster.Based on the degree of importance of each of these measures ( euj j; _euj j; eu_eu), a designercan tune the weights elaborately. Moreover, some consideration must be taken tothe account. For example, if w3 is relatively large, BELBIC output should oscillatesundesirably or in some situations the Stress becomes negative which weakens the learningphase. These weights can be tuned using learning algorithms such as reinforcementlearning (Sutton and Barto, 1998) whether there is another source of feedback whichevaluates the closeness of behavior of the imitator and mentor controllers.

Stress generation for performance enhancement phase. After BELBIC imitativelylearns the control action from initial controller, it becomes the main controller of thesystem and the initial controller is replaced by it. At this time, the control objectiveshave been change and reducing position tracking error and angle error become newcontrol objectives. Therefore, the stress signal must be modified.

To satisfy more than two objectives, more complicated combination of stresssignals which are associated with each objective is necessary. To generate the properstress signal for all objectives, at any time, the more important objective must beattended more than the others. To generate this coded attention mechanism, we usedtwo approaches. The first is the nonlinear combination of stresses and the second isusing linguistic rules.

Nonlinear combination of signals for stress generation. This stress signal is similarto which is used in our previous work for control of an overhead crane (Arami et al.,2008), but this work is somewhat different. In order to enhance the behavior of systemthe major objectives are defined. Moreover, some extra objectives should be attended to

IJICC3,2

344

improve the performance. Tracking error of the cart position and error of pendulumangle are the main concerns which needed to be decreased as much as possible.To achieve these objectives (reduction the position and angle errors), weighted squareof the first and absolute value of second one are summed up to generate the first part ofstress signal.

One of the extra objectives is to avoid collision with edges of the rail which leads tobreaking the operation. To impose this behavior to the cart, closeness to the edges ofrail must be punished via stress signal. Therefore, to generate the second part of stressa dead-zone function and a squared function are employed which generate extremestress if the cart gets close to the edges. The dead-zone inactivates this stress part whenthe cart is far from the edges.

Another important index which must be considered in every control tasks is energyof control force and its variations. The amplitude of control force and its derivation aresquared and then their weighted sum is employed to generate a new part of stresssignal. This part of stress then multiplied with a monotonically decreasing function ofsum of two previously mentioned parts of stress. This multiplication leads to such abehavior that when the stresses of previous parts are small, BELBIC tries to decreasethe control forces. Also when these stresses are significant, the limiting of control forceis relaxed to increase the possibility of fast responses. The stress generator diagram isshown in Figure 6.

The first input is the error of pendulum angle the second one is the error of cartposition and the third is the position of cart. After passing the third input through adead-zone block this signal represents the closeness to the edges of the rail, and stressof contact with end of rails is increased when it is happened. The forth input is thecontrol signal which is passed through a square function and the energy of it isconsidered as a stress which is multiplied with monotonically decreasing function ofstress from the first three inputs. It means that when the sum of first part of stresses(inputs 1, 2, and 3) is high, the forth one is suppressed by them. This nonlinear fusion of

Figure 6.Internal stress generator

mechanism

1

2

3

4

|u|

|u|2

|u|2

|u|2

<-

<-

++ ++ ++

In 1

In 2

In 3

In 4

Abs1

Mathfunction 5

Dead zone 2

Dead zone 1

Gain 2

Gain 4

Gain 5

sqrt 1Out 2Math

function 6

Mathfunction 7

Mathfunction 8

Fon 1

Product 1

2/(1 + u (1))

×

2

Note: Nonlinear combination of objectives


345

signals to generate emotional stress can be considered as a coded attention to mostimportant part of stresses with respect to its operational conditions and environmentaleffects.

Fuzzy stress generation. As mentioned before, there are some control objectiveswhich must be attended according to their importance and degree of satisfaction. Thelinguistic rules can implement this attention mechanism, especially when there is alinguistic knowledge about the behavior of system and the ascendency of objectives.Thus, first the major and extra objectives in the control task are defined.

The objectives are:

(1) Major objectives. Tracking the desired position of the cart and fixing thependulum vertically.

(2) Extra objective. Avoid reaching edges of the rail, minimizing energy of controlforce and its variations.

The concerns are similar to the previous part, i.e. satisfying major objectives are moreimportant than extra objectives. Meanwhile, holding pendulum at its equilibrium ismore important than reducing tracking error in the major objectives. Moreover, whenBELBIC learned to track the desired position while holding pendulum at itsequilibrium point, it must try to reduce the control force and its variations. Thevariations and especially the frequency of variations of control force must be limited,for actuator considerations.

To generate the stress signal and code the attention mechanism, linguistic rules areused and then they are imported to Sugeno FIS (Takagi and Sugeno, 1983). Using thismethod to generate stress signal makes BELBIC capable to attend to important parts ofstresses at any time and situation.

As it is mentioned before, four effective variables are employed to generate thestress signal. The FIS is designed with the following parameters:

(1) Inputs: errors of the cart position, error of pendulum angle, control force, firstderivation of the control force.

(2) Number of rules: 16.

(3) FIS type: Sugeno FIS.

(4) Output: emotional stress for BELBICs.

Figure 7 shows the resulted fuzzy surfaces for stress generation. It can be seen thatholding the pendulum at its equilibrium is slightly more important than reducingtracking error (Figure 7(a)), or reduction of control force variations is less importantthan decreasing tracking error (Figure 7(c)).

3.4 Proposed controllers for Chua’s circuitLike the inverted pendulum system, the Chua’s circuit has an unstable equilibriumpoint. The goal is to regulate the variable around the equilibrium point. The BELBICstructure used in this task is similar to which is used for controlling the pendulumsystem (Figure 5). As there are two major objectives, regulating capacitors voltage (v1)and self induction’s current (i) two BELBICs are employed. The voltage error and itsfirst derivation are defined as sensory signals for one of BELBICs and the current andits first derivation are for the other. Like previous part, each BELBIC has two neurons.

IJICC3,2

346

For this control task, control objectives at each learning phase are:

(1) Imitative learning phase. Producing similar control force and stabilizing the system.

(2) Performance enhancement. Regulating voltages and current and hold it at itsequilibrium point.

The stress generation for imitative learning phase is the same as the method used forcontrolling inverted pendulum system which was defined by equation (9).

For the second phase of learning, the stress signal is generated by using fuzzy rules.First, the major and extra objectives in the control task are defined as follow:

(1) Major objectives. Regulating capacitor voltage and current and fix them at theequilibrium point.

(2) Extra objective. Minimizing energy of control force and its variations.

Figure 7.Fuzzy surfaces for stress

generation

00.05

0.1

0

0.005

0.01

0.02

0.04

0.06

0.02

0.04

0.060.08

0.1

Position-errorAngle-error

Stre

ss

0.02

0.04

0.06

0.08St

ress

0.020.01

0.030.040.050.06

Stre

ss

Stre

ss

(a)0

0.050.1

–1

0

1

Position-errorControl-force

(b)

00.05

0.01

–1

0

1

Angle-errorControl-force

(d)

0.02

0.04

0.06

0.08

Stre

ss

01

–1 –1– 0.5

0.50

1

Control-force

(f)

0.02

0.04

0.06

0.08

Stre

ss

00.05

0.01

–1

0

1

Angle-errorDerivation -of-control-force

Derivation -of-control-force

(e)

00.05

0.1

–1

0

1

Position-errorDerivation-of-control-force

(c)


347

The concerns are similar to the previous part, i.e. satisfying major objectives are moreimportant than extra objectives. Moreover, when BELBIC learned to hold these valuesat the equilibrium point, it must try to reduce the control force and its variations. Togenerate the stress signal and code the attention mechanism, linguistic rules are usedand then they are imported to Sugeno FIS. The fuzzy rules are the same as the one usedto control inverted pendulum system. The only difference is the inputs of the FIS,which are: errors of capacitor voltage (v1), error of the current (i), control force, and firstderivation of the control force.

3.5 Switching between stress signals at each learning phase and controllersAs we observed in experimental results, hard switching between controllers andchanging stress signals makes the BELBICs become unstable. Thus, instead of hardswitching, a soft switching scheme must be employed, and the BELBIC control systemmust gradually replace the initial controller and at the same time its emotional stressmust be gradually changed. To do this, we employed a FIS to make soft switchingwhich it is a common solution for soft switching. The human linguistic rules can beimported to Sugeno FIS (Takagi and Sugeno, 1983) easily. We used 11 fuzzy rules forsoft switching. Figure 8 shows the fuzzy surface for this task. The inputs of this fuzzysystem are the two-mentioned stresses (for imitative learning and improvingperformance), the error in imitative learning phase (difference between initial controlleroutput and BELBIC output). The mentioned fuzzy switch is used for switchingbetween both controllers and stresses.

4. ResultsIn this part, result of using emotional controller and model-based controller for thetwo-mentioned test beds are presented.

Figure 8.Fuzzy surfaces for stressand controller switching

2030

4050

0

0.05

0.1

0.020.040.060.08

TimeImitative-stress

Stre

ss

00.05

0.1

0.02

0.04

0.06

Stre

ss

0.01

0.02

0.03

Stre

ss

2030

4050

TimePerformance-stress

00.05

0.1

2030

4050

TimeError

IJICC3,2

348

4.1 First benchmark, inverted pendulumTo validate the result of proposed controller, the results are compared with the originalsupplied controller, which consists of two proportional integral derivative (PID)controllers and a nonlinear compensator (Feedback Instrument Ltd, 2002). The initialcontroller for imitative learning phase (the mentor) is mentioned original controller.As it is seen from Figure 9, without employing imitative learning, BELBIC not learnedthe proper control signal in more than 150 second of training. Also, it can be seen thatthe pendulum falls down many times. In addition, it must be noticed that thecomparison of the proposed model free controller with complicated model based oneswhich designed based on exact mathematical model of the system is not a faircomparison. Therefore, the proposed controllers are only compared with the originallysupplied controller which plays the role of mentor in imitative phase of learning toshow the effect of enhancement phase of learning. Also, in order to assess the effect ofdifferent stress generation mechanisms – which are different coded attentionmechanisms to the objectives, the nonlinear and fuzzy stress generations are compared.

Figure 10 shows the originally supplied PID controller responses. Figure 11 showsthe BELBIC controller with nonlinear stress function and Figure 12 shows BELBICwith fuzzy stress. As it is seen BELBIC can imitate the behavior of original controller inabout ten second from starting time completely. After that, based on fuzzy switchstructure, from 30 to 50 second, the both controller are controlling the system and afterit BELBIC controls the system individually. It is clear that after imitative learning,BELBIC performance in reducing tracking and angle error is better, regardless of themethod for stress generation.

In order to evaluate the ability of controller to reject disturbances, a random voltageproduced by a Gaussian distribution with zero mean and 0.1 of variance is applied tothe motor in some instances. The time of applying this voltage is random variablewhich obtained from a uniform distribution. The mentioned disturbance is appliedeight times from the 55th seconds until the end of simulation. The results of original

Figure 9.Employing BELBIC

without imitative learning0 50 100 150

–10

–8

–6

–4

–2

0

2

4

6

Pend

ulum

ang

le


349

Figure 10.Results of originallysupplied controller

20 40 60 80 100 120 140

20 40 60 80 100 120 140

20 40 60 80 100 120 140

–0.5

0

0.5

–0.05

0

0.05

–0.5

0

0.5

Car

t pos

ition Desired position Actual position

Pend

ulum

ang

leC

ontr

ol f

orce

Note: Double PID

Figure 11.Results of BELBIC withnonlinear stressgeneration unit

20 40 60 80 100 120 140–0.5

0

0.5

20 40 60 80 100 120 140–0.05

0

0.05

20 40 60 80 100 120 140–0.5

0

0.5

Car

t pos


Pend

ulum

ang

leC

ontr

ol f

orce

IJICC3,2

350

PID controller and BELBIC with the two-mentioned stress generation functions areshown in Figures 13-15.

As it can be seen, BELBIC clearly shows superior performance in tracking anddisturbance rejection which is the results of its learning capability.

To have a meaningful comparison these controllers, four performance measures aredefined as follow and calculated for all the mentioned control systems, originallysupplied controller and BELBIC with two method of stress generation. As thedisturbance applied in randomly selected times, the experiments carried out 20 timesand the statistical moments of the following parameters (mean and standard deviation)are calculated:

(1) Integral absolute error (IAE) (for cart position and pendulum angle).

(2) Integral of absolute values of control force (IACF).

(3) Integral of absolute values of derivation of control force (IADCF) (shows thefluctuations of the control force).

These performance measures are calculated for the mentioned controllers, in normaloperation and without applying disturbance and the results are demonstrated inTable I.

From Table I, it is seen that BELBIC shows the fast learning ability for tracking.Also, the control force signal which is penalized by stress signal is lower than controlforce in other controllers and has less oscillation. Moreover, employing FISs for stressgeneration leads to better result than the nonlinear stress generation function inBELBIC.

Figure 12.Results of BELBIC withfuzzy stress generation

unit

20 40 60 80 100 120 140–0.5

0

0.5

20 40 60 80 100 120 140–0.05

0

0.05

20 40 60 80 100 120 140–0.5

0

0.5

Car

t pos


Pend

ulum

ang

leC

ontr

ol f

orce


351

In the presence of disturbance, the above-mentioned measures are calculated for thecontrollers and the results are presented in Table II. As it is seen, in presence ofdisturbance BELBIC (regardless of the method for stress generation) again shows farbetter performance than the model-based controller, although its performance

Figure 13.Results of originallysupplied controller (doublePID) in presence ofdisturbance

20 40 60 80 100 120 140–0.5

0

0.5

Car

t pos

ition

Desired positionActual position

20 40 60 80 100 120 140–0.05

0

0.05

Pend

ulum

ang

le

Figure 14.Results of BELBIC withnonlinear stressgeneration function inpresence of disturbance

20 40 60 80 100 120 140

20 40 60 80 100 120 140

–0.5

0

0.5

Car

t pos

ition

Desired position Actual position

–0.05

0

0.05

Pend

ulum

ang

le

IJICC3,2

352

decreases slightly in comparison with normal operation. But it shows betterdisturbance rejection and robustness. It can be seen that the fuzzy stress resulted inmore robust results and better performance.

4.2 Second benchmark, Chua’s circuitIn this part, BELBIC is used as a controller to stabilize a chaotic system (Chua’s circuit)at its unstable equilibrium point. Chua’s circuit controlled by the state PI feedbackcontroller is given by Jiang et al. (2002):

_X ¼ AX þ g Xð Þ þ lB KX þ K

Z t

0

CX 2 CXsð Þdt

� �ð10Þ

where Xs is the equilibrium point.

Figure 15.Results of BELBIC withfuzzy stress generation

unit in presence ofdisturbance

20 40 60 80 100 120 140–0.5

0

0.5C

art p

ositi

onDesired position Actual position

20 40 60 80 100 120 140–0.05

0

0.05

Pend

ulum

ang

le

Controller structure (no-disturbance) IAE (position) IAE (angle) IACF IADCF

BELBIC – fuzzy stress 3.301 0.367 9.432 9.262BELBIC – nonlinear stress 3.480 0.398 11.916 12.231Double PID 3.925 0.457 12.219 14.353

Table I.Performance measuresfor various controllers

without disturbance

IAE(position)

IAE(angle) IACF MADCF

Controller structure(in presence of disturbance) E STD E STD E STD E STD

BELBIC – fuzzy stress 5.699 0.915 0.476 0.174 73.247 3.152 151.26 6.245BELBIC – nonlinear stress 6.577 1.139 0.676 0.183 76.834 4.678 157.53 5.569Double PID 11.725 2.141 1.692 0.371 82.705 3.247 160.58 7.831

Table II.Performance measures

for various controllers inpresence of disturbance


353

The system with the following parameters is used in the simulations:

a ¼ 10 b ¼ 16 c ¼ 20:143 B ¼

0

0

1

2664

3775 D ¼

d

0

0

2664

3775 C ¼ 0 1 0

� �

K ¼ 21:7714 20:2296 24:640� �

k ¼ 26:9785

ð11Þ

The same strategy is applied here for training BELBIC and there are two learning phases,imitative learning from the mentioned PI controller and performance enhancement.An external disturbance is applied to the equation which is a step change at randomlyselected time with. Magnitude of the disturbance (d) is produced by a Gaussiandistribution with zero mean and 0.1 of variance. In Figure 16, voltages and current areshown at presence of disturbance. As the disturbance applied in randomly selected times,the experiments carried out 20 times and the statistical moments of the performancemeasures (mean and standard deviation) are calculated and presented in Table III. Theresults show that the proposed model free controller can hold the system at its equilibriumpoint with less energy and reject the disturbance more quickly, with lower control force.

Figure 16.Regulation of Chua’scircuit in presence ofdisturbance

0 10 20 30 40 50 60 70 80 90 100

0.35

0.4

0.45

0.5

0.55

0.6

Time

v 1

0 10 20 30 40 50 60 70 80 90 1000

0.02

0.04

0.06

0.08

0.1

0.12

Time

v 2

0 10 20 30 40 50 60 70 80 90 100–0.65

–0.6

–0.55

–0.5

–0.45

–0.4

–0.35

–0.3

Time

i

0 10 20 30 40 50 60 70 80 90 100–2.5

–2

–1.5

–1

–0.5

0

Time

(a) (b)

(c) (d)

Con

trol

for

ce

State PI regulator

BELBIC with fuzzy stressgeneration

State PI regulator


State PI regulator


State PI regulator


IJICC3,2

354

5. ConclusionsIn this paper, a new approach in stress generation for emotional controllers waspresented. Meanwhile, a novel approach for employing model free controllers withlearning ability for controlling systems with unstable equilibriums was introduced.This approach was based on imitative learning, in which the emotional controller firstimitated from a simple stabilizing controller. Although BELBIC has rapid and powerfullearning capability, it could not be simply used to control unstable systems or systemswith unstable equilibriums. The experimental results showed that by employingimitative learning, BELBIC could rapidly learn to produce appropriate control signalsfor controlling a system with unstable equilibrium point. After it learned imitativelyfrom a simple classically designed controller, due to its learning ability, it could reducethe tracking and angel errors more effectively. Moreover, it showed more robustnessfacing disturbances. Another advantage of the proposed controller with fuzzycombination of objectives as the stress generator parts is that; by considering extrasituated objectives, it produces smoother control force with lower energy.

The stress of BELBIC was generated by fuzzy rules, which it made BELBIC morecapable to attend each objective properly. The results showed that this kind of stressgeneration led to superior performance in terms of tracking and angle errors thanalternative method for stress generation. Another interesting result was that BELBICwith any stress signal had better performance in presence of disturbance than theoriginally supplied controller and the PI feedback controller in case of Chua’s circuit,which were model-based controllers that well-tuned especially for the cases. This wasthe effect of learning capability of BELBIC, which could produce more appropriatecontrol force at various working conditions, and the fuzzy combination of differentobjectives which result in stress signals that delicately guide the BELBICs to learn.

Owing to fast changes in some of BELBIC parameters, it is clear that BELBIC doesnot learn the whole control policy for system, and it learns the control policytemporally which decreases computational costs. The learned control policy is seemedto depend on the operational condition of the cart velocity and position and pendulumangel, angular velocity, and the satisfying level of each objective.

Using more complex fusion operator to combine the objectives for generating thestress can be the next step of this work. These fusion mechanisms can model theattention to the objectives based on the states and the satisfaction degree of each ofthe objectives. Furthermore, based on defined expected level of satisfaction of eachobjective the model of attending to each of the objectives (combination of objectives) canbe learned using neural networks or other parametric structures. Learning how tocombine the objectives can be a big step toward automating decision making inunknown environments.

IAE(v1)

IAE(v2)

IAE(i) IACF MADCF

Controller structure(in presence of disturbance) E STD E STD E STD E STD E STD

BELBIC – fuzzy stress 1.315 0.224 0.461 0.103 3.174 0.637 26.472 2.789 76.311 4.784State PI feedback 1.482 0.249 0.372 0.0821 3.795 0.982 35.853 3.471 89.179 5.135

Table III.Performance measures

for various controllers inpresence of disturbance


355

Note

1. The digital pendulum control system, crane system, manufactured by Feedback InstrumentsLimited, England.

References

Alexander, C. (1979), The Timeless Way of Building, Oxford University Press, Oxford.

Arami, A., Javan Roshtkhari, M. and Lucas, C. (2008), “A fast model free intelligent controllerbased on fused emotions: a practical case implementation”, Proceeding of the 16thMediterranean Conference on Control and Automation, Ajaccio, France, pp. 596-602.

Balkenius, C. and Moren, J. (1998), “A computational model of emotional conditioning in thebrain”, Proceedings of the Workshop on Grounding Emotions in Adaptive Systems, Zurich,Switzerland.

Balkenius, C. and Moren, J. (2000), “A computational model of emotional learning in theAmygdala: from animals to animals”, Proceedings of 6th International Conference on theSimulation of Adaptive Behavior, MIT Press, Cambridge, MA, pp. 383-91.

Balkenius, C. and Moren, J. (2001), “Emotional learning: a computational model of theAmygdala”, Cybernetics and Systems, Vol. 32, pp. 611-36.

Behenke, S. and Bennewitz, M. (2005), “Learning to play soccer using imitative reinforcement”,Proceedings of International Conference on Robotics and Automation (ICRA), Workshopon Social Aspects of Robot Programming through Demonstration, Barcelona, Spain,pp. 18-22.

Burl, J.B. (1999), Linear Optimal Control: H2 and Hinfinity Methods, Addison-Wesley, Boston, MA.

Chellaa, A., Dindoa, H. and Infantinob, I. (2006), “A cognitive framework for imitation learning”,Robotics and Autonomous Systems, Vol. 54, pp. 403-8.

Chellaa, A., Dindoa, H. and Infantinob, I. (2007), “Imitation learning and anchoring throughconceptual spaces”, Applied Artificial Intelligence, Vol. 21, pp. 343-59.

Farina, M., Deb, K. and Amato, P. (2004), “Dynamic multiobjective optimization problems: testcases, approximations, and applications”, IEEE Trans. on Evolutionary Computation,Vol. 8, pp. 425-42.

Feedback Instrument Ltd (2002), Digital Pendulum Control Experiments Manual,33-935/936-1V60, Feedback Instrument Ltd, Crowborough.

Gholipour, A., Lucas, C. and Shahmirzadi, D. (2004), “Purposeful prediction of space weatherphenomena by simulated emotional learning”, IASTED International Journal of Modellingand Simulation, Vol. 24, pp. 65-72.

Hsu, C.S. (1987), Cell to Cell Mapping: A Method of Global Analysis for Nonlinear Systems,Springer, New York, NY.

Hsu, C.S. and Guttalu, R.S. (1980), “An unraveling algorithm for global analysis of dynamicalsystems: an application of cell-to-cell mapping”, ASME Journal of Applied Mechanic,Vol. 47, pp. 940-8.

Jafarzadeh, S., Jahed Motlagh, M.R., Barkhordari, M. and Mirheidari, R. (2008), “A new Lyapunovbased algorithm for tuning BELBIC controllers for a group of linear systems”, Proceedingsof the 16th Mediterranean Conference on Control and Automation, Ajaccio, France,pp. 593-5.

Jamali, M.R., Arami, A., Dehyadegari, M. and Lucas, C. (2009), “Emotion on FPGA: model drivenapproach”, Expert Systems with Applications, Vol. 36, pp. 7369-78.

IJICC3,2

356

Jamali, M.R., Pedram, A., Milasi, M.R. and Lucas, C. (2006), “Design and implementation ofBELBIC pattern”, Proceedings of 14th Iranian Conference on Electrical Engineering,Tehran, Iran, pp. 436-41.

Jamali, M.R., Arami, A., Hosseini, B., Moshiri, B. and Lucas, C. (2008), “Real time emotionalcontrol for anti-swing and positioning control of SIMO overhead travelling crane”, Int.Journal of Innovative Computing, Information and Control, Vol. 4, pp. 2333-44.

Jiang, G.P., Chen, G. and Tang, W.K. (2002), “Stabilizing unstable equilibrium points of a class ofchaotic systems using a state PI regulator”, IEEE Trans. on Circuits and Systems – I:Fundamental Theory and Application, Vol. 49, pp. 1820-6.

Jin, Y. and Sendhoff, B. (2008), “Pareto-based multiobjective machine learning: an overview andcase studies”, IEEE Trans. on Systems Man, and Cybernetics – Part C: Applications andReviews, Vol. 38, pp. 397-415.

Kuniyoshi, M.I. and Inoue, I. (1994), “Learning by watching: extracting reusable task knowledgefrom visual observation of human performance”, IEEE Trans. on Robotics andAutomation, Vol. 10, pp. 799-822.

Latzke, T., Behenke, S. and Bennewitz, M. (2006), “Imitative reinforcement learning for soccerplaying robots”, Proceedings of the 10th RoboCup International Symposium, Bremen,Germany, pp. 47-58.

Lopes, M. and Santos, V.J. (2005), “Visual learning by imitation with motor representations”,IEEE Transactions on Systems, Man and Cybernetics, Part B, Vol. 35, pp. 438-49.

Lucas, C., Shahmirzadi, D. and Sheikholeslami, N. (2004), “Introducing BELBIC: brain emotionallearning based intelligent controller”, International Journal of Intelligent Automation andSoft Computing, Vol. 10, pp. 11-21.

Milasi, R.M., Jamali, M.R. and Lucas, C. (2007), “Intelligent washing machine: a bioinspired andmultiobjective approach”, International Journal of Control, Automation, and Systems,Vol. 5, pp. 436-43.

Milasi, R.M., Lucas, C. and Araabi, B.N. (2004), “Speed control of an interior permanent magnetsynchronous motor using BELBIC (brain emotional learning based intelligent controller)”,Proceedings of 5th International Symposium on Intelligent Automation and Control, WorldAutomation Congress, Sevilla, Spain, Vol. 16, pp. 280-6.

Milasi, R.M., Lucas, C. and Araabi, B.N. (2006a), “Intelligent modeling and control of washingmachine using locally linear neuro-fuzzy (LLNF) modeling and modified brain emotionallearning based intelligent controller (BELBIC)”, Asian Journal of Control, Vol. 8,pp. 393-400.

Milasi, R.M., Lucas, C., Araabi, B.N., Radwan, T.S. and Rahman, M.A. (2006b), “Implementationof emotional controller for interior permanent magnet synchronous motor drive”,Proceedings of IEEE/IAS 41st Annual Meeting: Industry Applications, Tampa, FL, USA,Vol. 4, pp. 1767-74.

Mobahi, H., Nili Ahmadabadi, M. and Nadjar Araabi, B. (2007), “A biologically inspired methodfor conceptual imitation using reinforcement learning”, Journal of Applied ArtificialIntelligence, Vol. 21, pp. 155-83.

Montesano, L., Lopes, M., Bernardino, A. and Santos-Victor, J. (2008), “Learning objectaffordances: from sensory-motor coordination to imitation”, IEEE Transactions onRobotics, Vol. 24, pp. 15-26.

Moren, J. (2002), “Emotion and learning: a computational model of the amygdale”, PhD thesis,Lund University, Lund.

Ogata, K. (1997), Modern Control Engineering, 3rd ed., Pearson Education, Harlow.


357

Shahmirzadi, D. (2005), “Computational modeling of the brain limbic system and its applicationin control engineering”, MSc thesis, Texas A&M University, College Station, TX.

Shahmirzadi, D. and Langari, R. (2005), “Stability of Amygdala learning system using cell-to-cellmapping algorithm”, Journal of Intelligent System and Control, Vol. 4, pp. 97-119.

Shahmirzadi, D., Lucas, C. and Langari, R. (2003), “Intelligent signal fusion algorithm usingBEL – brain emotional learning”, Proceedings of 7th Joint Conference on InformationSciences, JCIS’03, 1st Symposium on Brain-Like Computer Architecture, Cary, NC, USA.

Sharbafi, M.A., Lucas, C., Mohammadinejad, A. and Yaghobi, M. (2006), “Designing a footballteam of robots from beginning to end”, International Journal of Information Technology,Vol. 3 No. 2, pp. 101-8.

Sheikholeslami, N., Shahmirzadi, D., Semsar, E., Lucas, C. and Yazdanpanah, M.J. (2006),“Applying brain emotional learning algorithm for multivariable control of HVACsystems”, International Journal of Intelligent and Fuzzy Systems, Vol. 17, pp. 35-46.

Sutton, R.S. and Barto, A.G. (1998), Reinforcement Learning: An Introduction, MIT Press,Cambridge, MA.

Takagi, T. and Sugeno, M. (1983), “Derivation of fuzzy control rules from human operators’control actions”, Proceedings of IFAC Symposium on Fuzzy Information, KnowledgeRepresentation and Decision Analysis, Marseille, France, pp. 55-60.

Tong, S. and Li, Y. (2007), “Direct adaptive fuzzy backstepping control for a class of nonlinearsystems”, International Journal of Innovative Computing, Information and Control, Vol. 3,pp. 877-96.

Further reading

Merabian, A.R. and Lucas, C. (2007), “Intelligent adaptive control of non-linear systems based onemotional learning approach”, International Journal on Artificial Intelligence Tools, Vol. 16,pp. 69-85.

About the authors

Mehrsan Javan Roshtkhari was born in 1984 in Mashhad, Iran. He received hisBSc in Electrical Engineering from the University of Tehran (2006). He iscurrently a MSc student in Control Engineering in Electrical and ComputerEngineering Department, University of Tehran. He is also a Student Member ofControl and Intelligent Processing Center of Excellence. His research interestincludes, pattern recognition, signal processing, emotional learning methods,and model free control. Mehrsan Javan Roshtkhari is the corresponding authorand can be contacted at: [email protected]

Arash Arami was born in 1983 in Tehran. He received his BSc in ElectricalEngineering from the University of Tabriz (2006). He is currently a MSc studentin Control Engineering in Electrical and Computer Engineering Department,University of Tehran. He is also a Student Member of Control and IntelligentProcessing Center of Excellence. His research interest includes: attentioncontrol, reinforcement learning and emotional learning, model free control, fuzzyclustering, swarm intelligence, and signal processing.

IJICC3,2

358

Caro Lucas received the MS degree from the University of Tehran, Iran, in 1973,and the PhD degree from the University of California, Berkeley, in 1976. He is aProfessor at the Department of Electrical and Computer Engineering, Universityof Tehran, Iran, as well as a Researcher at the School of Cognitive Science, Institutefor Studies in Theoretical Physics and Mathematics (IPM), Tehran, Iran. He hasserved as the Director of Research Faculty of Intelligent Systems (RFIS), IPM(1993-1997), Chairman of the ECE Department at the University of Tehran

(1986-1988), Managing Editor of the Memories of the Engineering Faculty, University of Tehran(1979-1991), Associate Editor of Journal of Intelligent and Fuzzy Systems (1992-1999), and Chairmanof the IEEE, Iran Section (1990-1992). His research interests include biological computing,computational intelligence, uncertain systems, intelligent control, neural networks, multi-agentsystems, data mining, business intelligence, financial modeling, and knowledge management.Professor Lucas has served as the Chairman of several International Conferences. He was theFounder of the RFIS, Center of Excellence on Control and Intelligent Processing, and has assisted infounding several new research organizations and engineering disciplines in Iran.


359

To purchase reprints of this article please e-mail: [email protected] visit our web site for further details: www.emeraldinsight.com/reprints

imitative learning based emotional controller for unknown systems with unstable equilibrium

Documents