optimal control of inverted pendulum using ant colony system algorithm2

Optimal Control of Inverted Pendulum using

Ant Colony System Algorithm

Shekhar Yadav Dept. of Electrical Engineering,

IT.BHU, Varanasi (UP), India.

Email: [email protected]

J.P.Tiwari

Dept. of Electrical Engineering,



S.K.Nagar Dept. of Electrical Engineering,



Abstract- In this paper a pole-placement technique is

used for designing a feedback controller for an inverted

pendulum system. The state feedback method is

proposed for controlling and stabilization of an inverted

pendulum-cart system. The parameters of feedback

gain matrix are optimize using the modified ant colony

system (ACS) algorithm. The proposed control strategy

has been derived by eliminating the internal signals of

the feedback control system by executing row

operations. The effectiveness of this proposed technique

is validated through experimental results obtained by

performing experiments on a simple digital inverted

pendulum (33-936IC).

Keywords- Pole Placement design, Ant Colony

System (ACS), LQR, ITAE, Inverted Pendulum.

I. INTRODUCTION

The inverted pendulum on a cart is a perfect test-bed

for the design of a wide range of classical and

contemporary control techniques. Inverted pendulum

system is widely used in the field of robotics, space

rocket guidance system, fast moving ground vehicles

and anti-seismic control for buildings etc. Inverted

pendulum is a multivariable, nonlinear, fast reaction,

unstable and higher order system [1,2]. A double

rod inverted pendulum system as shown in Fig.1, is

mounted on a cart which is driven using a dc motor.

The aim of the control strategy is to oscillate the

inverted pendulum from its initial position, until it

reaches the upright equilibrium point [3,7]. The

stabilization of inverted pendulum system is

proposed using pole-placement technique. The

parameters of state feedback gain matrix are tuned

using ACS algorithm. Ant colony algorithm was first

introduced by M. Dorigo [5] and inspired by the

foraging behavior of real ants.

The basic ant colony algorithm idea is that a set

of cooperating artificial ants searching the solution

space in parallel simulated real ants searching their

environment for food [4,5]. In pheromone updating

rule, the main modification lies in either mode of

pheromone trails increment of different routes

assigned to different weight values. To keep a

suitable balance between the two contradictory aims

of exploring the search space and accelerating

convergence, an modified ant colony algorithm is

proposed based on adjusting pheromone evaporation

factor by solving continuous optimization problems.

The quadratic performance index is selected as an

objective function, and all parameters of state

feedback gain matrix are tuned by ACS algorithm.

Fig.1 Digital Inverted Pendulum (33-936IC)

Linear Quadratic Regulator (LQR) is designed for

optimal control of inverted pendulum.

II. POLE-PLACEMENT DESIGN TECHNIQUE

Considering a linear dynamic system in state space

form

= + (1)

= (2)

where, =state vector of the plant ( -vector)

=control signal (scalar)

=output signal (scalar)

= constant matrix

= 1 constant matrix

and the control signal is given by

= (3)

The 1 matrix is called the state feedback gain

matrix. The closed-loop control system when state

is fed back to the control signal is given by-

= () (4)

Assuming that the pair (A,B) is completely

controllable, there exist a feedback matrix K such that

the closed-loop system eigenvalues can be placed in

arbitrary locations. The state feedback gain matrix

can also be obtained through the Quadratic cost

function minimization.

=1

2 +

0 (5)

The errors are minimized as,

= [ ] (6)

where is a state vector,

= [, , , ]

The vector is expressed by the solution of the

Riccati equation.

+ + 1 = 0 (7)

= 1 (8)

III. ANT COLONY SYSTEM

The ant colony system was developed in early 1990s

by Dorigo et al [5]. The ACS technique is one of the

metaheuristic optimization methods and is inspired

by the capability of real ants to establish the shortest path from a food source to their nest. Ants lay the

chemical substance or the trails of pheromone, on the

ground when they move along paths. Each individual

ant makes a decision of the moving direction based

on the strength of the pheromone trails. The better

path is one that has higher amount of the pheromone

trails on the ground. While more and more ants track

on the food source, the shorter path accumulates the

more pheromone trails. Thus, most of the ants are

attracted to the shorter path, and this behavior of the

path selection encourages the positive feedback effect. It is noted that the ants finally will find the

shortest path [6].

A. GENERATION OF NODES AND PATHS

Let the state feedback gain matrix parameters

1 , 2 , 3 , 4 are the optimized variables, and

assume that the value of each of them has four valid

digits. In the four digits of 1 , 2 , 3 , 4, there are

two digits before decimal point and two digits after

decimal point. When using the ACS algorithm, a

discrete solving space is needed because the path

selections of an ant in each step are limited. In order

to use the ACS algorithm conveniently, the values of

1 , 2 , 3 and4 are expressed on X-Y plane. As

shown in Fig.2, first we draw sixteen lines

1 , 2 , , 16 which have equal length and equal

separation and are perpendicular to axis X.

1~4 , 5~8 ,9~12 , 13 ~16 represents the first

digit to fourth digit of 1 , 2 , 3 and4 respectively.

The X coordinated of these lines are represented by

numbers 1~16 respectively. Then, we divided each

of these lines into ten portions and thus eleven nodes

are generated on each line. The eleven nodes on each

line represent numbers 0~10 respectively, which are

possible values of the digits corresponding to the line.

Let an ant depart from the origin O of X-Y plane.

When it moves to any node of line 16 , it completes a

tour. Its moving path can be represented by =

{ , 1 , 1 , , 2 ,2 , . . ,

16 , 16 , }. Obviously, the values of

1 , 2 , 3 and4 represented by the path can be

computed by the following formulas:

K1 = y1j 101 + y2j 10

0 + y3j 101 + y4j 10

2

K2 = y5j 101 + y6j 10

0 + y7j 101 + y8j 10

2

K3 = y9j 101 + y10j 10

0 + y11j 101 + y12j 10

2

K4 = y13j 101 + y14j 10

0 + y15j 101 + y16j 10

2

B. TRANSITION RULE

When all ants move to one line, say, line , let

(0~10) be the number of ants at node j of line

then the total number of ants is = 10 =0 . Let

, be the concentration of pheromone at Node

, assume that initially all the nodes have same

amount of pheromone 0. In moving process, an ant

= 1~ on line 4 = 1~16 , will select a

node j from the eleven nodes of the next line to

move to the according to the following transition

rule:

=

, . , , 0 (10)

and j=J, if 0

where q is a random variable uniformly distributed

over [0,1], 0 is tunable parameter, contains all of

the nodes on line and J is a node that is

randomly selected according to probability.

, = ,

( , )

, ( , )

(11)

In Eqn. (10) and (11) ( , ) is the visibility of

node ( , ) and this is computed as-

, =11

11 (12)

where the values of (i=1~16,j= 0~10) are the set

in following way. In the first iteration of the ACS

algorithm the values of the are set to the vertical

coordinates of the sixteen nodes which are obtained

by mapping the values of state feedback gain matrix

parameters 10, 2

0, 30 and 4

0 onto Fig.2 where 10,

20, 3

0 and 40are obtained by using pole placement

technique. In each of the following iterations, the

values of state feedback gain matrix 1, 2

, 3 and

4 as shown in Fig.2, where 1

, 2, 3

and 4 are

the state feedback gain parameters corresponding to

best tour generated since the beginning of the trial.

C. GLOBAL UPDATE OF PHEROMONE

CONCENTRATION

When all of the ants in the colony complete their

tours once in the modified ant colony system

algorithm i.e. when they arrive on the line 16 , the

pheromone concentration of each nodes belonging to

(9)

Fig.2 Diagram of Generating Nodes and Paths

the best tour since the beginning of the trial is

updated by the following formulas:

( , ) (1 ). ( , ) + . ( , ) (13)

, = Q/ITAE (14)

where Node , s are the nodes belonging to the

best tour since the beginning of the trial; is the

parameter which governs the pheromone decay;

ITAE* is the value of the ITAE performance criterion

corresponding to the best tour since the beginning of

the trial; and Q is a positive constant which can be

determined in the following way: for a given control

system, first we obtained the state feedback gain

matrix through pole-placement technique and then

we compute the ITAE performance criterion of the

system according to the obtained state feedback gain

parameters and use ITAE0 to denote the obtained

ITAE value, and then let Q be equal to ITAE0.

Obviously, as the value of ITAE* becomes smaller

and smaller, the value of Q/ITAE* will become

greater and greater, which is helpful to increasing the

pheromone concentration of the nodes on the best

tour since the beginning of the trial and results in

finding the best solution within the maximum number

of iterations allowed.

D. LOCAL UPDATES OF PHEROMONE

CONCENTRATION

The local update is performed as follows: while

performing a tour, ant is on line 1 and selects

node j on line , the pheromone concentration of

Node ( , ) is updated by the following formula:

( , ) (1-). ( , ) + 0 (15)

The value 0 is the same as the initial value of

pheromone concentration. When an ant visits a node,

the application of the local update rule makes the

pheromone level of the node diminish. This has the

effect of making the visited nodes less and less

attractive for other ants, thus indirectly favoring the

exploration of not yet visited nodes. To optimize the

performance of an inverted pendulum-cart system,

the gains of state feedback system are adjusted to

maximize or minimize a certain performance index.

The objective of the performance index is to

encompass in a single number a quality measure for

the performance of the system. Various objective

functions were written based on error performance

criterion. The performance index is calculated over a

time interval; T, normally in the region of 0 T ts , where ts is the settling time of the system. To

emphasize the effectiveness of the proposed method,

the ITAE performance criterion as given below is

adopted in this paper.

ITAE = . ()

0 (16)

IV. MODELING OF INVERTED PENDULUM

The inverted pendulum-cart system is usually

presented as a pole balancing task. The system to be

controlled consists of a cart and a rigid pole hinged to

the top of the cart. The movement of the cart is

caused by pulling the belt in two directions by the

DC motor attached at the end of the rail. By applying

a voltage to the motor the force can be controlled

with which the cart is to be pulled. The value of the

force depends on the value of the control voltage.

The cart can move left or right on a one-dimensional

bounded track, whereas the pole can swing in the

vertical plane determined by the track. The linearized

system equations around = in the state space are:

=

0 1 0 0

0 +2

+ + 222

+ + 20

0 0 0 1

0

+ + 2 +

+ + 20

+

0+2

+ +2

0

+ +2

(17)

= 1 0 0 00 0 1 0

+ 00 (18)

where,

M (mass of cart) 2.4 kg

m (mass of pendulum) 0.23 kg

b (friction of cart) 0.05 N/m/sec

I (moment inertia of pendulum) 0.099 kgm2

l (length of pendulum) 0.4 m

g (acceleration due to gravity) 9.8 m/sec2

The state of the system is defined by values of four

system variables: , , , the cart position, cart

velocity, pendulum angle and angular velocity of the

pendulum pole respectively. Control force is applied

to the system to prevent the pole from falling while

keeping the cart within the specified limits. The

inverted pendulum-cart system is used here is Digital

Pendulum (33-936IC).

V. RESULTS AND DISCUSSION

The performance of proposed controller is discussed

in this section. The closed-loop poles of the system

are located at = ( = 1,2,3,4), where 1 = 2 +

2 3, 2 = 2 2 3, 3 = 10, 4 = 10. The closed-loop poles 1 2 are a pair of dominant closed loop poles with = 0.5 = 4. The LQR method finds the optimal control matrix that

result in some balance between system errors and

control effort. The performance index matrix (R) and

the state-cost matrix (Q) is initially set as: =1 and = [1 0 0 0; 0 0 0 0; 0 0 1 0; 0 0 0 0]. The weighting factors will be chosen by trial and errors.

The state feedback gain matrix found through

MATLAB commands is:

= [0.9701 3.0259 70.5683 27.1358] (19)

The pendulum's and cart's overshoot appear fine, but

their settling times need improvement and the cart's

rise time needs to be decreased. Also the cart has, in

fact, moved in the opposite direction. For now, we

will concentrate on improving the settling times and

the rise times.

The settling time and rise time can be improved by

updating the matrix Q and matrix R by trial and error

method. The updated matrix Q and R are given as-

=

4500 0 0 00 0 0 00 0 100 00 0 0 0

= 1

the step response generated while updating the matrix

Q and R, is shown in Fig.4, and the state feedback

gain matrix is given as-

= [63.05 66.78 372.28 144.12] (20)

From Fig.4, we see that all design requirements are

satisfied except the steady-state error of the cart

position (x) but using LQR method system respond

very slowly because values of state feedback gain

becomes larger. Therefore, gains of state feedback

are tuned through modified ant colony system

algorithm. Performance index (PI) is optimized for

position of the cart as in the real system, the length of

the apparatus on which the cart is moving is limited.

So care has to be taken to restrict the motion of the

cart within the limits. This is analyzed on the basis of

ITAE to maintain the pendulum position at 00 for any

disturbance given to the cart. The feedback gain

matrix using ant colony system (ACS) is-

= 48.78 28.63 51.65 71.76 (21)

Fig.3 Step response of inverted pendulum

system

Fig.4 Step response using LQR method

VI. CONCLUSION

In this paper, an Ant Colony System (ACS) algorithm

is used to stabilize an inverted pendulum-cart system.

By using modified ACS algorithm, the calculation

time can be reduced and the accuracy can be

increased in comparison with the pole-placement

design technique. This concept gives a new

alternative procedure in time varying feedback

control to improve the stability performance. This

technique is implemented in an inverted pendulum-

cart system which is a highly nonlinear system.

VII. REFERENCES

[1] C. C. Chung and J. Hauser, Nonlinear control of a swinging pendulum, Automatica, vol. 31, no. 6, pp. 851862, Jun. 1995. [2] Q. Wei, W. P. Dayawansa, and W. S. Levine, Nonlinear controller for an inverted pendulum having restricted travel, Automatica, vol. 31, no. 6, pp. 841-850, 1995. [3] Hamid R. P., M. R. Jaheh-Motlagh, Ali-Akbar J.,

Optimal feedback control design using genetic algorithm applied to inverted pendulum, IEEE International Symposium on Industrial Electronics, pp. 263-268, June,2007 [4] C. Grosan and A. Abraham: Hybrid Evolutionary Algorithms: Methodologies, Architectures, and Reviews,

Studies in Computational Intelligence (SCI) 75, 117 (2007), www.springerlink.com c_ Springer-Verlag Berlin Heidelberg. [5] M. Dorigo, M. Birattari, and T. Stitzle, Ant Colony Optimization: Arificial Ants as a Computational

Intelligence Technique, IEEE computational intelligence magazine, November, 2006 [6] M. Dorigo, L.M. Gambardella, Ant colony system : a cooperative learning approach to the traveling salesman problem, IEEE Tran. On Evolutionary Computation, vol. 1, no. 1, pp. 53-66, 1997. [7] Katsuhiko Ogata, Modern Control Engineering, Prentice Hall, New Jersey, 3rd edition-1997

Fig.5 Step response of ITAE using ACS

algorithm

optimal control of inverted pendulum using ant colony system algorithm2

Documents