nearlyoptimalcontrollawsfornonlinearsystemswithsaturating ... · hamilton–jacobi–bellman (hjb)...

13
Automatica 41 (2005) 779 – 791 www.elsevier.com/locate/automatica Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach Murad Abu-Khalaf , Frank L. Lewis Automation and Robotics Research Institute, The University of Texas at Arlington, TX 76118, USA Received 25 November 2002; received in revised form 3 June 2004; accepted 23 November 2004 Abstract The Hamilton–Jacobi–Bellman (HJB) equation corresponding to constrained control is formulated using a suitable nonquadratic func- tional. It is shown that the constrained optimal control law has the largest region of asymptotic stability (RAS). The value function of this HJB equation is solved for by solving for a sequence of cost functions satisfying a sequence of Lyapunov equations (LE). A neural network is used to approximate the cost function associated with each LE using the method of least-squares on a well-defined region of attraction of an initial stabilizing controller. As the order of the neural network is increased, the least-squares solution of the HJB equation converges uniformly to the exact solution of the inherently nonlinear HJB equation associated with the saturating control inputs. The result is a nearly optimal constrained state feedback controller that has been tuned a priori off-line. 2005 Elsevier Ltd. All rights reserved. Keywords: Actuator saturation; Constrained input systems; Infinite horizon control; Hamilton–Jacobi–Bellman; Lyapunov equation; Least squares; Neural network 1. Introduction The control of systems with saturating actuators has been the focus of many researchers for many years. Sev- eral methods for deriving control laws considering the saturation phenomena are found in Saberi, Lin, and Teel (1996), Sussmann, Sontag, and Yang (1994) and Bernstein (1995). However, most of these methods do not consider optimal control laws for general nonlinear systems. In this paper, we study this problem through the framework of the Hamilton–Jacobi–Bellman (HJB) equation appearing in op- timal control theory (Lewis & Syrmos, 1995). The solution of the HJB equation is challenging due to its inherently Supported by NSF ECS-0140490 and ARO DAAD19-02-1-0366. This paper was not presented at any IFAC meeting. This paper was recom- mended for publication in revised form by Associate Editor K.L. Teo under the direction of Editor I. Petersen. Corresponding author. Tel./fax: +1 817 272 5938. E-mail addresses: [email protected] (M. Abu-Khalaf), [email protected] (F.L. Lewis). 0005-1098/$ - see front matter 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.automatica.2004.11.034 nonlinear nature. For linear systems, the HJB becomes the well-known Riccati equation which results in the linear quadratic regulator (LQR) controller. When the linear sys- tem is input constrained, then the closed loop dynamics become nonlinear and the LQR is not optimal anymore. The HJB equation generally cannot be solved. There has been a great deal of effort to solve this equation. Approxi- mate HJB solutions have been found using many techniques such as those developed by Saridis and Lee (1979), Beard, Saridis, and Wen (1997, 1998), Beard (1995), Murray, Cox, Lendaris, and Saeks (2002), Lee, Teo, Lee, and Wang (2001), Huang, Wang, and Teo (2000), Bertsekas and Tsitsiklis (1996), Munos, Baird, and Moore (1999), Kim, Lewis, and Dawson (2000), Han and Balakrishnan (2000), Liu and Bal akrishnan (2000), Lyshevski and Meyer (1995), Lyshevski (1996, 1998, 2001a,b) and Huang and Lin (1995) for HJI equation which is closely related to the HJB equation. In this paper, we focus on the HJB solution using a se- quence of Lyapunov equations developed by Saridis and Lee (1979). Saridis and Lee (1979) successively improve a given initial stabilizing control. This method reduces to the

Upload: others

Post on 09-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Nearlyoptimalcontrollawsfornonlinearsystemswithsaturating ... · Hamilton–Jacobi–Bellman (HJB) equation appearing in op-timal control theory ( Lewis & Syrmos, 1995 ). The solution

Automatica 41 (2005) 779–791

www.elsevier.com/locate/automatica

Nearly optimal control laws for nonlinear systemswith saturatingactuators using a neural networkHJBapproach�

MuradAbu-Khalaf∗, Frank L. LewisAutomation and Robotics Research Institute, The University of Texas at Arlington, TX 76118, USA

Received 25 November 2002; received in revised form 3 June 2004; accepted 23 November 2004

Abstract

The Hamilton–Jacobi–Bellman (HJB) equation corresponding to constrained control is formulated using a suitable nonquadratic func-tional. It is shown that the constrained optimal control law has the largest region of asymptotic stability (RAS). The value function ofthis HJB equation is solved for by solving for a sequence of cost functions satisfying a sequence of Lyapunov equations (LE). A neuralnetwork is used to approximate the cost function associated with each LE using the method of least-squares on a well-defined region ofattraction of an initial stabilizing controller. As the order of the neural network is increased, the least-squares solution of the HJB equationconverges uniformly to the exact solution of the inherently nonlinear HJB equation associated with the saturating control inputs. The resultis a nearly optimal constrained state feedback controller that has been tuned a priori off-line.� 2005 Elsevier Ltd. All rights reserved.

Keywords:Actuator saturation; Constrained input systems; Infinite horizon control; Hamilton–Jacobi–Bellman; Lyapunov equation; Least squares;Neural network

1. Introduction

The control of systems with saturating actuators hasbeen the focus of many researchers for many years. Sev-eral methods for deriving control laws considering thesaturation phenomena are found inSaberi, Lin, and Teel(1996), Sussmann, Sontag, and Yang (1994)andBernstein(1995). However, most of these methods do not consideroptimal control laws for general nonlinear systems. In thispaper, we study this problem through the framework of theHamilton–Jacobi–Bellman (HJB) equation appearing in op-timal control theory (Lewis & Syrmos, 1995). The solutionof the HJB equation is challenging due to its inherently

� Supported by NSF ECS-0140490 and ARO DAAD19-02-1-0366. Thispaper was not presented at any IFAC meeting. This paper was recom-mended for publication in revised form by Associate Editor K.L. Teounder the direction of Editor I. Petersen.

∗ Corresponding author. Tel./fax: +18172725938.E-mail addresses: [email protected] (M. Abu-Khalaf),

[email protected](F.L. Lewis).

0005-1098/$ - see front matter� 2005 Elsevier Ltd. All rights reserved.doi:10.1016/j.automatica.2004.11.034

nonlinear nature. For linear systems, the HJB becomes thewell-known Riccati equation which results in the linearquadratic regulator (LQR) controller. When the linear sys-tem is input constrained, then the closed loop dynamicsbecome nonlinear and the LQR is not optimal anymore.The HJB equation generally cannot be solved. There has

been a great deal of effort to solve this equation. Approxi-mate HJB solutions have been found using many techniquessuch as those developed bySaridis and Lee (1979), Beard,Saridis, and Wen (1997, 1998), Beard (1995), Murray, Cox,Lendaris, and Saeks (2002), Lee, Teo, Lee, andWang (2001),Huang, Wang, and Teo (2000), Bertsekas and Tsitsiklis(1996), Munos, Baird, and Moore (1999), Kim, Lewis, andDawson (2000), Han and Balakrishnan (2000), Liu and Balakrishnan (2000), Lyshevski and Meyer (1995), Lyshevski(1996, 1998, 2001a,b)andHuang and Lin (1995)for HJIequation which is closely related to the HJB equation.In this paper, we focus on the HJB solution using a se-

quence of Lyapunov equations developed bySaridis andLee (1979). Saridis and Lee (1979)successively improve agiven initial stabilizing control. This method reduces to the

Page 2: Nearlyoptimalcontrollawsfornonlinearsystemswithsaturating ... · Hamilton–Jacobi–Bellman (HJB) equation appearing in op-timal control theory ( Lewis & Syrmos, 1995 ). The solution

780 M. Abu-Khalaf, F.L. Lewis / Automatica 41 (2005) 779–791

well-known Kleinman (1968)iterative method for solvingthe Algebraic Riccati equation using Lyapunov matrix equa-tions. For nonlinear systems, it is unclear how to solve theLE. Therefore, successful application of the successive ap-proximation method was limited until the novel work ofBeard et al. (1997, 1998)where they used a Galerkin spec-tral approximation method at each iteration to find approx-imate solutions to the LE and this is called usually GHJB.This requires however the computation of a large number ofintegrals and it is not obvious how to handle explicit con-straints on the controls, which is the interest of this paper.Lyshevski (1998)proposed the use of nonquadratic func-

tionals to confront constraints on inputs. Using nonquadraticfunctionals, the HJB equation was formulated and its solu-tion results in a smooth saturated controller. It remains how-ever difficult to actually solve for the value function of thisHJB equation.In this paper, we use neural networks to obtain an approx-

imate solution to the value function of the HJB equationthat uses nonquadratic functionals. This in turn results in anearly optimal constrained input state feedback controllersuitable for saturated actuators.Neural networks applications to optimal control via the

HJB equation, Adaptive Critics, were first proposed byWer-bos inMiller, Sutton, and Werbos (1990). Parisini and Zoppoli (1998)used neural networks to derive optimal controllaws for discrete-time stochastic nonlinear system. Success-ful neural network controllers have been reported inChenand Liu (1994), Lewis, Jagannathan, and Yesildirek (1999),Polycarpou (1996), Rovithakis and Christodoulou (1994),Sadegh (1993)andSanner and Slotine (1991). It has beenshown that neural networks can effectively extend adaptivecontrol techniques to nonlinearly parameterized systems.The status of neural network control as of 2001 appears inNarendra and Lewis (2001).

2. Background in optimal control and constrainedinput systems

Consider an affine in the control nonlinear dynamical sys-tem of the form

x = f (x)+ g(x)u(x), (1)

wherex ∈ Rn, f (x) ∈ Rn, g(x) ∈ Rn×m. And the inputu ∈ U, U = {u = (u1, . . . , um) ∈ Rm : �i�ui��i , i =1, . . . , m}. where�i , �i are constants. Assume thatf + guis Lipschitz continuous on a set� ⊆ Rn containing theorigin, and that system (1) is stabilizable in the sense thatthere exists a continuous control on� that asymptoticallystabilizes the system. It is desired to findu, which minimizesa generalized nonquadratic functional

V (x0)=∫ ∞

0[Q(x)+W(u)]dt . (2)

Q(x),W(u) are positive definite, i.e.∀x �= 0 Q(x)>0 andx = 0 ⇒ Q(x) = 0. For unconstrained control inputs, acommon choice forW(u) is W(u) = uTRu, whereR ∈Rm×m. Note that the controlu must not only stabilize thesystem on�, but guarantee that (2) is finite. Such controlsare defined to beadmissible(Beard et al., 1997).

Definition 1 (Admissible Controls). A control u is definedto be admissible with respect to (2) on�, denoted byu ∈�(�), if u is continuous on�, u(0)= 0, u stabilizes (1) on�, and∀x0 ∈ � V (x0) is finite.

Eq. (2) can be expanded as follows:

V (x0)=∫ T

0[Q(x)+W(u)]dt +

∫ ∞

T

[Q(x)+W(u)]dt

=∫ T

0[Q(x)+W(u)]dt + V (x(T )). (3)

If V ∈ C1, then Eq. (3) becomes

limT→0

[V (x0)−V (x(T ))]/T = limT→0

1

T

∫ T

0[Q(x)+W(u)]dt ,

V = V Tx (f + gu)= −Q(x)−W(u). (4)

Eq. (4) is the infinitesimal version of Eq. (2) and is a sortof a Lyapunov equation for nonlinear systems that we referto as LE in this paper.

LE(V, u)�V Tx (f + gu)+Q+W(u)= 0, V (0)= 0. (5)

For unconstrained control inputs,W(u)= uTRu, the LEbecomes the well-known HJB equation (Lewis & Syrmos,1995) on substitution of the optimal control

u∗(x)= −12R

−1gTV ∗x (6)

V ∗(x) is the value function of the optimal control problemthat solves the HJB equation

HJB(V ∗)�V ∗Tx f +Q− 1

4V∗Tx gR−1gTV ∗

x = 0,

V ∗(0)= 0. (7)

Lyshevski and Meyer (1995)showed that the value functionobtained from (7) serves as a Lyapunov function on�.To confront bounded controls,Lyshevski (1998)intro-

duced a generalized nonquadratic functional

W(u)= 2∫ u

0(�−1(v))TR dv,

�(v)= [�(v1) · · ·�(vm)]T,�−1(u)= [�−1(u1) · · ·�−1(um)], (8)

wherev ∈ Rm, � ∈ Rm, and�(·) is a bounded one-to-onefunction that belongs toCp (p�1) andL2(�). Moreover, itis a monotonic odd function with its first derivative boundedby a constantM. An example is the hyperbolic tangent�(·)=tanh(·). R is positive definite and assumed to be symmetric

Page 3: Nearlyoptimalcontrollawsfornonlinearsystemswithsaturating ... · Hamilton–Jacobi–Bellman (HJB) equation appearing in op-timal control theory ( Lewis & Syrmos, 1995 ). The solution

M. Abu-Khalaf, F.L. Lewis / Automatica 41 (2005) 779–791 781

for simplicity of analysis. Note thatW(u) is positive definitebecause�−1(u) is monotonic odd andR is positive definite.The LE when (8) is used becomes

V Tx (f + g · u)+Q+ 2

∫ u

0�−T(v)R dv = 0,

V (0)= 0. (9)

Note that the LE becomes the HJB equation upon substitut-ing the constrained optimal feedback control

u∗(x)= −�(12 · R−1gTV ∗x ), (10)

whereV ∗(x) solves the following HJB equation:

V ∗Tx (f − g�(12 · R−1gTV ∗

x ))+Q

+ 2∫ −�(12 ·R−1gTV ∗

x )

0�−T(v)R dv = 0,

V ∗(0)= 0. (11)

Eq. (11) is a nonlinear differential equation for whichthere may be many solutions. Existence and uniqueness ofthe value function has been shown inLyshevski (1996). ThisHJB equation cannot generally be solved. There is currentlyno method for rigorously solving for the value function ofthis constrained optimal control problem. Moreover, currentsolutions are not well defined over a specific region of thestate space.

Remark 1. Optimal control problems do not necessarilyhave smooth or even continuous value functions, (Huanget al., 2000; Bardi & Capuzzo-Dolcetta, 1997). Lio (2000)used the theory of viscosity solutions to show that for infi-nite horizon optimal control problems with unbounded costfunctionals, under certain continuity assumptions of the dy-namics, the value function is continuous on some set�,V ∗(x) ∈ C(�). Bardi and Capuzzo-Dolcetta (1997)showedthat if the Hamiltonian is strictly convex and if the continu-ous viscosity solution is semiconcave, thenV ∗(x) ∈ C1(�)satisfying the HJB equation everywhere. For affine inputsystems (1), the Hamiltonian is strictly convex if the systemdynamics gain matrixg(x) is constant, there is no bilinearterms of the state and the control, and if the integrand of(2) does not involve cross terms of the states and the con-trol. In this paper, all derivations are performed under theassumption of smooth solutions to (9) and (11) with all whatthis requires of necessary conditions. See (Van Der Schaft,1992; Saridis & Lee, 1979) for similar framework of so-lutions. If the smoothness assumption is released, then oneneeds to use the theory of viscosity solutions to show thatthe continuous cost solutions of (9) converge to the contin-uous value function of (11).

3. Successive approximation of HJB for saturatedcontrols

The LE is linear inV (x), while the HJB is nonlinear inV ∗(x). Solving the LE forV (x) requires solving a lineardifferential equation, while the HJB solution involves a non-linear differential equation that may be impossible to solve.This is the reason for introducing the successive approxima-tion theory developed bySaridis and Lee (1979).Successive approximation using the LE has not yet been

rigorously applied to bounded controls. In this section, weshow that the successive approximation theory can be ex-tended to constrained input systems when nonquadratic per-formance functionals are used.The successive approximation technique is now applied

to (9), (10). The following lemma shows how Eq. (10) canbe used to improve the control law.

Lemma 1. If u(i) ∈ �(�), andV (i) ∈ C1(�) satisfies theequationLE(V (i), u(i)) = 0 with the boundary conditionV (i)(0)= 0, then the new control derived as

u(i+1)(x)= −�(12 · R−1gTV (i)x ) (12)

is an admissible control for(1) on �. Moreover, if thebounding function�(·) is monotone odd, and V (i+1)

is the unique positive-definite function satisfying equa-tion LE(V (i+1), u(i+1)) = 0, with the boundary conditionV (i+1)(0)= 0, thenV ∗(x)�V (i+1)(x)�V (i)(x) ∀x ∈ �.

Proof. To show the admissibility part, sinceV (i) ∈ C1(�),the continuity assumption ong implies thatu(i+1) is contin-uous. SinceV (i) is positive definite it attains a minimum atthe origin, and thus, dV (i)/dx must vanish there. This im-plies thatu(i+1)(0)= 0. Taking the derivative ofV (i) alongthe systemf + gu(i+1) trajectory we have

V (i)(x, u(i+1))= V (i)Tx f + V (i)Tx gu(i+1), (13)

V (i)Tx f = −V (i)Tx gu(i) −Q− 2∫ u(i)

0�−T(v)R dv. (14)

Therefore Eq. (13) becomes

V (i)(x, u(i+1))= − V (i)Tx gu(i) + V (i)Tx gu(i+1) −Q

− 2∫ u(i)

0�−T (v)R dv. (15)

SinceV (i)Tx g(x)= −2�−T(u(i+1))R, we get

V (i)(x, u(i+1))= −Q+ 2

[�−T(u(i+1))R(u(i) − u(i+1))

−∫ u(i)

0�−T(v)R dv

]. (16)

Page 4: Nearlyoptimalcontrollawsfornonlinearsystemswithsaturating ... · Hamilton–Jacobi–Bellman (HJB) equation appearing in op-timal control theory ( Lewis & Syrmos, 1995 ). The solution

782 M. Abu-Khalaf, F.L. Lewis / Automatica 41 (2005) 779–791

The second term in (16) is negative because� and hence�−1 is monotone odd. To see this, note thatR is symmetricpositivedefinite, this means we can rewrite it asR = ���where� is a diagonal matrix with its values being the sin-gular values ofRand� is an orthogonal symmetric matrix.Substituting forR in (16) we get

V (i)(x, u(i+1))

= −Q+ 2

[�−T(u(i+1))���(u(i) − u(i+1))

−∫ u(i)

0�−T(v)���dv

]. (17)

Applying the coordinate changeu= �−1z to (17)

V (i)(x, u(i+1))

= −Q+ 2�−T(�−1z(i+1))���(�−1z(i) − �−1z(i+1))

− 2∫ z(i)

0�−T(�−1�)����−1 d�

= −Q+ 2�−T(�−1z(i+1))��(z(i) − z(i+1))

− 2∫ z(i)

0�−T(�−1�)��d�

= −Q+ 2�T(z(i+1))�(z(i) − z(i+1))

− 2∫ z(i)

0T(�)�d�, (18)

where�T(z(i))= �−1(�−1z(i))T�.Since� is a diagonal matrix, we can now decouple the

transformed input vector such that

V (i)(x, u(i+1))

= −Q+ 2�T(z(i+1))�(z(i) − z(i+1))

− 2∫ z

(i)k

0�T(�)�d�

= −Q+ 2m∑k=1

�kk

[(z(i+1)

k )(z(i)k − z

(i+1)k )

−∫ z

(i)k

0(�k)d�k

]. (19)

SinceR>0, then we have the singular values�kk beingall positive. Also, from the geometrical meaning of

(z(i+1)k )(z

(i)k − z

(i+1)k )−

∫ z(i)k

0(�k)d�k,

this term is always negative if(zk) is monotone and odd.Because�−1(·) is monotone and odd one-to-one function,and since�T(z(i))= �−1(�−1z(i))T�, it follows that(zk)is monotone and odd. This implies thatV (i)(x, u(i+1)) <0and thatV (i)(x) is a Lyapunov function foru(i+1) on �.From Definition 1,u(i+1) is admissible on�.

The second part of the lemma follows by considering thatalong the trajectories off + gu(i+1), ∀x0 we have

V (i+1)(x0)− V (i)(x0)

=∫ ∞

0

{Q(x(, x0, u(i+1)))

+2∫ u

(i+1)(x(,x0,u

(i+1)))

0�−T(v)R dv

d

−∫ ∞

0

{Q(x(, x0, u(i+1)))

+2∫ u(i)

(x(,x0,u(i+1))

)0

�−T(v)R dv

}d

= −∫ ∞

0

d(V (i+1) − V (i))T

dx[f + g u(i+1)]d. (20)

Because LE(V (i+1), u(i+1))= 0, LE(V (i), u(i))= 0

V (i)Tx f = − V (i)T

x gu(i) −Q− 2∫ u(i)

0�−T(v)R dv, (21)

V (i+1)Tx f = − V (i+1)T

x gu(i+1) −Q

− 2∫ u(i+1)

0�−T(v)R dv. (22)

Substituting (21), (22) in (20) we get

V (i+1)(x0)− V (i)(x0)

= −2∫ ∞

0

{�−T(u(i+1))R(u(i+1) − u(i))

−∫ u(i+1)

u(i)�−T(v)R dv

}d. (23)

By decoupling Eq. (23) usingR = ���, it can be shownthatV (i+1)(x0) − V (i)(x0)�0 when�(·) is monotone oddfunction. Moreover, it can be shown by contradiction thatV ∗(x0)�V (i+1)(x0). �

The next theorem is a key result which justifies applyingthe successive approximation theory to constrained inputsystems.

Definition 2 (Uniform Convergence). A sequence offunctions {fn} converges uniformly tof on a set� if∀�>0, ∃N(�) : n>N ⇒ |fn(x) − f (x)|< � ∀x ∈ �, orequivalently supx∈�|fn(x)− f (x)|< �.

Theorem 1. If u(0) ∈ �(�), then u(i) ∈ �(�), ∀i�0.Moreover, V (i) → V ∗, u(i) → u∗ uniformly on�.

Proof. From Lemma 1, it can be shown by induction thatu(i) ∈ �(�), ∀i�0. Furthermore, Lemma 1 shows thatV (i) is a monotonically decreasing sequence and boundedbelow byV ∗(x). Hence,V (i) converges pointwise toV (∞).

Page 5: Nearlyoptimalcontrollawsfornonlinearsystemswithsaturating ... · Hamilton–Jacobi–Bellman (HJB) equation appearing in op-timal control theory ( Lewis & Syrmos, 1995 ). The solution

M. Abu-Khalaf, F.L. Lewis / Automatica 41 (2005) 779–791 783

Because� is compact, then uniform convergence followsimmediately from Dini’s theorem (Apostol, 1974). Due tothe uniqueness of the value function (Lewis & Syrmos, 1995;Lyshevski, 1996), it follows that V (∞) = V ∗. Controllersu(i) are admissible, therefore they are continuous and haveunique trajectories due to the locally Lipschitz continuity as-sumption on the dynamics. Since (2) converges uniformly toV ∗, this implies that the system trajectories converge∀x0 ∈�. Thereforeu(i) → u(∞) uniformly on�. If dV (i)/dx →dV ∗/dx uniformly, we conclude thatu(∞)=u∗. To prove thatdV (i)/dx → dV ∗/dx uniformly on�, note that dV (i)/dxconverges uniformly to some continuous functionJ. SinceV (i) → V ∗ uniformly and dV (i)/dx exists∀i, it followsthat the sequence dV (i)/dx is term-by-term differentiable(Apostol, 1974), andJ = dV ∗/dx. �

Beard (1995)has shown that improving the control lawdoes not reduce the RAS for the case of unconstrained con-trols. Similarly, we show that this is the case for systemswith constrained controls.

Corollary 1. If �∗ denotes the RAS of the constrained op-timal control u∗, then�∗ is the largest RAS of any otheradmissible control law.

Proof. The proof is by contradiction. Lemma 1 shows thatthe saturated controlu∗ is asymptotically stable on�(0),where�(0) is the stability region of the saturated controlu(0). Assume thatuLargestis an admissible controller with thelargest RAS�Largest. Then, there isx0 ∈ �Largest, x0 /∈�∗.From Theorem 1,x0 ∈ �∗ which completes the proof.�

Note that there may be stabilizing saturated controls thathave larger stability regions thanu∗, but are not admissiblewith respect toQ(x) and the system(f, g).

4. Neural network least-squares approximate HJBsolution

Although Eq. (9) is a linear differential equation, whensubstituting (10) into (9), it is still difficult to solve for thecost functionV (i)(x). In this section, neural networks areused along with the theory of successive approximation, tosolve for the value function of (11) over�, by approximatingthe solution for the cost functionV (i)(x) at each successiveiterationi in a least-squares sense. Moreover, to approximateintegration, a mesh is introduced on�. This yields an ef-ficient, practical, and computationally tractable solution al-gorithm to find nearly optimal state feedback controllers ofconstrained input nonlinear systems.

4.1. Neural network approximation ofV (x)

It is well known that neural networks can be used toapproximate smooth functions on prescribed compact sets

(Lewis et al., 1999). Since the analysis is restricted to theRAS of some initial stabilizing controller, neural networksare natural for this application. Therefore, to successivelysolve (9) and (10),V (i) is approximated by

V(i)L (x)=

L∑j=1

w(i)j �j (x)= wT (i)L �L(x) (24)

which is a neural network with the activation functions�j (x) ∈ C1(�), �j (0)= 0. The neural network weights are

w(i)j . L is the number of hidden-layer neurons.�L(x) ≡

[�1(x) �2(x) · · · �L(x)]T is the vector activation function,w(i)L ≡ [w(i)1 w

(i)2 · · · w(i)L ]T is the vector weight. The

weights are tuned to minimize the residual error in a least-squares sense over a set of points sampled from a compactset� inside the RAS of the initial stabilizing control.For LE(V , u) = 0, the solutionV is replaced withVL

having a residual error

LE

VL(x)= L∑

j=1

wj�j (x), u

= eL(x). (25)

To find the least-squares solution, the method of weightedresiduals is used (Finlayson, 1972). The weights,wL, are de-termined by projecting the residual error onto deL(x)/dwLand setting the result to zero∀x ∈ � using the inner prod-uct, i.e.⟨deL(x)

dwL, eL(x)

⟩= 0, (26)

where〈f, g〉 = ∫� fg dx is a Lebesgue integral. One has

〈∇�L(f + gu),∇�L(f + gu)〉wL+⟨Q+ 2

∫ u

0

(�−1(v)

)TR dv,∇�L(f + gu)

⟩= 0.

(27)

The following technical results are needed.

Lemma 2. If the set{�j }L1 is linearly independent andu ∈�(�), then the set

{∇�Tj (f + gu)}L1 (28)

is also linearly independent.

Proof. See (Beard et al., 1997). �

Because of Lemma 2,〈∇�L(f +gu),∇�L(f +gu)〉 is offull rank, and thus is invertible. Therefore, a unique solutionfor wL exists and is computed as

wL = −〈∇�L(f + gu),∇�L(f + gu)〉−1Q⟨Q+ 2

∫ u

0(�−1(v))TR dv, ∇�L(f + gu)

⟩. (29)

Having solved forwL, the improved control is given by

u= −�(12 R−1gT(x)∇�TLwL). (30)

Page 6: Nearlyoptimalcontrollawsfornonlinearsystemswithsaturating ... · Hamilton–Jacobi–Bellman (HJB) equation appearing in op-timal control theory ( Lewis & Syrmos, 1995 ). The solution

784 M. Abu-Khalaf, F.L. Lewis / Automatica 41 (2005) 779–791

Eqs. (29) and (30) are successively solved at each iteration(i) until convergence.

4.2. Convergence of the method of least squares

In what follows, we show convergence results of themethod of least squares when neural networks are usedto solve for the cost function of the LE. Note that (24) isa Fourier series expansion. The following definitions arerequired.

Definition 3 (Convergence in the mean). A sequence offunctions {fn} that is Lebesgue integrable on a set�,L2(�), is said to converge in the mean tof on � if∀�>0, ∃N(�) : n>N ⇒ ‖fn(x)− f (x)‖L2(�) < �, where‖f ‖2

L2(�)= 〈f, f 〉.

The convergence proofs of the least-squares methodare done in the Sobolev function space setting (Adams &Fournier, 2003). This allows defining functions that areL2(�) with their partial derivatives.

Definition 4. Sobolev spaceHm,p(�): Let � be an openset inRn and letu ∈ Cm(�). Define a norm onu by

‖u‖m,p =∑

0� |�|�m

(∫�

|D�u(x)|p dx)1/p

, 1�p<∞.

This is the Sobolev norm in which the integration isLebesgue. The completion ofu ∈ Cm(�) : ‖u‖m,p <∞with respect to‖‖m,p is the Sobolev spaceHm,p(�). Forp = 2, the Sobolev space is a Hilbert space.

The LE can be written using the linear operatorA definedon the Hilbert spaceH 1,2(�)

AV︷ ︸︸ ︷V Tx (f + gu)=

P︷ ︸︸ ︷−Q−W(u) .

In Mikhlin (1964), it is shown that if the set{�j }L1 iscomplete, and the operatorA and its inverse are bounded,then ‖AV L − AV ‖L2(�) → 0 and‖VL − V ‖L2(�) → 0.For the LE however, it can be shown that these sufficiencyconditions are violated.Neural networks based on power series have an impor-

tant property that they are differentiable. This means thatthey can approximate uniformly a continuous function withall its partial derivatives up to ordermusing the same poly-nomial, by differentiating the series termwise. This type ofseries ism-uniformly dense as shown in Lemma 3. Otherm-uniformly dense neural networks, not necessarily basedon power series, are studied inHornik et al. (1990).

Lemma 3. High-order Weierstrass approximation theorem.Let f (x) ∈ Cm(�), then there exists a polynomial, P(x),such that it converges uniformly tof (x) ∈ Cm(�), andsuch that all its partial derivatives up to order m convergesuniformly (Finlayson, 1972), Hornik et al. (1990).

Lemma 4. Given N linearly independent set of functions{fn}. Then for the Fourier series�TNfN , it follows that‖�TNfN‖2

L2(�)→ 0 ⇔ ‖�N‖2l2 → 0.

Proof. To show the sufficiency part, note that theGram matrix,G = 〈fN, fN 〉, is positive definite. Hence,�TNGN�N � (GN)‖�N‖2l2, (GN)>0 ∀N . If �TNGN�N →0,then‖�N‖2l2 = �TNGN�N/ (GN) → 0 because (GN)>0 ∀N .To show the necessity part, note that

‖�N‖2L2(�) − 2‖�TNfN‖2L2(�) + ‖fN‖2L2(�)= ‖�N − fN‖2L2(�),

2‖�TNfN‖2L2(�) = ‖�N‖2L2(�) + ‖fN‖2L2(�)− ‖�N − fN‖2L2(�).

Using the Parallelogram Law

‖�N − fN‖2L2(�) + ‖�N + fN‖2L2(�)= 2‖�N‖2L2(�) + 2‖fN‖2L2(�),

asN → ∞‖�N − fN‖2L2(�) + ‖�N + fN‖2L2(�)

=→0︷ ︸︸ ︷

2‖�N‖2L2(�)+2‖fN‖2L2(�),⇒ ‖�N − fN‖2L2(�) → ‖fN‖2L2(�), ‖�N + fN‖2L2(�)→ ‖fN‖2L2(�),

2‖�NfN‖2L2(�) =→0︷ ︸︸ ︷

‖�N‖2L2(�)+‖fN‖2L2(�)

−→‖fN‖2

L2(�)︷ ︸︸ ︷‖�N − fN‖2L2(�) → 0.

Therefore,‖�N‖2l2 → 0 ⇒ ‖�NfN‖2L2(�)

→ 0. �

The following assumptions are required.

Assumption 1. The LE solution is positive definite. This isguaranteed for stabilizable dynamics and when the perfor-mance functional satisfies zero-state observability defined in(Van Der Schaft, 1992).

Assumption 2. The system dynamics and the performanceintegrandsQ(x)+W(u(x)) are such that the solution of theLE is continuous and differentiable. Therefore, belongs tothe Sobolev spaceV ∈ H 1,2(�).

Assumption 3. We can choose a complete coordinate ele-ments{�j }∞1 ∈ H 1,2(�) such that the solutionV ∈ H 1,2(�)and{�V/�x1, . . . , �V/�xn} can be uniformly approximatedby the infinite series built from{�j }∞1 .

Page 7: Nearlyoptimalcontrollawsfornonlinearsystemswithsaturating ... · Hamilton–Jacobi–Bellman (HJB) equation appearing in op-timal control theory ( Lewis & Syrmos, 1995 ). The solution

M. Abu-Khalaf, F.L. Lewis / Automatica 41 (2005) 779–791 785

Assumption 4. The sequence{�j = A�j is linearly inde-pendent and complete.

The first three assumptions are standard in optimal con-trol and neural network control literature. Lemma 2 assuresthe linear independence required in the fourth assumption.Completeness follows from Lemma 3 and (Hornik et al.,1990),

∀V, �∃L,wL ∵|VL − V |< �,∀i|dVL/dxi − dV/dxi |< �.

This implies that asL → ∞supx∈�

|AV L − AV | → 0 ⇒ ‖AV L − AV ‖L2(�) → 0,

and therefore completeness of the set{�j } is established.The next theorem uses these assumptions to conclude con-

vergence results of the least-squares method which is placedin the Sobolev spaceH 1,2(�).

Theorem 2. If Assumptions1–4hold, then approximate so-lutions exist for the LE using the method of least squaresand are unique for each L. In addition

(R1) ‖LE(VL(x))− LE(V (x))‖L2(�) → 0,(R2) ‖VxL − Vx‖L2(�) → 0,(R3) ‖uL(x)− u(x)‖L2(�) → 0.

Proof. Existenceof a least-squares solution for the LE canbe easily shown. The least-squares solutionVL is nothingbut the solution of the minimization problem

‖AV L − P ‖2 = minV∈SL

‖AV − P ‖2 =minw

‖wTL�L − P ‖2,

whereSL is the span of{�1, . . . ,�L}.Uniquenessfollows from the linear independence of

{�1, . . . ,�L}. (R1) follows from thecompletenessof {�j }.To show the second result (R2), write the LE in terms of

its series expansion on� with coefficientscj

LE

(VL =

N∑i=1

wi�i

)−

=0︷ ︸︸ ︷LE

(V =

∞∑i=1

ci�i

)=�L(x),

(wL − cL)T∇�L(f + gu)

= �L(x)+

eL(x)︷ ︸︸ ︷∞∑

i=L+1

cid�idx(f + gu) .

Note that eL(x) converges uniformly to zero due toLemma 3, and hence converges in the mean. On the otherhand�L(x) converges in the mean due to (R1). Therefore,

‖(wL − cL)T∇�L(f + gu)‖2L2(�) = ‖�L(x)+ eL(x)‖2L2(�)�2‖�L(x)‖2L2(�) + 2‖eL(x)‖2L2(�) → 0.

Because∇�L(f+gu) is linearly independent, using Lemma4, we conclude that‖wL− cL‖2l2 → 0. Furthermore, the set{d�i/dx} is linearly independent, and hence from Lemma 4it follows that ‖(wL − cL)T∇�L‖2

L2(�)→ 0. Because the

infinite series withcj converges uniformly. It follows thatasL → ∞, ‖dVL/dx − dV/dx‖L2(�) → 0.Finally, (R3) follows from (R2) and from the fact that

g(x) is continuous and therefore bounded on�. Hence

‖ − 12 · R−1gT(VxL − Vx)‖2L2(�)

�‖ − 12 · R−1gT‖2L2(�)‖(VxL − Vx)‖2L2(�) → 0.

Denote�j,L(x)= −12 · gTj VxL , �j (x)= −1

2 · gTj Vx

uL − u= �(−12 · gTVxL)− �(−1

2 · gTVx),= [�(�1,L(x))− �(�1(x)) · · ·�(�m,L(x))

− �(�m(x))].Because�(·) is smooth, and under the assumption that

its first derivative is bounded by a constantM, then we have�(�j,L)− �(�j )�M(�j,L(x)− �j (x)). Therefore

‖�j,L(x)− �j (x)‖L2(�) → 0

⇒ ‖�(�j,L)− �(�j )‖L2(�) → 0,

and (R3) follows. �

Corollary 2. If the results of Theorem2 hold, then

supx∈�

|VxL − Vx | → 0, supx∈�

|VL − V | → 0,

supx∈�

|uL − u| → 0.

Proof. This follows by noticing that‖wL−cL‖2l2 → 0 and,the series withcj is uniformly convergent, and (Hornik etal., 1990). �

Next, the admissibility ofuL(x) is shown.

Corollary 3. Admissibility ofuL(x):

∃L0 : L�L0, uL ∈ �(�).

Proof. Consider the following LE

V (i)(x, u(i+1)L )= −Q

�0︷ ︸︸ ︷−2∫ u(i)

u(i+1)�−T(v)R dv + 2�−T(u(i+1))R(u(i) − u(i+1))

−2�−T(u(i+1))R(u(i+1)L − u(i+1))− 2

∫ u(i+1)

0 �−T(v)R dv.

Since u(i+1)L is guaranteed to be within a tube around

u(i+1) u(i+1)L → u(i+1) uniformly. Therefore one can easily

see that (31) with�>0 is satisfied∀x ∈ � ∩/ �1(�(L))

Page 8: Nearlyoptimalcontrollawsfornonlinearsystemswithsaturating ... · Hamilton–Jacobi–Bellman (HJB) equation appearing in op-timal control theory ( Lewis & Syrmos, 1995 ). The solution

786 M. Abu-Khalaf, F.L. Lewis / Automatica 41 (2005) 779–791

where�1(�(L)) ⊆ � containing the origin.

�−T(u(i+1))Ru(i+1)L � 1

2 · �−T(u(i+1))Ru(i+1)

+ �∫ u(i+1)

0�−TR dv. (31)

Hence,V (i)(x, u(i+1)L )<0 ∀x ∈ � ∩/ �1(�(L)). Given that

u(i+1)L (0) = 0, from the continuity ofu(i+1)

L , there exists�2(�(L)) ⊆ �1(�(L)) containing the origin for whichV (i)(x, u

(i+1)L )<0. As L increases,�1(�(L)) gets smaller

while �2(�(L)) gets larger and (31) is satisfied∀x ∈ �.Therefore,∃L0 : L�L0, V (i)(x, u(i+1)

L )<0 ∀x ∈ � andhenceuL ∈ �(�). �

Corollary 4. It can be shown thatsupx∈� |uL(x)−u(x)| →0 implies that supx∈�|J (x) − V (x)| → 0, whereLE(J, uL)= 0, LE(V, u)= 0.

4.3. Convergence of the method of least-squares to thesolution of the HJB equation

In this section, we would like to have a theorem analo-gous to Theorem 1 that guarantees that the successive least-squares solution converge to the value function of the HJBequation (11).

Theorem 3. Under the assumptions of Theorem2, the fol-lowing is satisfied∀i�0:

(i) supx∈�|V (i)L − V (i)| → 0,

(ii) supx∈�|u(i+1)L − u(i+1)| → 0,

(iii) ∃L0 : L�L0, u(i+1)L ∈ �(�).

Proof. The proof is by induction.Basis step:

Using Corollaries 2 and 3, it follows that for anyu(0) ∈�(�), one has

(I) supx∈�|V (0)L − V (0)| → 0,

(II) supx∈�|V (1)L − V (1)| → 0,

(III) ∃L0 : L�L0, u(1)L ∈ �(�).

Inductive step:

Assume that

(a) supx∈�|V (i−1)L − V (i−1)| → 0,

(b) supx∈�|u(i)L − u(i)| → 0,

(c) ∃L0 : L�L0, u(i)L ∈ �(�).

If J (i) is such that LE(J (i), u(i)L )= 0. Then from Corol-

lary 2, J (i) can be uniformly approximated byV (i)L . More-over from assumption (b) and Corollary 4, it follows that asu(i)L → u(i) uniformly thenJ (i) → V (i) uniformly. There-

fore V (i)L → V (i) uniformly.

BecauseV (i)L → V (i) uniformly, u(i+1)L → u(i+1) uni-

formly by Corollary 2. From Corollary 3,∃L0 : L�L0 ⇒u(i+1)L ∈ �(�).Hence the proof by induction is complete.�

The next theorem is an important result upon which thealgorithm proposed in Section 4.4 of this paper is justified.

Theorem 4. ∀�>0, ∃i0, L0 : i� i0, L�L0 one has

(A) supx∈�|V (i)L − V ∗|< �,

(B) supx∈�|u(i)L − u∗|< �,

(C) u(i)L ∈ �(�).

Proof. This follows directly from Theorems 1 and 3.�

4.4. Algorithm for nearly optimal neurocontrol design withsaturated controls: introducing a mesh inRn

Solving the integration in (29) is expensive computation-ally. The integrals can be well approximated by discretiza-tion. A mesh of points over the integration region can be in-troduced on� of size�x. The terms of (29) can be rewrittenas follows:

X = �∇�L(f + gu)|x1 · · · · · · ∇�L(f + gu)|xp�T, (32)

Y =⌊Q+ 2

∫ u

0�−T(v)R dv

∣∣∣∣x1

· · · · · ·Q

+2∫ u

0�−T(v)R dv

∣∣∣∣xp

⌋T, (33)

wherep in xp represents the number of points of the mesh.Reducing the mesh size, we have

〈∇�L(f + gu),∇�L(f + gu)〉 = lim‖�x‖→0

(XTX) · �x,⟨Q+ 2

∫ u

0�−T(v)R dv,∇�L(f + gu)

⟩= lim

‖�x‖→0(XTY ) · �x. (34)

This implies that we can calculatewL as

wL,p = −(XTX)−1(XTY ). (35)

Various ways to efficiently approximate integrals exists.Monte Carlo integration techniques can be used. Here themesh points are sampled stochastically instead of being se-lected in a deterministic fashion (Evans & Swartz, 2000).In any case however, the numerical algorithm at the end re-quires solving (35) which is a least-squares computation ofthe neural network weights.Numerically stable routines that compute equations like

(35) do exists in several software packages like MATLABwhich is used the next section.

Page 9: Nearlyoptimalcontrollawsfornonlinearsystemswithsaturating ... · Hamilton–Jacobi–Bellman (HJB) equation appearing in op-timal control theory ( Lewis & Syrmos, 1995 ). The solution

M. Abu-Khalaf, F.L. Lewis / Automatica 41 (2005) 779–791 787

Fig. 1. Neural-network-based nearly optimal saturated control law.

The neurocontrol law structure is shown inFig. 1. It is aneural network with activation functions given by�, multi-plied by a function of the state variables.A flowchart of the computational algorithm presented in

this paper is shown inFig. 2. This is an offline algorithm runa priori to obtain a neural network constrained state feedbackcontroller that is nearly optimal.

5. Numerical examples

We now show the power of our neural network controltechnique of finding nearly optimal controllers for affine ininput dynamical systems. Two examples are presented.

5.1. Multi input canonical form linear system withconstrained inputs

We start by applying the algorithm obtained above for thelinear system

x1 = 2x1 + x2 + x3, x2 = x1 − x2 + u2, x3 = x3 + u1.

It is desired to control the system with input constraints|u1|�3, |u2|�20. This system is null controllable. There-fore global asymptotic stabilization cannot be achieved(Sussmann et al., 1994).To find a nearly optimal constrained state feedback con-

troller, the following smooth function is used to approximatethe value function of the system:

V21(x1, x2, x3)= w1x21 + w2x

22 + w3x

23 + w4x1x2

+ w5x1x3 + w6x2x3 + w7x41 + w8x

42 + w9x

43

+ w10x21x

22 + w11x

21x

23 + w12x

22x

23 + w13x

21x2x3

+ w14x1x22x3 + w15x1x2x

23 + w16x

31x2 + w17x

31x3

+ w18x1x32 + w19x1x

33 + w20x2x

33 + w21x

32x3.

The selection of the neural network is usually a naturalchoice guided by engineering experience and intuition. Thisis a neural network with polynomial activation functions,

Fig. 2. Successive approximation algorithm for nearly optimal saturatedneurocontrol.

and henceV21(0) = 0. This is a power series neural net-work with 21 activation functions containing powers of thestate variable of the system up to the fourth order. Conver-gence was not observed that for neurons with second-orderpower of the states. The number of neurons required is cho-sen to guarantee the uniform convergence of the algorithm.The activation functions selected in this example satisfy theproperties of activation functions discussed in Section 4 ofthis paper, and inLewis et al. (1999).To initialize the algorithm, an LQR controller is derived

assuming no input constraints. Then the control signal ispassed through a saturation block. Note that the closed loopdynamics is not optimal anymore. The following controlleris then found and its performance is shown inFig. 3.

u1 = −8.31x1 − 2.28x2 − 4.66x3, |u1|�3,u2 = −8.57x1 − 2.27x2 − 2.28x3, |u2|�20.

In order to model the saturation of actuators, a non-quadratic cost performance term (8) is used. To showhow to do this for the general case of|u|�A, we useA∗ tanh(1/A · · · ·) for �(· · ·). Hence the nonquadratic cost

Page 10: Nearlyoptimalcontrollawsfornonlinearsystemswithsaturating ... · Hamilton–Jacobi–Bellman (HJB) equation appearing in op-timal control theory ( Lewis & Syrmos, 1995 ). The solution

788 M. Abu-Khalaf, F.L. Lewis / Automatica 41 (2005) 779–791

0 1 2 3 4 5 6 7 8 9 10-5

-4

-3

-2

-1

0

1

2

3

Time (s)0 1 2 3 4 5 6 7 8 9 10

Time (s)

Sys

tem

s S

tate

s

State Trajectory for LQR Control Law with Bounds

x1x2x3

-20

-15

-10

-5

0

5

Con

trol

Inpu

t u(x

)

The Initial Stabilizing Control:LQR Control Signal with Bounds

u1u2

Fig. 3. LQR control with actuator saturation.

associated with controls in the optimization functional isgiven as

W(u)= 2∫ u

0(A tanh−1(v/A))TR dv

= 2ARu tanh−1(u/A)+ A2R ln(1− u2/A2).

This nonquadratic cost performance is then used inthe algorithm to calculate the optimal constrained con-troller. The algorithm ofFig. 2 is executed for the region−1.2�x1�1.2, −1.2�x2�1.2, −1.2�x3�1.2 with thedesign parametersR = I2x2, Q = I3x3. This region fallswithin the RAS of the initial stabilizing controller. Methodsto estimate the RAS are discussed inKhalil (2003).After 20 successive iterations, the algorithm converges to

the following nearly optimal saturated control,

u1 = − 3 tanh

×

1

3

7.7x1 + 2.44x2 + 4.8x3+2.45x31 + 2.27x21x2 + 3.7x1x2x3+0.71x1x22 + 5.8x21x3 + 4.8x1x23+0.08x32 + 0.6x22x3 + 1.6x2x23 + 1.4x33

,

u2 = − 20 tanh

×

1

20

9.8x1 + 2.94x2 + 2.44x3−0.2x31 − 0.02x21x2 + 1.42x1x2x3+0.12x1x22 + 2.3x21x3 + 1.9x1x23+0.02x32 + 0.23x22x3 + 0.57x2x23+0.52x33

This is a state feedback control law based on neural net-works as shown inFig. 1. The suitable performance of thissaturated control law is revealed inFig. 4. Note that the con-troller works fine even when the state trajectory exits thestate space region for which training happened. In generalhowever, the algorithm is guaranteed to work for those ini-

tial conditions for which the state trajectory remains withinthe training set.

5.2. Nonlinear oscillator with constrained input

We consider next the following nonlinear oscillator:

x1 = x1 + x2 − x1(x21 + x22),

x2 = −x1 + x2 − x2(x21 + x22)+ u.

It is desired to control the system with control limits of|u|�1. The following neural network is used:

V24(x1, x2)= w1x21 + w2x

22 + w3x1x2 + w4x

41 + w5x

42

+ w6x31x2 + w7x

21x

22 + w8x1x

32 + w9x

61

+ w10x62

+ w11x51x2 + w12x

41x

22 + w13x

31x

32

+ w14x21x

42 + w15x1x

52

+ w16x81 + w17x

82 + w18x

71x2 + w19x

61x

22

+ w20x51x

32

+ w21x41x

42 + w22x

31x

52 + w23x

21x

62

+ w24x1x72.

This is a power series neural network with 24 activationfunctions containing powers of the state variable of the sys-tem up to the 8th power. Note that the order of the neurons,8th, required to guarantee convergence was higher than theone in the previous example.

Page 11: Nearlyoptimalcontrollawsfornonlinearsystemswithsaturating ... · Hamilton–Jacobi–Bellman (HJB) equation appearing in op-timal control theory ( Lewis & Syrmos, 1995 ). The solution

M. Abu-Khalaf, F.L. Lewis / Automatica 41 (2005) 779–791 789

0 1 2 3 4 5 6 7 8 9 10-6

-5

-4

-3

-2

-1

0

1

2

3

Time (s)0 1 2 3 4 5 6 7 8 9 10

Time (s)

Sys

tem

s S

tate

s

State Trajectory for the Nearly Optimal Control Law

x1x2x3

-20

-15

-10

-5

0

5

Con

trol

Inpu

t u(x

)

Nearly Optimal Control Signal with Input Constraints

u1u2

Fig. 4. Nearly optimal nonlinear neural control law considering actuator saturation.

0 5 10 15 20 25 30 35 40-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Time (s)0 5 10 15 20 25 30 35 40

Time (s)

Sys

tem

s S

tate

s

State Trajectory for Initial Stabilizing Control

x1x2

Con

trol

Inpu

t u(x

)Control Signal for The Initial Stabilizing Control

Fig. 5. Performance of the initial stabilizing control when saturated.

0 5 10 15 20 25 30 35 40-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Time (s)0 5 10 15 20 25 30 35 40

Time (s)

Sys

tem

s S

tate

s

State Trajectory for the Nearly Optimal Control Law

x1x2

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Con

trol

Inpu

t u(x

)

Nearly Optimal Control Signal with Input Constraints

Fig. 6. Nearly optimal nonlinear control law considering actuator saturation.

Page 12: Nearlyoptimalcontrollawsfornonlinearsystemswithsaturating ... · Hamilton–Jacobi–Bellman (HJB) equation appearing in op-timal control theory ( Lewis & Syrmos, 1995 ). The solution

790 M. Abu-Khalaf, F.L. Lewis / Automatica 41 (2005) 779–791

Fig. 5 shows the performance of the bounded controlleru = sat+1

−1(−5x1 − 3x2) for x1(0) = 0, x2(0) = 1. The al-gorithm is run over the region−1�x1�1, −1�x2�1,R = 1, Q= I2x2. After 20 successive iterations, one has

u= − tanh

2.6x1 + 4.2x2 + 0.4x32 − 4.0x31 − 8.7x21x2−8.9x1x22 − 5.5x52 + 2.26x51 + 5.8x41x2+11x31x

22 + 2.6x21x

32 + 2.00x1x42 + 2.1x72

−0.5x71 − 1.7x61x2 − 2.71x51x22 − 2.19x41x

32

−0.8x31x42 + 1.8x21x

52 + 0.9x1x62

.

This is the control law in terms of a neural network fol-lowing the structure shown inFig. 1. Performance of thissaturated control law is revealed inFig. 6. Note that thestates and the saturated input inFig. 6 have fewer oscilla-tions when compared to those ofFig. 5.

6. Conclusion

A rigorous computationally effective algorithm to findnearly optimal constrained control state feedback laws forgeneral nonlinear systems with saturating actuators is pre-sented. The control is given as the output of a neural net-work. This is an extension of the novel work inSaridis andLee (1979), Beard (1995)andLyshevski (2001). Conditionsunder which the theory of successive approximation applieswere shown. Two numerical examples were presented.Although it was not the focus of this paper, we believe that

the results of this paper can be extended to handle constraintson states. Moreover, an issue of interest is how to increaseRAS of an initial stabilizing controller. Finally, it would beinteresting to show how one can solve for the HJB equationwhen the system dynamicsf, g are unknown as discussedin Murray et al. (2002).

References

Adams, R., & Fournier, J. (2003).Sobolev spaces. (2nd ed.), New York:Academic Press.

Apostol, T. (1974).Mathematical analysis. Reading, MA: Addison-Wesley.Bardi, M., & Capuzzo-Dolcetta, I. (1997).Optimal control andviscosity solutions of Hamilton–Jacobi–Bellman equations. Boston,MA: Birkhauser.

Beard, R. (1995).Improving the closed-loop performance of nonlinearsystems. Ph.D. Thesis, Rensselaer Polytechnic Institute, Troy, NY.

Beard, R., Saridis, G., & Wen, J. (1997). Galerkin approximations of thegeneralized Hamilton–Jacobi–Bellman equation.Automatica, 33(12),2159–2177.

Beard, R., Saridis, G., & Wen, J. (1998). Approximate solutions tothe time-invariant Hamilton–Jacobi–Bellman equation.Journal ofOptimization Theory and Application, 96(3), 589–626.

Bernstein, D. S. (1995). Optimal nonlinear, but continuous, feedbackcontrol of systems with saturating actuators.International Journal ofControl, 62(5), 1209–1216.

Bertsekas, D. P., & Tsitsiklis, J. N. (1996).Neuro-dynamic programming.Belmont, MA: Athena Scientific.

Chen, F. C., & Liu, C. C. (1994). Adaptively controlling nonlinearcontinuous-time systems using multilayer neural networks.IEEETransactions on Automatic Control, 39(6), 1306–1310.

Evans, M., & Swartz, T. (2000).Approximating integrals via Monte Carloand deterministic methods. Oxford: Oxford University Press.

Finlayson, B. A. (1972).The method of weighted residuals and variationalprinciples. New York: Academic Press.

Han, D., Balakrishnan, S. N. (2000). State-constrained agile missilecontrol with adaptive-critic-based neural networks.Proceedings of theAmerican control conference(pp. 1929–1933).

Hornik, K., Stinchcombe, M., & White, H. (1990). Universalapproximation of an unknown mapping and its derivatives usingmultilayer feedforward networks.Neural Networks, 3, 551–560.

Huang, C.-S., Wang, S., & Teo, K. L. (2000). Solving Hamilton–Jacobi–Bellman equations by a modified method of characteristics.NonlinearAnalysis, 40, 279–293.

Huang, J., & Lin, C. F. (1995). Numerical approach to computing nonlinearH∞ control laws.Journal of Guidance, Control, and Dynamics, 18(5),989–994.

Khalil, H. (2003).Nonlinear systems. (3rd ed.), Upper Saddle River, NJ:Prentice-Hall.

Kim, Y. H., Lewis, F. L., & Dawson, D. (2000). Intelligent optimal controlof robotic manipulators using neural networks.Automatica, 36, 1355–1364.

Kleinman, D. (1968). On an iterative technique for Riccati equationcomputations.IEEE Transactions on Automatic Control, 114–115.

Lee, H. W. J., Teo, K. L., Lee, W. R., & Wang, S. (2001). Construction ofsuboptimal feedback control for chaotic systems using B-splines withoptimally chosen knot points.International Journal of Bifurcation andChaos, 11(9), 2375–2387.

Lewis, F. L., & Syrmos, V. L. (1995).Optimal control. New York: Wiley.Lewis, F. L., Jagannathan, S., & Yesildirek, A. (1999).Neural networkcontrol of robot manipulators and nonlinear systems. London: Taylor& Francis.

Lio, F. D. (2000). On the Bellman equation for infinite horizonproblems with unbounded cost functional.Applied Mathematics andOptimization, 41, 171–197.

Liu, X., Balakrishnan, S. N. (2000). Convergence analysis of adaptive criticbased optimal control.Proceedings of American control conference(pp. 1929–1933).

Lyshevski, S. E. (1996). Constrained optimization and control of nonlinearsystems: new results in optimal control.Proceedings of the IEEEconference on decision and control(pp. 541–546).

Lyshevski, S. E. (1998). Optimal control of nonlinear continuous-timesystems: design of bounded controllers via generalized nonquadraticfunctionals.Proceedings of American control conference, June 1998(pp. 205–209).

Lyshevski, S. E. (2001a).Control systems theory with engineeringapplications. Boston, MA: Birkhauser.

Lyshevski, S. E. (2001b). Role of performance functionals in control lawsdesign.Proceedings of American control conference(pp. 2400–2405).

Lyshevski, S. E., Meyer, A. U. (1995). Control system analysis anddesign upon the Lyapunov method.Proceedings of American controlconference(pp. 3219–3223).

Mikhlin, S. G. (1964).Variational methods in mathematical physics.Oxford: Pergamon.

Miller, W. T., Sutton, R., & Werbos, P. (1990).Neural networks forcontrol. Cambridge, Massachusetts: The MIT Press.

Munos, R., Baird, L. C., Moore, A. (1999). Gradient descent approaches toneural-net-based solutions of the Hamilton–Jacobi–Bellman equation.International joint conference on neural networks IJCNN, 3 (pp.2152–2157).

Murray, J., Cox, C., Lendaris, G., & Saeks, R. (2002). Adaptive dynamicprogramming.IEEE Transactions on Systems, MAN, and Cybernetics-Part C: Applications and Reviews, 32(2), 140–153.

Narendra, K. S., & Lewis, F. L. (2001). Special issue on neural networkfeedback control.Automatica, 37(8), 1147–1148.

Parisini, T., & Zoppoli, R. (1998). Neural approximations for infinite-horizon optimal control of nonlinear stochastic systems.IEEETransactions on Neural Networks, 9(6), 1388–1408.

Page 13: Nearlyoptimalcontrollawsfornonlinearsystemswithsaturating ... · Hamilton–Jacobi–Bellman (HJB) equation appearing in op-timal control theory ( Lewis & Syrmos, 1995 ). The solution

M. Abu-Khalaf, F.L. Lewis / Automatica 41 (2005) 779–791 791

Polycarpou, M. M. (1996). Stable adaptive neural control scheme fornonlinear systems.IEEE Transactions on Automatic Control, 41(3),447–451.

Rovithakis, G. A., & Christodoulou, M. A. (1994). Adaptive control ofunknown plants using dynamical neural networks.IEEE Transactionson Systems, Man, and Cybernetics, 24(3), 400–412.

Saberi, A., Lin, Z., & Teel, A. (1996). Control of linear systems withsaturating actuators.IEEE Transactions on Automatic Control, 41(3),368–378.

Sadegh, N. (1993). A perceptron network for functional identification andcontrol of nonlinear systems.IEEE Transactions on Neural Networks,4(6), 982–988.

Sanner, R. M., Slotine, J. J. E. (1991). Stable adaptive control andrecursive identification using radial gaussian networks.Proceedings ofIEEE conference on decision and control(pp. 2116–2123) Brighton.

Saridis, G., & Lee, C. S. (1979). An approximation theory of optimalcontrol for trainable manipulators.IEEE Transactions on Systems, Man,Cybernetics, 9(3), 152–159.

Sussmann, H., Sontag, E. D., & Yang, Y. (1994). A general resulton the stabilization of linear systems using bounded controls.IEEETransactions on Automatic Control, 39(12), 2411–2425.

Van Der Schaft, A. J. (1992).L2-gain analysis of nonlinear systems andnonlinear state feedbackH∞ control. IEEE Transactions on AutomaticControl, 37(6), 770–784.

Murad Abu-Khalaf was born in Jerusalem,Palestine, in 1977. He did his high schoolstudies at Ibrahimiah College in Jerusalem.He subsequently studied in Cyprus and re-ceived his Bachelor’s Degree in Electronicsand Electrical Engineering from Bo˘gaziçiUniversity in Istanbul, Turkey, in 1998.He then joined The University of Texas atArlington from which he received the Mas-ter’s of Science in Electrical Engineering in2000. Currently he is working on his Ph.D.

degree at The University of Texas at Arlington and is working as aresearch assistant at the Automation and Robotics Research Institute. Heis an IEEE member, and a member of Etta Kappa Nu honor society. Hisinterest is in the areas of nonlinear control, optimal control and neuralnetwork control.

Frank L. Lewis was born in Würzburg,Germany, subsequently studying in Chileand Gordonstoun School in Scotland.He obtained the Bachelor’s Degree inPhysics/Electrical Engineering and the Mas-ter’s of Electrical Engineering Degree atRice University in 1971. He spent six yearsin the U.S. Navy, serving as Navigatoraboard the frigate USS Trippe (FF-1075),and Executive Officer and Acting Command-ing Officer aboard USS Salinan (ATF-161).

In 1977 he received the Master’s of Science in Aeronautical Engineeringfrom the University of West Florida. In 1981 he obtained the Ph.D. degree

at The Georgia Institute of Technology in Atlanta, where he was em-ployed as a Professor from 1981 to 1990 and is currently an AdjunctProfessor. He is a Professor of Electrical Engineering at The Universityof Texas at Arlington, where he was awarded the Moncrief-O’DonnellEndowed Chair in 1990 at the Automation and Robotics Research Insti-tute. He is a Fellow of the IEEE, a member of the New York Academyof Sciences, and a registered Professional Engineer in the State of Texas.He is a Charter Member (2004) of the UTA Academy of DistinguishedScholars. He has served as Visiting Professor at Democritus Universityin Greece, Hong Kong University of Science and Technology, ChineseUniversity of Hong Kong, National University of Singapore. He is anelected Guest Consulting Professor at both Shanghai Jiao Tong Universityand South China University of Technology. Dr. Lewis’ current interestsinclude intelligent control, neural and fuzzy systems, microelectrome-chanical systems (MEMS), wireless sensor networks, nonlinear systems,robotics, condition-based maintenance, and manufacturing process con-trol. He is the author/co-author of 3 US patents, 157 journal papers,23 chapters and encyclopedia articles, 239 refereed conference papers,nine books, includingOptimal Control, Optimal Estimation, Applied Op-timal Control and Estimation, Aircraft Control and Simulation, Controlof Robot Manipulators, Neural Network Control, High-Level FeedbackControl with Neural Networksand the IEEE reprint volumeRobot Con-trol. He was selected to the Editorial Boards ofInternational Journal ofControl, Neural Computing and Applications, and Int. J. Intelligent Con-trol Systems. He served as an Editor for the flagship journalAutomatica.He is the recipient of an NSF Research Initiation Grant and has beencontinuously funded by NSF since 1982. Since 1991 he has received $4.8million in funding from NSF and other government agencies, includingsignificant DoD SBIR and industry funding. His SBIR program was in-strumental in ARRI’s receipt of the SBA Tibbets Award in 1996. He hasreceived a Fulbright Research Award, the American Society of Engineer-ing EducationF.E. TermanAward, three Sigma Xi Research Awards, theUTA Halliburton Engineering Research Award, the UTA University-WideDistinguished Research Award, the ARRI Patent Award, various Best Pa-per Awards, the IEEE Control Systems Society Best Chapter Award (asFounding Chairman), and the National Sigma Xi Award for OutstandingChapter (as President). He was selected as Engineer of the year in 1994by the Ft. Worth IEEE Section. He was appointed to the NAE Commit-tee on Space Station in 1995 and to the IEEE Control Systems SocietyBoard of Governors in 1996. In 1998 he was selected as an IEEE Con-trol Systems SocietyDistinguished Lecturer. He is a Founding Memberof the Board of Governors of the Mediterranean Control Association.