neural networks enhanced adaptive admittance control of...

Neural networks enhanced adaptive admittance control of optimized robotenvironment interaction

Article (Accepted Version)

http://sro.sussex.ac.uk

Yang, Chenguang, Peng, Guangzhu, Li, Yanan, Cui, Rongxin, Cheng, Long and Li, Zhijun (2018) Neural networks enhanced adaptive admittance control of optimized robot-environment interaction. IEEE Transactions on Cybernetics, 49 (7). pp. 2568-2579. ISSN 2168-2267

This version is available from Sussex Research Online: http://sro.sussex.ac.uk/id/eprint/74843/

This document is made available in accordance with publisher policies and may differ from the published version or from the version of record. If you wish to cite this item you are advised to consult the publisher’s version. Please see the URL above for details on accessing the published version.

Copyright and reuse: Sussex Research Online is a digital repository of the research output of the University.

Copyright and all moral rights to the version of the paper presented here belong to the individual author(s) and/or other copyright owners. To the extent reasonable and practicable, the material made available in SRO has been checked for eligibility before being made available.

Copies of full text items generally can be reproduced, displayed or performed and given to third parties in any format or medium for personal research or study, educational, or not-for-profit purposes without prior permission or charge, provided that the authors, title and full bibliographic details are credited, a hyperlink and/or URL is given for the original metadata page and the content is not changed in any way.

http://sro.sussex.ac.uk/

1

Neural Networks Enhanced Adaptive AdmittanceControl of Optimized Robot-Environment

InteractionChenguang Yang, Guangzhu Peng, Yanan Li, Rongxin Cui, Long Cheng, and Zhijun Li

Abstract—In this paper, an admittance adaptation method forrobots to interact with unknown environment is developed. Theenvironment to be interact with is modelled as a linear system inthe state-space form. In the presence of the unknown environmentdynamics, an observer in robot joint space is employed toestimate the interaction torque. Admittance control is adopted toregulate the dynamic behavior at the interaction point when therobot interacts with unknown environment. An adaptive neuralcontroller using radial basis function is employed to guaranteetrajectory tracking. A cost function is defined to achieve theinteraction performance of torque regulation and trajectorytracking, which is minimised by adaptation of the admittancemodel. Simulation studies on a robot manipulator are carriedout to verify the effectiveness of the proposed method.

Index Terms—optimal adaptive control; robot−environmentinteraction; observer; neural networks (NNs); admittance control

I. INTRODUCTION

WITH the development of robot technology, robots havebeen widely used in various fields such as education,

industry and entertainment, etc. In these applications, robotsare required to interact with external environment [1]–[3].Therefore, robots interacting with the environment has re-ceived great attention and much effort has been made on thistopic. Although it has been investigated for more than decades,there are still many open problems not solved,, due to thehigh expectation of robots in more general scenarios and thecomplex environment in which robots are working. In order toachieve a compliant behavior, there are three approaches thatare widely applied: admittance control, hybrid position/forcecontrol and impedance control.

The concept of impedence control introduced by Hogan hasbeen a classical control method in robotics [4]. The aim ofimpedence control is to develop a relationship between theinteraction force and the position of the robot. The core ideaof impedence control is that the controller should modulatethe mechanical impedence, which is a mapping from gener-alized velocities to generalized force. This control approach

This work was partially supported by National Nature Science Foundation(NSFC) under Grant 61473120, Science and Technology Planning Projectof Guangzhou 201607010006 and the Fundamental Research Funds for theCentral Universities under Grant 2015ZM065.

C. Yang, G. Peng and Z. Li are with Key Laboratory of AutonomousSystems and Networked Control, College of Automation Science and En-gineering, South China University of Technology, Guangzhou, 510640 China.Emails: [email protected]; [email protected]; [email protected].

Yanan Li is with Department of Engineering and Design, University ofSussex, Brighton, BN1 9RH, UK. Email: [email protected].

Rongxin Cui is with School of Marine Science and Technology, Northwest-ern Polytechnical University, China. Email: [email protected].

Long Cheng is with State Key Laboratory of Management and Control forComplex Systems, Institute of Automation, Chinese Academy of Sciences,Beijing 100190, China. Email: [email protected].

appears to be feasible and robust [5]. Another approach isadmittance control, which was introduced by Mason [6]. In ageneralized admittance control system, with the measurementsof environment force and a desired admittance model, a virtualdesired trajectory is obtained and tracked. Then, the compliantbehavior is realized by trajectory adaptation. Traditional con-trol method of a robot manipulator is model-based control,which usually has a good control performance [7]. However,this method heavily depends on the accuracy of a robot modelwhich cannot be guaranteed in many cases. Therefore, adaptivecontrol methods have been widely studied and applied to prac-tical systems [8]. These methods can approximate uncertaintiesof a system by using tools such as neural network (NN),wavelet network, and fuzzy logic system, etc [9]–[11]. Anotherkey element in admittance control system is the force sensor.Force sensors are regarded as a media for communicationbetween a robot and environment. However, force sensorsequipped on the manipulators may cause inconvenience andare usually costly. Due to these reasons, sensorless controlschemes have been received great attention. There are twomain methods for estimating the external force: disturbanceobserver approach and force observer approach based onknowledge of motor torques. In [12], the disturbance observerapproach with knowledge of joint angle has been analyzed.In [13], a force observer for collision detection based onthe generalized momentum has been introduced. In [14], ancollision detection method is first developed for rigid robotarms and other robots with elastic joints.

Under impedence/admittance control, robots are governed tobe compliant to interaction force exerted by the environment.If the environment is passive, a passive impedence/admittancemodel is imposed to the robot for safety. However, obtainingdesired impedence/admittance model is not a easy work dueto the complexity of the environmental dynamics. Moreover, afixed impedence/admittance model could not be applied to allsituations. To solve this problem, iterative learning has beenwidely studied for robots to adapt to unknown environments.This approach aims to introduce human learning skills torobots. It has been generally acknowledged that such anability of improvng performance by repeating a task is animportant control strategy and has been widely studied. In[15], an associative search network (ASN) learning scheme ispresented for learning control parameters for robot to completea wall-following task. In [16], neural networks based methodis applied to regulate impedance parameters of the end-effectorof a robot. However, the disadvantage of the learning method isthat it requires a robot to repeat operations to learn the desiredimpedance parameters which may cause inconvenience inmany situations. Therefore, the impedence adaptation methodhas been widely studied [17]. In [18], strategies of switching

2

among different values of impedance parameters are developedfor dissipating the energy of the system. In [19], the impedanceadaptation is investigated for robots to interact with unknownenvironments.

The control objective of interaction control is to achieveforce regulation and trajectory tracking. Thus, optimisationshould be taken into consideration, since it is the compromiseof these two objectives. There has been much research effort inliteratures. The well known linear quadratic regulator (LQR)is widely acknowledged as an important solution of optimalcontrol, which is concerned with operating a dynamic systemat a minimal cost. In [20], the LQR is used to determinedesired impedance parameters with the environmental dynam-ics known. In [21], a target impedance is adjusted by onlinesolutions of the defined LQR problem based on environmentstiffness and damping. Better than the fixed impedance pa-rameters obtained from LQR technique, the algorithm showsgreater adaptability for a wide range of environments. How-ever, the dynamics of the environment is also assumed tobe known in this paper above. As presented in [22], thesolution of a Riccati equation could be difficult to find withthe unknown dynamics of a environment. Therefore, when thedynamics of a environment is unknown, approaches proposedabove may not be used. To copy with this problem, adaptivedynamic programming (ADP) has received much attentionand been widely studied [23]–[27]. ADP is a very usefultool in solving optimization and optimal control problems.Based on the idea of ADP, a control action is modifiedbased on the feedback information of a environment. Thereare many ADP approaches such as heuristic dynamic pro-gramming (HDP), Q-learning and dual-heuristic programming(DHP). The advantage of ADP is that only partial informationof the system under control needs to be known. In [28],optimal impedance parameters are updated by employing arecursive least-square filter-based episodic natural actor-criticalgorithm. In [29], the reinforcement learning (RL) algorithmis adopted to accomplish variable impedance control. However,in many situations, the process of learning is still needed toobtain parameters of a impedence/admittance model [30]. Theoptimal control method with unknown environment proposedin [31] is to be applied and developed. The environment modelis considered to be a damping-stiffness model, which is a linearsystem with unknown dynamics.

Based on the above discussion, we propose a method toadjust admittance parameters subject to an unknown envi-ronment. First, a cost function is defined to describe theinteraction performance including the trajectory tracking errorand interaction torque. Second, an environment with unknowndynamics is taken into consideration and its model is describedas a linear system in the state-space form. An observer basedon the generalized momentum approach is used to estimatethe interaction torque in joint space. As the environmentaldynamics is unknown, admittance adaptation is proposed toadjust admittance parameters. As a result, the target admittancemodel which guarantees the optimal interaction behavior isobtained.

The rest of this paper is organized as follows. Section IIdescribes the dynamics of a robot and control objective aswell as the unknown environmental dynamics. Section IIIintroduces the methodology including the estimation of theexternal torque and the admittance adaptation. In Section IV,the proposed method is verified through the simulations and

the conclusion is drawn in Section V.

Fig. 1. The model of damping-stiffness environment

II. PRELIMINARIESA. System Description

In this system, we consider that a robot arm is interactingwith an environment. The kinematics of a robot can beexpressed by

x(t) = κ(q) (1)

where κ(·), x(t), q ∈ Rn and n denote forward kinematicsfunction, positions/orientations in the Cartesian space, jointvectors in joint space, and number of degrees of freedom(DOF), respectively. Differentiating (1) with respect to timeresults in

x(t) = J(q)q (2)

where J(q) ∈ Rn×n is the Jacobian matrix which is assumedto be non-singular in a finite work space. Further differentiat-ing (2) with respect to time results in

x(t) = J(q)q + J(q)q (3)

The dynamics of a robot arm in joint space can be given by

M(q)q + C(q, q)q +G(q) + τext = τ (4)

where M(q) ∈ Rn×n is the inertia matrix in joint space;C(q, q) ∈ Rn×n is the Coriolis and centripetal couplingmatrix; G(q) ∈ Rn×1 is the gravity loading, q and q arerespectively the vector of generalized joint coordinates: veloc-ities and accelerations; τ ∈ Rn is the control input; τext ∈ Rndenotes the vector of joint torque exerted by the environment.

Property 1: Matrix M(q) is symmetric and positive definite[32].

Property 2: Matrix 2C(q, q) − M(q) is a skew-symmetricmatrix [32].

Now, let us consider the modelling of the environment. Inthis paper, a damping-stiffness environment model is consid-ered, as shown in Fig. 1:

CE q +GEq = −τext (5)

where CE and GE are unknown damping and stiffness matri-ces of the environment.

Remark 1: Without loss of generality, the spring-dampersystem is usually used to describe the model of an en-vironment. Compared to the model (5), a simple second-order equation is used as the model for an environment.In the model, mass, damping and stiffness are considered:ME q + CE q + GEq = −τext. In general, a large range of

3

Fig. 2. The control diagram

environments can be represented by these two models. Foranalysis convenience, the spring-damper system is consideredas a time-invariant system i.e. the three coefficient matricesare constant matrices.

B. Control StrategyThe system diagram is given in Fig. 2. The outer-loop

of the system is to obtain admittance parameters subjectto an unknown environment based on the adaptive optimalcontrol method. With the interaction torque estimated from anobserver, the virtual desired trajectory qr in the joint space isgenerated. The inner-loop of the system is to guarantee thetrajectory tracking with adaptive control scheme.

In general, the desired admittance model in the Cartesianspace is

fext = f(xr, xd) (6)

where xr ∈ Rn is the virtual desired trajectory in the Cartesianspace and f(·) is the function of the admittance model. To bespecific, a target admittance model in joint space is describedas below

MdJ(q)(qr − qd) + (MdJ(q) + CdJ(q))(qr − qd)+Gd(κ(qr)− κ(qd)) = −J(q)Tτext

(7)

where Md, Cd and Gd are the desired inertia, damping andstiffness matrices, respectively.

Remark 2: Model (7) is a general admittance model whichdefines the relationship between interaction torque and jointangles. In certain situations, a more simplified stiffness modelmay be adopted

Gd(qr − qd) = −τext (8)

Besides, Md, Cd and Gd are constant matrices which impliesidentical characteristics in all directions. Obviously, in orderto have different characteristics in different directions, thesethree matrices should be defined as positive definite matriceswith different diagonal elements. When τext = 0, qd is thedesired trajectory to be tracked by in the absence of interaction.However, when τext is not null, the virtual desired trajectoryqr will be generated.

The adaptive control scheme is to let robot follow thedesired trajectory and drive the tracking error eq = q − qrinto a small neighborhood of zero. The design of adaptivecontrol scheme will be discussed in the next section.

C. Control ObjectiveThe control objective is to achieve an optimal interaction

performance and the following cost function is defined toquantify the interaction performance

V (t) =

∫ ∞0

((q − qd)TQ(q − qd) + τTextRτext

)dt (9)

where Q = QT ∈ Rn×n is positive definite, describing theweight of tracking errors, and R ∈ Rn×n is the weight of theinteraction torque. By minimizing V (t), a desired interactionperformance can be achieved.

Remark 3: Different cost functions similar to (9) have beendiscussed in some related works [21]. In a traditional LQRproblem, a cost function includes control input and trajecto-ry tracking errors. The optimal control performance can beachieved by specifying feedback gains. In this paper, both therobot and environment systems are taken into consideration.

III. METHODOLOGYThe objective of the admittance adaptation is to obtain the

target admittance model according the changing environment.The inner-loop of the system is to guarantee the trackingperformance. The interaction torque from the environment isestimated by the force observer.

A. Torque EstimationIn this section, an observer based on the generalized mo-

mentum approach is used to estimate the external torque injoint space. Compared with traditional method requiring com-putation of joint accelerations or the inversion of the inertiamatrix [33], the generalized momentum approach assumes thatonly motor torque τ , joint angle q and joint velocity q areavailable. In [33] the generalized momentum is defined as

p = M(q)q (10)

Its differential form with respect to time is

p = M q +Mq (11)

Substituting (11) into (4), we have

p = M(q, q)q + τ − C(q, q)q −G(q)− τext (12)

Considering that the matrix M is symmetric and positivedefinite and Coriolis matrix is expressed using Christoffel

4

symbols [32], the time derivative of inertia matrix M can bewritten as

M = C + CT (13)

Substituting (13) into (12), we have

p = CT(q, q)q + τ −G(q)− τext (14)

It is obvious that equation (14) based on the generalizedmomentum does not involve joint angle acceleration q. Finally,the external torques can be modelled as

τext = Aττext + wτ (15)

where wτ is the uncertainty, wτ ∼ N(0, Qτ ). Usually, thematrix Aτ is defined as Aτ = 0n×n. However, a negativediagonal matrix can reduce the offset of the estimation ofdisturbances. Then, equation (14) can be rewritten as

p = u− τext (16)

where u is defined as

u = τ + CT(q, q)q −G(q) (17)

The above equations can be combined and reformulated in thestate-space form[

pτext

]︸︷︷︸

x

=

[0n −In0n Aτ

]︸︷︷︸

Ac

[pτext

]︸︷︷︸

x

+

[In0n

]︸︷︷︸

Bc

u+

[0wτ

]︸︷︷︸

w

y =[In 0n

]︸︷︷︸Cc

[pτext

]+ v

(18)where v is the measurement noise v ∼ N(0, Rc). It can beeasily proved that this system is observable. Since q and q areable to be measured, the generalized momentum p = M(q)qcan be regarded as a measured variable. Then, a state observeris designed

˙x = Acx+Bcu+ L(y − y)

y = Ccx(19)

The gain matrix L in the system can be calculated by

L = PCcTR−1c (20)

where the matrix P can be calculated by the algebraic Riccatiequation (ARE) [34]

AcP + PAcT − PCcTR−1c CcP +Qc = 0 (21)

where Qc is the uncertainty of the state, written as

Qc = diag([0, Qτ ]) (22)

A schematic overview of the observer is shown in Fig. 3. Asshown in equation (19), the output y = Ccx(t) is comparedwith Ccx(t). If the gain matirx L is properly designed, thedifference, passing through the gain matrix, will drive theestimated state to actual state. From the above analysis, wecan see that the estimation of states can be obtained fromestimated state x, which can be written as

τext =

[0 0 1 00 0 0 1

]x

p =

[1 0 0 00 1 0 0

]x

(23)

Fig. 3. The diagram of the observer based on generalized momentumapproach

B. Adaptive Optimal ControlThe adaptive optimal control strategy is proposed in [31],

which is outlined in the following. Consider a continuous-timelinear system:

ξ(t) = Aξ(t) +Bu(t) (24)

where ξ ∈ Rm is the system state variable, u ∈ Rr is thesystem input, A ∈ Rm×m and B ∈ Rm×r are the systemmatrix and input matrix assumed to be constant and unknown.The following optimal control input

u = −Kξ (25)

which can minimize the cost function as follows

V =

∫ ∞0

(ξTQξ + uTR u

)dt (26)

The solution to this problem is similar to that of the LQRproblem. The LQR provides a systematic way to find feedbackgains that guarantee the optimal control performance. Inoptimal control theory [35], when A and B are known, thereexists a symmetric positive matrix P ∗, which is the solutionof the ARE

PA+ATP +Q− PBR−1BTP = 0 (27)

Then, we can obtain the optimal feedback gain matrix

K∗ = −R−1BTP ∗ (28)

Therefore, we can obtain the optimal control input fromequation (28). Next, we give the on-line learning algorithm toobtain the optimal control input subject to unknown dynamicsof the environment.

For convenience, let us introduce the following definitions:Pk ∈ Rm×m → Pk ∈ R 1

2m(m+1) and ξ ∈ Rm → R 12m(m+1)

where Pk is a symmetric matrix

Pk = [P11, 2P12, ..., 2P1m, P22, 2P23, ..., Pmm]T

ξ = [ξ21 , ξ1ξ2, ..., ξ1ξm, ξ22 , ξ2ξ3, ..., ξ

2m]T

δξξ = [ξ(t1)− ξ(t0), ξ(t2)− ξ(t1), ...., ξ(tl)− ξ(tl−1)]T

Iξξ =

[∫ t1

t0

ξ ⊗ ξdt,∫ t2

t1

ξ ⊗ ξdt, ...,∫ tl

tl−1

ξ ⊗ ξdt

]T

Iξu =

[∫ t1

t0

ξ ⊗ udt,∫ t2

t1

ξ ⊗ udt, ...,∫ tl

tl−1

ξ ⊗ udt

]T(29)

5

where l is a positive integer and ⊗ is the Kronecker product.Now, we let u = K0ξ+φ be the initial input, t ∈ [t0, tl]. φ isthe exploration noise and K0 is initial feedback gain, whichcan stabilize the system. Then, compute Iξξ and Iξu until thefollowing rank condition is satisfied

rank ([Iξξ, Iξu]) =m(m+ 1)

2+mr (30)

After the rank condition is satisfied, we can solve Pk andKk+1 according to the following equation[

Pkvec(Kk+1)

]= (ΘT

kΘk)−1ΘTkΞk (31)

Θk and Ξk are defined as

Θk = [δξξ,−2Iξξ(Im ⊗KTk R)− 2Iξu(Im ⊗R)]

Ξk = −Iξξvec(Qk)

Qk = Q+KTk RKk

(32)

where Im is the m-dimensional unit matrix, vec(·) is thefunction to transfer a matrix to a vector. Then, we repeatthe calculation until ||Pk − Pk−1|| < ε , where ε is a smallconstant defined by the designer. Finally, the optimal feedbackgain Kk is obtained. This learning algorithm is summarizedin Algorithm 1.

C. Admittance AdaptationThis section is to obtain a target admittance model ac-

cording to an unknown environment. The environment modelis assumed to be damping-stiffness as described in equation(5). The cost function defined in the equation (9) is to beminimised. Comparing the cost function in this paper with thegeneral linear system (24), we need to make them identical.Define the state variable

ξ = [qT, qTd ]T (33)

Then, equation (9) can be

V =

∫ ∞0

([qTqTd ]Q′

[qqd

]+ τText R τext

)dt

=

∫ ∞0

(ξTQ′ξ + τText R τext

)dt

(34)

whereQ′ =

[Q −Q−Q Q

](35)

Combined with the defined state variable, we can rewrite theenvironment model into state-space form

ξ = Aξ +Bτext (36)

where

A =

[−C−1E GE 0

0 In

], B =

[−C−1E

0

](37)

It is obvious that matrices A and B contain the unknowndynamics of the environment. If τext is taken as the inputof the system (36), we can use the adaptive optimal controlmethod discussed above to obtain the control input as followsto minimise the cost function

τext = −Kkξ (38)

Algorithm 1 Admittance Adaptation Algorithm1: Choose u = K0ξ + φ as the initial admittance model,

where K0 is the initial feedback gain and φ is theexploration noise. Compute δξξ, Iξξ, Iξu until the rankcondition in equation (30) is satisfied.

2: Solve Pk and Kk+1 in equation (32).3: Let K+1→ K and repeat Step 2 until ||Pk−Pk−1|| < ε,

where ε is a small constant.4: Use u = Kkξ as the approximated optimal control input.

where Kk will be obtained by the on-line learning algorithmdiscussed in the previous section.

To understand equation (38) in the sense of LQR, theoptimal control input is obtained in equation (28). Accordingto the solution of ARE, we can obtain the optimal matrix

P ∗ =

[P1 P2

∗ ∗

](39)

where P1 ∈ Rn×n and P2 ∈ Rn×n and ∗ denotes the uselessmatrix. Therefore, we have

τext = R−1P1q −R−1P2qd (40)

Comparing equation (40) with the desired admittance model(6), the expected admittance model is obtained to ensure theoptimal interaction performance. With the desired trajectory qdand τext estimated by the observer approach, we can obtain thevirtual desired trajectory qr in joint space and the inner-loopis to guarantee the trajectory tracking.

D. RBFNN

Radial basis function neural network (RBFNN) has capa-bilities of approximating any continous function [36] f(θ):Rm → R as follows

f(θ) = WTZ(θ) + ε(θ) (41)

where the input vector θ ∈ Ωθ ⊂ Rm; weight vector W =[w1, w2, ..., wd] ∈ Rd, d denotes the NNs node number d >1; Z(θ) = [Z1(θ), Z2(θ), ...., ZT (θ)]T, with Zi(θ) the basisfunction usually chosen as Gaussian function as

Zi(θ) = exp[−(θ − uTi )(θ − ui)

η2i], i = 1, ..., d (42)

where ui = [ui1, ui2, ..., uim]T ∈ Rm is the center fieldand ηi the standard deviation. A continuous function can beapproximated by

f(θ) = W ∗TZ(θ) + ε∗(θ) (43)

The ideal weight vector W ∗ is defined as the value of Wwhich minimizes ε(θ) for all θ ∈ Ωθ ⊂ Rm

W ∗ = arg minW⊂Rt

sup|f(θ)−WTZ(θ)| (44)

In general, the ideal weights W ∗ are unknown and need to beestimated by W . The weight estimation errors are defined as

W = W ∗ − W (45)

6

E. Controller Design

The inner-loop of the system is to guarantee trackingperformance. An adaptive neural based controller is designedto achieve the objective. Considering the dynamics of robotmanipulator (4), we define

s = eq − Λeq

v = qr + Λeq

Λ = diag(λ1, λ2, .., λn)

(46)

where eq = q − qr and λi > 0. Substituting (46) into (4), wehave

M(q)s+C(q, q)s+G(q)+M(q)v+C(q, q)v = τ−τext (47)

Design the control torque input

τ = G+ M v + Cv + τext −Ks (48)

where G(q), M(q) and C(q, q) are the estimates of G(q),M(q) and C(q, q); K = diag(k1, k2, ..., ki) with ki > 1

2 . Theclosed-loop dynamics can be written as

M(q)s+ C(q, q)s+Ks =− (M − M)v − (C − C)v

− (G− G) + (τext − τext)(49)

Using the approximation method, we have

M(q) = WTMZM (q) + εM (q)

C(q, q) = WTCZC(q, q) + εC(q)

G(q) = WTGZG(q) + εG(q) (50)

where WM , WC , WG are the ideal weight matrices; ZM (q),ZC(q, q), ZG(q) are the basis function matrices. The basisfunction matrices can be defined as

ZM (q) = diag(Zq, ..., Zq)

ZC(q, q) = diag([Zq, Zq]T, ..., [Zq, Zq]

T)

ZG(q) = diag(ZTq , ..., Z

Tq )

(51)

where

Zq = [ψ(||q − q1||), ..., ψ(||q − qn||)]T

Zq = [ψ(||q − q1||), ..., ψ(||q − qn||)]T(52)

and ψ(·) is defined as the Gaussian function. The estimates ofM(q), C(q, q) and G(q) can be written as

M(q) = WTMZM (q)

C(q, q) = WTCZC(q, q)

G(q) = WTGZG(q) (53)

Substituting (50) and (53) into (49), we have

M(q)s+ C(q, q)s+Ks =− WTMZM v − WT

CZCv

− WTGZG − eτ

(54)

where WM = W ∗M − WM , WC = W ∗C − WC and WG =W ∗G − WG and eτ = τext − τext. Considering the Lyapunovfunction

V =1

2sTMs+

1

2tr(WT

MQMWM+WTCQCWC+WT

GQGWG)

(55)

where QM , QC ,QG are positive definite matrices to be set bythe designer. The derivative of V can be written as

V =sTMs+1

2sTMs

+ tr(WTMQM

˙WM + WTCQC

˙WC + WTGQG

˙WG)

(56)

By using the estimation of the weight matrices WM , WC , WG

to approximate W ∗M ,W∗C ,W

∗G, the errors between the actual

and the ideal RBFNN can be expressed as

W ∗TM ZM (q)− WTMZM (q) = WT

MZM (q)

W ∗TC ZC(q)− WTCZC(q) = WT

CZC(q)

W ∗TG ZG(q)− WTGZG(q) = WT

GZG(q)

(57)

As the ideal weight matrix W ∗ is a constant vector, we knowthat

˙WM =˙WM

˙WC =˙WC

˙WG =˙WG

(58)

Considering 2C(q, q)−M(q) is a skew-symmetric matrix [32]and (54), we have

V =sTMs+ sTCs

+ tr(WTMQM

˙WM + WTCQC

˙WC + WTGQG

˙WG)

=− sT(Ks+ WTMZM v + WT

CZCv + WTGZG + eτ )

+ tr(WTMQM

˙WM + WTCQC

˙WC + WTGQG

˙WG)

=− sTKs− sTeτ− tr

[WTM (ZM vs

T +QM˙WM )

]− tr

[WTC (ZC vs

T +QC˙WC)

]− tr

[WTC (ZGs

T +QG˙WG)

](59)

The update law is designed as

˙WM = −Q−1M (ZM vs

T + σMWM )

˙WC = −Q−1C (ZC vs

T + σCWC)

˙WG = −Q−1G (ZGvs

T + σGWG)

(60)

where σM , σG and σC are constants to be specified by thedesigner. Substituting (60) into (59), the derivative of V is

V =− sTKs− sTeτ + tr[σMWTMWM ]

+ tr[σCWTC WC ] + tr[σGG

TMWG]

(61)

Using Young’s inequality [37]

tr[WT(·)W(·)] ≤ −

1

2||W(·)||2F +

1

2||W ∗(·)||

2F

−sTeτ ≤1

2sTs+

1

2eTτ eτ

(62)

Then (61) can be written as

V ≤− sTKs+1

2||s||2 +

1

2||eτ ||2 + α

− σM2||WM ||2 −

σC2||WC ||2 −

σG2||WG||2

(63)

7

where α = σM

2 ||W∗M ||2+σC

2 ||W∗C ||2+σG

2 ||W∗G||2. A sufficient

condition for V ≤ 0 is that

α ≤sT(K − 1

2I)s+

1

2||eτ ||2 +

σM2||WM ||2

+σC2||WC ||2 +

σG2||WG||2

(64)

where I is the unit matrix. Let χ denotes the state variablecomprised of eτ , s, WM , WC , WG defined in the Lyapunovfunction candidate, and it follows from (64) that

V (χ) < 0,∀||χ|| > % (65)

where % is a positive constant. In other words, the time deriva-tive of V (χ) is negative outside the set Ωs = ||χ|| ≤ %,which is defined in Theorem 1 as below, or equivalently, allχ(t) that start outside Ωs will enter the set within a finitetime, and will remain inside the set afterwards. Choose 0 <V (χ) < ε < c, and suppose that the sets Ωs = V (χ) ≤ εand Ωc = V (χ) ≤ c. Let

Υ = ε ≤ V (χ) ≤ c = Ωc − Ωs (66)

It is known that the time derivative of V (χ) is negative insideΥ, that is

V (χ(t)) < 0,∀χ ∈ Υ,∀t ≥ t0 (67)

Since V is negative in Υ = ε ≤ V (χ) ≤ c, which impliesthat in this set V (χ(t)) will decrease monotonically in timeuntil the solution enters the set V (χ) ≤ ε. From that timeon, χ(t) cannot leave the set because V is negative on itsboundary V (χ) = ε. A sketch of the sets is shown in Fig. 4.

We can conclude the convergence and stability results inTheorem 1.

Theorem 1: Using the Uniformly Ultimately Bounded (U-UB) theorem, the tracking error s, estimation error eτ andweight errors WM , WC , WG will fall into an set Ωs, wherethe bounding set Ωs is defined in (68) and shown in Fig. 4.

Ωs =

(||WM ||, ||WC ||, ||WG||, ||eτ ||, ||s||), |

σM ||WM ||2

2α+σC ||WC ||2

2α+σG||WG||2

2α

+sT(K − 1

2I)s

α+||eτ ||2

α≤ 1

(68)

As shown in Fig. 4, the bounding set Ωs is the area inthe first quadrant, passing through the points (||σ(·)

2 W(·)||2 =

Fig. 4. Representation of UUB and the set Ωs defined in (68).

α, ||eτ ||2 = 0, ||s||2 = 0), (sT(K − 12I)s = α, ||W(·)||2 =

0, ||eτ ||2 = 0) and ( 12 ||eτ ||

2 = α, ||W(·)||2 = 0, ||s||2 = 0). InFig. 4, we define

when ||σ(·)

2W(·)||2 = α, W = $

when sT(K − 1

2I)s = α, s = β

when1

2||eτ ||2 = α, eτ = µ

(69)

Since ||W ∗(·)|| is a bounded constant, ||W(·)|| = ||W ∗(·) −W(·)|| is bounded. Since ||eτ || is bounded, ||τext|| = ||eτ +τext|| is bounded. With the bounded signals qr and qr, accord-ing to (46), ||v|| is bounded and ||q|| = ||eq + qr|| is boundedas well. Therefore, the norm of state variable χ is bounded.

IV. SIMULATION STUDIES

In this section, we consider a 2-link robot arm in physicalinteraction with unknown environment. The simulation envi-ronment is shown in Fig. 5. The environment torque is appliedat the end-effector of the robot arm in the Y direction. Thedesired trajectory will be modified to adjust the interactiontorque exerted by the environment so that a compliant behavioris achieved. The controller is to guarantee the robot arm totrack a virtual desired trajectory qr. The parameters of therobot arm are shown in Table I.

Fig. 5. An overview of the scenario.

TABLE IPARAMETERS OF THE ROBOT ARM

Parameters Description Valuem1 Mass of link 1 2.0 kgm2 Mass of link 2 2.0 kgl1 Length of link 1 0.20 ml2 Length of link 2 0.20 mI1 Inertia moment of link 1 0.027 kgm2

I1 Inertia moment of link 2 0.027 kgm2

A. Interaction PerformanceIt is assumed that the dynamics of the environment can be

defined as: 0.01q + (q − 0.3) = −τext. The LQR method isused to verify the effectiveness of the proposed method. If thematrices A and B are known, we can use optimal solutions ofthe ARE to obtain the optimal parameters of the admittance

8

t[s]0 5 10 15 20

q1 [r

ad]

-2

-1

0

1

qdLQRproposed

t[s]0 5 10 15 20

q2 [r

ad]

-2

-1

0

1

qdLQRproposed

Fig. 6. The desired trajectory qd and virtual desired trajectory qr with Q = 1and R = 1.

Number of iterations2 4 6 8 10 12 14 16 18 20 22

||Kk-K

* ||

0

0.2

0.4

0.6

0.8


||Pk||

0

5

10

Fig. 7. The convergence of Pk and Kk to their optimal values with Q = 1and R = 1.

t[s]0 5 10 15 20

Inte

ract

ion

torq

ue [N

m]

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6LQRproposed

Fig. 8. Interaction torque with Q = 1 and R = 1.

model. The proposed method is adopted when the environmentmodel is not available.

In the first step, the initial values of variables of the system

t[s]0 5 10 15 20

Cos

t fun

ctio

n

0

5

10LQR

t[s]0 5 10 15 20

Cos

t fun

ctio

n

0

2

4proposed

Fig. 9. The value of cost function with Q = 1 and R = 1.

t[s]0 5 10 15 20

q1 [r

ad]

-2

-1

0

1

qdLQRproposed

t[s]0 5 10 15 20

q2 [r

ad]

-2

-1

0

1

qdLQRproposed

Fig. 10. The desired trajectory qd and virtual desired trajectory qr withQ = 5 and R = 1.


||Kk-K

* ||

0

0.2

0.4

0.6


||Pk||

0

5

10

Fig. 11. The convergence of Pk and Kk to their optimal values with Q = 5and R = 1.

should be properly set. The selection of exploration noise isnot a trivial task which is used to make estimated parametersconverge to the real values. In this paper, the exploration noise

9

t[s]0 5 10 15 20

Inte

ract

ion

torq

ue [N

m]

-0.4

-0.2

0

0.2

0.4

0.6

0.8LQRproposed

Fig. 12. Interaction torque with Q = 5 and R = 1.

t[s]0 5 10 15 20

Cos

t fun

ctio

n

0

5

10LQR

t[s]0 5 10 15 20

Cos

t fun

ctio

n

0

5

10

15proposed

Fig. 13. The value of cost function with Q = 5 and R = 1.

is selected as

φ =

8∑w=1

0.04

wsin(wt) (70)

The initial feedback gain K0 should ensure the stability ofthe system, which is set as K0 = [−1, 0.1]. The initial Pkis set as P0 = 10Ip, where Ip represents the p dimensionalunit matrix. Then, the second step is conducted and stops until||Pk|| < 0.02, where || · || denotes the Euclidean norm.

The weights of the cost function are given by Q = 1 andR = 1. The desired admittance model is obtained as τext =−0.4142q+0.0702qr based on the known A and B. Simulationresults are shown in Figs. 6-9. In Fig. 6, the desired trajectoriesof the robot arm in joint space are shown. At the beginning,there is a large error between the LQR and proposed method,due to the initial admittance model: τext = −q+ 0.1(q− 0.3)and the exploration noise. After that, the error is becomingsmaller and trajectory of the robot arm with proposed methodis coming closely to the trajectory of the robot arm with LQRmethod. The error of admittance parameters between LQR andproposed method is shown in Fig. 7. The convergence of Pkand Kk to optimal values based on LQR are illustrated. Theerror is defined as ||Kk − K∗||. It is obvious that the error

t[s]0 5 10 15 20

Join

t 1 [r

ad]

-2

-1

0

1

virtual desired trajectoryactual trajectorydesired trajectory

t[s]0 5 10 15 20

Join

t 2 [r

ad]

-2

-1

0

1

virtual desired trajectorydesired trajectoryactual trajectory

Fig. 14. The tracking performance of each joint.

t[s]0 5 10 15 20

Tra

ckin

g er

ror

[rad

]

-2

-1

0

1Joint 1

t[s]0 5 10 15 20

Tra

ckin

g er

ror

[rad

]

-1

-0.5

0

0.5

1Joint 2

Fig. 15. Tracking error of each joint.

decreases to around 0.01 after 12 iterations and the Euclideannorm of Pk decreases to around 0.02. After three steps, theadmittance model with the proposed method is obtained as:τext = −0.4173q+ 0.00904qr. The symmetric positive matrixwith Q = 1 and R = 1, Pk from the proposed algorithm andP ∗ from LQR are shown below:

Pk =

[0.0033 0.00270.0027 0.0077

]P ∗ =

[0.0041 −0.0007−0.0007 0.0025

](71)

To further verify the correctness of the proposed method,different weights of the cost function are given by Q = 5and R = 1. Simulation results are shown in Figs. 10-13. Similarly, the desired admittance model is obtained asτext = −1.4495q + 0.2033qr based on the known A and B.After three steps of iteration, the admittance model is obtainedas τext = −1.5374q + 0.2063qr with the proposed method.The desired trajectory qd and virtual desired trajectory qr withQ = 5 and R = 1 are illustrated in Fig. 10. As shown in Fig.11, the error ||Kk − K∗|| between the LQR and proposedmethod converges to 0.08 after 12 iterations. The interactiontorque converges to around −0.35Nm at 7s, shown in Fig. 12.The values of cost function are 0.561 and 0.602 with LQR andthe proposed method respectively. Similarly, the symmetricpositive matrix with Q = 5 and R = 1, Pk from the proposed

10

0 5 10 15 20t[s]

0

0.5

1

1.5

2m

agni

tude

||M||

WM

ZM

Fig. 16. Function approximation performance of the M(q) matrix by RBFneural network.

0 5 10 15 20t[s]

0.26

0.28

0.3

0.32

mag

nitu

de

||C||

WC

ZC

Fig. 17. Function approximation performance of the C(q, q) matrix by RBFneural network.

0 5 10 15 20t[s]

0.8

0.9

1

1.1

mag

nitu

de

||G||

WG

ZG

Fig. 18. Function approximation performance of the G(q) matrix by RBFneural network.

algorithm and P ∗ from LQR are shown below:

Pk =

[0.0106 0.00680.0068 0.0107

]P ∗ =

[0.0145 −0.0020−0.0020 0.0043

](72)

Comparing the admittance model obtained by the proposedmethod and that with LQR method, the difference is smalland acceptable. The overall results are satisfactory.

B. Trajectory TrackingThe initial joint angle of the robot arm is q(0) =

[2.6,−2.6]T and the desired trajectory of the robot arm isgiven by qd = [0.3 + 0.2e−t, 0.3e−t]T. The weight matricesare initialized as WM (0) = 0, WC(0) = 0, WG(0) = 0. Theresults of tracking performance are shown in Figs. 14-20. InFig. 14, the desired joint angle trajectory and actual trajectoryare shown and the tracking errors are illustrated in Fig. 15. Itcan be found that the proposed control algorithm can make thetracking errors converge to a small neighborhood of zero. The

t[s]0 5 10 15 20

||w1||

0

2

4

6

t[s]0 5 10 15 20

||w2||

0

5

10

15

Fig. 19. The norm of weights of the neural network for each joint.

t[s]0 5 10 15 20

τ1 [N

m]

-40

-20

0

20

40

t[s]0 5 10 15 20

τ2 [N

m]

-40

-20

0

20

40

Fig. 20. The control inputs τ1 and τ2.

function approximation performance of the our implementedneural network is depicted in Figs. 16-18. As seen from thefigures, we can observe that the output of the implementedneural network could follow the approximated nonlinear func-tion’s dynamics (M(q), C(q, q), G(q)), which implies that theimplemented neural networks have the ability to approximatethe nonlinear function with satisfactory tracking performance.As shown in Fig. 19, the weight matrices of the system showsa trend of convergence and the control input is shown in Fig.20. With the tracking errors and toque regulation, as shown inFig. 8, the cost function is minimised, as shown in Fig. 9.

C. Torque Observer

The interaction torque is assumed to be applied at the end-effector of the robot arm. In Fig. 21, the actual interactiontorque and its estimation are shown, and the estimated resultsof the generalized momentum of each joint are shown in Fig.22. As seen from the figures, the output of the implementedtorque observer could follow the real torque which impliesthat the implemented observer have a satisfactory estimationperformance.

11

t[s]0 5 10 15 20

Inte

ract

ion

torq

ue [N

m]

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2estimatedactual

Fig. 21. The estimated interaction torque obtained from the observer basedon generalized momentum approach and actual interaction torque.

t[s]0 5 10 15 20

Mom

entu

m [N

rad

/s]

-1

-0.5

0

0.5

1actual p

1

estimation of p1

t[s]0 5 10 15 20

Mom

entu

m [N

rad

/s]

-0.5

0

0.5

1actual p

2

estimation of p2

Fig. 22. The estimated generalized momentum of the joints obtained fromthe torque observer and actual generalized momentum.

V. CONCLUSIONIn this paper, a method of admittance adaptation is proposed

for robot-environment interaction. The desired admittancemodel is obtained by optimal adaptive control approach andadmittance control is used to regulate the interaction behavior.A neural based controller is developed to guarantee trajectorytracking and interaction torque is estimated by an observerapproach. A cost function that includes tracking errors and in-teraction torque is minimised. Simulation studies have verifiedthe effectiveness of our proposed method.

REFERENCES

[1] S. Katsura and K. Ohnishi, “Human cooperative wheelchair for hapticinteraction based on dual compliance control,” IEEE Transactions onIndustrial Electronics, vol. 51, no. 1, pp. 221–228, 2004.

[2] H. Wang, B. Yang, Y. Liu, W. Chen, X. Liang, and R. Pfeifer, “Visualservoing of soft robot manipulator in constrained environments with anadaptive controller,” IEEE/ASME Transactions on Mechatronics, vol. 22,no. 1, pp. 41–50, Feb 2017.

[3] J. Huang, W. Huo, W. Xu, S. Mohammed, and Y. Amirat, “Controlof upper-limb power-assist exoskeleton using a human-robot interfacebased on motion intention recognition,” IEEE Transactions on Automa-tion Science and Engineering, vol. 12, no. 4, pp. 1257–1270, Oct 2015.

[4] N. Hogan, “Impedance control: An approach to manipulation: Part iiłim-plementation,” Journal of dynamic systems, measurement, and control,vol. 107, no. 1, pp. 8–16, 1985.

[5] W. He, Y. Dong, and C. Sun, “Adaptive neural impedance control ofa robotic manipulator with input saturation,” IEEE Transactions onSystems, Man, and Cybernetics: Systems, vol. 46, no. 3, pp. 334–344,March 2016.

[6] M. T. Mason, “Compliance and force control for computer controlledmanipulators,” IEEE Transactions on Systems, Man, and Cybernetics,vol. 11, no. 6, pp. 418–432, June 1981.

[7] A. M. Smith, C. Yang, H. Ma, P. Culverhouse, A. Cangelosi, andE. Burdet, “Novel hybrid adaptive controller for manipulation in com-plex perturbation environments,” PloS one, vol. 10, no. 6, p. e0129281,2015.

[8] H. Wang, D. Guo, H. Xu, W. Chen, T. Liu, and K. K. Leang, “Eye-in-hand tracking control of a free-floating space manipulator,” IEEETransactions on Aerospace and Electronic Systems, vol. 53, no. 4, pp.1855–1865, Aug 2017.

[9] W. He and S. Zhang, “Control design for nonlinear flexible wings ofa robotic aircraft,” IEEE Transactions on Control Systems Technology,vol. 25, no. 1, pp. 351–357, Jan 2017.

[10] W. He, Z. Yan, C. Sun, and Y. Chen, “Adaptive neural network controlof a flapping wing micro aerial vehicle with disturbance observer,” IEEETransactions on Cybernetics, vol. 47, no. 10, pp. 3452–3465, Oct 2017.

[11] J. Huang, P. Di, T. Fukuda, and T. Matsuno, “Robust model-based onlinefault detection for mating process of electric connectors in robotic wiringharness assembly systems,” IEEE Transactions on Control SystemsTechnology, vol. 18, no. 5, pp. 1207–1215, Sept 2010.

[12] A. Alcocera, A. Robertssona, A. Valerac, and R. Johanssona, “Forceestimation and control in robot manipulators,” in Robot Control 2003(SYROCO’03): A Proceedings Volume from the 7th IFAC Symposium,Wrocław, Poland, 1-3 September 2003, vol. 1. International Federationof Automatic Control, 2004, p. 55.

[13] A. De Luca and R. Mattone, “Sensorless robot collision detectionand hybrid force/motion control,” in Proceedings of the 2005 IEEEinternational conference on robotics and automation. IEEE, 2005,pp. 999–1004.

[14] A. De Luca, A. Albu-Schaffer, S. Haddadin, and G. Hirzinger, “Collisiondetection and safe reaction with the dlr-iii lightweight manipulator arm,”in 2006 IEEE/RSJ International Conference on Intelligent Robots andSystems. IEEE, 2006, pp. 1623–1630.

[15] M. Cohen and T. Flash, “Learning impedance parameters for robotcontrol using an associative search network,” IEEE Transactions onRobotics and Automation, vol. 7, no. 3, pp. 382–390, 1991.

[16] T. Tsuji, K. Ito, and P. G. Morasso, “Neural network learning of robotarm impedance in operational space,” IEEE Transactions on Systems,Man, and Cybernetics, Part B (Cybernetics), vol. 26, no. 2, pp. 290–298, 1996.

[17] C. Yang, C. Zeng, P. Liang, Z. Li, R. Li, and C. Y. Su, “Interface designof a physical human-robot interaction system for human impedanceadaptive skill transfer,” IEEE Transactions on Automation Science andEngineering, vol. PP, no. 99, pp. 1–12, 2017.

[18] R. Z. Stanisic and A. V. Fernandez, “Adjusting the parameters of themechanical impedance for velocity, impact and force control,” Robotica,vol. 30, no. 04, pp. 583–597, 2012.

[19] S. S. Ge, Y. Li, and C. Wang, “Impedance adaptation for optimal robot–environment interaction,” International Journal of Control, vol. 87, no. 2,pp. 249–263, 2014.

[20] R. Johansson and M. W. Spong, “Quadratic optimization of impedancecontrol,” in Robotics and Automation, 1994. Proceedings., 1994 IEEEInternational Conference on. IEEE, 1994, pp. 616–621.

[21] M. Matinfar and K. Hashtrudi-Zaad, “Optimization-based robot com-pliance control: Geometric and linear quadratic approaches,” The In-ternational Journal of Robotics Research, vol. 24, no. 8, pp. 645–656,2005.

[22] D. E. Kirk, Optimal control theory: an introduction. Courier Corpora-tion, 2012.

[23] D. P. Bertsekas, Dynamic programming and optimal control. AthenaScientific Belmont, MA, 1995, vol. 1, no. 2.

[24] F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptivedynamic programming for feedback control,” IEEE circuits and systemsmagazine, vol. 9, no. 3, pp. 32–50, 2009.

[25] F.-Y. Wang, H. Zhang, and D. Liu, “Adaptive dynamic programming: anintroduction,” IEEE Computational Intelligence Magazine, vol. 4, no. 2,pp. 39–47, 2009.

[26] P. J. Werbos, “A menu of designs for reinforcement learning over time,”Neural networks for control, pp. 67–95, 1990.

[27] D. White, D. Sofge, and L. Feldkamp, “Handbook of intelligent control:neural, fuzzy, and adaptive approaches,” IEEE Transactions on NeuralNetworks, vol. 5, no. 5, pp. 852–852, 1994.

12

[28] B. Kim, J. Park, S. Park, and S. Kang, “Impedance learning for roboticcontact tasks using natural actor-critic algorithm,” IEEE Transactions onSystems, Man, and Cybernetics, Part B (Cybernetics), vol. 40, no. 2, pp.433–443, 2010.

[29] J. Buchli, F. Stulp, E. Theodorou, and S. Schaal, “Learning variableimpedance control,” The International Journal of Robotics Research,vol. 30, no. 7, pp. 820–833, 2011.

[30] S. Arimoto, S. Kawamura, and F. Miyazaki, “Bettering operation ofdynamic systems by learning: A new control theory for servomechanismor mechatronics systems,” in Decision and Control, 1984. The 23rd IEEEConference on. IEEE, 1984, pp. 1064–1069.

[31] Y. Jiang and Z.-P. Jiang, “Computational adaptive optimal control forcontinuous-time linear systems with completely unknown dynamics,”Automatica, vol. 48, no. 10, pp. 2699–2704, 2012.

[32] B. Siciliano and O. Khatib, Springer handbook of robotics. SpringerScience & Business Media, 2008.

[33] A. Wahrburg, E. Morara, G. Cesari, B. Matthias, and H. Ding, “Cartesiancontact force estimation for robotic manipulators using kalman filtersand the generalized momentum,” in 2015 IEEE International Conferenceon Automation Science and Engineering (CASE). IEEE, 2015, pp.1230–1235.

[34] H. Kwakernaak and R. Sivan, Linear optimal control systems. Wiley-interscience New York, 1972, vol. 1.

[35] F. L. Lewis and V. L. Syrmos, Optimal control. John Wiley & Sons,1995.

[36] T. H. Lee and C. J. Harris, Adaptive neural network control of roboticmanipulators. World Scientific, 1998, vol. 19.

[37] W. H. Young, “On classes of summable functions and their fourierseries,” Proceedings of the Royal Society A Mathematical Physical andEngineering Sciences, vol. 87, no. 594, pp. 225–229, 1912.

neural networks enhanced adaptive admittance control of...

Documents