iterative identification and control || open-loop iterative learning control

1 Modelling, Identification and Control

Michel Gevers

Abstract. In this chapter, we first review the changing role of the model in control system design over the last fifty years. We then focus on the development over the last ten years of the intense research activity and on the important progress that has taken place in the interplay between modelling, identification and robust control design. The major players of this interplay are presented; some key technical difficulties are highlighted, as well as the solutions that have been obtained to conquer them. We end the chapter by presenting the main insights that have been gained by a decade of research on this challenging topic.

1.1 A not-so-brief Historical Perspective

There are many ways of describing the evolution of a field of science and engineering over a period of half a century, and each such description is necessarily biased, oversimplified and sketchy. But I have always learned some new insight from such sketchy descriptions, whoever the author. Thus, let me attempt to start with my own modest perspective on the evolution of modelling, identification and control from the post-war period until the present day.

Until about 1960, most of control design was based on model-free methods. This was the golden era of Bode and Nyquist plots, of Ziegler-Nichols charts and lead/lag compensators, of root-locus techniques and other graphical design methods.

From model-free to model-based control design The introduction of the parametric state-space models by Kalman in 1960, together with the solution of optimal control and optimal filtering problems in a Linear Quadratic Gaussian (LQG) framework [90,91] gave birth to a tremendous development of model-based control design methods. Successful applications abounded, particularly in aerospace, where accurate models were readily available.

From modelling to identification The year 1965 can be seen as the founding year for parametric identification with the publication of two milestone papers. The paper [80] set the stage for state-space realisation theory which, twenty-five years later, became the major stepping stone towards what is now called subspace identification. The paper [12] proposed a Maximum Likelihood (ML) framework for the identification of input-output (i. e., ARMAX) models that gave rise to the celebrated

P. Albertos et al. (eds.), Iterative Identification and Control© Springer-Verlag London Limited 2002

100 A. Sala and P. Albertos

approach is a black box one, in the sense that no prior knowledge of the physics of the plant is used to synthesise the control actions.

The plant evolution is assumed to start always at the same initial conditions and the control action refining is a batch process. That is, the control updating is done out of the control performing. This is a main difference with respect to the older concept of repetitive controP [157]. In this way, in the iterative learning control approach, there are no stability problems (in fact, the control system acts in an open-loop way). Nevertheless, the control convergence to a steady control signal is not guaranteed and some cautions should be taken, as analysed later on.

Also, this setting assumes perfect measurement conditions. The errors in the measurements, as well as the changes in the plant or disturbance behaviours, are left to the feedback controller.

For linear plants, the sequence of control actions can be scaled with respect to the magnitude of the reference or the disturbance, if it has been computed for unitary signals. In this sense, some kind of generalisation is possible, but, in general, the control sequence is only valid for the trained conditions.

The aim of this chapter is to show how control structures with some open-loop components can improve the reference tracking and deterministic disturbance rejection and simultaneously decrease the feedback effort needed, hence improving the robustness of the overall control system. Feedback will then tackle only the task of compensating random disturbances and model mismatch and variations, if any.

Another possibility to be presented (in the repetitive task case) is to learn the sequence of control actions needed to follow a prescribed trajectory. Although that sequence could be derived from the same linear model used in the feedback control design, incorporating the learning mechanism could enable the system to compensate some unmodelled non-linearities in openloop, maybe avoiding the need of non-linear identification and non-linear controllers. Non-linear controllers in the feedback loop would be only needed in the cases that significant non-deterministic disturbances arise (assuming that linear identification gives useless models) and/or strong operating condition changes (plant variations) occur.

For example, a typical case for applying these techniques would be a robot arm executing a trajectory-following task requiring quick and precise movements. The linearised models are time-varying so that achieving robustness against unmodelled dynamics needs low feedback gains. If the task includes grabbing and leaving an object, then the parameters also vary. Furthermore, there might be unmodelled irregularities in a bearing producing friction spikes always at the same arm position, etc. Iterative learning control could generate feedforward control actions so that the feedback regulator (either linear

1 In the repetitive control framework, the reference is periodically time-varying and the control updating takes place in closed-loop. Thus, only a task is considered though its behaviour is repetitive or periodic.

5 Open-loop Iterative Learning Control 101

or using some sort of linearisation) could have reduced gains and hence be less sensitive to inaccurate modelling of inertias, friction, etc.

In this case, the main advantages of this approach rely on both the use of a lower feedback gain and in getting a faster response. The high feedback gain to counteract the error presence increases the sensitivity to noise and easily leads to actuator saturations. On the other hand, the feedforward control avoids the plant's delays and allows for a faster response and a reduction in the magnitude error.

The main drawback is that the generated control action is only valid for the learned operating conditions.

5.2 Control with Two Degrees of Freedom

In this section, some of the ideas in section 4.2.7 will be reviewed to set up a framework motivating the iterative learning control laws discussed in later sections.

Let us have a process control loop such as the one depicted in Figure 5.1.

Fig.5.1. Closed-loop control with set-point changes, process and measurement noise

In that set-up, the different transfer functions are

y(s) = H 1 ~ Tv v(s) l+KG

y(s) -w(s)

y(s) r(s)

KG ~fT 1 +KG - r

On the other hand, in open-loop, the plant equation is:

y(s) = Gu(s) + Hv(s)

(5.1)

(5.2)

so measurement noise does not affect the output. The classical conclusions for those equations is that high-gain regulators produce Tv ~ 0 (process disturbance rejection ) and Tr ~ 1 reference tracking. As high gain cannot be achieved all over the frequency range due to stability and robustness constraints (see Chapter 4 or [140,142]), the loop designer usually shapes one of


the two closed-loop functions (Tv or Tr) and the other one is left with whatever the result of that shaping is. In many cases where combined tracking and disturbance rejection is sought, the open-loop transfer function KG is shaped to have the highest possible gain while not unstabilising the closed-loop (with reasonable robustness margins).

The referred shaping of closed-loop transfer functions can be made using pole placement or cancellation controllers (defining either Tr or Tv so that some internal stability requirements are met and then obtaining K from the appropriate expression in (5.1) ), or via Hoo techniques such as stacked sensitivity S / K S, etc. [114, 142, 164). If one of the functions is shaped, then the other one must verify:

(5.3)

In many cases, an inaccurate model hinders gain increase so tracking performance is unacceptable, but disturbance following may be satisfactory if they are small or concentrated in a frequency range where robust stability constraints are not stringent.

5.2.1 Feedforward Control

To achieve some prefixed behaviour in both Tr and Tv, it is necessary to include a so-called two-degrees-of-freedom structure (2DOF) in the control loop such as the ones depicted in Figure 5.2 or any other obtained from them with valid block diagram transformations. Other than the inner feedback loop already mentioned, the input is fed forward to the control variable without waiting for an output error to occur. In both of these, the internal regulator K is designed to take into account non-deterministic disturbance rejection and robustness, given a certain model quality. Of course, many of the techniques in this book could be used to design appropriated experiments so as to improve the model quality and the closed-loop performance.

The remaining blocks are designed as follows, in each of the two approaches:

Set-point pre-filtering

The block diagram is depicted in Figure 5.2(a). If the output (let us call it Yl (8)) with a starting default pre-filter Qo = 1 is unsatisfactory, then a prefilter Q (8) is designed so that Q (8 )Yl (8) exhibits good behaviour. Although the pre-filter would involve, in a general case, inversion of the achieved closedloop transfer functions, in many cases, the pre-filter is manually tuned, by trial and error, once the closed loop design has been finished. The manual tuning usually is based on a pre-filter such as:

Q(8) = K a8 + 1 e f38 + 1

or an equivalent discretised version.

(5.4)


Process

(a) Set-point pre-filtering

...------+[2] ~

Process

(b) Feedforward open-loop

Fig. 5.2. Two-Degrees-of-freedom configurations

V* + y

• Ke is used to compensate the position error ep of the closed-loop regulator, i.e., Ke = 1/(1 - ep );

• a > (3 is used to "accelerate" the output rise phase, usually in nonoscillating transients;

• a < (3 delays the rise phase in the output and alleviates some overshoot effects. In many practical applications, non-linearities advise against the use of high amplitude inputs and, in order to achieve that, rate-limited set-point changes are usually enforced, with a similar effect to this last pre-filtering. This type of pre-filtering is implemented into many commercial PID regulators.

Feedforward open-loop

The scheme can be analysed in Figure 5.2(b). In this case, it is easy to check that the transfer function from r to y is Tr (s) whatever the internal regulator R is. On the other hand, the Tr block does not modify the stability or


performance of the closed-loop regarding v. In this way, the reference tracking design is fully decoupled from the disturbance rejection and stabilisation ones. The tracking design is only based on open-loop calculations2 . In this way, tracking performance and stability robustness can be balanced in a design. This will be the basic configuration for the iterative learning approach to be presented in the next sections, and it is similar to the set-up depicted in Figures 4.8 and 4.9.

This schema can be also applied to disturbance rejections, if they are deterministic and known in advance. In this case, Tr (s) is replaced by G v (s) in the forward path from v(s).

So, to summarise, 2-DOF configurations allow for separate design of, on one hand, disturbance and modelling error rejection and closed-loop stability robustness (limited by model quality) and, on the other hand, reference tracking behaviour (limited by actuator power3 and bandwidth).

Iterative improvement can be carried out by identifying better and better models, so a better estimate of G will lead to a better uff and Ufb calculation in Figure 5.2(b). Notwithstanding, as pointed out in other chapters, maybe some reduced order models could be useful to design feedback controllers even if their open-loop step response is very different to that of the plant. In that case, a different model of G should be used for the design of K and for the design of the inverse plant G- 1 in the referred figure.

In fact, the point in this chapter is to realise that maybe the "model" needed for disturbance rejection in closed-loop control could be simpler, and the regulator more cautious if the tracking of references were carried out in open-loop and learnt (identified) separately: the closed-loop model needs accuracy ONLY at stability-risk frequencies. There might be no need of accuracy over a wide range of frequencies to follow abrupt, quick reference changes.

In the case of repetitive tasks, the control actions Utt can be learnt from repetition of experiments and progressively improve tracking performance. Of course, if the reference changes arbitrarily in an unpredictable way, then a general model G is needed. The first case (repetitive task) will be presented in more detail in the next sections. For the general case, the reader is referred to the open-loop and closed-loop identification techniques described in Chapter 2 and [108].

2 Perfect tracking is not possible in non-minimum-phase plants. Notwithstanding, the 2-DOF approach is applicable to unstable plants as long as the feedback controller internally stabilises the loop.

3 Actuator power is also an issue in stabilising unstable plants, where unrecoverable states can be reached [85].


5.3 The Iterative Learning Paradigm

As usual implementations of complex control laws are coded into a computer, we will consider the discrete time case, in where a sequence of control actions has to be learnt.

In the sequel, an analysis of a particular set-up for repetitive tasks will be presented. However, in a general case, direct use ofthe most updated system model in the schemes in Figure 5.2 can also be considered. That model can be obtained from previous identification experiments. An important point is that the model in the feedforward path can be more "aggressive" and complex in the sense that its modelling errors will not compromise closed-loop stability, but only tracking performance, so less caution is required.

In the following, a direct learning of control actions will be undertaken, without resorting to a model. That will ease convergence, but at the expense of not being a "general" plant inverse but one specialised for the particular repetitive task under execution. Although the learnt action is applied in openloop, the learning algorithm obviously needs feedback from sensors to evaluate its performance, but the loop is only closed once the repetitive task has ended, and the dynamics will be simpler, as shown in later sections.

The assumptions

The iterative learning methodology can be understood as follows. Let us have a repetitive system

(5.5)

where fH : U ---+ Y is a non-linear operator and Uk(t) is the input sequence applied to the system in the repetition (iteration) k = 0,1, , ... at time t. It will be assumed that

• every trial lasts a fixed time T; • for the desired output yd(t), a reasonable control action sequence ud(t)

exists so that it achieves the desired output. Reasonable in this case means with acceptable bounds in amplitude and frequency. To accomplish this requirement, the desired signal shouldn't exhibit abrupt changes, and restrictions on the smoothness and minimum "gain" of fH also apply. The desired output yd is kept the same during all the iterations (repetitive task);

• the initial state of the plant is the same in all iterations. In 2DOF configurations, the initial condition error will exhibit the closed-loop dynamics;

• the system is time-invariant through the iterations. That means: fH does not depend on k. Time variance inside each iteration is allowed (for example, grabbing or dropping a certain amount of mass, leading to a change in mechanical parameters) as long as it occurs at the same time instant in each iteration.


Note that minimum-phase systems can be "inverted" to obtain Ud if the duration of the requested task is short enough, or if the desired trajectory does not excite the inverse unstable dynamics (although generating the desired output in that case for long-lasting tasks is impossible in practice because of errors in estimation of the zero dynamics).

The objectives

The objective of the iterative learning is to track references and repetitive disturbances, with progressively lower tracking error by improving its openloop control action estimates.

A learning operator h will provide the input Uk+! (t) to be applied to the plant in the next iteration, taking into account the current input-output data pair, (Uk(t),Yk(t)), and the desired output, yd(t)

(5.6)

This setting is represented in Figure 5.3.

Plant c-,-+' fH

+----- Learning fL

Fig. 5.3. Iterative learning

Thus, the iterative learning problem is to design an iterative algorithm (5.6) that converges to an input u*(t) that minimises the system output tracking error yd(t) - JH(U*(t), t) in some norm "It E [to, tf] [115].

5.4 The Learning Algorithms

In order to avoid the closed-loop interaction between the current control action and tracking error, the following strategies are usually adopted:

• use next error (in current iteration) to update the control action in current time for the next iteration

(5.7)

where 'T) is a learning factor. In this setting, the mismatch between the current control action and the one required to obtain the desired output


is assumed to produce a tracking error in the next time instant. If it is assumed that the plant presents a pure time delay d, the learning algorithm would be:

• add the previous iteration feedback control actions to the current feedforward one:

(5.8)

• the previous approaches are a particular case of the use of a general filtered measure of error in the last iteration to update the current control actions. In the next iteration, all time instants are known so the filter can be "non-causal" if so wished.

5.4.1 Convergence (Linear Case)

In this section, the convergence of some learning strategies will be analysed. The feedback loop is not considered and so the control actions refer to the feedforward part.

SISO case

For simplicity, let us firstly consider a single-input single-output (8180) linear plant.

Denote by

{hd = {O, ... , 0, hm, hm+1,"'} (5.9)

the pulse response of a sampled linear time-invariant system, where m is the relative degree of the system (if m > 1 that means that a pure delay is present). The system output is given by the usual convolution formula

i-m

y(i) = L hi-ju(j) j=O

(5.10)

If a finite length response is analysed (i = m, ... ,N), then (5.10) can be written in matrix form. By defining vectors u, y E RN -m+1

u = [u(O) u(l) ... u(N - m)(

y = [y(m) y(m + 1) ... y(N)( (5.11)

(5.12)


then

y=Hu (5.13)

h~ ::: ~ 1 hN - 1 ... hm

(5.14)

The inverse system can be also represented in finite time by a matrix relation u = My, where M = H- 1

, and another convolution:

u(i - m) = L ajy(m + i - j) (5.15) j=m

The structure of matrix M is also lower triangular:

M= (5.16)

so from any desired finite time output trajectory sequence y = yd, the input sequence u*(t) can be derived given the feedforward matrix M. The output sequence must satisfy trivial conditions regarding causality (delay). For unstable systems, although the reasoning so far is valid, the Hand M matrices can be badly conditioned so it is recommended to learn the control actions in a reference pre-filtering approach (Figure 5.2(a)), after the feedback loop has stabilised the system instead of feedforward open-loop injection (Figure 5.2(b)). For example, the condition number of H for a system with impulse response hk = OAe3.5*(O.2k) for a task duration of 11 samples is 4.35 x 103 . The problem of iterative learning in reference pre-filtering is that if the feedback controller is changed then the learned actions are no longer valid so the process must start again, whereas the second approach totally de couples tracking and disturbance rejection, as previously presented.

If a system model (state-space or transfer function) is known, M can be easily obtained. If a model is not available, the feedforward matrix elements ai (i = m,··· ,N) can be easily estimated from any previously recorded experimental input-output pairs (u, y), as well as an estimation of the total delay m, but the approach is very sensitive to model uncertainty, noise and disturbances. Also, it does not apply to non-linear systems. The iterative learning approach will compute the control actions directly, without resorting to calculation of M, even for non-linear systems where it cannot be constructed. The delay estimation is still needed, and a wrong one can hinder the convergence.


Let us consider Yk and Uk, the vectors containing the input and output samples in repetition (iteration) k, and ek = yd - Yk, the vector of errors. The size of the vectors is N - m + 1.

Let us apply an update of control action given by:

(5.17)

where A is a learning matrix of dimensions (N - m + 1) x (N - m + 1). Then,

(5.18)

so A must be designed so that the eigenvalues of (I - H A) have a modulus of less than 1. As eig(I - H A) = 1 - eig(H A), that condition can be shown to be equivalent to matrix H A having eigenvalues in the unit circle centered at + 1, so if oscillatory complex eigenvalues are ruled out, the desired values are real between 0 and 2. For full oscillation avoidance, values in [0,1] are advised.

Due to the triangular structure of H, if A is a diagonal matrix with all diagonal elements 'fJ such that

0< 'fJ < 2/hm (5.19)

the stability of the error dynamics in the iterations time axis is guaranteed. This applies to the SISO case, and corresponds to the learning rule

(5.20)

From this perspective, the "optimal" learning factor would be

'fJ = l/hm (5.21)

assigning the eigenvalues of (I - H A) to the origin. Of course, depending on the approximate plant model, a non-diagonal A (meaning some kind of filtering on the error measures) can also yield stable error dynamics with less sensitivity to random disturbances (see later). That would be the philosophy under learning rules such as (5.8) or "non-causal" filterings previously mentioned.

If an approximate model is known, then the bound on the learning rate (5.19) can be used to design the iterative learning set-up. If the plant is unknown, an estimate of it could be obtained via impulse or step experiments. Note that the learning rate in (5.19) is limited by the maximum possible value of hm in case that an uncertain model description is available. The uncertainty in models will be addressed later on.

Oscillations in the final control U* may appear as the poles in the ideal control action cancel the zeros of the plant, unless those effects are accounted for in generating yd. Usually, as learning control is applied to poorly known plants, it is not possible to a priori know the location of the plant zeros.


MIMO case

In a multiple-input-multiple-output (MIMO) case, input and output vectors at each sampling instant Uk(t), Yk(t) should be concatenated to form bigger vectors Uk, ek. In that case, H is block lower triangular but its determinant and eigenvalues are still given by those of the block diagonal. Let us assume that the plant incorporates m actuators and m outputs (square plant) so that H is square and there exists a unique u* achieving yd.

In this case, the impulse response hm is a matrix of dimensions m x m , so the requisite for the learning rate matrix "I (with dimensions m x m, being "I the building block of a block-diagonal A matrix) would be that the eigenvalues of hm"l are between 0 and 2. That can be solved with straightforward pole placement techniques. The "optimal" learning rate matrix is "I = h;l, in the sense that poles are placed at the origin.

Note that, however, the design of learning rates requires at least an approximate knowledge of hm and its directionality effects, which in the MIMO case can be harder to obtain than in a single variable experiment. If an approximate hm is known such that the relative error

(5.22)

has a known bound in maximum singular value 0'(,,1) < O'm strictly less than 1, then using a learning rate matrix "I = h; 1 (1 - F), as the closed loop dynamics is determined by the eigenvalues of

1- hmT} = I - hmh;;/(I - F)

= I - (I + (hmhm - 1))(1 - F) = F - ,,1(1 - F) (5.23)

that the maximum eigenvalue of F + Ll(I - F) must be less than one, being F a target dynamics that would be achieved in the case of exact matching between model and real plant. For the particular case of symmetric F (so its singular values are the absolute value of the eigenvalues Ai), it can be shown that robust stability will hold if:

max IAil + O'm max(l - Ai) < 1 , ,

and, after some operations:

(5.24)

For example, to safely use learning rates near 2h;;;t1 (i.e., F = -1) the modelling error must be very small. For positive definite F, the result is O'm < 1. As most singular value stability results, the uncertainty bound is conservative.


Deterministic perturbations

When subject to deterministic repetitive disturbances, the output of the system is Y = HU + D, where D is a vector of disturbances invariant under iterations. In that case, the error equation still converges to zero, because ek = yd - (HUk +D) so:

(5.25)

Note that the error converges to zero irrespective of D being directly measurable or known, as long as it is repetitive.

N on-deterministic perturbations

In the presence of non-deterministic perturbations, D is not constant over iterations so it cannot be cancelled as in the previous formula. Hence, the dynamics are:

ek+l = yd - Yk+l = yd - H(Uk + Aek) - Dk+l = (ek + Dk) - Dk+l - H Aek = (J - H A)ek + (Dk - Dk+l) (5.26)

If the disturbance D k is stationary, the ILC law converges to zero-mean error, as E(Dk) = E(Dk+l) where EO denotes mathematical expectation. Thus, the algorithm converges successfully for slowly-varying disturbances (Dk -Dk+l small) but general stochastic disturbances produce variance effects on ek, in many cases amplifying its effect, with amplification values depending on the learning rate (note that the ILC law feeds back in the next iteration the disturbances from the current one, adding in this way a second source of them). That is why the iterative learning control is advised in problems where the tracking or deterministic disturbance component is predominant over the stochastic one (although a valid approach is the 2DOF framework where a feedback regulator diminishes the effect of random disturbances to an acceptable level).

To evaluate the effect of disturbances, in the case that Dk is independent of Dk+l4 the stationary error covariance matrix, after convergence to zero mean, would be given by the solution 3 e to the discrete Lyapunov equation:

(5.27)

where 3 D is the covariance matrix of Dk. For instance, a white noise disturbance would have a diagonal 3D.

4 Note that coloured noise on the system output would qualify for that independent disturbance if the time between repetitions (iterations) is high enough to fade away significant correlations.


Delay estimation errors

If the delay m is underestimated, then the diagonal of H contains zeros, so the learning eigenvalues are always 1, i.e., the control actions increase forever and the learning behaves in an unstable way (ramp-like). In the case that it is overestimated, the learning algorithm can converge and, in some cases, be even better than in the original case but the calculation of the admissible range of learning rates is not straightforward.

Let us consider, as an example, a system with impulse response given by:

{hd = {O, 0.01, 0.06, 0.5, 0.7, 0.8, 0.8, ... }

and a task length of 11 samples. In that case, hI and h2 are almost nil. It appears that, although a pure delay of d = 2 could be considered, the measurement noise masks it. Assuming a delay m = 1, the optimal learning rate (5.21) is 100, but in that case, the non-deterministic perturbations such as measurement noise are amplified by that gain, disturbing the algorithm in such a way that even if it will converge in average, the performance could be very poor due to that effect. If a more sensible choice iii, = 3 is made, then the matrix H is no longer diagonal, but with a choice of TJ = 1.51 the eigenvalues of 1-H A have a maximum modulus of 0.66, so the convergence is reasonably quick and the amplification of the non-deterministic components is much less than in the previous case. Note that in this case, adopting the "optimal" learning rate from (5.21), i.e., TJ = 1/0.5 = 2 would have led to an unstable iteration scheme.

5.4.2 Non-linear Case

In a non-linear plant, the finite time length response has an expression y = !H(U), y and U being the output and input sequences as a vector, respectively. With the plant being causal, the operator ! H is also triangular.

The non-linear convergence for small learning rates can be proved in a local sense, as follows. These kind of iterations are, in a certain sense, a reformulation of gradient methods to solve yd = !H(U*) for u*.

If the increments of control actions between iterations are small, the output approximately verifies:

where J is the Jacobian matrix of first derivatives of H (locallinearised model for Uk):

J = 8(fH)i aUj

The Jacobian depends as well on the initial state, but it is assumed to be fixed. Hence, assuming the non-linear model is smooth enough so that J( uo)


is continuous and bounded, then if the learning rule (5.17) is applied with a smallliAl1 (maximum singular value norm), it follows

(5.28)

hence, for a small 'fJ in (5.20), the linearised model's validity and a condition similar to (5.19) can be both fulfilled, if the diagonal elements in J do not change sign (monotonic instantaneous gain). Note that the non-linear case J is lower triangular, but all the diagonal elements need not be identical. At least, they should not change sign in any operating point.

However, J depends on Uk so in the next iteration, the Jacobian J will be at a different point, hence dynamics in (5.28) will be time-varying. Hence, convergence is unclear and, anyway, the idea could only work for infinitesimal learning rates. These expressions, however, provide some hints for more powerful results.

In the following, a proof of global convergence in the non-linear case will be outlined by induction, assuming the learning rate is below a certain bound, for the SISO case.

Let us divide the operator H(u) into an off-diagonal HoO and a diagonal one HdO, i.e., each output Yki (reflecting iteration k, time i + m) will be expressed as

where the second subscript represents actual time in input and output variables.

We will assume that some knowledge is available on the diagonal instantaneous gain-like term Hdi . In particular, we assume that Hdi is positive and verifies, for any two feasible inputs (at time i) a and b, the Lipschitz condition:

where 0 < 'Yl ::; 'Yh and 'Yh is approximately known. That will be the case, for example, if it has a derivative with respect to Uki bounded by 0 < 'Yl ::;

8~~i ::; 'Yh, but in fact it is not even needed for Ho or Hd to be differentiable functions for the learning algorithm to work.

Let us assume that the learning rate 'fJ is less than 1/ 'Yh, and let us consider each element of ek+l.

Starting with e(k+l)l,

and


hence:

so, if TJ < 2/'Yh, ekl converges as it is bounded from below and above by stable first-order linear dynamics.

By considering e(k+l)j, assuming all eki converge to zero for i < j, we have, when enough iterations have elapsed that, defining

'ljJj = Hoj (Ukl + TJekl, ... ,Uk(j-l) + TJek(j-l)) - Hoj (Ukl' ... ,Uk(j-l))

it follows that

e(kH)j = Ydj - Hoj(Ukl + TJekl,···, Uk(j-l) + TJek(j-l)) - Hdj(Ukj + TJekj)

< Ydj-Hoj(Ukl, ... , uk(j_l))-Hdj(Uk)-"(ITJekj-'ljJj = (l-"(ITJ)ekj -'ljJj

(5.29)

and

e(k+l)j = Ydj - Hoj( Ukl + TJekl, ... ,Uk(j-l) + TJek(j-l)) - Hdj (Ukj + TJekj)

> Ydj-Hoj(Ukl, ... , Uk(j-l))-Hdj(Uk)-"(uTJekj-'ljJj = (l-"(uTJ)ekj+'ljJj

(5.30)

where 'ljJj tends to zero when k is big if Ho is assumed to be continuous. Hence, the error is bounded by two convergent linear dynamics with an input approaching zero, thus ekj converges also to zero.

In that way, by induction, as ekl converges, then ek2, ek3, ... also converge in the long run.

A similar proof would show convergence in the non-linear case with presence of deterministic repetitive perturbations, as in (5.25).

Non-linear MIMO case

Convergence for the MIMO non-linear case needs additional considerations. Based on previous considerations, an accurate enough estimate of the Jacobian must be available, in the sense of (5.22). With that estimate, convergence will be achieved by using its inverse.

Let us outline the full generalisation of the approach, defining di (b, u) such that:

so that if Uk+l = Uk + b, then

(5.31)


If and invertible approximator d of d (linear or non-linear) is known (i.e., m = d(n, u) can be solved for n with known m and u, denoting the solution as d-1(m,u)) such that the uncertainty in the approximation, defined as:

.1(e,u) = d(d-1(e,u),u) - e (5.32)

verifies, for any desired increment of output e and input u

(5.33)

In that case, it can be shown that incrementing the control actions by

(5.34)

will yield, for the first control action

A_I e(k+l)1 = ekl - d1(d1 (,Bekl,ukd,Ukl)

= ekl - ,Bekl - .1(,Bekl, Ukl) = (1 - ,B)ekl - .1 (,Bekl ,ukd (5.35)

so

(5.36)

hence, for 0 < ,B < 1, ek converges. For reduced uncertainty levels, the convergence bound for ,B approaches 2, and letting ,B = (I - F) a similar expression to (5.24) can be obtained. An induction argument similar to the SISO case can be put forward for the convergence of the full scheme.

An alternative approach to convergence proofs for iterative laws based on non-linear state-space models can be found in [37].

5.4.3 Reparameterisation

The previous approach directly learns the control actions that are applied at each stage of the repetitive task.

There is the option of generating the control actions from a parameterised function approximator. Of course, if the number of parameters is small, only an approximation to the control time history could be achieved, but note that in a 2DOF structure, the feedback part would handle those approximator errors. Learning rates may need to be further reduced, even more if the proposed approximator is non-linear in parameters.

The possible advantages of these parameterisations are two:

• for some references (such as steps), the complexity of the control actions reduces with time (all are equal to the same constant near the settling time);


• maybe the task to be carried out needs to be executed for different values of a certain measured variable (a mass, a reference scaling, etc.). In that case, including that variable in the approximator inputs could lead to useful interpolations for non-linear systems, in a sense similar to gain scheduling on the feedforward blocks, learning them from experience.

For linear plants, taking the model reference control structure in mind, the parameterised function approximator could be implemented, in some cases, as a filter. The desired output is defined as the signal generated by a model under some excitation, and the sequence of control actions is generated as the output of a filter with the same excitation. In this setting, the adaptation to changes in magnitude of the desired output is naturally achieved.

The heuristic procedure will be:

• define the desired output, yd(t) as the signal generated by the reference model Gd(s) under the input u;

• consider a suitable filter G(s, a), where a is a vector of parameters to be learned. This filter should have, at least, the poles in Gd(s). Let us call ua(t) the filter output for the input u;

• apply a learning algorithm to the parameters a, considering as error the difference between yd (t) and the output of the plant excited by Ua (t);

• if the error does not converge, try a new filter G(s, a), with a different structure/set of parameters.

Of course, this heuristic approach does not have any convergence proof but, in some cases, can lead to good results, as illustrated later on in some examples.

The main advantages are the reduced number of parameters, the easy implementation and the implicit generalisation for different reference magnitudes. Although some generalisation in the excitation input u can be expected, the results are only valid for the trained kind of inputs.

5.5 Examples

Example 5.1. In this example, the set of control actions to control a linear plant will be iteratively learnt. The plant will be:

G(s) _ 1.8 - (s + l)(s + 2)(s + 9.5)

and discretised at 1.5 s sampling time. The instantaneous gain hI of the discretised model is 0.0535 so the "minimum-time" learning rate is 1] = 1/0.0535 = 18.7. A feedback PD regulator is in charge of trying to achieve the reference behaviour when the iterative approach for the repetitive task hasn't yet learned the feedforward actions.


The closed-loop equations in Figure 5.2(b) are:

_ G ff + GK ( *) + 1 * Y - l+GK u l+GK r l+GK V

1 + K (* *) U = l+GKUff l+GK r - v _ -GK + K (* *) - l+GKUff l+GK r - v

(5.37)

(5.38)

(5.39)

The learning rate used in the simulation was 9.2, half of the previously calculated optimal one. Note that in practice, that optimal gain may be a very rough estimate so it's better to play on the safe side, especially when significant non-linearities could be present. The MATLAS® code that carries out the simulation is:

ng=1.8;dg=poly([-1 -2 -9.5]); T=1.5; qe=9.2;uff=0*range'; [ngz,dgz]=c2dm(ng,dg,T,'zoh'); nkz=[9.5500 -3.5500]; dkz =[1.0000 0.0000]; nsamples=60;range=1:(nsamples); r=sin(0.16*(range'-1)*T)+square(0.2*(range'-1)*T);r(1)=0; v=0.4*sin(0.96*(range'-1)*T); y'deterministic perturbation y'simulate loop: [cng,cdg]=feedback(ngz,dgz,nkz,dkz,-1); Y.g/(1+gk) [t1,t2]=series(nkz,dkz,ngz,dgz); Y.gk [cngk,cdgk]=cloop(t1,t2,-1); Y.gk/(1+gk) [cnk,cdk]=feedback(nkz,dkz,ngz,dgz,-1); Y.k/(1+gk) [cn1,cd1]=feedback(1,1,t1,t2,-1); Y.1/(1+gk) for iters=1:20 Y.calculate output: [yy1,x]=dlsim(cng,cdg,uff); [yy2,x]=dlsim(cngk,cdgk,r); [yy3,x]=dlsim(cn1,cd1,v); y=yy1+yy2+yy3; Y.calculate control action (feedback): [yy1,x]=dlsim(cngk,cdgk,uff); [yy2,x]=dlsim(cnk,cdk,r-v); ufb=-yy1+yy2; e=r-y;

%prepare next iteration: uff=uff+qe*[e(2:nsamples); 0]; end

and the results are plotted in Figure 5.4. When the trials end, all the reference tracking and disturbance cancel

lation are handled by the feedforward control action, and the feedback controller receives a zero input. Of course, if would still be acting in the case of random disturbances. In that case, a reduction of the learning rate would also be advised to avoid excessive variation of U f f between trials due to those random process disturbances.

Example 5.2. A second example, over the same system, will approximate the desired control action to track a constant desired trajectory given by r(t) = 1- e-·3t .

The approximator will use a control action given by:


10 20 30 40 50 60

(a) Outputs

-40L---~--~----~--~----~--~

o 10 20 30 40 50 60

(b) Final control actions uff and u fb (u fb ~ 0)

Fig. 5.4. Example results

everything being discretised at T = 1.5 s. The adjustment calculates the parameter increments from the control increments via least squares fitting. The MATLAB® code is the same as in the previous example, except changes such as:

time=(range'-l)*T; r=1-exp(-0.3*time); v=O; Xdeterministic perturbation psi2=exp(-.3*time);psi3=ones(nsamples,1); psi4=time.*exp(-.6*time); psi=[psi2 psi3 psi4]; theta=zeros(3,1);

in the initialisation code, and

uffdes=uff+qe*[e(2:nsamples);O]; theta=inv(psi'*psi)*psi'*uffdes; uff=psi*theta;


in the iterations loop. Figure 5.5 shows the output in several iterations. The final quadratic error is 0.004. If the te-·6t regressor was not used, the error would have been 0.02. Note that very acceptable tracking is achieved with just two or three parameters. If other deterministic disturbances were present, additional regressors should be included, acting at the time points where the referred disturbances take place.

°O~--~1O----~~~---3~O----~40-----~~--~W

Fig. 5.5. Reparameterisation of open-loop actions: Plant output

5.6 Conclusions

In this chapter, the possibilities of the use of two-degrees-of-freedom structures in the context of iterative performance improvement are analysed. One particular possibility, iterative learning for repetitive tasks, has been analysed. Iterative learning control is able to achieve good tracking performance even for some non-linear processes with non-smooth non-linearities, so that at least in the repetitive case, there is no need to increase feedback gains (thus reducing robustness) to compensate for non-linearities.

Of course, apart from the approach previously presented, direct use of the most updated system model (from standard open- or closed-loop estimation techniques) can also be thought of for feedforward control blocks.

Acknowledgements

This work benefited from discussions with M. Olivares from Universidad Santa Marfa, Valparaiso, Chile. The authors thank Generalitat Valenciana and UPV for its support grant GV-OO-100-14.

iterative identification and control || open-loop iterative learning control

Documents