recursive parameter identification of hammerstein–wiener ... · pdf...

Contents lists available at ScienceDirect

Signal Processing

Signal Processing 105 (2014) 137–147

http://d0165-16

n CorrE-m

journal homepage: www.elsevier.com/locate/sigpro

Recursive parameter identification of Hammerstein–Wienersystems with measurement noise

Feng Yu, Zhizhong Mao n, Mingxing Jia, Ping YuanCollege of Information Science and Engineering, Northeastern University, Shenyang, China

a r t i c l e i n f o

Article history:Received 25 October 2013Received in revised form8 May 2014Accepted 16 May 2014Available online 2 June 2014

Keywords:Hammerstein–Wiener systemRecursive identificationConvergenceHeteroscedastic measurement noise

x.doi.org/10.1016/j.sigpro.2014.05.03084/& 2014 Elsevier B.V. All rights reserved.

esponding author.ail address: [email protected] (Z.

a b s t r a c t

A recursive algorithm is proposed in this paper to identify Hammerstein–Wiener systemswith heteroscedastic measurement noise. Based on the parameterization model ofHammerstein–Wiener systems, the algorithm is derived by minimizing the expectationof the sum of squared parameter estimation errors. By replacing the immeasurableinternal variables with their estimations, the need for the commonly used invertibilityassumption on the output block can be eliminated. The convergence of the proposedalgorithm is also studied and conditions for achieving the uniform convergence of theparameter estimation are determined. The validity of this algorithm is demonstrated withthree simulation examples, including a practical electric arc furnace system case.

& 2014 Elsevier B.V. All rights reserved.

1. Introduction

Many nonlinear systems can be classified as a type ofnonlinear block oriented system, which consists of thecombination of linear dynamics blocks and nonlinearstatic blocks. Two typical nonlinear block oriented systemsare known as Hammerstein (H) systems that are composedof a static nonlinear block followed by a linear dynamicone; and Wiener (W) systems that have the inverseconfiguration of H systems. Many efforts have beendevoted to identify these two types of nonlinear blockoriented systems [1–8]. In general, these algorithms can bedivided into two categories: iterative methods and recur-sive methods [9]. Usually, we use iterative methods for off-line algorithms and recursive methods for on-line ones[10]. The recursive identification method is important notonly because it can be computed in real time, e.g. in theapplications of evolving fuzzy system identification, neuralnetworks training and machine learning [11–13], but also

Mao).

because it can be combined with on-line control strategiesto achieve adaptive control algorithms [14,15].

Hammerstein–Wiener (H–W) systems are more com-plex and are composed of a linear dynamic block betweentwo static nonlinear blocks. As the identification algo-rithms for H and W systems have approached maturity,study on the identification of this type of nonlinearsystem has attracted increasing attention and some itera-tive methods have been published in previous studies[16–20]. The first study focused on the identification ofH–W systems may be [16], in which a two-stage algorithmis proposed to identify an approximate model of H–Wsystems. Based on this method, many other algorithmshave been proposed to identify H–W systems with mea-surement noise recursively [21–25], such as the blandapproach [21], the two-stage recursive identification algo-rithm[22] and the hierarchical least squares estimationalgorithm [24]. However, the application of these algo-rithms is usually limited by a restrictive condition, namelythe output nonlinearity is invertible; and the variance ofthe noise is usually assumed to be constant; furthermore,some methods lack strict convergence analysis for theparameter estimation, which, as we know, is an important

www.sciencedirect.com/science/journal/01651684

www.elsevier.com/locate/sigpro

http://dx.doi.org/10.1016/j.sigpro.2014.05.030



http://crossmark.crossref.org/dialog/?doi=10.1016/j.sigpro.2014.05.030&domain=pdf



mailto:[email protected]


F. Yu et al. / Signal Processing 105 (2014) 137–147138

part for an identification algorithm. Therefore, much workremains to be done for the identification of H–W systems.

The objective of this paper is to propose a recursiveidentification algorithm to identify the parameter in theparameterized H–W system model by using availableinput and output data and to study the property of thealgorithm involved. With the parameterization model ofH–W systems, the proposed algorithm is derived byminimizing the expectation of the sum of squared para-meter estimation errors. This idea derives from theextended Kalman filter (EKF) algorithm [26]. In this study,other than the previous contributions, the noise is nolonger assumed to be homoscedastic, but a generalizedautoregressive conditional heteroscedasticity (GARCH)process; and the requirement for the commonly usedinvertibility assumption on the output nonlinearity canbe eliminated by replacing immeasurable internal vari-ables with their estimations. In addition, we study theconvergence properties of the proposed algorithm inten-sively and obtain the conditions for the uniform conver-gence of parameter estimations, which back up the validityof the proposed algorithm theoretically.

This paper is organized as follows: Section 2 describes theparameterization model of H–W systems; Section 3 presentsthe derivation of the proposed recursion algorithm; Section4 gives an analysis on the convergence property of theproposed algorithm; Section 5 discusses several problemsrelated to the algorithm; Section 6 shows three simulationexamples. Finally, we conclude the paper in Section 7.

2. Model parameterization

Fig. 1 shows the general scheme of the H–W systemswith measurement noise, where ut is the system input; xtis the true system output (the noise-free system output);yt is the measurement of xt; εt is the measurement noise;vt and wt are the internal variables; vt, wt and xt areimmeasurable; G(z�1) is the linear block transfer function;FI (ut) and FO(wt) are the functions of input and outputnonlinear blocks, respectively.

Without loss of generality, the input and output non-linear blocks are approximated by the linear combinationsof known basis functions as follows:

vt ¼ FIðutÞ ¼ ∑p

i ¼ 1cif I;iðutÞ ¼ C

Tf IðutÞ ð1Þ

xt ¼ FOðwtÞ ¼ ∑q

i ¼ 1dif O;iðwtÞ ¼D

Tf OðwtÞ ð2Þ

where p and q are the numbers of basis functions; fI,i and fO,iare the basis functions of input and output nonlinear blocks,respectively; C ¼ ½c1; c2;…; cp�T and D¼ ½d1; d2;…; dq�T arethe coefficient vectors of basis functions. f IðUÞ ¼ ½f I;1ðUÞ;f I;2ðUÞ;…; f I;pðU Þ�T ; f OðUÞ ¼ ½f O;1ðUÞ; f O;2ðUÞ;…; f O;qðU Þ�T .

1( )G z− ( )O tF wut vt wt yt

( )I tF uxt +

tε

Fig.1. H–W systems with measurement noise.

The linear block is described by a discrete transferfunction:

Gðz�1Þ ¼ b0þb2z�1þ⋯þbmz�m

1þa1z�1þ⋯þarz� r ð3Þ

where A¼ ½1; a1;…; ar �T and B¼ ½b0; b1;…; bm�T are thecoefficient vectors of linear block; z�1 is the delay opera-tor; r andm are the orders of the discrete transfer function.

We adopted the following assumptions for the H–Wsystems.

Assumption 1. The structure of the parameterizationmodel of H–W systems is known. That means the ordersof linear transfer functions r and m, the basic functions fI,iand fO,i and their numbers p and q are known.

Assumption 2. The chosen set of the basis function foroutput nonlinear is twice differentiable with respect to theparameters to be estimated.

Assumption 3. The noise term we considered here ismodeled as a GARCH process with following properties:E½εt jF t�1� ¼ 0; a:s: ð4Þ

and

E½ε2t��F t�1� ¼ δtðλÞrδo1; a:s: ð5Þ

where {F t�1} is the s function sequence generated bymeasurable {yt�1}; δt is the variance of the noise with theupper bound δ; δt is the function of coefficient vector λ asbelow [27]

δtðλÞ ¼ λ0þ ∑n1

i ¼ 1λiε

2t� iþ ∑

n1 þn2

j ¼ n1 þ1λj�n1δt� j�n1

ð6Þ

Remark. Assumption 1 implies that the system structuredoes not need to be estimated. Otherwise, the identificationwill become a problem of structural estimation [21];Assumption 2 is made for the derivation of the algorithm. Itis worth to note that the basis functions usually used, such aspolynomial function, cubic spline function, etc., all satisfy thisassumption. Furthermore, only the differentiability of outputblock basis function is required. Thus, the input block can berepresented by the linear combination of non-differentiablebasis functions, namely the proposed algorithm can be used toidentify the H–Wsystemwith the input block containing non-differentiable nonlinearity, such as dead-zone nonlinearity,preload nonlinearity and piecewise-linear nonlinearity.Assumption 3 means {εt} is the stochastic noise with a zeromean and a bounded time-varying variance [28], which ismodeled as a GARCH process. This condition is much looserthan the commonly used one in which the noise is assumedto be homoscedastic [10,29]. The upper bound of the variancemeans the GARCH process is stable [27].

Based on (1)–(3), the parameterized model of H–Wsystems can be represented as below

yt ¼ xtþεt ¼ hðUt ;Wt�1; θÞþεt ð7Þ

where Ut ¼ ½ut ;ut�1;…;ut�m�T ; Wt�1 ¼ ½wt�1;wt�2;…;

wt� r�T ; θ¼ ½AT;BT ;C

T;D

T �T .

F. Yu et al. / Signal Processing 105 (2014) 137–147 139

Eq. (3) implies the following formula:

wt ¼ BTVt�ATWt�1 ð8Þ

where A¼ ½a1; a2;…; ar �T ; Vt ¼ ½vt ; vt�1;…; vt�m�T .Note that wt is computed by (8) recursively and will

depend only on the input signal Vt and all initial values ofwt. Therefore, Wt�1 can be omitted from (7) to yield thefollowing expression:

yt ¼ xtþεt ¼ hðUt ; θÞþεt ð9Þ

Therefore, the proposed recursive identification algo-rithm is to identify the parameter vector θ in the para-meterized H–W system model (9) by using the availableinput and output data Ut and yt.

3. Parameter estimation

In this section, the recursive algorithm for identificationof the H–W system with measurement noise will bederived.

First, the problem of parameter estimation uniquenesswill be discussed. In the concrete, the vectors B, C and Dare not unique because the intermediate signals vt and wt

are immeasurable so that arbitrary gains can be distrib-uted among the linear block and the nonlinear blocks. Toobtain a unique parameterization model, two of the threeblocks are needed to be normalized. A simple but effectivesolution to this problem is to set c1 and d1 as a constantvalue of 1 and exclude them from parameter updating[21,24]. Therefore, the parameter vectors C and D can berewritten as follows:

C ¼ ½1; c2;…; cp�T ¼ ½1;CT �T

and

D¼ ½1; d2;…; dq�T ¼ ½1;DT �T :

Obviously, only the updating of the parameter vectorθ¼ ½AT ;BT ;CT ;DT �T is required.

The parameter estimation errors are defined as below

~θ t ¼ θt�θ ð10Þ

where θt is an estimate of θ.Inspired by the EKF Algorithm, minimizing the expec-

tation of the sum of squared parameter estimation errorsserves as the criterion function defined as

Vt ¼ trðE½ ~θ t ~θTt jF t�1�Þ ð11Þ

The parameter estimation vector is updated accordingto

θt ¼ θt�1þKtet ð12Þ

where et is the model estimation error; Kt is theupdate gain.

Based on (10) and (12), the parameter estimation errorvector can be rewritten as below

~θ t ¼ θt�θ¼ ~θ t�1þKtet ð13Þ

The analytical expression of et is required to obtain theanalytical expression of ~θ t . ~xt is defined as below

~xt ¼ xt�xt ¼ hðUt ; Wt�1; θt�1Þ�hðUt ; θÞ ð14Þwhere xt is the estimate of xt; θt is the estimate of θ.

Notice that the internal variable vector Wt�1 is immea-surable. We use the obtained parameter estimates andinput signals to estimate this internal variable. Therefore,w is recursively computed by the following formula:

wt ¼ BTt V t� A

Tt Wt�1 ð15Þ

with

vt ¼ CTf IðutÞ ð16Þ

Therefore, (14) can be rewritten as

~xt ¼ hðUt ; θt�1Þ�hðUt ; θÞ ð17ÞBased on (17), we can obtain the approximate form of

~xt by taking the first-order Taylor expansion with respectto θ around ½Ut ; θt�1�:~xt ¼ hðUt ; θt�1Þ�h½Ut ; ðθt�1� ~θt�1Þ�

¼ hðUt ; θt�1Þ�hðUt ; θt�1ÞþHTt�1

~θt�1þOt�1

¼HTt�1

~θt�1þOt�1 ¼HTt�1

~θ t�1þOt�1 ð18Þwhere H

Tt�1 ¼ ∂f ðUt ; θt�1Þ=∂θt�1; Ot�1 is the Lagrange

remainder term.As the values of the first elements of ~A, ~C and ~D are

constant and equal zero, we can obtain the vector Ht�1 bydeleting the elements in the same position as the zeroconstant elements in ~θt�1 from Ht�1. Then we haveH

Tt�1

~θt�1 ¼HTt�1

~θ t�1.Therefore, neglecting the Lagrange remainder term

Ot�1, we have

hðUt ; θt�1Þ � hðUt ; θt�1Þ�HTt�1

~θ t�1 ð19ÞThus, ~xt is approximated from (17) and (19) as below

~xt �HTt�1

~θ t�1 ð20ÞDefined the model estimation error et as

et ¼ xt�yt ¼ xt�xt�εt ¼ ~xt�εt ð21ÞAccording to (20), (21) is approximated as

et �HTt�1

~θ t�1�εt ð22Þand the parameter estimation errors (13) can be repre-sented by

~θ t ¼ ~θ t�1þKtðHTt�1

~θ t�1�εtÞ ð23ÞThe expectation of the covariance matrix of parameter

estimation errors is required to obtain the optimal gainthat minimizes (11). From (23), we get the followingrecursive formula:

Pt ¼ Pt�1þKtβTt þβtK

Tt þKtαtK

Tt ð24Þ

where Pt ¼ E½ ~θ t ~θTt jF t�1�. The scalar αt and the vector βt aredefined as:

αt ¼ ðHTt�1Pt�1Ht�1þR2Þ ð25Þ

βt ¼ Pt�1Ht�1 ð26Þ


where the positive number R2 is defined as the estimateof δt.

Notice that αt40 if Pt�1, Ht�1 and R2 are nonzero, andVt in (11) is a real number. According to (24), the firstderivative of Vt with respect to Kt is

∂Vt

∂Kt¼ 2βtþ2αtKt ð27Þ

Furthermore, the second derivative 2αtI is positive definitebecause αt40, where I denotes an identity matrix. Thus,let (27) equal to 0, the optimal gain that minimizes (11)can be obtained as

Kt ¼ �α�1t βt ð28Þ

Therefore, the update of Pt can be rewritten as

Pt ¼ Pt�1þKtβTt ð29Þ

Remark 3.1. The optimality mentioned above is tenableprovided that the parameter estimates are sufficientlyclose to the true values. This is because the derivation of(19), (20) and (22) is based on the assumption that thecontribution of the Lagrange remainder Ot�1 to the truemodel estimation error ~xt can be neglected. This assump-tion is reasonable only if ~θ t is sufficiently small; otherwise,the formula (24) cannot give an approximate expression ofPt, and Kt is suboptimal.

Remark 3.2. The commonly used assumption for theinvertibility of output nonlinearity is no longer necessarybecause the estimates of the internal variable wt arecalculated by (15) recursively; otherwise, the inversefunction of output nonlinearity is required to estimatethe internal variable wt.

The recursive steps of the proposed algorithm aresummarized as follows:

Kt ¼ �Pt�1Ht�1ðHTt�1Pt�1Ht�1þR2Þ�1 ð30Þ

θt ¼ θt�1þKtet ð31Þ

Pt ¼ Pt�1�ðHTt�1Pt�1Ht�1þR2Þ�1Pt�1Ht�1H

Tt�1Pt�1

ð32Þ

Remark 3.3. From (30)–(32), it is obvious that the maindifference between the EKF algorithm and the proposedalgorithm is the recursive step of Pt. The proposed algo-rithm looks more like the Recursive Least Squares (RLS)method in form. However, the definitions of the symbols,especially the calculating of vector Ht, are quite different.These differences make the proposed algorithm moresuitable for identifying time-invariant H–W system.

4. Convergence analysis

In this section, we study the convergence properties ofthe proposed algorithm and obtain the uniform conver-gence condition for parameter estimation.

Assumption 4. There exists a positive number η to pro-vide the following:

limt-1

1t∑t�1

i ¼ 0HiH

Ti ¼ ηI ð33Þ

Lemma. When applying algorithm (30)–(32) to system (9)based on Assumptions 1–4, the following inequality holds:

∑1

t ¼ 1

R�4δtHTt�1PtHt�1

jP�1t j1=2

o1 ð34Þ

Proof. Based on (32), we have

P�1t ¼ P�1

t�1þR�2Ht�1HTt�1 ð35Þ

Therefore, the following formulas are obtained by con-sidering Assumption 4:

limt-1

P�1t ¼ P�1

0 þR�2 limt-1

∑t

i ¼ 1HiH

Ti ¼ P�1

0 þ limt-1

tηI ¼1

ð36Þand

jP�1t�1j ¼ jP�1

t jð1�R�2Ht�1PtHTt�1Þ ð37Þ

Then we obtain

R�2Ht�1PtHTt�1 ¼

jP�1t j�jP�1

t�1jjP�1

t jð38Þ

Therefore

∑1

t ¼ 1

R�4δtHTt�1PtHt�1

jP�1t j1=2

¼ ∑1

t ¼ 1

R�2δtðjP�1t j�jP�1

t�1jÞP�1t

�� 3=2rR�2δ ∑

1

t ¼ 1ð2jP�1

t�1j�1=2�2jP�1t j�1=2Þ

¼ 2R�2δðjP�10 j�1=2�jP�1

1 j�1=2Þo1 ð39ÞEnd of Lemma proof. □

We introduce a coefficient μt�1 [30]

μt�1 ¼Ot�1

~xtð40Þ

Theorem. If

�ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiR2ðHT

t�1Pt�1Ht�1þR2Þ�1q

oμt�1

offiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiR2ðHT

t�1Pt�1Ht�1þR2Þ�1q

ð41Þ

holds, the algorithm can ensure that

~θ t-0; a:s: ð42Þwhen applying algorithm (30)–(32) to system (9) onAssumptions 1–4.

Proof. Using the Lyapunov function

Lt ¼ ~θTt P

�1t

~θ t ð43ÞFrom (35), we have

Pt ¼ R2Pt�1ðHTt�1P

�1t�1Ht�1þR2Þ�1 ð44Þ


Based on (23), (30) and (43) and considering (35) and(44), the Lyapunov function is transformed to

Lt ¼ Lt�1þR�2 ~θTt�1Ht�1H

Tt�1

~θ t�1

�2R�2 ~θTt�1Ht�1etþR�4HT

t�1PtHt�1e2t ð45ÞAs the Lagrange remainder term Ot�1 is neglected in

(16), HTt�1

~θ t�1 is an approximation of the true modelestimation error ~xt . Therefore, an unknown coefficient μt,as defined in (40), is introduced to account for theseresidues and correct this approximation. The followingformula is obtained by substituting (40) and (18) into (21):

et ¼ ð1�μt�1Þ�1HTt�1

~θ t�1�εt ð46ÞSubstituting (46) into (45), we obtain

Lt ¼ Lt�1þR�2 ~θTt�1Ht�1H

Tt�1

~θ t�1

�2R�2 ~θTt�1Ht�1½ð1�μt�1Þ�1HT

t�1~θ t�1�εt �

þR�4HTt�1PtHt�1½ð1�μt�1Þ�1HT

t�1~θ t�1�εt �2 ð47Þ

Thus

EðLt jF t�1Þ ¼ Lt�1þR�2 ~θTt�1Ht�1H

Tt�1

~θ t�1

�2ð1�μt�1Þ�1R�2 ~θTt�1Ht�1H

Tt�1

~θ t�1

þR�4HTt�1PtHt�1½ð1�μt�1Þ�2HT

t�1~θ t�1 ~θ

Tt�1Ht�1þδt �

¼ Lt�1�R�2HTt�1

~θ t�1 ~θTt�1Ht�1½2ð1�μt�1Þ�1

�1�ð1�μt�1Þ�2R�2HTt�1PtHt�1�

þR�4δtHTt�1PtHt�1; a:s: ð48Þ

Moreover

ELt

jP�1t j1=2

��F t�1

!¼ Lt�1

jP�1t j1=2

þR�4δtHTt�1PtHt�1

jP�1t j1=2

�R�2HTt�1

~θ t�1 ~θTt�1Ht�1

jP�1t j1=2

2ð1�μt�1Þ�1�1h

�ð1�μt�1Þ�2R�2HTt�1PtHt�1

i

¼ Lt�1

jP�1t�1j1=2

�Lt�11

jP�1t�1j1=2

� 1

jP�1t j1=2

!


jP�1t j1=2

�R�2HTt�1


jP�1t j1=2

�½2ð1�μt�1Þ�1�1�ð1�μt�1Þ�2R�2HTt�1PtHt�1�; a:s:

ð49ÞIt is obvious that

R�2HTt�1


jP�1t j1=2

Z0 ð50Þ

If

2ð1�μt�1Þ�1�1�ð1�μt�1Þ�2R�2HTt�1PtHt�1Z0 ð51Þ

holds, the following inequation is true:

ELt

jP�1t j1=2

��F t�1

!r Lt�1

jP�1t�1j1=2

�Lt�11

jP�1t�1j1=2

� 1

jP�1t j1=2

!


jP�1t j1=2

; a:s: ð52Þ

Eq. (53) is obtained by solving equation (51):

�ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1�R�2HT

t�1PtHt�1

qrμt�1r

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1�R�2HT

t�1PtHt�1

qð53Þ

And (41) is obtained by substituting (44) into (53).Considering the conclusion (34) of the lemma and

applying the martingale convergence theorem to (52),Lt

jP � 1t j1=2 converges a.s. to a finite random variable L as

follows:

LtjP�1

t j1=2-L; a:s: ð54Þ

Thus

Lt ¼OðjP�1t j1=2Þ; a:s: ð55Þ

where g1ðtÞ ¼O½g2ðtÞ� if there exists a positive constant τsuch that jg1ðtÞjrτg2ðtÞ.

According to the definition of Lt

J ~θ t J2rLt

λminðP�1t Þ

ð56Þ

From (55) and (56), we obtain

J ~θ t J2 ¼OjP�1

t j1=2λminðP�1

t Þ

" #; a:s: ð57Þ

where λ( � ) denotes the eigenvalue of the matrix.From Assumption 4, it follows that:

limt-1

jP�1t j1=2

λminðP�1t Þ

¼ limt-1

P�10 þ∑t�1

i ¼ 0HiHTi

�� 1=2λmin P�1

0 þ∑t�1i ¼ 0HiH

Ti

� �

¼ limt-1

P�10 þtηI

�� 1=2λminðP�1

0 þtηIÞ¼ lim

t-1jtηIj1=2λminðtηIÞ

¼ 0

ð58ÞTherefore

J ~θ t J2-0; a:s: ð59ÞThus

~θ t-0; a:s: ð60ÞEnd of Theorem proof. □

Remark 4.1. A symmetric positive initialization of Pt isnecessary based on its definition. Therefore, the initializa-tion of Pt can be chosen as follows:P0 ¼ l1 � I ð61Þwhere l1 is a large positive real scalar. This symmetry isguaranteed by (32) during the whole identification pro-cess. This characteristic and the initialization method of P0are similar as RLS and EKF algorithm.

Remark 4.2. As mentioned in Remark 3.1, this algorithm isavailable only if the previous parameter estimates areclose to the true values. The value of μ presents the“distance” between the parameters and their estimatesto a certain extent. According to its definition, μ is close tozero if the estimates are close to the true parameters;otherwise, μ may be far from zero. Therefore, condition(41) can be considered as a sufficient condition for


parameter convergence. Together with this condition andRemark 3.1, initial parameter estimates are suggested to beclose to the true values for the convergence of parameterestimation. This good initial condition is easy to obtain forthe systems with a priori knowledge; but difficult forunfamiliar ones. In this case, (41) may be unsatisfied andthe parameter estimation may be unstable. However, anuniversal approach to solve this problem may not beavailable. This is due to many unexpected characteristicsin which the nonlinear blocks result.

However, some algorithms can be used to provide agood initial condition for the practical case. One possiblemethod is to rewrite model (6) as a linear parameterrepresentation:

yt ¼ΦtΘt�1 ð62Þ

where Φt is the data vector that contains only measurablesignal and Θt�1 is an over-parameter vector. With an over-parameter vector estimated by using the recursive leastsquare scheme, we may obtain nearly consistent initialparameter estimates by combining the parameter separat-ing technology, such as the singular value decompositionsmethod [16,21] or the average method [10,23].

Remark 4.3. According to the definition of Ht, Ht is relatedto the system input data. Therefore, Assumption 4 can beconsidered as the persistent excitation condition of uni-form convergence in parameter estimation. In addition,Assumption 4 is similar with a variant of the uniformobservability condition in the EKF algorithm, where theJacobian matrix of the system state function in the EKF canbe considered as an identity matrix in this algorithm. Thedetails refer to Definition 4.2 and Lemma 4.2 in [31].

Remark 4.4. R2 is the estimate of noise variance. However,the unbiased estimation of this value is not necessarybecause we do not assume that the estimation of δt isunbiased during the proof of the Theorem. Therefore, R2

can be chosen as any other positive number. However,choosing an appropriate value of R2 is difficult because R2

acts as a scaling factor for all elements of the gain vector. Asufficiently large value of R2 may enhance the convergenceproperty of the algorithm but decrease the convergencerate; conversely, a small value of R2 may increase theconvergence rate of the algorithm but degrade the stabi-lity. A tradeoff between the convergence rate and thestability of the proposed algorithm is to set:

R2 ¼ l2HTt�1Pt�1Ht�1þ l3 ð63Þ

where l2 and l3 are the positive constants.Generally speaking, l3 is set to a “smaller” value due to

the observation that a larger l3 decreases the convergencerate according to formula (30). It is still an open issue onsetting l2. In practical, we can use different l2 parallelly inidentification processes and choose the best result as theestimate.

Moreover, R2 cannot equal to zero even if the systemruns without the interference of noise. Here, a positive R2

can be considered a weighted factor to prevent a large Kt

and maintain the stability of the algorithm.

Remark 4.5. The conclusion of Theorem that the estima-tion error convergences to zero under certain assumptionsis quite different from the convergence property of EKFalgorithm, in which only the bounded state estimationerror can be obtained under the similar assumptions.

5. Discussions

5.1. Identification of continuous H–W systems

The proposed algorithm is derived based on discreteH–W systems. For continuous H–W systems, sampling isrequired. If sampling period satisfies Whittaker–Shannonsampling theorem for linear continuous systems, outputcan be retrieved accurately from sampling value. Consider-ing that there is no dynamic characteristic in the input andoutput blocks of H–W systems, system output can also beretrieved accurately provided that the continuous linearblocks of H–W systems satisfies Whittaker–Shannonsampling theorem. Practically, the selection of samplingperiod is also related to the dynamic behavior of systems,the computational capability of computers and practicalexperience, etc. Therefore, it is better to determine thesampling period on the basis of experiment.

5.2. Mismatch of system structure

As system structure is assumed to be known inAssumption 1, a priori knowledge of the original systemis needed. However, model mismatch may occur in anunfamiliar system. Therefore, it is necessary to studyconvergence property in case there is any model mis-match. Model mismatch occurs when the order of linearblock model is selected incorrectly or unsuitable basicfunctions are used to fit the input and output nonlinearityso that the model fitting error is too large to be neglected.On this occasion, we have

xt ¼ hðUt ; θbestÞþΔðUtÞ ð64Þwhere ΔðUtÞ is the fitting error; θbest denotes the parametervector which, in the input space U, gives the “best” fittingeffect on original system. Here, we redefine μt�1 and ~xt as

μt�1 ¼Ot�1þΔðUtÞ

~xtð65Þ

~xt ¼HTt�1

~θ t�1þOt�1þΔðUtÞ ð66ÞIt is noted from (65) and (63) that if Ot�1 and ΔðUtÞ aresufficiently small, condition (41) can also be satisfied.According to Remark 4.2, the algorithm will ensureJ ~θbest;t J2 is bounded. However, condition (41) cannot besatisfied forever because ΔðUtÞ cannot be eliminated byadjust θbest . As a result, J ~θbest;t J2 cannot converge to zero.Therefore, the proposed algorithm can only ensure thatthe parameter estimates are bounded provided that θbest;0is close to θbest and ΔðUtÞ and noise are sufficiently small.

5.3. Process noise

We have studied the convergence property of thealgorithm when the system is interfered by measurement

Fig. 2. Parameter estimation error.


noise. If noise acts on the internal variables vt or wt, it iscalled process noise. For measurement noise, according to(4), we have

E½yt jF t�1� ¼ xt ; a:s: ð67ÞFor process noise, however, due to the nonlinear mappingof output nonlinearity, we do not have the similar prop-erty, but have

E½yt jF t�1�axt ; a:s: ð68ÞTherefore, the parameter estimate is biased.

6. Numerical example

In this section, we use three simulation examples toshow the validity of the proposed algorithm.

Example 1. We consider the H–W system in which theinput and output nonlinear blocks are represented aspolynomial functions:

v¼ uþ0:5u2�0:7u3

x¼w�0:5w2þ0:2w3

and the discrete transfer function of linear block isrepresented as below

Gðz�1Þ ¼ 0:25þ0:17z�1þ0:1z�2

1�0:8z�1þ0:5z�2�0:3z�3

P0 and θ0 are initialized as 108I and ð0;…;0ÞT , respec-tively. The system input is a white Gaussian noise (WGN)sequence with a zero mean and a variance of 1. The noise isgenerated by a GARCH process and the variance is deter-mined by the following formula:

δtðλÞ ¼ 0:00001þ0:25ε2t�1þ0:2ε2t�2þ0:2δt�1þ0:15δt�2

The simulation comprises 5000 samples. In order to studythe efficiency of R2 on the convergence property of theproposed algorithm, identification is performed with threedifferent l2 as

R21 ¼ 6HT

t�1Pt�1Ht�1þ0:01

R22 ¼ 3HT

t�1Pt�1Ht�1þ0:01

and

R23 ¼ 0:01

Fig. 2 shows the convergence rate of the sum of thesquared parameter estimation error. The parameter esti-mation error converges to zero gradually when R2

1 is usedin the identification. This result confirms the validity of theproposed algorithm. The fluctuation of the parameterestimation error during the convergence process is causedby the interference of the noise with time-varying var-iance. An imperfectly convergent result is obtained whenR22 is used. This may be because the convergence domain

defined by (41) is not sufficiently large with l2¼3 duringall the identification process. Therefore, the estimatesdeviate the convergent direction sometimes. A divergentresult is obtained from the simulation when R2

3 is used.This is due to the small R2 that may not support asufficiently large convergence domain defined by (41).

Comparison of the three results shows the availability ofthe form of R2 we suggested.

Example 2. In this example, we study the convergenceproperty when there is mismatch on the order of linearblock and introduce the output error test method toestimate the linear block order. The H–W system is thesame as in Example 1.

In the simulation, we use the models with the follow-ing different pairs of order (r, m) to identify the originalsystem: model.1: (1, 2); model.2: (2, 2); model.3: (2, 3);model.4: (3, 3) and model.5: (3, 4). It is noted that model.3matches the order of original system. The numbers ofnonlinear basis functions p and q are equal to the originalsystem. The noise is generated by a WGN sequence in thisexample and the signal-to-noise rate (SNR) is 20 dB. R2 isset to be

R2 ¼ 6HTt�1Pt�1Ht�1þ0:01

The initialization of P0 and θ0 are the same as in Example1. All the identification processes with different models areperformed with 5000 samples.

Fig. 3 shows the model estimation error of each modeland Fig. 4 shows the sum of the squared parameterestimation error of the nonlinear blocks of each modelfrom 3000th to 5000th sampling point. As shown in thesetwo figures, all the errors are bounded. This may bebecause the formula (41) is satisfied so that the parameterestimates converge to a bounded domain. Compared withthe model.3 (the model with the same order as the originalsystem), the estimation errors and the parameter estima-tion errors of the model.1 and model.2 (reduced-ordermodels) are larger; while the case of model.4 and model.5(higher-order model) are pretty much the same. This isbecause a higher-order model can give an identical input–output characteristic with respect to the original system,and a reduced-order model cannot. Different input–outputcharacteristics of the linear blocks lead to larger modelestimation errors and oscillating nonlinear parameterestimates. Therefore, the following three conclusions canbe drawn: (1). Provided that the formula (41) is satisfied,the parameter estimation of mismatch models can still be

Fig. 4. Nonlinear block parameter estimation errors of each model.

u v L +ε

arcR

Fig. 5. Structure of EAF system.

Fig. 3. Output estimation errors of each model.


bounded. (2). Reduced-order model will degrade theestimation result. (3). Higher-order model can obtain thesimilar estimation result as the model with originalsystem order.

Practically, if there is no prior knowledge of the order ofsystem linear block, we can use different pairs of orderconcurrently during identification process and choose thebest pair of order by comprehensive survey of modelestimation error and smoothness of the parameter estima-tion process.

Choosing the number of nonlinear basis function issimilar as choosing the linear block order. The details canbe found in [21].

Example 3. A simulation of the identification of an electricarc furnace (EAF) system is given in this example. EAF is adevice to melt scrap and direct reduced iron for steelproduction. EAF system achieves the purpose of optimiz-ing output power by controlling the length of arc. Thestructure of EAF system is shown in Fig.5. The proportionalvalve output v drives the hydraulic cylinder to adjust the

length of the arc L and control the system output Rarc,namely arc resistance, at the setting points. Due to thestatic friction, proportional valve contains a dead-zonenonlinearity. The function of the dead-zone nonlinearitycan be expressed as

v¼m1ðu�p1Þ uZu4p10 p2rurp1m2ðu�p2Þ uruop2

8><>:

The dynamic characteristic of hydraulic cylinder can beexpressed as a continuous three-order linear system:

GðsÞ ¼ LðsÞvðsÞ ¼

b0sþb1a0s3þa1s2þa2s

The model of power supply system can be expressed as anequivalent to an R–L circuit:

U2p ¼ ðRdIarcþUarcÞ2þðXdIarcÞ2

where Up is the transformer secondary voltage, V; Rd is thecircuit resistance, mΩ; Xd is the circuit inductance, mΩ; Iarcis the arc current, A; Uarc is the arc voltage, V; and we have

Uarc ¼ aarcþbarcL¼ IarcRarc

where aare is the voltage drop of arc anode and cathode, V;bare is the arc voltage drop gradient, V/cm. It is obviousthat the relationship between Rare and L is nonlinear. In thesimulation, the parameters of EAF system are set as follows[32]:

1).
Proportional valve: m1 ¼ 1, m2 ¼ �1, p1 ¼ 1,p2 ¼ �1,u¼ 10, u ¼ �10.
2).
Hydraulic cylinder: b0 ¼ 7:2, b1 ¼ 104, a0 ¼ 1, a1 ¼ 9,a2 ¼ 130.
3).
Power supply system: Up ¼ 320, Rd ¼ 0:4, Xd ¼ 0:7,aarc ¼ 30, barc ¼ 10.
The EAF system can be modeled as an H–W system.Practically, Rare is available due to the Up and Iare aremeasurable and Rd and Xd are known. u is measurable,but v and L are not. Therefore, only the input and outputare available, which meets the requirement of the pro-posed algorithm, namely using the available input andoutput data to identify the system parameter.

In order to identify the EAF system, the systemmodel isneeded to be parameterized. We express the characteristicof proportional valve approximately as a piecewise-linearfunction:

vt ¼ FIðutÞ ¼ ∑p

i ¼ 1cif I;iðutÞ ¼ C

Tf IðutÞ

where f I;iðutÞis a “tent function” defined as

f I;1ðutÞ ¼u2 �utu2 �u1

u1rutou2

0 u2rutoup

(


f I;jðutÞ ¼

0 urutouj�1ut �uj� 1uj �uj� 1

uj�1rutouj

ujþ 1 �ut

ujþ 1 �ujujrutoujþ1

0 ujþ1rutoup

8>>>>><>>>>>:

j¼ 2;…; ðp�1Þ

Fig. 6. Estimation of dead

Fig. 7. Estimation of

f I;pðutÞ ¼0 u1rutoup�1ut �up� 1up �up� 1

up�1rutoup

8<:

where ui (i¼1…p) is the knot of piecewise-linear function.According to the upper and lower bounds of input signal,we define the knots ½u1;u2;…;up� as ½�10; �6; �2;�1;0;1;2;6;10�.

-zone nonlinearity.

arc nonlinearity.

Fig. 8. Estimation of parameter vector A.


By selecting the sampling period as 0.1 s, the discretetransfer function of hydraulic cylinder becomes

Gðz�1Þ ¼ Bðz�1ÞAðz�1Þ

¼ 0:0378z�1þ0:0333z�2þ0:0097z�3

1�1:637z�1þ1:044z�2�0:4066z�3

The nonlinear characteristic of arc is approximatelyexpressed as a five-order polynomial function:

Rarc ¼ ∑5

i ¼ 1diL

i

P0 and θ0 are initialized as 108I and ð0;…;0ÞT , respec-tively. The system input signal is generated by a uniformdistribution sequence on [�10, 10]. The noise is generatedby a WGN sequence and the SNR is 20 dB. R2 is set as

R2 ¼ 20HTt�1Pt�1Ht�1þ1

The continuous EAF system is simulated for 400 s (4000sample points). Figs. 6 and 7 show the estimation results ofthe dead-zone nonlinearity and arc nonlinearity at differ-ent sampling points, respectively. The differences in coor-dinate are caused by removing the gain from input andoutput nonlinearity to the linear block. The estimates ofthe nonlinearities match the true ones gradually in bothfigures. Fig. 8 shows the convergence cures of parametervector A. All the estimates converge to constant valueswhich are very close to the elements of A. Therefore, theidentification via the proposed algorithm is successful inthis practical system.

7. Conclusions

In this paper, a recursive identification method isproposed to estimate the parameters of H–W systems.The proposed algorithm is based mainly on minimizingthe expectation of the sum of squared parameter errors.As the unmeasureable internal variables are replaced bytheir estimates, the commonly used invertibility assump-tion on the output block is no longer needed. The mea-surement noise we considered is heteroscedastic, muchstricter than the homoscedastic one that is usuallyassumed. The convergence property of the proposed algo-rithm is intensively studied and the conditions for theuniform convergence of parameter estimations are also

obtained. Furthermore, the problems related to the pro-posed algorithm are discussed and it is concluded that theparameter estimate is still bounded provided that there isany mismatch on system model or process noise acting onthe system. The validity of the proposed algorithm isdemonstrated with three simulation examples, includinga practical EAF system case.

In practical, it frequently occur that system parameterand even structure may be subject to the changes ofsystem state, operation environment and modes duringthe identification process. At this time, the system shouldbe taken as a time-varying system. As the algorithmproposed in this paper is used for identifying time-invariant H–W system, further researches should focuson testing and modifying the proposed algorithm fortracking time-varying H–W system.

Acknowledgments

This work was supported by the National NaturalScience Foundation of China, People’s Republic of China(Nos. 61074098, 61203103, and 61333006) and the Funda-mental Research Funds for the Central Universities Grant,People’s Republic of China (No. 110404025).

References

[1] I.J. Umoh, T. Ogunfunmi, An affine projection-based algorithm foridentification of nonlinear Hammerstein systems, Signal Process. 90(2010) 2020–2030.

[2] B. Er-Wei, F. Minyue, A blind approach to Hammerstein modelidentification, IEEE Trans. Signal Process. 50 (2002) 1610–1619.

[3] F. Ding, Y. Shi, T. Chen, Auxiliary model-based least-squares identi-fication methods for Hammerstein output-error systems, Syst. Con-trol Lett. 56 (2007) 373–380.

[4] E.W. Bai, K. Li, Convergence of the iterative algorithm for a generalHammerstein system identification, Automatica 46 (2010)1891–1896.

[5] P. Koukoulas, N. Kalouptsidis, Blind identification of second orderHammerstein series, Signal Process. 83 (2003) 213–234.

[6] F. Ding, X.P. Liu, G. Liu, Identification methods for Hammersteinnonlinear systems, Digit. Signal Process. 21 (2011) 215–238.

[7] A.E. Nordsjo, L.H. Zetterberg, Identification of certain time-varyingnonlinear wiener and Hammerstein systems, IEEE Trans. SignalProcess. 49 (2001) 577–592.

[8] D.Q. Wang, F. Ding, Least squares based and gradient based iterativeidentification for Wiener nonlinear systems, Signal Process. 91(2011) 1182–1189.

[9] K.J. Åstrom, B. Wittenmark, Adaptive Control, 2nd ed. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1994.

[10] F. Ding, T. Chen, Identification of Hammerstein nonlinear ARMAXsystems, Automatica 41 (2005) 1479–1489.

[11] N. Kasabov, Evolving Connectionist Systems: The KnowledgeEngineering Approach, 2nd ed. Springer Verlag, London, 2007.

[12] E. Lughofer, Evolving Fuzzy Systems: Methodologies, AdvancedConcepts and Applications, Springer, Berlin Heidelberg, 2011.

[13] M. Sayed-Mouchaweh, E. Lughofer, Learning in Non-stationaryEnvironments: Methods and Applications, Springer, New York, 2012.

[14] O. Nelles, Nonlinear System Identification, Springer, Berlin, 2001.[15] J. Voros, Parameter identification of Wiener systems with multi-

segment piecewise-linear nonlinearities, Syst. Control Lett. 56(2007) 99–105.

[16] E.W. Bai, An optimal two-stage identification algorithm forHammerstein–Wiener nonlinear systems, Automatica 34 (1998)333–338. (Mar).

[17] P. Crama, J. Schoukens, Hammerstein–Wiener system estimatorinitialization, Automatica 40 (2004) 1543–1550.

[18] A. Wills, T.B. Schon, L. Ljung, B. Ninness, Identification of Hammer-stein–Wiener models, Automatica 49 (2013) 70–81. (Jan).

http://refhub.elsevier.com/S0165-1684(14)00248-5/sbref1














































[19] Y.C. Zhu, Estimation of an N–L–N Hammerstein–Wiener model,Automatica 38 (2002) 1607–1614.

[20] J.W. MacArthur, A new approach for nonlinear process identificationusing orthonormal bases and ordinal splines, J. Process Control 22(2012) 375–389.

[21] E.W. Bai, A blind approach to the Hammerstein–Wiener modelidentification, Automatica 38 (2002) 967–979.

[22] G.R. Bolkvadze, The Hammerstein–Wiener model for identificationof stochastic systems, Autom. Remote Control 64 (2003) 1418–1431.

[23] D. Wang, F. Ding, Extended stochastic gradient identification algo-rithms for Hammerstein–Wiener ARMAX systems, Comput. Math.Appl. 56 (2008) 3157–3164.

[24] D.Q. Wang, F. Ding, Hierarchical least squares estimation algorithmfor Hammerstein–Wiener systems, IEEE Signal Process. Lett. 19(2012) 825–828.

[25] F. Yu, Z. Mao, M. Jia, Recursive identification for Hammerstein–Wiener systems with dead-zone input nonlinearity, J. ProcessControl 23 (2013) 1108–1115.

[26] L. Ljung, System Identification: Theory for the User, 2nd ed. PrenticeHall PTR, Prentic Hall Inc., Upper Saddle River, New Jersey, 1999.

[27] S. Mousazadeh, I. Cohen, Simultaneous parameter estimation andstate smoothing of complex GARCH process in the presence ofadditive noise, Signal Process. 90 (2010) 2947–2953.

[28] L. Ljung, Analysis of recursive stochastic algorithms, IEEE Trans.Autom. Control 22 (1977) 551–575.

[29] F. Ding, H. Yang, F. Liu, Performance analysis of stochastic gradientalgorithms under weak conditions, Sci. China Series F: Inf. Sci. 51(2008) 1269–1280.

[30] M. Boutayeb, D. Aubry, A strong tracking extended Kalman observerfor nonlinear discrete-time systems, IEEE Trans. Autom. Control 44(1999) 1550–1556.

[31] K. Reif, S. Gunther, E. Yaz, R. Unbehauen, Stochastic stability of thediscrete-time extended Kalman filter, IEEE Trans. Autom. Control 44(1999) 714–728.

[32] L. Li, Z. Mao, A direct adaptive controller for EAF electrode regulatorsystem using neural networks, Neurocomputing 82 (2012) 91–98.





































recursive parameter identification of hammerstein–wiener ... · pdf...

Documents