leader-follower strategies for multilevel systems

12
244 IFEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. AC-23, NO. 2, APRIL 1978 Leader-FollowerStrategies for Multilevel Systems JOSE B. CRUZ, JR., FELLOW, IEEE Abstmct-Seqmntial strategies for dynamic systems with multiple deei- sion makers and dtiple performance indices are surveyed and reviewed. These strategies are geueralizatiom of Stackelberg or IeadeAoUower strategies for tm-perwn games The review indudes struchoPs with one cOOrdinatOr and several second-level decision makers, and linear hierarchi- cal structores with only one decision maker at each level. Several informa- tion structures are considered I. INTRODUCTION THE purpose of this paper is to survey and interrelate recent results on the utilization of leader-follower or Stackelberg strategy concepts in the control structuring of interconnected systems. These control methodologies are appropriate for classes of system problems where there are multiple criteria, multipledecision makers, decentralized information, and natural hierarchy of decision making levels. The basic leader-follower strategy was originally suggested for static duopoly by von Stackelberg [l]. This concept has been generalized to dynamic nonzero-sum two-person games by Chen and Cruz [2] and Simaan and Cruz [3], [4], to two groups of players by Simaan and Cruz [6], and to stochastic games by Castanon and Athans [8] and Castanon [14]. Static Two-Person Games The basic idea of a leader-follower strategy for a static two-person game is rather simple. Consider two players. Player 1 chooses control u1 E R and Player 2 chooses control u2E R. The scalar cost function associated with Player 1 is J,(u,,uJ and the scalar cost function associated with Player 2 is J2(u1,u2). Designate Player 1 as leader and Player 2 as follower. For each control u1 chosen by Player 1, Player 2 chooses u2 = T2(u1) where T2 is a mapping from ul to u2 such that J2 b l > T2 (uJ> f J2 (u19u2) (1) for all u2. For simplicity, we assume that for each ul, T2(u,) yields a unique u2. The leader chooses u: such that work was supported in part by the National Science Foundation under Manuscript received August 22, 1977; revised November 8, 1977. This Grant ENG-74-20091, in part by the U. S. Energy RF,e?rch and Devel- opment Administration, ElectricEnergySystems Dmslon under Con- tract EX-76-C-01-2088, and in part by the Joint Senices Electronics Program under Contract DM-07-72-C-0259. Coordinated Science Laboratory and the Department of Electrical En- The author is with the Decision and Control Laboratory, of the gineering, University of Illinois at Urbana-Champap, Urbana, IL 61801. Jl (G3 T2 (G)) Q J1 (u1, T2 (.I,) (2) for all uI. The strategy u: is the Stackelberg strategy for Player 1 and u: = T2(u3 is the Stackelberg strategy for Player 2 when the leader is Player 1. Similarly, when Player 1 is follower and Player 2 is leader, J, (TI (u2),u2) Q J, (u1,u2) for each u2 and for all uI, (3) and 32 (TI (u2**), u;*) f 52 (TI (u2), u2) for all u2 (4) where TI is a mappingfrom u2 to ul, u:* is the leader Stackelberg strategy, and u:*= T,(u:*) is the follower Stackelberg strategy. In comparison, a Nash strategy pair ( U ~ ~ , U ~ ~ ) , whch may not be unique, is defined by Jl(uIN>u2*d QJ,(u,,u2,) for all u1 (5) and J2 (uI,v, u2N) J2 (ulN9~2) for u2- (6) Clearly, from (6) J2 (UIN9U2N) =J2 blN, T2 (UIN)) (7) and from (2) and (7) Jl ( G Y T2 (.:>I f J1 (UlN9U2N). (8) Similarly, from (5) Jl (UIN3 U2N) = J, (Tl (U2N), U2N) (9) and from (4) and (9) J2(T, (u;*),G*) QJ2(~IN,~2N). (10) Thus, for the leader a Stackelberg strategy is at least as good as any Nash strategy. For the follower, the Stackel- berg strategy may or may not be preferable compared to a Nash strategy. It is assumed that the leader knows the cost function mapping of the follower, but the follower might not know the cost function mapping of the leader. However, the follower knows the control strategy of the leader and he takes this into account in computing his strategy. This 0018-9286/78/0400-0244$~.75 01978 IEEE

Upload: j

Post on 11-Dec-2016

219 views

Category:

Documents


7 download

TRANSCRIPT

244 IFEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. AC-23, NO. 2, APRIL 1978

Leader-Follower Strategies for Multilevel Systems

JOSE B. CRUZ, JR., FELLOW, IEEE

Abstmct-Seqmntial strategies for dynamic systems with multiple deei- sion makers and dtiple performance indices are surveyed and reviewed. These strategies are geueralizatiom of Stackelberg or IeadeAoUower strategies for tm-perwn games The review indudes struchoPs with one cOOrdinatOr and several second-level decision makers, and linear hierarchi- cal structores with only one decision maker at each level. Several informa- tion structures are considered

I. INTRODUCTION

T H E purpose of this paper is to survey and interrelate recent results on the utilization of leader-follower or

Stackelberg strategy concepts in the control structuring of interconnected systems. These control methodologies are appropriate for classes of system problems where there are multiple criteria, multiple decision makers, decentralized information, and natural hierarchy of decision making levels. The basic leader-follower strategy was originally suggested for static duopoly by von Stackelberg [l]. This concept has been generalized to dynamic nonzero-sum two-person games by Chen and Cruz [2] and Simaan and Cruz [3], [4], to two groups of players by Simaan and Cruz [6], and to stochastic games by Castanon and Athans [8] and Castanon [14].

Static Two-Person Games

The basic idea of a leader-follower strategy for a static two-person game is rather simple. Consider two players. Player 1 chooses control u1 E R and Player 2 chooses control u2E R. The scalar cost function associated with Player 1 is J , (u , ,uJ and the scalar cost function associated with Player 2 is J2(u1,u2). Designate Player 1 as leader and Player 2 as follower. For each control u1 chosen by Player 1, Player 2 chooses u2 = T2(u1) where T2 is a mapping from ul to u2 such that

J 2 b l > T2 ( u J > f J 2 (u19u2) (1)

for all u2. For simplicity, we assume that for each ul, T2(u,) yields a unique u2. The leader chooses u: such that

work was supported in part by the National Science Foundation under Manuscript received August 22, 1977; revised November 8, 1977. This

Grant ENG-74-20091, in part by the U. S . Energy RF,e?rch and Devel- opment Administration, Electric Energy Systems Dmslon under Con- tract EX-76-C-01-2088, and in part by the Joint Senices Electronics Program under Contract DM-07-72-C-0259.

Coordinated Science Laboratory and the Department o f Electrical En- The author is with the Decision and Control Laboratory, of the

gineering, University of Illinois at Urbana-Champap, Urbana, IL 61801.

J l ( G 3 T2 (G)) Q J 1 (u1, T2 (.I,) (2)

for all uI. The strategy u: is the Stackelberg strategy for Player 1 and u: = T2(u3 is the Stackelberg strategy for Player 2 when the leader is Player 1. Similarly, when Player 1 is follower and Player 2 is leader,

J, ( T I (u2),u2) Q J, (u1,u2) for each u2 and for all uI,

(3) and

3 2 (TI (u2**), u;*) f 5 2 (TI (u2), u2) for all u2 (4)

where TI is a mapping from u2 to ul, u:* is the leader Stackelberg strategy, and u:*= T,(u:*) is the follower Stackelberg strategy.

In comparison, a Nash strategy pair ( U ~ ~ , U ~ ~ ) , whch may not be unique, is defined by

Jl(uIN>u2*d QJ,(u,,u2,) for all u1 ( 5 ) and

J 2 (uI,v, u 2 N ) J 2 (ulN9~2) for u2- (6)

Clearly, from (6)

J2 ( U I N 9 U 2 N ) = J 2 b l N , T2 ( U I N ) ) (7)

and from (2) and (7)

J l ( G Y T2 (.:>I f J 1 ( U l N 9 U 2 N ) . (8)

Similarly, from (5)

J l ( U I N 3 U 2 N ) = J, (Tl ( U 2 N ) , U 2 N ) (9)

and from (4) and (9)

J2(T , ( u ; * ) , G * ) Q J 2 ( ~ I N , ~ 2 N ) . (10)

Thus, for the leader a Stackelberg strategy is at least as good as any Nash strategy. For the follower, the Stackel- berg strategy may or may not be preferable compared to a Nash strategy.

It is assumed that the leader knows the cost function mapping of the follower, but the follower might not know the cost function mapping of the leader. However, the follower knows the control strategy of the leader and he takes this into account in computing his strategy. This

0018-9286/78/0400-0244$~.75 01978 IEEE

CRUZ: LEADER-FOLLOWER STRATEGIES 245

reaction behavior of the follower is known to the leader who optimizes his choice of control ul.

Open-Loop Stackelberg Strategies for Dynamic Games

Consider a dynamic system

=f ( x , u,, 4 (1 1)

where x E R is the state, u1 E R ml and u2 E Rm2 are the controls, and f is a piecewise continuous function from R" X R m 1 X Rm2 to R". In the dynamic system case, it is necessary to specify what type of information is available to each player. Suppose no state measurements are avail- able. In this case we consider open-loop strategies. Associated with each player is a scalar cost function

~ ~ = K , [ x ( T ) ] + S ' ~ ~ ( x , u , , u , ) d t , 0 i=1,2. (12)

With Player 1 as leader, the necessary conditions for (ul,uJ to be an open-loop Stackelberg strategy pair are 121, [41u 1)

where

Explicit solutions in t e r n of matrix Riccati equations are given for the linearquadratic problem in [2], [3]. Necessary conditions for the closed-loop Stackelberg strategy are extremely difficult to characterize [4]. Simpli- fications are possible when the structure of the control law is constrained, e.g., restricting the control law to be linear

and the effects of random initial conditions are averaged

The open-loop strategy for the leader for the entire duration of the game is declared in advance. If the follower minimizes his cost function, he obtains his follower Stackelberg strategy which is the optimal reaction to the declared leader strategy. By declaring his strategy in advance, the leader influences the follower to react in a manner which, of course, minimizes the follower's cost function, but more importantly, in a manner which is favorable to the leader. This is a direct interpretation of the definition of the leader's strategy in (2). Similarly, for closed-loop strategies where the state is available for measurement, the leader has to declare his control law for the entire duration of the game.

In situations where either player might be a leader, both cases should be examined because both players may insist on leader strategies in which case there may be dis- equilibrium, or both may play follower strategies and a stalemate may occur [5]. The stability of these dis- equilibrium strategies has been examined by Okuguchi

One of the disadvantages of using Stackelberg strategies is that for the leader the principle of optimality does not hold, and hence dynamic programming cannot be applied. For example, if the open-loop Stackelberg strategies for a discrete-time game in the interval [to,$] are applied in the interval [to, ti] where to < ti < $, and if the open-loop Stac- kelberg strategies for the same game but for the interval [ti, $1 are computed, the new strategies will generally not coincide with the continuation of the old strategies for [ti, 91. Similarly, the principle of optimality does not gener- ally hold for closed-loop Stackelberg strategies. The leader-follower solution concept assumes a commitment by the leader to implement his announced strategy. This commitment is for a game over the interval [to,$]. If the actual interval were different, the committed strategy gen- erally would not coincide with the Stackelberg strategy for the new interval, but the leader would be obliged to use the nonoptimal strategy.

1121.

[ 171.

Feedback Stackelberg Strategy

A modification of the Stackelberg strategy concept which requires that the strategies for the remaining time-to-go after each stage should be optimal in the ex- tended Stackelberg sense to be defined is presented in [4]. We briefly review this extended strategy here. We con- sider a multistage discrete-time game where state measure- ments are available to both players. To distinguish the extended strategy from the closed-loop Stackelberg strategy, it is called feedback Stackelberg strategy. Other information structures may be considered, and to dis- tinguish the extended Stackelberg strategy from the basic one, it is called equilibrium Stackelberg strategy [7].

Consider a discrete-time system

246 IEEE TRANSACTIONS ON AUIMATIC C O ~ O L , VOL AC-23, NO. 2, APRIL 1978

where x(j)ER", uICj)ERml , u2Cj)ERmz for j = O , . . - , N - 1. A cost functional

N - 1

J ; [ x ( k ) , u , , u , ] = K , [ x ( N ) ] + 2 Li[X(j) ,dj)J42(j)] j- k

(23)

is associated with each player i, for i = 1, 2, where ui = { u,(k) , . - * , u,(N - 1)). Suppose that Player 1 is the leader. Denote the feedback Stackelberg strategies, with Player 1 as leader, for a game starting at time k, by u:, and u&. These are sequences of functions of the state at each stage from time k to time N - 1. Denote the resulting cost functions using these feedback Stackelberg controls by y[x(k), k]. A key defining property of feedback Stackel- berg strategies is that if utl is a feedback Stackelberg strategy for a game starting at time k and ending at time N , then the continuation of starting from time k + 1 is a feedback Stackelberg strategy uL,+ for a game starting at time k + 1 and ending at time N . Thus, for a game starting at time k , we consider only those control sequences whose continuations are u,",+'. The resulting cost functions are

J j = L i [ x ( k ) , u l ( k ) , u 2 ( k ) ] + v-[x(k+l ) ,k+l ] . (24)

If there are no constraints on the controls, we have the following necessary conditions:

and

af' a v; au2(k) ax(k+ 1)

+ X'(k) -

for i = 1,2. (26)

The boundary conditions for (25) and (26) are

vi[ x(N),N] =K,[ x(N)], i = 1,2. (27)

From the definition, the optimality of the feedback Stackelberg strategy does not depend on the number of stages in the game. Continuations of feedback Stackelberg strategies are optimal in the feedback Stackelberg sense for any number of remaining stages. On the other hand, the Stackelberg strategy, open-loop or closed-loop, is tuned to a specific number of stages and to a specific starting time. For such a fixed interval and fixed starting time, the leader's cost corresponding to a feedback Stac- kelberg strategy may not be as low as that corresponding to the closed-loop Stackelberg strategy. However, the

leader's cost associated with remaining stages-to-go corre- sponding to the closed-loop Stackelberg strategy may be higher than that corresponding to the feedback Stackel- berg strategy. This is because the continuation of the original closed-loop strategy is generally not a closed-loop Stackelberg strategy for the remaining stages-to-go. In contrast, for the last stage, the feedback Stackelberg strategy is also the optimal closed-loop strategy for a one-stage game. The feedback Stackelberg strategy for the next to the last stage is chosen under the constraint that the strategy for the last stage is a feedback Stackelberg strategy. The feedback control law is computed backward in time in this fashion, as indicated in (25), (26), and (27).

The application of the Stackelberg concept to the cost function in (24) is not the only way we can define a feedback Stackelberg strategy. Suppose that the number of stages is even. Then we might consider that the con- tinuation of any strategy two stages later has the same optimality property in the sense to be defined by the feedback Stackelberg strategy. That is, if ul l is a feedback Stackelberg strategy, we might want to constrain the admissible control strategies so that the continuation two stages later is equal to u,",+,, which is the feedback Stac- kelberg strategy for a game starting at k + 2. The resulting cost function to be optimized is

Ji=Lj[x(k),u,(k),u2(k)]

+L,[x(k+ l),u,(k+ l),u,(k+ l)] + y[x(k+2),k+2].

(28)

Thus, J, is to be minimized with respect to u2(k) and u,(k+ l), and J, is to be minimized with respect to u,(k) and u,(k + 1) subject to the constraint that J , is minimized with respect to u2(k) and u,(k+ 1). The resulting control law would be different from the previously defined feed- back Stackelberg strategy. To differentiate these different feedback concepts, we call the previous one Type 1 feed- back Stackelberg and the second one Type 2 feedback Stackelberg. Type n feedback Stackelberg strategies may be similarly considered. For a conventional optimal con- trol problem, all these types yield the same control and they are obtainable by dynamic programming and the principle of optimality. It does not matter whether we minimize a cost function such as in (24) or (28). But in a Stackelberg game situation, each choice yields a different feedback Stackelberg strategy. If n is taken as N - k , the Type n feedback Stackelberg strategy becomes the closed-loop Stackelberg strategy.

The leader-follower strategy for two-person games may be extended to multilevel control of large-scale systems. The basic approach has been outlined in [9]. A class of two-level systems has been considered in [13]. In the following sections we will review recent results pertaining to specific classes of linear systems in two types of hierarchy. One hierarchy consists of two levels of decision makers where the first level is for coordination camed out by a leader, and the second level is occupied by M decision makers behaving as followers who use a Nash

~~ - ~~~ ~~

C R U Z : LEADER-FOLLOWER STRATEGIES 247

strategy with respect to each other. Another hierarchy is a linear M-level structure where each decision maker, ex- cept the first and last ones, is a leader with respect to succeeding decision makers, but a follower with respect to preceding ones.

11. COORDINATION OF INTERCONNECTED SYSTEMS

Basic Coordination Concept

Let us consider the basic concept of coordination in a static system with two decision makers, each with a scalar performance index, and each controlling a separate vari- able uI and u2, respectively. Let us suppose that each of the two scalar performance indices is also affected by a third variable uo which is chosen by a third decision maker called the coordinator. Denote these scalar perfor- mance indices by Jl(uO,ul,uJ and J2(u0, ul,uJ. For each value of u,, the controls uI and u2 are chosen according to a game solution concept appropriate for a particular situa- tion. For example, if uI and u2 are chosen as Nash equilibrium solutions,

J I b o , Tl (uo), T2 (uo)) G J l b o , u1, T2 ( U o N

J 2 b o ? Tl (uo), T2 (uo)) Q J 2 b o , TI b o ) , u2)

where uI = T,(uo) and u2= T2(u0) are Nash solutions for the given u,. The coordinator chooses a value for uo such that a scalar performance index Jo(uo,ul,uJ is minimized subject to the condition that u1 = Tl(uo) and u2 = T2(u0). Thus, the coordinator acts as the leader and the two other decision makers act as followers in the Stackelberg sense. The coordinator chooses u; such that

Jo[ uo", TI (u,s), T2 (uos) J G J o [ uo, TI (uo), 7-2 ( u o ) ]

for all uo in the admissible set. The coordinator performance index could represent a

composite function reflecting the welfare of the entire system. For example, the index J , might be a convex linear combination of J , and J2:

Jo (u , ,u1 ,~2)=~1J , (uo,uI,u2)+"2J2(uo,uI,u2)

where

a,>O, a,>0, and a1+a2=1.

In this case, uI and u2 might be chosen as Nash equilibrium solutions when the two decision makers cannot be guaranteed to cooperate. However, the in- troduction of a coordinator which chooses a third control variable enforces a restricted Pareto optimality in the sense that

aIJ1 ( ~ o s , u S , u ~ ) + ~ 2 J 2 ( ~ ~ , u S , u ~ )

G a l J I ( ~ o , ~ I , ~ 2 ) + ~ 2 J 2 ( ~ 0 , ~ 1 , ~ 2 )

for all admissible uo,ul,u2. However, in the case of the Stackelberg coordination of the Nash decision makers, the allowed controls for u1 and u2 are Tl(uo) and T2(u0) for all ug. Without coordination, the variable uo is assumed to take a nominal value Go. With coordination, uo is chosen as u;. Thus, in cases where the controls u1 and u2 have to be chosen without cooperation, a limited type of Pareto optimality can still be achieved by introducing a coordina- tor.

We consider a linear stochastic discrete-time system with one coordinator at the first level and M decision makers at the second level. For simplicity, we take M = 2 . The system is represented by

x ( k + l ) = A ( k ) x ( k ) + B o ( k ) u o ( k )

+ B 1 ( k ) u ' ( k ) + B 2 ( k ) u 2 ( k ) + ~ ( k ) (29)

where x ( k ) E R is the state, uo(k) E R mo is the control of the coordinator, u'(k) E Rml and u2(k) E Rm2 are the con- trols of the two decision makers at the second level, and u(k) is a vector noise disturbance. The quantities x(0) and u ( k ) are Gaussian random vectors with zero mean and covariance P ( 0 ) and A(k), and the measurement of each decision maker is

z ' ( k ) = H ' ( k ) x ( k ) + ( ' ( k ) i=O,l,2, (30)

where ('(k) is a Gaussian random vector with zero mean and covariance Z ( k ) . It is assumed that x(O), c(k), and ( ' ( k ) are mutually independent. The cost function for each i is

Ji(ui)=ix'(N)Ki(N)x(N) N-1

+ f 2 [x'(k)Q'(k)x(k)+(u')'(k)R'(k)u'(k)],

i=O, 1,2. (31)

The matrices A(k) , B'(k), H'(k) , K ' ( N ) , Q'(k), and R ' ( k ) are known to all decision makers. It is assumed that R' and K' are positive definite and Q' is positive semi-def- inite. The problem of finding the feedback Stackelberg strategy for the two-level hierarchy where Decision Makers 1 and 2 play Nash between themselves was re- cently considered for several information structures by Glankwamdee and Cruz [15]. In this section we summarize some of these results.

k = O

Perfect Information

Here it is assumed that all decision makers have perfect knowledge of the state through their measurements

z " k ) = z ' ( k ) = z 2 ( k ) = x ( k ) . (32)

We seek coordinator strategies which are functions of the

248 WEE 'TRANSACTIONS ON AUTOMATIC CONTROL, VOL. AC-23, NO. 2, APRIL 1978

state, and follower strategies which are functions of the state and the coordinator control strategy. Denote the resulting expected cost-to-go at stage k by

V ' ( k ) = f x ' ( k ) S ' ( k ) x ( k ) + ; y ' ( k ) , i=O,1,2 (33)

for some deterministic matrix S'(k) and scalar function y ' ( k ) when feedback Stackelberg strategies are applied. Using the solution concept of Type 1 feedback Stackel- berg strategy discussed in the previous section, we have

~ ' ( k ) = min [f~'(k)~'(k)~(k)+~(u')'(k)~'(k)u'(k) u'(k)

+ E { V ( k + I)}], i= 1,2. (34)

For a given feedback control law for the coordinator, the two minimizations in (34) define the Nash game between Decision Makers 1 and 2. Substituting the expression from (33) with k replaced by k + 1 into (34) and using the state equation in (29), the minimizations yield expressions for u'(k) and u2(k) in terms of A ( k ) , B'(k), S'(k+I), Q'(k) , R'(k) , uo(k), and x ( k ) in the form [15]

ui(k)= - A i ( k ) [ A ( k ) x ( k ) + B o ( k ) u o ( k ) ] , i=1,2.

(35)

For the coordinator we have

+ i ( u o ) ' ( k ) R o ( k ) u o ( k ) + E { Vo(k+l)}]. (36)

Before performing the minimization in (36), we express V o ( k + 1) in terms of So(k+ 1) and yo(k + 1) from (33), the state equation of (29), and the follower control laws from (35). The resulting minimization results in a coordi- nator control law of the form [15]

The matrix gains Lo(k) and k ( k ) are computed from a set of recursive equations backward in time starting with k = N - 1. The coordinator's control law, i.e., Lo(k), is known in advance to all the M second-level decision makers. The recursive equations are

L ' ( k ) = [ R ' ( k ) + ( B ' ) ' ( k ) S ' ( k + l)B'(k)]-'

- ( B ' ) ' ( k ) S ' ( k + l), i = 1,2 (38)

A'(k)= [ I - L ' ( k ) B - " (k)L'(k)B' (IC)]-' - (L' (k) -L' (k)B'(k)L' (k) ) , i = 1,2,j=1,2, iZj

(39) i ( k ) = A ( k ) - B ' ( k ) A ' ( k ) A ( k ) - B ' ( k ) A ' ( k ) A ( k ) (40)

B ̂ ( k ) = B o ( k ) - B 1 ( k ) A ' ( k ) B o ( k ) - B 2 ( k ) A 2 ( k ) B o ( k )

(41)

LO(k)=[RO(k)+Bl'(k)SO(k+l)Bl(k)I- '

.B^'(k)So(k+l)Al(k) (42)

So(k)=Qo(k)+Al ' (k)So(k+ l )2 (k ) - (Lo) ' (k )

. [ RO(k)+ S'(k)SO(k+ 1)S (k)]LO(k) (43)

S i ( k ) = Q ' ( k ) + [ A ( k ) - B o ( k ) L o ( k ) ] ' ( k ) ' ( k )

- R ' ( k ) A ' ( k ) [ A ( k ) - B o ( k ) L o ( k ) ]

+ [ i ( k ) - S (k)LO(k)]'S'(k+ 1)

. [ 2 ( IC) - B^ ( k ) ~ O ( k ) ] , i = 1,2. (44)

Equations (38H44) are solved in the sequence presented starting with k = N - 1 with boundary conditions

S ' ( N ) = K ' ( N ) , i=o,1,2. (45)

The calculations are repeated for k = N - 2, k = N - 3, and so forth, until the specified initial time is reached. The y ' ( k ) in the cost function are obtained from the follow- ing:

y i ( k ) = y ' ( k + l ) + t r S ' ( k + l ) h ( k ) , i=0,1,2 (46) y ' ( N ) = O , i=o, 1,2. (47)

As in the two-person game discussed in the previous section, the feedback Stackelberg strategy for coordina- tion is an equilibrium strategy in the sense that the con- tinuation strate0 after one stage is an optimal feedback Stackelberg strategy for the remainder of the game.

Similar solutions are obtainable for another special information structure, namely, when z ' (k)= z2(k) and the coordinator's measurement consists of at least z' (k) . This nested information structure includes three subcases: 1) when the lower level decision makers have identical noisy measurements and the coordinator has perfect knowledge of the state; 2) when the lower level decision makers have no measurements and the coordinator has some measure- ment; and 3) when all measurements are identical. For this nested information structure case, and using the feedback Stackelberg concept, the optimal cost functions for the lower level subsystems can be expressed as a quadratic form in the conditional expectation of the state, given the measurement of the lower level subsystem, plus a term analogous to y ' ( k ) in the perfect measurement case. For the coordinator, the optimal cost function is expressible as a quadratic form in the conditional expecta- tion of the state, given the coordmator's measurement, and in the difference between the two conditional ex- pectations, given the two different measurements, plus a term analogous to yo(k). The lower level controls are linear in the conditional expectation of the state given their measurement, and linear in the given coordinator control. The gain matrices are identical to those for per- fect measurement so that a separation principle applies to the lower level decision makers. The coordinator control is linear in the two conditional expectations of the state. These conditional expectations are obtained from Kalman

~ ~~ ~ ~~ ~

CRUZ: LEADER-FOLLOWER STRATEGIES 249

filters. The procedure is analogous to that presented in [8]. Details of the recursive equations and their derivation are given in [ 151.

Nonnested Information Structure

When the leader does not know all the measurements of the second-level decision makers and/or when the sec- ond-level decision makers do not have the same measure- ment, it is extremely difficult to formulate an optimum feedback Stackelberg problem. However, when the struc- ture of the individual control laws is specified, e.g., when it is constrained to be linear, necessary conditions can be derived. We consider control laws of the form

where F'(k) is to be determined so that the control law is feedback Stackelberg. Necessary conditions for the de- termination of F'(k) have been derived in [ 151. Using the control (48) where z'(k) is given in (30), and defining

P ( k ) = E { x ( k ) x ' ( k ) } (49)

we obtain

2 A ( k ) + x B'(k)F'(k)H'(k)

i=Q

2 A ( k ) + 2 B' (k)F' (k)H' ( k )

i = O

2 + x Bi(k)Fi(k)X'(k)(Fi)'(k)(B')'(k)+A(k). (50) i = O

It is assumed that P(0) is given. Given a set of feedback matrix sequences {F' (j)}, the second level cost-to-go expressions may be written as

E [ J ' ( k ) ] = $ t r [ S ' ( k ) P ( k ) ] + + x trS'([)h(Z-l) N

I = k + l

tr(F')'(Z-I)(R'(Z-I) Z = k + l

+(Bi)'S'(Z)B'(Z- l ) )F ' ( l - l)Z(l- 1)

where

Si(k)=Qi(k)+(H')'(k)(F')'(k)R'(k)F'(k)H'(k)

B'(k)Fj(k)H'(k) 'S'(k+l) + ( d ( k ) + j =Q i

2 A(/?)+ x B'(k)Fi(k)Hi(k) , i=1,2 (52)

S ' ( N ) = K ' ( N ) , i = 1,2. (53)

j=O 1 For the coordinator

E I J o ( k ) ] = $tr [ S O ( k ) P ( k ) ]

where

S o ( k ) = Qo(k)+(Ho)'(k)(Fo)'(k)Ro(k)Fo(k)Ho(k) +[A(k)+BO(k)FO(k)HO(k)

@ ( k + l)[A(k)+BO(k)P(k)HO(k)

+B1(k)F'(k)H'(k)+B2(k)1;2(k)H2(k)] (55)

so ( N ) = KO ( N ) . (56)

+B1(k)F'(k)H1(k)-tB2(k)F2(k)H2(k)]'

In accordance with the feedback Stackelberg concept, we consider strategies whose continuations after one stage are feedback Stackelberg strategies for the remaining stages. Thus, we write

E {J' (k)} = E { f [ x'(k)Qi ( k ) x ( k )

+(u'(k))'R'(k)u'(k)]}

+E{J'(k+l)}, i=0,1,2 (57)

where E {J'(k + 1)) is obtained from (51) with k replaced by k + l , and { F ' ( j ) } from j = k + l to j = N are the feedback Stackelberg matrices. Then E {J ' (k ) } is mini- mized with respect to F'(k) and E { J 2 ( k ) } is minimized with respect to F2(k). These minimizations yield expres- sions for F'(k) and F2(k) in terms of Fo(k) and the other matrices that appear in (51). These matrices F'(k) and F2(k) are substituted in (57 ) for i=O and the resulting expression for E {Jo(k)} is minimized with respect to Fo(k). This yields an expression for Fo(k) in terms of the matrices appearing in (57) and (51) except F'(k) and

250 IEEE TRANSACTIONS ON AUTOMATIC CO?JTROl, VOL. AC-23, NO. 2, APRIL 1978

F2(k). Combining these equations, we can obtain coupled difference equations in Si(,%), for i=O, 1,2, and P(k) , with boundary conditions P(0) and S ' ( N ) . Thus, the feedback Stackelberg matrices F'(k) are expressed in terms of solu- tions of a two-point boundary value problem. We note that even in the case of a standard stochastic optimal control problem where the control law is constrained to be a linear function of the measurement, as in (48), a two-point boundary value problem arises [ 181. The game problem cannot be expected to be simpler. Details of the two-point boundary value problem are in [15].

The resulting feedback Stackelberg matrices {F'(k) , k=O, l , . . . ,N- l ; i=O,I,2} are functions of P(O)= E { x(O)x'(O)} = m,m~+cov[x(0)] where m, is the mean of x(0) and cov [x(O)] is the covariance of x(0). Thus, these feedback matrix sequences are based on data at the start of the game, k=O. Since measurements are obtained at each sampling instant, updated estimates of P ( k ) might be available. For example, suppose that at time k = r , there is a new estimate of P ( r ) . A new set of feedback Stackelberg matrices { F'(k), k = r, r + 1 , . . . , N - 1 ; i = 0,1,2} could be computed. These new sequences are func- tions of P ( r ) . In principle, an updated set of feedback Stackelberg sequences for the remaining stages-to-go could be considered at each stage r when a new estimate of P ( r ) is available.

The method described above may be extended to dy- namic output feedback controllers of specified order. R e p resent the ith subsystem controller by

wi(k+l)=D'(k)w'(k)+" (k)z ' (k) , i=o,1,2

(58)

in this section. We model the interconnected system by

i=f(x,ui ; i=O, 1;. * ,m), x(to)=xo (60)

where x E R is the state, uj E R nz is the control of the ith decision maker, and the cost function for each decision maker is

The index i =O corresponds to the coordmator. The time instants to and tf are fixed. State measurements are made at r discrete instants of time {ti E [ to , tf), i = O , l , . , r - 1 }. The controls are allowed to be functions of time f and the latest state measurement. Thus, for all i, uj= ui(r,x(tj)), for $ 9 r < tj+ I . Before time to, the coordinator announces h ~ s control law u,(t,r;i.> for f~[tj,tf], forj=O, I;. - , r - 1. The second-level declslon makers take this given coordinator strategy into account in computing their individual sam- pled data strategy based on a Nash solution concept among themselves. The leader, talung into account the reaction strategies of all the second-level decision makers, determines a sampled data strategy to minimize his own cost function, subject to the constraint that the remaining strategy starting from the next sampling instant is also optimal in the feedback Stackelberg sense. This permits us to relate the optimum cost-to-go, in the feedback Stackel- berg sense, at any sampling time $ to the optimal cost- to-go at sampling time Let the sampled data feed- back Stackelberg costs to go at time $ be denoted by K(x(tj), tj), i =0,1,. . , m. By definition, for the interval [$$+ 1)

r

where w i E RS' is the state vector of the controllers used. vi (X,,$)= min { V,i(Xj+~,tj+~) Then

u,

ui(k)=Ni(k)wi(k)+~'(k)zi(k), i=O,1,2 (59) + [""L,(x,u,; k=O, 1;. . ,m)dr) (62)

where zi(k) is the measurement (30). For a given si (09 s i < n), the matrices D'(k ) , M'(k), N i ( k ) , and Fi(k ) are to be found so that the controls are optimal in the feedback Stackelberg sense. For si=O, the problem is identical to the one considered previously. By augmenting the state space and by augmenting the measurement z i with wi, the problem may be transformed to the same type considered previously. Details are given in [ 151.

111. SAMPLED DATA COORDINATED CONTROL OF INTERCOhmmD Commwous-Tm SYSTEMS

In this section we consider the two-level control of interconnected continuous-time systems, where the first- level decision maker is a coordinator and the second-level decision makers are followers in the Stackelberg sense, and where the decision makers have sampled data state measurements. This problem has been examined in [I61 where necessary conditions have been derived, and where efficient solution algorithms have been derived for the linear quadratic case. We briefly review the results in [16]

where

and where u,, for k # i are at their optimal values. For k = 1 , . , m, and for each u,, (62) is the usual condition for a Nash equilibrium solution. For i=O, the minimiza- tion of (62) is camed out under the constraint that the other controls u,, k f O are chosen to satisfy (62) for all i#O, for each u,. For each ~ . ( X ~ + ~ , $ + J , the problem posed above is an open-loop Stackelberg problem with one leader and several followers for a game in the interval

where the followers play a Nash game among themselves [6]. The problem is much more complex, how- ever, because V,(x,+ I , $ + l ) is also to be determined. The sampled-data feedback Stackelberg concept is similar to the feedback Stackelberg concept for discrete systems in the sense that y.(x,,tj) is related to V,(xj+ as in (62) or (28). However, for the sampled data case, an open-loop control time function between sampling times is required. The usual dynamic programming approach is not applica-

CRUZ: LEADER-FOLLOWER STRATEGIES 25 1

ble to such problems, but a variational method can be used.

For each second-level decision maker define a Hamilto- nian

H;(x,~;,u~;~~,u,)=L~(x,u~; k=O,l,-.. ,m) +pif(x,uk; k=0,1,--- ,m). (64)

For any given uo, the necessary conditions for optimality for i = l , . - - , m are

i=f(x,ui; i = O , 1; * * ,m), X($)= X, (65)

aH; au;

O= -

The necessary conditions, for t E [ 5, $+ j = 0, - ,r, i = 1; ,m are

where y;(tj-) = lk~~+~-y;(t) for yi defined on the (’j - 1)st interval [ 5- $) and yj( tj”) = yj( tj) defined on the jth inter- val An efficient procedure for solving this com- plicated ( r + 1)-point boundary value problem is given in [16] for the linear-quadratic case. The controls are ex- pressible in the form

u ; ( t , x , ) = q ( t ) x ( $ ) (73)

forj=O;..,r, for tE [$ ,$+ l ] . The gain matrices K.(t) are obtained by integration of a set of linear differential equations over one sampling interval. A matrix inversion of dimension n is needed at each sampling instant. For details see [16].

For each interval, the solution procedure is the same as that for open-loop Stackelberg strategies except that the boundary conditions are in terms of optimal cost-to-go functions which are reminiscent of feedback Stackelberg strategies for discrete-time games. The sampled-data Stackelberg strategy has features of open-loop Stackelberg strategies for continuous-time games and feedback Stac- kelberg strategies for discrete-time games.

IV. M-LEVEL LINEAR HIERARCHIES

The basic Stackelberg strategy for two-level sequential decision making can be generalized to three or more levels, as outlined in [9]. For simplicity, we consider only one decision maker at each level, thus yielding a linear hierarchy. Three cases have been treated recently: 1) open-loop multilevel Stackelberg strategies for continu- ous-time systems [lo], 2) closed-loop multilevel Stackel- berg strategies for continuous-time systems [12], and 3) feedback multilevel Stackelberg strategies for discrete- time systems [19]. These results are briefly summarized here.

Open-Loop Multilevel Stackelberg Strategies

Consider a three-level Stackelberg problem for a linear system

, ~ = A x + B ~ u ~ + B ~ u ~ + B ~ u ~ (74)

with associated cost function

T 3

10 . j= 1 J j ( ~ , , u 2 , u 3 , x 0 ) = ~ J (x ’Qjx+ 2 ujlRiiuj)dt

+ ; x ( T)’&x( T ) (75)

for each decision maker Pi. P , is the follower at the bottom of the linear hierarchy. He knows the controls u2 and uj of the other decision makers. P2 is the middle who knows u3, but he knows that P I reacts according to declared functions u2 and u,. P, is the leader who knows that P2 reacts according to his declared control u3, and who takes into account the reaction of P, to declared controls u2 and 24,. Necessary conditions for this problem are derived in [lo]. For P , the necessary conditions are (74)

P I = - Q ~ x - A ’ P I ~ P I ( T ) = F I X ( T ) (76)

and

O= R , , u l + B ~ p l . (77)

Assuming that R I I is positive definite, control u1 may be expressed as

252 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. AC-23, NO. 2, APRIL 1978

u = - R -'B 1 1 1 21- (78) pj=&.x, i=1,2,3 (92)

Notice that the controls u2 and u3 influence the costate nj= Pix, i= 1,2,3 (93) vectorpl and, hence, u1 depends on u2 and 24,. Substitut- w = w x (94) ing (78) in (76), we have

one obtains coupled quadratic matrix differential equa- f = A x - S , p , + B2u2+ B3u3, SI= B,R,'B;, x( tO)=xw tions in K,, Pi, and W with boundary conditions at r, and

(79) T [ 101. The open-loop Stackelberg strategies are

Substituting (78) in (75) for i=2, we have uj= - Rjz:'B,'&+(t,t0)x, i = 1,2,3 (95)

J 2 = $ ~ 0 T ( x ' Q 2 x + p ; S 2 1 p , + u;R,u2+ u;R,u,)dt

+ $x( T)'F2x( T ) (80)

where Srl= B,R,;'R2!RG1B;. The necessary conditions that characterize u2 mmmizing (80) under the constraints (79) and (77) are (79), (73, and

$ ~ = - Q ~ x - A A ~ + Q1.1, P ~ ( T ) = F ~ x ( T ) - F I ~ I ( T ) (8 1)

k l = - S 2 1 p I + S l p 2 + A n , , n , ( t , ) = O (82)

and assuming that R , is positive definite,

~2 = - R; IB&2. (83)

Substituting 24, from (83) in (79), we have

where +(t, to) is the fundamental matrix of the system

i = ( A - SlKl - S2K2- S ~ K ~ ) X , x ( t o ) = x @ (96)

In [lo] it is shown that the two-point boundary value problem can be converted to a higher order matrix Riccati differential equation with a given terminal condition. The coefficient matrices of this higher order Riccati equation do not possess the symmetry and positive semi-definite- ness of usual optimal control problems. However, in [IO] it is shown that if the 4n x 4 n solution of the Riccati equa- tion is partitioned into four 2n X 2n matrices, the block off-diagonal matrices are symmetric and the block diago- nal matrices are transposes of each other. Furthermore, if a solution exists, then the block off-diagonal matrices are positive semidefinite.

Closed-Loop Multilecel Stackelberg Strategies

~ = A x - S ~ ~ ~ - S ~ P ~ + B , U , , S2=B2RS1B;, x ( ~ , ) = x , The determination of necessary conditions for optimal- . ity in the closed-loop Stackelberg sense is very difficult,

(84) even for linearquadratic problems. In r121, linear closed-

Equation (77) represents the reaction of PI to a given u2 loop Stackelberg strategies are consideied for linear sys- tems with quadratic cost functions, where it is shown that

and u3' and (81) and (82) represent the reaction Of p2 to the optimal closed-loop Stackelberg strategies for such

using the reactions of P I and P,. Substituting the controls uI from (78) and u2 from (83) in (75) for i = 3 , we have state and the state. By assuming that the initial state is

random and taking the expectation of the original cost

given '3. Equation (84) is the state equation for a given '3 problems are nodinear functions of the initial

~ 3 = ~ ' ( x ' Q 3 x + p ; S 3 1 ~ 1 f p ~ S 3 2 p 2 + u ~ R 3 3 u 3 ) d t function as a new cost function, linear closed-loop strate- gies may be optimal closed-loop Stackelberg strategies provided certain matrix differential equations have

+ $X(T)'F,X(T) (85) bounded solutions. For simplicity, it is assumed that the

lo

~ = A x - s , ~ , - S , ~ , - S ~ ~ ~ , x ( t O ) = x o , (86) ui= -L i ( r )x , i=1,2,3 (97) P 3 = - Q , x - A A , + Q I + + Q2n3 (87) where the feedback matrices L j ( t ) are bounded. When the

P 3 ( T ) = F 3 x ( T ) - F l n 2 ( T ) - F 2 n 3 ( T ) , linear controls in (97) are substituted in (74) and (79, it is k2= - ~ , ~ p , + s , p , + ~ n , + S , ~ W , n2(t0)=0, (88) clear that Ji can be expressed as

k,= - S32p2+ S g 3 + A n , - S I w , n3(tO)=0, (89) J j = f x a , (r,)x, (98) w = - Qln3 - A'w, w ( T ) = Fln3( T ) , (90) where M j ( t ) satisfies the Lyapunov equation

and assuming that R,, is positive definite 3

~ 3 = - RG1B&3. (91) A ~ ~ + A A ~ M , + M , A ~ + L;R,L,+Q=o, M;(T)=C.

J= I

By using the relations (99)

where

and

CRUZ: LEADER-FOLLOWER STRATEGIES

and

3 A,= A - x BjLj. (loo)

j = 1

The initial state x, is assumed to be random with zero mean and unity covariance matrix. The new cost function is

J i = E ( ~ i } = ~ t r ~ i ( t o ) E ( x ~ ~ } = f t r M i ( t o ) . (101)

The linear closed-loop Stackelberg problem can now be restated as an open-loop Stackelberg problem where-the matrices 4. are the controls, the cost functions are Ji in (101), and the matrix differential equations in (99) are the constraints. The lowest level decision maker P, chooses L , and the highest level decision maker P3 chooses L,. For a given L, and L,, & can be minimized with respect to L , subject to (99) using the matrix minimum principle or a standard variational approach, yielding

k, + AfM, + MIA, + M,S,,Ml

+ L$RI24+ L;R13L3+ Q , = O (102)

MI (TI = Fl (103)

s,, = B,R; l ~ ; (104)

Ll = Rfi’BiM,. (105)

With L , chosen as in (105), minimization of y2 with respect to L, with the constraints (102) and (99) for i =2 yields a set of matrix Riccati-type differential equations and one algebraic matrix equation which is linear in L,. Finally, we minimize y3 with respect to L, subject to the constraints (99) for i = 3, (102), (103), (105), and the add& tional Riccati-type equations from the minimization of J2 with respect to &. This yields more Riccati-type differen- tial equations and one algebraic matrix equation which is linear in &. Thus, a large set of matrix Riccati type differential equations must be solved with boundary con- ditions at t = to and t = T. These necessary conditions have been derived in [12]. When the matrices in (74) and (75) are time-invariant and when T-xQ, these differential equations are replaced by algebraic equations which are obtained by deleting the time derivative tenns. An algo- rithm for this problem is suggested in [12].

Feedback MultiIeveI Stackelberg Strategies

Consider the linear discrete-time system M

x ( k + l ) = A ( k ) x ( k ) + x Bi(k)ui(k), x(O)=x,. (106) i = 1

The cost function for each decision maker is

253

J i = i x ’ ( N ) & x ( N )

We consider the decision makers in a linear hierarchy where the top decision maker is PI who chooses u1 and the last decision maker PM chooses uM. We examine feedback Stackelberg strategies where the controls are functions of the state, and where we require that continua- tions of the strategies starting at time j , for j > 0 are also feedback Stackelberg strategies for games starting at time j + 1. The solution for this problem is given in [ 191 where Riccati-type equations for the feedback gain matrices are derived. The ith control can be expressed as

where

Ai( k ) = Ri ( k ) + B/ ( k ) i - ( fi [ I -Bj (k)A, (k) ]

j = i + 1

-( fi [I-Bj(k)A,.(k)] j = i + 1

( fi [ I - B j ( k ) A j ( k ) ] j= i+ 1

[ I - B j ( k ) A j ( k ) ] , i = l , - - . , M - l , (109) j = i + I

K,

+A’(k ) 5 [ I -B j (k )A , (k ) ] ! j = 1

i = l , . . * , M (110)

The matrix product notation used above is defined by

254 IEEE TRANSACllONS ON AUTOhMTlC CONTROL, VOL.. AC-23, NO. 2, APRIL 1978

M Gi= G M G M - 1 * . GI- (1 14)

i = 1

Sufficient conditions for the existence of the inverses and minima of the cost functions are YN > 0, Q,G) 2 0, and R,G>>O, i = l , - - . , M ; j=O,-.. , N - 1. After all the feedback controls are substituted in (104), the state equa- tion becomes

x ( k + l ) = + ( k ) x ( k ) . (1 15)

From the form of +(k) in (113), it is seen that each decision maker, starting with the top, shifts the eigenval- ues of A ( k ) , and the final shifted A matrix after all decision makers have acted, is + as noted in [19].

V. CONCLUDING REMARKS

REFERENCES

[l] H. von Stackelberg, The Theory of the Market Economy. Oxford,

121 C. I. Chen and J. B. Cruz. Jr.. “Stackelbere solution for two-wrson England: Oxford Univ. Press, 1952.

games -with biased inforktion patterns,” IEEE Trans. AAomat. Contr. vol. AC-17, pp. 791-798, 1972. M. Simaan and J. B; Cruz, Jr., “On the Stackelberg strategy in nonzerc-sum games, J. Opt. Theory Appf., vol. 11, no. 5, pp.

-‘‘Additi:nd aspects of the Stackelberg strategy in nonzero- sum games, J. Opt. Theory Appl., vol. 11, pp. 613-626, no. 6,

533-555, 1973.

1977

T. Basar, “On the relative leadership property of Stackelberg strategies,” J. Opt. 7’heory Appl., vol. 11, pp. 655-661, June 1973. M. Simaan and J. B. Cruz, Jr., “A Stackelberg strategy for games with many players,” IEEE Trans. Automat. Contr. vol. AG18, no.

J. B. Cruz, Jr., “Survey of Nash and Stackelberg equilibrium strategies in dynamic games,” Annals of Economic and Social Measurement, vol. 4, no. 2, pp. 339-344, 1975. D. Castanon and M. Athans, “On stochastic dynamic Stackelberg strategies,” Automatica, vol. 12, pp. 177-183, 1976. J. B. Cruz, Jr., “Stackelberg strategies for multilevel systems,” in Directions in Large Scale @stems, Y. C. Ho and S. K. Mtter, Eds. New York: Plenum, 1976, pp. 139-147. J. Medanic and D. Radojevic, “On the multilevel Stackelberg strategies in linear quadratic systems,” J. Opt. Theory AppL, vol. 24, 1978.

I, e.,.

3, pp. 322-324, 1973.

In large-sale system where there is a Of [I]] B. F. Gardner, Jr. and J. B. Cruz, Jr., “Feedback Stackelberg decision makers and where each decision maker has a strategy for a two player game,” IEEE Trans. Automat. C o w . vol. different Performance god, it is natural to consider the [12] J. Medanic, -closed-loop Stackelberg strategies in hear-quhat ic control problem as a differential game problem. In th s problems,” in Proc. I977 JACC, San Francisco, CA, June 1977, pp. paper we have reviewed Some Of the recent work On [13] M. Simaan, “Stackelberg optimization of two-level systems,” IEEE Stackelberg strategies which are relevant when sequential Tram. Syst., Man, Cybem, vol. SMC-7, pp. 554-557, July 1977. decision making is appropriate and desirable. In general, [14] D. Castanon, “Equilibria in stochastic dynamic games of Stackel-

berg type,” Electronic Syst. Lab., M.I.T., Rep. ESL-R-662, May the leader or top decision maker has the most complicated 1976. optimization problem because he has to consider the [15] S. Glankwamdee and J. B. Cruz, Jr., “Decentralized Stackelberg

optimal reactions of all decision makers who act after presented at the 7th Triennial World Congr. of IFAC, June 1978, strategies for interconnected stochastic dynamic systems,” to be

him. The decision maker who acts last in a linear Helsinki, Finland, June, 1978; also, UIUC, Decision and Contr.

hierarchy has an ordinary optimal control problem. [16] P. M. Walsh and J. B. Cruz, Jr., “A sampled data Stackelberg Although the followers optimize their own performance coordination scheme for the multicontroller problem,” in Proc.

I977 IEEE Con$ on Decision and Control, pp. 108-114, New index given the controls of previous decision makers, the Orleans, L A ; also UIUC, Decision and Contr. Lab., Rep. DC-3, leader chooses a control which optimizes his own perfor- [I71 K. Okugucl,i, and in o~gopoly in

Apr. 1977.

mance index considering that the followers will react Lecture Notes in Economics and Mathematical Systems, Mathemati-

optimally- In a sense, the leader influences the followers [18] C. M. Enner and V. D. VandeLinde, ‘‘Outxt feedback gains for a to choose controls which are beneficial to the leader. linear-discrete stochastic control problem, IEEE Tram. Automat.

decision structure is One where there are [19] B. F. Gardner, Jr. and J. B. Cruz, Jr., Feedback Stackelberg several decision makers who act simultaneously at a given strategy for M-level herarchid games,” IEEE Trans. Automaf. level, and there is more than one level where level actions are sequential. In the paper we considered the case when there are only two levels. The first level has only one decision maker, called the coordinator, who acts first, and Jose B. Cmz, Jr. (S’56-M57SM61-F68) re- the second level, as a group, reacts to the action of the ceived the B.S.E.E. degree (summa cum laude) coordinator. The second-level decision makers act simul- from the University of the Philippines, Diliman,

in 1953, the S.M. degree from the Massachusetts taneously according to some game solution concept, such lnstitute of Technology, Cambridge, in 1956, as the Nash equilibrium treated in the paper. The and the Ph.D. degree from the University of leader-follower strategy concept treated in the paper Illinois, Urbana, in 1959, all in electrical en-

could provide a basis for the study of coordination in a 953 to 1954 he taught at the University large-scale system. Although the decision makers who act : of the Philippines. He was a Research Assistant

simultaneously~ and in a decentralized manner, adopt ics, Cambridge, from 1954 to 1956. Since 1956 he has been with the in the M.I.T. Research Laboratory of Electron-

noncooperative strategies, the introduction of a coordina- Department of Electrical Engineering University of Illinois, where he

an Associate Professor from 1961 to 1965, and Professor since 1965.

makers act. This influence could be exploited to improve Laboratory, University of Illinois, where he is Ihrector of the Decision

AC-22, pp. 270-271, Apr. 1977.

1324-1329.

Lab., Rep. DC-1, Mar. 1977.

cal Economics, vol. 138. New York: Springer-Verlag, 1976.

Contr. vol. AC-18, pp. 154-157, Apr. 1973.“

Conrr., vol. AC-23, June 1978, to be published.

tor who chooses additional control variables could alter Was an Instructor until 1959, an Assistant Professor from 1959 to 1961,

the framework in which the noncooperahg decision Also, he currently a R m c h Professor at the Coordinated Science

overall system perfomance allowing the other’deci- and Contrh Laborat&. In 1964 he was a Visiting Associate Professor at

sion makers to continue pursuing their individual original of the Center for Advanced university of Illinois. In the Fall of the University of California, Berkeley, and in 1967 he was an Associate

objectives. 1973 he was a Visiting Professor at M.I.T. and at Harvard University.

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. AC-23, NO. 2, APRIL 1978 255

His areas of research are hierarchical control of multiple goal systems, decentralized control of largescale systems, sensitivity analysis, and stochastic control of systems with uncertain parameters. He has written more than 90 papers in technical journals, coauthored three textbooks, and has served as Editor for two books.

Within the IEEE Control Systems Society, Dr. Cruz served as a member of the Administrative Committee, Chairman of the Linear Systems Committee, Chairman of the Awards Committee, Editor of the IEEE TRANSACTIONS ON AUTOMATIC CONTROL, a member of the Infor- mation Dissemination Committee, Chairman of the Finance Committee, General Chairman of the 1975 IEEE Conference on Decision and Control, and Vice President for Financial and Administrative Activities. He is President-Elect for 1978 of the IEEE Control System Society. At

the Institute level, he served as a member of the IEEE Fellow Com- mittee, member of the IEEE Education Activities Board, and Chairman of a committee for the revision of the IEEE Guidelines for ECPD Accreditation of Electrical Engineering Curricula in the United States. Presently he is a member of the Meetings Committee of the IEEE Technical Activities Board and a member of the IEEE Education Medal Committee. In 1972 he received the Curtis W. McGraw Research Award of the American Society for Engineering Education. He is a member of Phi Kappa Phi, Sigma Xi, and Eta Kappa Nu. He is listed in American Men and Women of Science, Who’s Who in America, and Who’s Who in Engineering. Dr. CNZ is a Registered Professional Engineer in the State of Illinois.

Specific Structures for Large-Scale State Estimation Algorithms Having

Information Exchange

CHARLES W. SANDERS, MEMBER, WEE, EDGAR C. TACKER, sEh?oR MEMBER, IEEE, THOMAS D. LINTON, AND ROBERT Y.-S. LING

Absburct--This paper considers the design and evalnation of large-scale state estimation algorithm having specific structnres which allow the subsystems to exchange information over noisy channels. The specific stroctores whicb are presented are f i i motivated by comidering the relative performance between tbe surely locally unbiased filter and a global dynamics fiter. The role of the snrely locally unbiased fiter in evaluating the tradeoffs between the cost of information transfer and fiiter perfor- mance is examined and a theorem is presented wbicb fonns tbe Mi for an algorithm for calcnlating channel noise crossover levels. ’Ihe theoretical results are illostrated via an application to a power system model.

I. INTRODUCTION

I NCREASINGLY complex processes together with an increasingly broad spectrum of available hardware have

necessitated a reexamination of the tradeoffs between the various information structures on which system monitor- ing and control is based [1]-[3]. In addition to the funda- mental information-theoretic aspects of decentralized structures examined in [4]-[6], the problem of system stabilizability for these structures has been treated in [7]-[9]. State estimation techniques which are compatible

work was supported in part by the National Science Foundation under Manuscript received March 15, 1977; revised September 7, !977. This

Grant ENG 75-13399. The authors are with the Department of Electrical Engineering and

the Systems Engineering Program, University of Houston, Houston, TX 77ow.

with completely decentralized information structures have been considered in [12]-[14] and it has been shown [12], [13] that effective algorithms from this class can be devel- oped. The primary objective of the present paper is to motivate and explore some specific state estimation algo- rithms for the case in which information exchange is permitted over noisy communication channels.

After identifying the class of systems under considera- tion and the relevant assumptions being employed, two singular information patterns which can be used to bound the performance improvement attainable through infor- mation exchange are discussed in Section 11. It is shown that the information exchange residing in the subsystem interaction measurements can be used to obtain effective local filters. Interaction measurement of noise crossover levels which provide a measure of effectiveness of the interaction measurements are introduced and a theorem is presented which suggests an algorithm for computing these levels. The results of Section I1 are then used in Section I11 to motivate two specific structures having infomation exchange and a numerical example illustrat- ing the performance capability of the resulting algorithms is presented.

Consider a system which can be modeled as a given collection {Si : i = 1,2,. - * , N 1 of N interconnected dy- namical subsystems Si. On the time interval [0, co) each Si

0018-9286/78/0400-0255$00.75 01978 IEEE