lecture 15: least squares, state estimation

ECEN 615Methods of Electric Power

Systems Analysis

Lecture 15: Least Squares, State Estimation

Prof. Tom Overbye

Dept. of Electrical and Computer Engineering

Texas A&M University

[email protected]

mailto:[email protected]

ECEN 615Methods of Electric Power

Systems Analysis

Lecture 18: State Estimation

Prof. Tom Overbye

Dept. of Electrical and Computer Engineering

Texas A&M University

[email protected]

mailto:[email protected]

Announcements

• Starting reading Chapter 9

• Homework 4 is due on Thursday October 15.

2

UTC Revisited

• We can now revisit the uncommitted transfer

capability (UTC) calculation using PTDFs and LODFs

• Recall trying to determine maximum transfer between

two areas (or buses in our example)

• For base case maximums are quickly determined with

PTDFs

( )

( )

( )

( ), w

0max

0

m n w0

f fu min

− =

Note we are ignoring zero (or small) PTDFs; would also need

to consider flow reversal3

UTC Revisited

• For the contingencies we use

• Then as before

( )

( ) ( )( )

( ) ( )

1

,( )k

w

max 0 k 0

k

m n kw

0

f f d fu min

− −

=

( ) (1)

, , ,,

0

m n m n m nu min u u=

We would need to check all contingencies! Also,

this is just a linear estimate and is not considering

voltage violations.

4

Five Bus Example

2, 3,w t= ( ) f 42 , 34 , 67 , 118 , 33 , 100T0

=

f 150 , 400 , 150 , 150 , 150 , 1,000Tmax =

Line 1

Line 2

Line 3

Line 6

Line 5

Line 4slack

1.050 pu

42 MW

67 MW

100 MW

118 MW

1.040 pu

1.042 pu

A

MVA

A

MVA

A

MVA

1.042 pu

A

MVA

1.044 pu

MW200

258 MW

MW118

260 MW

100 MW

MW100

A

MVA

One Two

Three

Four

Five

34 MW

33 MW

5

Five Bus Example

( )

( )

( )

( )2,2

150 42 400 34 150 67 150 118 150 33, , , ,

0.2727 0.1818 0.0909 0.7273 0.0909

44.0

w

0max

0

w0

f fu min

min

− =

− − − − − =

=

Therefore, for the base case

6

Five Bus Example

• For the contingency case corresponding to the outage

of the line 2

The limiting value is line 4

Hence the UTC is limited by the contingency to 23.0

( ) ( )2

( )

( ) 2 ( )

2(1)

2,3 2( )w

max 0 0

w0

f f d fu min

− −

=

( )

( ) 2 ( )

2

2( )

150 118 0.4 34

0.8

max 0 0

w

f f d f

− − − − =

7

Additional Comments

• Distribution factors are defined as small signal

sensitivities, but in practice, they are also used for

simulating large signal cases

• Distribution factors are widely used in the operation of

the electricity markets where the rapid evaluation of the

impacts of each transaction on the line flows is required

• Applications to actual system show that the distribution

factors provide satisfactory results in terms of accuracy

• For multiple applications that require fast turn around

time, distribution factors are used very widely,

particularly, in the market environment

• They do not work well with reactive power!8

Least Squares

• So far we have considered the solution of Ax = b in

which A is a square matrix; as long as A is

nonsingular there is a single solution

– That is, we have the same number of equations (m) as

unknowns (n)

• Many problems are overdetermined in which there

more equations than unknowns (m > n)

– Overdetermined systems are usually inconsistent, in which no

value of x exactly solves all the equations

• Underdetermined systems have more unknowns than

equations (m < n); they never have a unique solution

but are usually consistent9

Method of Least Squares

• The least squares method is a solution approach for

determining an approximate solution for an

overdetermined system

• If the system is inconsistent, then not all of the

equations can be exactly satisfied

• The difference for each equation between its exact

solution and the estimated solution is known as the error

• Least squares seeks to minimize the sum of the squares

of the errors

• Weighted least squares allows differ weights for the

equations 10

Least Squares Solution History

• The method of least squares developed from trying to

estimate actual values from a number of measurements

• Several persons in the 1700's, starting with Roger

Cotes in 1722, presented methods for trying to decrease

model errors from using multiple measurements

• Legendre presented a formal description of the method

in 1805; evidently Gauss claimed he did it in 1795

• Method is widely used in power systems, with state

estimation the best known application, dating from

Fred Schweppe's work in 1970

11

Least Squares and Sparsity

• In many contexts least squares is applied to problems

that are not sparse. For example, using a number of

measurements to optimally determine a few values

– Regression analysis is a common example, in which a line or

other curve is fit to potentially many points)

– Each measurement impacts each model value

• In the classic power system application of state

estimation the system is sparse, with measurements

only directly influencing a few states

– Power system analysis classes have tended to focus on

solution methods aimed at sparse systems; we'll consider both

sparse and nonsparse solution methods

12

Least Squares Problem

• Consider

or

)

)

)

=

111 12 13 1 1 1

2

21 22 22 2 2 2

1 2 3

(a

(ax

(a

Tn

T

n

m T

n mm m m mn

a a a a x b

a a a a x b=

x ba a a a

, ,m n n m= Ax b A x b¡ ¡ ¡

13

Least Squares Solution

• We write (ai)T for the row i of A and ai is a column

vector

• Here, m ≥ n and the solution we are seeking is that

which minimizes Ax - b p, where p denotes some

norm

• Since usually an overdetermined system has no exact

solution, the best we can do is determine an x that

minimizes the desired norm.

14

Choice of p

• We discuss the choice of p in terms of a specific

example

• Consider the equation Ax = b with

(hence three equations and one unknown)

• We consider three possible choices for p:

1

2 1 2 3

3

1

A 1 b

1

b

= = b b b b 0

b

with

15

Choice of p

−

−

−

1 2

1 2 3

2

1 3

Ax b

Ax b3

Ax b2

*

*

*

x = b

b + b + bx =

b + bx =

is minimized by

is minimized by

is minimized by

(i) p = 1

(ii) p = 2

(iii) p =

16

The Least Squares Problem

• In general, is not differentiable for p = 1

or p = ∞

• The choice of p = 2 (Euclidean norm) has become well

established given its least-squares fit interpretation

• The problem is tractable for 2 major

reasons

– First, the function is differentiable

−Ax bp

17

−2

x

Ax bn

min ¡

( ) ( )- −

22

2

1 1x Ax b a x

2 2

mT

i

ii =1

= = b

The Least Squares Problem, cont.

– Second, the Euclidean norm is preserved under orthogonal

transformations:

with Q an arbitrary orthogonal matrix; that is, Q

satisfies

( ) − −22

Q A x Q b Ax bT T =

QQ Q Q I QT T n × n= = ¡

18

The Least Squares Problem, cont.

• We introduce next the basic underlying assumption:

A is full rank, i.e., the columns of A constitute a set

of linearly independent vectors

• This assumption implies that the rank of A is n

because n ≤ m since we are dealing with an

overdetermined system

• Fact: The least squares solution x* satisfies

A A x A bT T

=

19

( ) ( )

( )( )

( )

-

− −

= = − −

= −

−

2

2

xx

x

1 1x Ax b x A Ax x A b b A x b b

2 2

x 10 x A Ax x A b b A x b b

x x 2

1x A A x 2x A b b b

x 2

A A x A b

T T T T T T

T T T T T T

T T T T T

T T

= = +

+

+

=

Proof of Fact

• Since by definition the least squares solution x*

minimizes at the optimum, the derivative of

this function zero:

( )•

20

Implications

• This underlying assumption implies that

• Therefore, the fact that ATA is positive definite (p.d.)

follows from considering any x ≠ 0 and evaluating

which is the definition of a p.d. matrix

• We use the shorthand ATA > 0 for ATA being a

symmetric, positive definite matrix

is full rank x 0A Ax 0

2

2x A A x A x

T T= > 0 ,

21

Implications

• The underlying assumption that A is full rank and

therefore ATA is p.d. implies that there exists a unique

least squares solution

• Note: we use the inverse in a conceptual, rather than a

computational, sense

• The below formulation is known as the normal

equations, with the solution conceptually

straightforward

( )−

1

x A A A bT T=

( )A A x A bT T=

22

Example: Curve Fitting

• Say we wish to fit five points to a polynomial

curve of the form

• This can be written as

2

1 2 3f( , )t x x t x t= + +x

211 1

21 22 2

22 33 3

23 44 4

255 5

1

1

1

1

1

yt t

x yt t

x yt t

x yt t

yt t

= = =

Ax y

Example: Curve Fitting

• Say the points are t =[0,1,2,3,4] and y = [0,2,4,5,4].

Then

( )

1

2

3

0.886 0.257 0.086 0.143 0.086

0.771 0.186 0.571 0.386 0.371

0.143 0.071 0.143 0.071 0.14

1 0 0 0

1 1 1 2

1 2 4 4

1 3 9 5

1 4 16 4

0

2

4

5

4

3

x

x

x

−

− −

= − − − − −

= = =

1

Ax y

x A A A bT T

=

0.2

3.1

0.5

− −

x =

Implications

• An important implication of positive definiteness is

that we can factor ATA since ATA > 0

• The expression ATA = GTG is called the Cholesky

factorization of the symmetric positive definite

matrix ATA

1/2 1/2 =Α Α U D U U D D U G GT T T T

= =

25

A Least Squares Solution Algorithm

Step 1: Compute the lower triangular part of ATA

Step 2: Obtain the Cholesky Factorization

Step 3: Compute

Step 4: Solve for y using forward substitution in

and for x using backward substitution in

ˆG y b T

=

=Α Α G GT T

ˆΑ b bT

=

G x y

=

26

Note, our standard LU factorization approach would work;

we can just solve it twice as fast by taking advantage of

it being a symmetric matrix

Practical Considerations

• The two key problems that arise in practice with the

triangularization procedure are:

– First, while A maybe sparse, ATA is much less sparse and

consequently requires more computing resources for the

solution

• In particular, with ATA second neighbors are now connected! Large

networks are still sparse, just not as sparse

– Second, ATA may actually be numerically less well-

conditioned than A

27

1 1 0 0

1 2 1 0

0 1 2 1

0 0 1 1

− −

−

−

B =

Loss of Sparsity Example

• Assume the B matrix for a network is

• Then BTB is

• Second neighbors are now connected!

2 3 1 0

3 6 4 1

1 4 6 3

0 1 3 2

T

− − −

− −

−

B B =

28

Numerical Conditioning

• To understand the point on numerical ill-

conditioning, we need to introduce terminology

• We define the norm of a matrix to be

• This is the maximum singular value of B

m nB ¡

=

x 0

B xB

x

B

max

= maximum stretching of the matrix

29

Numerical Conditioning Example

• Say we have the matrix

• What value of x with a norm of 1 that maximizes ?

• What value of x with a norm of 1 that minimizes ?

30

10 0

0 0.1

=

B

=

x 0

B xB

x

B

max

= maximum stretching of the matrix

Bx

Bx


i.e., li is a root of the polynomial

• In other words, the 2 norm of

B is the square root of the

largest eigenvalue of BTB

,l l Ti

ii

= max , is an eigenvalue of B B

− ( ) B B IT

p λ = det λ

31

Keep in mind the

eigenvalues of a

p.d. matrix are

positive


• The conditioning number of a matrix B is defined as

• A well–conditioned matrix has a small value of

, close to 1; the larger the value of , the

more pronounced is the ill-conditioning

( )( )

( )

−

= =

1B

B B BB B

max

min

the max / min stretching

ratio of the matrix

( )B ( )B

32

Power System State Estimation (SE)

• The need is because in power system operations there is

a desire to do “what if” studies based upon the actual

“state” of the electric grid

– An example is an online power flow or contingency analysis

• Overall goal of SE is to come up with a power flow

model for the present "state" of the power system based

on the actual system measurements

• SE assumes the topology and parameters of the

transmission network are mostly known

• Measurements from SCADA and increasingly PMUs

• Overview is given in ECEN 615; more details in 61433

Power System State Estimation

• Problem can be formulated in a nonlinear, weighted

least squares form as

where J(x) is the scalar cost function, x are the state

variables (primarily bus voltage magnitudes and

angles), zi are the m measurements, f(x) relates the

states to the measurements and i is the assumed

standard deviation for each measurement

2

21

( )min ( )

mi i

ii

z fJ

=

−

xx =

34

Assumed Error

• Hence the goal is to decrease the error between the

measurements and the assumed model states x

• The i term weighs the various measurements,

recognizing that they can have vastly different

assumed errors

• Measurement error is assumed Gaussian (whether it is

or not is another question); outliers (bad

measurements) are often removed

2

21

( )min ( )

mi i

ii

z fJ

=

−

xx =

35

State Estimation for Linear Functions

• First we’ll consider the linear problem. That is

where

• Let R be defined as the diagonal matrix of the

variances (square of the standard deviations) for

each of the measurements

meas meas− = −z f(x) z Hx

2

1

2

2

2

0 0

0

0

0 0 m

=

R

36

State Estimation for Linear Functions

• We then differentiate J(x) w.r.t. x to determine the

value of x that minimizes this function

1

1 1

11 1

( )

( ) 2 2

At the minimum we have ( ) . So solving for gives

Tmeas meas

T meas T

T T meas

J

J

J

−

− −

−− −

= − −

= − +

=

=

x z Hx R z Hx

x H R z H R Hx

x 0 x

x H R H H R z

37

Simple DC System Example

• Say we have a two bus power system that we are

solving using the dc approximation. Say the line’s per

unit reactance is j0.1. Say we have power

measurements at both ends of the line. For simplicity

assume R=I. We would then like to estimate the bus

angles. Then

1 2 2 11 12 2 21

1

2

2.2, 2.00.1 0.1

10 10 200 200, ,

10 10 200 200

T

z P z P

− −= = = = − = =

− − = = =

− − x H H H

We have a problem since HTH is singular. This is because

of lack of an angle reference.38

Simple DC System Example, cont.

• Say we directly measure 1 (with a PMU) to be zero;

set this as the third measurement. Then1 2 2 1

1 12 2 21 3

1

1

1 1

2

1

2.2, 2.0 , 00.1 0.1

2.2 10 10201 200

, 2 , 10 10 ,200 200

0 1 0

2.2201 200 10 10 1

2200 200 10 10 0

0

T T mea

T

s

z P z P z

−

−− −

− −= = = = − = = =

− −

= = − = − = −

=

− −

− − −

=

x z H H H

x H R H H R z

x0

0.21

= −

39

Note that

the angles

are in

radians

Nonlinear Formulation

• A regular ac power system is nonlinear, so we need to

use an iterative solution approach. This is similar to

the Newton power flow. Here assume m

measurements and n state variables (usually bus

voltage magnitudes and angles) Then the Jacobian is

the H matrix1 1

1

1

( )( )

n

m m

n

f f

x x

f f

x x

= =

f xH x

x

Measurement Example

• Assume we measure the real and reactive power

flowing into one end of a transmission line; then the

zi-fi(x) functions for these two are

– Two measurements for four unknowns

• Other measurements, such as the flow at the other end,

and voltage magnitudes, add redundancy

( ) ( )( )

( ) ( )( )

2

2

cos sin

sin cos2

meas

ij i ij i j ij i j ij i j

capmeas

ij i ij i j ij i j ij i j

P V G V V G B

BQ V B V V G B

− − + − + −

− + + − − −

41

SE Iterative Solution Algorithm

• We then make an initial guess of x, x(0) and iterate,

calculating x each iteration

1 11

1 1

( 1) ( )

( )

( )

T T

m m

k k

z f

z f

−− −

+

−

= −

= +

x

x H R H H R

x

x x x

This is exactly the least

squares form developed

earlier with HTR-1H an n

by n matrix. This could be

solved with

Gaussian elimination, but

this isn't preferred

because the problem is

often ill-conditioned

42

Keep in mind that H is no

longer constant, but varies

as x changes. often ill-

conditioned

Nonlinear SE Solution Algorithm, Book Figure 9.11

Example: Two Bus Case

• Assume a two bus case with a generator supplying a

load through a single line with x=0.1 pu. Assume

measurements of the p/q flow on both ends of the line

(into line positive), and the voltage magnitude at both

the generator and the load end. So B12 = B21=10.0

( )( )

( )( )2

sin

cos

0

meas

ij i j ij i j

meas

ij i ij i j ij i j

meas

i i

P V V B

Q V B V V B

V V

− −

− + − −

− =We need to assume a reference angle

unless we directly measuring phase

44


• Let .

.

., .

.

.

12

12

1

21meas 0

2 i

21

2

1

2

P 2 02

Q 1 5V 1

P 1 98x 0 0 01

Q 1V 1

V 1 01

V 0 87

−

= = = = = −

Z

sin( ) cos( ) sin( )

cos( ) sin( ) cos( )

sin( ) cos( ) sin( )( )

cos( ) sin( ) cos( )

2 2 1 2 2 1 2

1 2 2 1 2 2 1 2

2 2 1 2 2 1 2

2 2 1 2 2 2 1 2

V 10 V V 10 V 10

20V V 10 V V 10 V 10

V 10 V V 10 V 10H

V 10 V V 10 20V V 10

1 0 0

0 0 1

− − − −

− − − − − −

= − −

x

We assume an

angle reference

of 1=0

45


• With a flat start guess we get.

.

.( ) , ( )

.

.

.

.

.

.

.

.

0 0

0 10 0 2 02

10 0 10 1 5

0 10 0 1 98H

10 0 10 1

1 0 0 0 01

0 0 1 0 13

0 0001 0 0 0 0 0

0 0 0001 0 0 0 0

0 0 0 0001 0 0 0

0 0 0 0 0001 0 0

0 0 0 0 0 0001 0

0 0 0 0 0 0 0001

−

− −

= − = − −

−

=

x z f x

R

46


1 6

11 0 1 1

2.01 0 2

1 0 2 0

2 0 2.01

2.02

1.51.003

1.980.2

10.8775

0.01

0.13

T

T T

e−

−− −

−

= −

− = + = − − −

H R H

x x H R H H R

47

Assumed SE Measurement Accuracy

• The assumed measurement standard deviations can

have a significant impact on the resultant solution, or

even whether the SE converges

• The assumption is a Gaussian (normal) distribution of

the error with no bias

48

SE Observability

• In order to estimate all n states we need at least n

measurements. However, where the measurements are

located is also important, a topic known as observability

– In order for a power system to be fully observable usually we

need to have a measurement available no more than one bus

away

– At buses we need to have at least measurements on all the

injections into the bus except one (including loads and gens)

– Loads are usually flows on feeders, or the flow into a

transmission to distribution transformer

– Generators are usually just injections from the GSU

49

Pseudo Measurements

• Pseudo measurements are used at buses in which there

is no load or generation; that is, the net injection into

the bus is know with high accuracy to be zero

– In order to enforce the net power balance at a bus we need to

include an explicit net injection measurement

• To increase observability sometimes estimated values

are used for loads, shunts and generator outputs

– These “measurements” are represented as having a higher

much standard deviation

50

SE Observability Example

51

lecture 15: least squares, state estimation

Documents