lecture 15: least squares, state estimation
TRANSCRIPT
ECEN 615Methods of Electric Power
Systems Analysis
Lecture 15: Least Squares, State Estimation
Prof. Tom Overbye
Dept. of Electrical and Computer Engineering
Texas A&M University
ECEN 615Methods of Electric Power
Systems Analysis
Lecture 18: State Estimation
Prof. Tom Overbye
Dept. of Electrical and Computer Engineering
Texas A&M University
Announcements
• Starting reading Chapter 9
• Homework 4 is due on Thursday October 15.
2
UTC Revisited
• We can now revisit the uncommitted transfer
capability (UTC) calculation using PTDFs and LODFs
• Recall trying to determine maximum transfer between
two areas (or buses in our example)
• For base case maximums are quickly determined with
PTDFs
( )
( )
( )
( ), w
0max
0
m n w0
f fu min
− =
Note we are ignoring zero (or small) PTDFs; would also need
to consider flow reversal3
UTC Revisited
• For the contingencies we use
• Then as before
( )
( ) ( )( )
( ) ( )
1
,( )k
w
max 0 k 0
k
m n kw
0
f f d fu min
− −
=
( ) (1)
, , ,,
0
m n m n m nu min u u=
We would need to check all contingencies! Also,
this is just a linear estimate and is not considering
voltage violations.
4
Five Bus Example
2, 3,w t= ( ) f 42 , 34 , 67 , 118 , 33 , 100T0
=
f 150 , 400 , 150 , 150 , 150 , 1,000Tmax =
Line 1
Line 2
Line 3
Line 6
Line 5
Line 4slack
1.050 pu
42 MW
67 MW
100 MW
118 MW
1.040 pu
1.042 pu
A
MVA
A
MVA
A
MVA
1.042 pu
A
MVA
1.044 pu
MW200
258 MW
MW118
260 MW
100 MW
MW100
A
MVA
One Two
Three
Four
Five
34 MW
33 MW
5
Five Bus Example
( )
( )
( )
( )2,2
150 42 400 34 150 67 150 118 150 33, , , ,
0.2727 0.1818 0.0909 0.7273 0.0909
44.0
w
0max
0
w0
f fu min
min
− =
− − − − − =
=
Therefore, for the base case
6
Five Bus Example
• For the contingency case corresponding to the outage
of the line 2
The limiting value is line 4
Hence the UTC is limited by the contingency to 23.0
( ) ( )2
( )
( ) 2 ( )
2(1)
2,3 2( )w
max 0 0
w0
f f d fu min
− −
=
( )
( ) 2 ( )
2
2( )
150 118 0.4 34
0.8
max 0 0
w
f f d f
− − − − =
7
Additional Comments
• Distribution factors are defined as small signal
sensitivities, but in practice, they are also used for
simulating large signal cases
• Distribution factors are widely used in the operation of
the electricity markets where the rapid evaluation of the
impacts of each transaction on the line flows is required
• Applications to actual system show that the distribution
factors provide satisfactory results in terms of accuracy
• For multiple applications that require fast turn around
time, distribution factors are used very widely,
particularly, in the market environment
• They do not work well with reactive power!8
Least Squares
• So far we have considered the solution of Ax = b in
which A is a square matrix; as long as A is
nonsingular there is a single solution
– That is, we have the same number of equations (m) as
unknowns (n)
• Many problems are overdetermined in which there
more equations than unknowns (m > n)
– Overdetermined systems are usually inconsistent, in which no
value of x exactly solves all the equations
• Underdetermined systems have more unknowns than
equations (m < n); they never have a unique solution
but are usually consistent9
Method of Least Squares
• The least squares method is a solution approach for
determining an approximate solution for an
overdetermined system
• If the system is inconsistent, then not all of the
equations can be exactly satisfied
• The difference for each equation between its exact
solution and the estimated solution is known as the error
• Least squares seeks to minimize the sum of the squares
of the errors
• Weighted least squares allows differ weights for the
equations 10
Least Squares Solution History
• The method of least squares developed from trying to
estimate actual values from a number of measurements
• Several persons in the 1700's, starting with Roger
Cotes in 1722, presented methods for trying to decrease
model errors from using multiple measurements
• Legendre presented a formal description of the method
in 1805; evidently Gauss claimed he did it in 1795
• Method is widely used in power systems, with state
estimation the best known application, dating from
Fred Schweppe's work in 1970
11
Least Squares and Sparsity
• In many contexts least squares is applied to problems
that are not sparse. For example, using a number of
measurements to optimally determine a few values
– Regression analysis is a common example, in which a line or
other curve is fit to potentially many points)
– Each measurement impacts each model value
• In the classic power system application of state
estimation the system is sparse, with measurements
only directly influencing a few states
– Power system analysis classes have tended to focus on
solution methods aimed at sparse systems; we'll consider both
sparse and nonsparse solution methods
12
Least Squares Problem
• Consider
or
)
)
)
=
111 12 13 1 1 1
2
21 22 22 2 2 2
1 2 3
(a
(ax
(a
Tn
T
n
m T
n mm m m mn
a a a a x b
a a a a x b=
x ba a a a
, ,m n n m= Ax b A x b¡ ¡ ¡
13
Least Squares Solution
• We write (ai)T for the row i of A and ai is a column
vector
• Here, m ≥ n and the solution we are seeking is that
which minimizes Ax - b p, where p denotes some
norm
• Since usually an overdetermined system has no exact
solution, the best we can do is determine an x that
minimizes the desired norm.
14
Choice of p
• We discuss the choice of p in terms of a specific
example
• Consider the equation Ax = b with
(hence three equations and one unknown)
• We consider three possible choices for p:
1
2 1 2 3
3
1
A 1 b
1
b
= = b b b b 0
b
with
15
Choice of p
−
−
−
1 2
1 2 3
2
1 3
Ax b
Ax b3
Ax b2
*
*
*
x = b
b + b + bx =
b + bx =
is minimized by
is minimized by
is minimized by
(i) p = 1
(ii) p = 2
(iii) p =
16
The Least Squares Problem
• In general, is not differentiable for p = 1
or p = ∞
• The choice of p = 2 (Euclidean norm) has become well
established given its least-squares fit interpretation
• The problem is tractable for 2 major
reasons
– First, the function is differentiable
−Ax bp
17
−2
x
Ax bn
min ¡
( ) ( )- −
22
2
1 1x Ax b a x
2 2
mT
i
ii =1
= = b
The Least Squares Problem, cont.
– Second, the Euclidean norm is preserved under orthogonal
transformations:
with Q an arbitrary orthogonal matrix; that is, Q
satisfies
( ) − −22
Q A x Q b Ax bT T =
QQ Q Q I QT T n × n= = ¡
18
The Least Squares Problem, cont.
• We introduce next the basic underlying assumption:
A is full rank, i.e., the columns of A constitute a set
of linearly independent vectors
• This assumption implies that the rank of A is n
because n ≤ m since we are dealing with an
overdetermined system
• Fact: The least squares solution x* satisfies
A A x A bT T
=
19
( ) ( )
( )( )
( )
-
− −
= = − −
= −
−
2
2
xx
x
1 1x Ax b x A Ax x A b b A x b b
2 2
x 10 x A Ax x A b b A x b b
x x 2
1x A A x 2x A b b b
x 2
A A x A b
T T T T T T
T T T T T T
T T T T T
T T
= = +
+
+
=
Proof of Fact
• Since by definition the least squares solution x*
minimizes at the optimum, the derivative of
this function zero:
( )•
20
Implications
• This underlying assumption implies that
• Therefore, the fact that ATA is positive definite (p.d.)
follows from considering any x ≠ 0 and evaluating
which is the definition of a p.d. matrix
• We use the shorthand ATA > 0 for ATA being a
symmetric, positive definite matrix
is full rank x 0A Ax 0
2
2x A A x A x
T T= > 0 ,
21
Implications
• The underlying assumption that A is full rank and
therefore ATA is p.d. implies that there exists a unique
least squares solution
• Note: we use the inverse in a conceptual, rather than a
computational, sense
• The below formulation is known as the normal
equations, with the solution conceptually
straightforward
( )−
1
x A A A bT T=
( )A A x A bT T=
22
Example: Curve Fitting
• Say we wish to fit five points to a polynomial
curve of the form
• This can be written as
2
1 2 3f( , )t x x t x t= + +x
211 1
21 22 2
22 33 3
23 44 4
255 5
1
1
1
1
1
yt t
x yt t
x yt t
x yt t
yt t
= = =
Ax y
Example: Curve Fitting
• Say the points are t =[0,1,2,3,4] and y = [0,2,4,5,4].
Then
( )
1
2
3
0.886 0.257 0.086 0.143 0.086
0.771 0.186 0.571 0.386 0.371
0.143 0.071 0.143 0.071 0.14
1 0 0 0
1 1 1 2
1 2 4 4
1 3 9 5
1 4 16 4
0
2
4
5
4
3
x
x
x
−
− −
= − − − − −
= = =
1
Ax y
x A A A bT T
=
0.2
3.1
0.5
− −
x =
Implications
• An important implication of positive definiteness is
that we can factor ATA since ATA > 0
• The expression ATA = GTG is called the Cholesky
factorization of the symmetric positive definite
matrix ATA
1/2 1/2 =Α Α U D U U D D U G GT T T T
= =
25
A Least Squares Solution Algorithm
Step 1: Compute the lower triangular part of ATA
Step 2: Obtain the Cholesky Factorization
Step 3: Compute
Step 4: Solve for y using forward substitution in
and for x using backward substitution in
ˆG y b T
=
=Α Α G GT T
ˆΑ b bT
=
G x y
=
26
Note, our standard LU factorization approach would work;
we can just solve it twice as fast by taking advantage of
it being a symmetric matrix
Practical Considerations
• The two key problems that arise in practice with the
triangularization procedure are:
– First, while A maybe sparse, ATA is much less sparse and
consequently requires more computing resources for the
solution
• In particular, with ATA second neighbors are now connected! Large
networks are still sparse, just not as sparse
– Second, ATA may actually be numerically less well-
conditioned than A
27
1 1 0 0
1 2 1 0
0 1 2 1
0 0 1 1
− −
−
−
B =
Loss of Sparsity Example
• Assume the B matrix for a network is
• Then BTB is
• Second neighbors are now connected!
2 3 1 0
3 6 4 1
1 4 6 3
0 1 3 2
T
− − −
− −
−
B B =
28
Numerical Conditioning
• To understand the point on numerical ill-
conditioning, we need to introduce terminology
• We define the norm of a matrix to be
• This is the maximum singular value of B
m nB ¡
=
x 0
B xB
x
B
max
= maximum stretching of the matrix
29
Numerical Conditioning Example
• Say we have the matrix
• What value of x with a norm of 1 that maximizes ?
• What value of x with a norm of 1 that minimizes ?
30
10 0
0 0.1
=
B
=
x 0
B xB
x
B
max
= maximum stretching of the matrix
Bx
Bx
Numerical Conditioning
i.e., li is a root of the polynomial
• In other words, the 2 norm of
B is the square root of the
largest eigenvalue of BTB
,l l Ti
ii
= max , is an eigenvalue of B B
− ( ) B B IT
p λ = det λ
31
Keep in mind the
eigenvalues of a
p.d. matrix are
positive
Numerical Conditioning
• The conditioning number of a matrix B is defined as
• A well–conditioned matrix has a small value of
, close to 1; the larger the value of , the
more pronounced is the ill-conditioning
( )( )
( )
−
= =
1B
B B BB B
max
min
the max / min stretching
ratio of the matrix
( )B ( )B
32
Power System State Estimation (SE)
• The need is because in power system operations there is
a desire to do “what if” studies based upon the actual
“state” of the electric grid
– An example is an online power flow or contingency analysis
• Overall goal of SE is to come up with a power flow
model for the present "state" of the power system based
on the actual system measurements
• SE assumes the topology and parameters of the
transmission network are mostly known
• Measurements from SCADA and increasingly PMUs
• Overview is given in ECEN 615; more details in 61433
Power System State Estimation
• Problem can be formulated in a nonlinear, weighted
least squares form as
where J(x) is the scalar cost function, x are the state
variables (primarily bus voltage magnitudes and
angles), zi are the m measurements, f(x) relates the
states to the measurements and i is the assumed
standard deviation for each measurement
2
21
( )min ( )
mi i
ii
z fJ
=
−
xx =
34
Assumed Error
• Hence the goal is to decrease the error between the
measurements and the assumed model states x
• The i term weighs the various measurements,
recognizing that they can have vastly different
assumed errors
• Measurement error is assumed Gaussian (whether it is
or not is another question); outliers (bad
measurements) are often removed
2
21
( )min ( )
mi i
ii
z fJ
=
−
xx =
35
State Estimation for Linear Functions
• First we’ll consider the linear problem. That is
where
• Let R be defined as the diagonal matrix of the
variances (square of the standard deviations) for
each of the measurements
meas meas− = −z f(x) z Hx
2
1
2
2
2
0 0
0
0
0 0 m
=
R
36
State Estimation for Linear Functions
• We then differentiate J(x) w.r.t. x to determine the
value of x that minimizes this function
1
1 1
11 1
( )
( ) 2 2
At the minimum we have ( ) . So solving for gives
Tmeas meas
T meas T
T T meas
J
J
J
−
− −
−− −
= − −
= − +
=
=
x z Hx R z Hx
x H R z H R Hx
x 0 x
x H R H H R z
37
Simple DC System Example
• Say we have a two bus power system that we are
solving using the dc approximation. Say the line’s per
unit reactance is j0.1. Say we have power
measurements at both ends of the line. For simplicity
assume R=I. We would then like to estimate the bus
angles. Then
1 2 2 11 12 2 21
1
2
2.2, 2.00.1 0.1
10 10 200 200, ,
10 10 200 200
T
z P z P
− −= = = = − = =
− − = = =
− − x H H H
We have a problem since HTH is singular. This is because
of lack of an angle reference.38
Simple DC System Example, cont.
• Say we directly measure 1 (with a PMU) to be zero;
set this as the third measurement. Then1 2 2 1
1 12 2 21 3
1
1
1 1
2
1
2.2, 2.0 , 00.1 0.1
2.2 10 10201 200
, 2 , 10 10 ,200 200
0 1 0
2.2201 200 10 10 1
2200 200 10 10 0
0
T T mea
T
s
z P z P z
−
−− −
− −= = = = − = = =
− −
= = − = − = −
=
− −
− − −
=
x z H H H
x H R H H R z
x0
0.21
= −
39
Note that
the angles
are in
radians
Nonlinear Formulation
• A regular ac power system is nonlinear, so we need to
use an iterative solution approach. This is similar to
the Newton power flow. Here assume m
measurements and n state variables (usually bus
voltage magnitudes and angles) Then the Jacobian is
the H matrix1 1
1
1
( )( )
n
m m
n
f f
x x
f f
x x
= =
f xH x
x
Measurement Example
• Assume we measure the real and reactive power
flowing into one end of a transmission line; then the
zi-fi(x) functions for these two are
– Two measurements for four unknowns
• Other measurements, such as the flow at the other end,
and voltage magnitudes, add redundancy
( ) ( )( )
( ) ( )( )
2
2
cos sin
sin cos2
meas
ij i ij i j ij i j ij i j
capmeas
ij i ij i j ij i j ij i j
P V G V V G B
BQ V B V V G B
− − + − + −
− + + − − −
41
SE Iterative Solution Algorithm
• We then make an initial guess of x, x(0) and iterate,
calculating x each iteration
1 11
1 1
( 1) ( )
( )
( )
T T
m m
k k
z f
z f
−− −
+
−
= −
= +
x
x H R H H R
x
x x x
This is exactly the least
squares form developed
earlier with HTR-1H an n
by n matrix. This could be
solved with
Gaussian elimination, but
this isn't preferred
because the problem is
often ill-conditioned
42
Keep in mind that H is no
longer constant, but varies
as x changes. often ill-
conditioned
Nonlinear SE Solution Algorithm, Book Figure 9.11
Example: Two Bus Case
• Assume a two bus case with a generator supplying a
load through a single line with x=0.1 pu. Assume
measurements of the p/q flow on both ends of the line
(into line positive), and the voltage magnitude at both
the generator and the load end. So B12 = B21=10.0
( )( )
( )( )2
sin
cos
0
meas
ij i j ij i j
meas
ij i ij i j ij i j
meas
i i
P V V B
Q V B V V B
V V
− −
− + − −
− =We need to assume a reference angle
unless we directly measuring phase
44
Example: Two Bus Case
• Let .
.
., .
.
.
12
12
1
21meas 0
2 i
21
2
1
2
P 2 02
Q 1 5V 1
P 1 98x 0 0 01
Q 1V 1
V 1 01
V 0 87
−
= = = = = −
Z
sin( ) cos( ) sin( )
cos( ) sin( ) cos( )
sin( ) cos( ) sin( )( )
cos( ) sin( ) cos( )
2 2 1 2 2 1 2
1 2 2 1 2 2 1 2
2 2 1 2 2 1 2
2 2 1 2 2 2 1 2
V 10 V V 10 V 10
20V V 10 V V 10 V 10
V 10 V V 10 V 10H
V 10 V V 10 20V V 10
1 0 0
0 0 1
− − − −
− − − − − −
= − −
x
We assume an
angle reference
of 1=0
45
Example: Two Bus Case
• With a flat start guess we get.
.
.( ) , ( )
.
.
.
.
.
.
.
.
0 0
0 10 0 2 02
10 0 10 1 5
0 10 0 1 98H
10 0 10 1
1 0 0 0 01
0 0 1 0 13
0 0001 0 0 0 0 0
0 0 0001 0 0 0 0
0 0 0 0001 0 0 0
0 0 0 0 0001 0 0
0 0 0 0 0 0001 0
0 0 0 0 0 0 0001
−
− −
= − = − −
−
=
x z f x
R
46
Example: Two Bus Case
1 6
11 0 1 1
2.01 0 2
1 0 2 0
2 0 2.01
2.02
1.51.003
1.980.2
10.8775
0.01
0.13
T
T T
e−
−− −
−
= −
− = + = − − −
H R H
x x H R H H R
47
Assumed SE Measurement Accuracy
• The assumed measurement standard deviations can
have a significant impact on the resultant solution, or
even whether the SE converges
• The assumption is a Gaussian (normal) distribution of
the error with no bias
48
SE Observability
• In order to estimate all n states we need at least n
measurements. However, where the measurements are
located is also important, a topic known as observability
– In order for a power system to be fully observable usually we
need to have a measurement available no more than one bus
away
– At buses we need to have at least measurements on all the
injections into the bus except one (including loads and gens)
– Loads are usually flows on feeders, or the flow into a
transmission to distribution transformer
– Generators are usually just injections from the GSU
49
Pseudo Measurements
• Pseudo measurements are used at buses in which there
is no load or generation; that is, the net injection into
the bus is know with high accuracy to be zero
– In order to enforce the net power balance at a bus we need to
include an explicit net injection measurement
• To increase observability sometimes estimated values
are used for loads, shunts and generator outputs
– These “measurements” are represented as having a higher
much standard deviation
50
SE Observability Example
51