unit 3 understanding matrices - web.stanford.edu
Post on 25-Jul-2022
3 Views
Preview:
TRANSCRIPT
Understanding Matrices
Eigensystems
⢠Eigenvectors - special directions ð£ð in which a matrix only applies scaling
⢠Eigenvalues - the amount ðð of that scaling
⢠Right Eigenvectors (or simply eigenvectors) satisfy ðŽð£ð = ððð£ð⢠Eigenvectors represent directions, so ðŽ(ðŒð£ð) = ðð ðŒð£ð is also true for all ðŒ
⢠Left Eigenvectors satisfy ð¢ðððŽ = ððð¢ð
ð (or ðŽðð¢ð = ððð¢ð)
⢠Diagonal matrices have eigenvalues on the diagonal, and eigenvectors Æžðð2 00 3
10
= 210
and 2 00 3
01
= 301
⢠Upper/lower triangular matrices also have eigenvalues on the diagonal
2 10 3
10
= 210
and 2 10 3
11
= 311
Complex Numbers
⢠Complex numbers may appear in both eigenvalues and eigenvectors0 1â1 0
1ð
= ð1ð
⢠Recall: complex conjugate: ð + ðð â = ð â ðð
⢠Hermitian Matrix: ðŽâð = ðŽ (often, ðŽâð is written as ðŽð»)
⢠ðŽð£ = ðð£ implies ðŽð£ âð = ðð£ âð or ð£âððŽ = ðâð£âð
⢠Using this, ðŽð£ = ðð£ implies ð£âððŽð£ = ð£âððð£ or ðâð£âðð£ = ðð£âðð£ or ðâ = ð
⢠Thus, Hermitian matrices have ð â ð (no complex eigenvalues)
⢠Symmetric real-valued matrices have real-valued eigenvalues/eigenvectors
⢠However, complex eigenvectors work too, e.g. ðŽ(ðŒð£ð) = ðð ðŒð£ð with ðŒcomplex
Spatial Deformation
⢠Let ð = Ïð ðŒðð£ð, so that ðŽð = Ïð ðŒððŽð£ð = Ïð ðŒððð ð£ð⢠Then, ðŽ tilts ð away from directions with smaller eigenvalues and towards
directions with larger eigenvalues
⢠Large ðð stretch components in their associated ð£ð directions
⢠Small ðð squish components in their associated ð£ð directions
⢠Negative ðð flip the sign of components in their associated ð£ð directions
Spatial Deformation
⢠Consider every point on the unit circle (green) as a vector ð = Ïð ðŒðð£ð , and remap each point via ðŽð = Ïð ðŒððð ð£ð
⢠The remapped shape (blue) is more elliptical then the original circle (green)
⢠The circle is stretched along the axis with the larger eigenvalue, and compressed along the axis with the smaller eigenvalue
⢠The larger the ratio of eigenvalues, the more elliptical the circle becomes
⢠This is true for all circles (representing all points in the plane)
Solving Linear Systems
⢠Perturb the right hand side from ð to ð, and solve ðŽ Æžð = ð to find Æžð
⢠Note how ð and Æžð are more separated than ð and ð, i.e. the solution is more perturbed than the right hand side is
⢠Small changes in the right hand side lead to larger changes in the solution
⢠Small algorithmic errors are also amplified,
since they change ðŽâ1ð to áðŽâ1ð which is
similar to changing ðŽâ1ð to ðŽâ1 ð⢠The amount of amplification is proportional
to the ratio of the eigenvalues
Preconditioning
⢠Suppose ðŽ has large eigenvalue ratios that make ðŽð = ð difficult to solve⢠Suppose one had an approximate guess for the inverse, i.e an áðŽâ1 â ðŽâ1
⢠Then, transform ðŽð = ð into áðŽâ1ðŽð = áðŽâ1ð or áðŒð = à·šð⢠Typically, a bit more involved than this, but conceptually the same
⢠áðŒ is not the identity, so there is still more work to do in order to find ð⢠However, áðŒ has eigenvalues with similar magnitudes (clusters work too), making áðŒð = à·šð far easier to solve than a poorly conditioned ðŽð = ð
Preconditioning works GREAT!
⢠It is best to re-scale ellipsoids along eigenvector axes, but scaling along the coordinate axes (diagonal/Jacobi preconditioning) can work well too
Rectangular Matrices (Rank)
⢠An ðð¥ð rectangular matrix has ð rows and ð columns
⢠(Note: these comments also hold for square matrices with ð = ð)
⢠The columns span a space, and the unknowns are weights on each column (recall ðŽð = Ïð ðððð )
⢠A matrix with ð columns has maximum rank ð
⢠The actual rank depends on how many of the columns are linearly independent from one another
⢠Each column has length ð (which is the number of rows)
⢠Thus, the columns live in an ð dimensional space, and at best can span that whole space
⢠That is, there is a maximum of ð independent columns (that could exist)
⢠Overall, a matrix at most has rank equal to the minimum of ð and ð
⢠Both considerations are based on looking at the columns (which are scaled by the unknowns)
Rows vs. Columns
⢠One can find discussions on rows, row spaces, etc. that are used for various purposes
⢠Although these are fine discussions in regards to matrices/mathematics, they are unnecessary for an intuitive understanding of high dimensional vector spaces (and as such can be ignored)
⢠The number of columns is identical to number of variables, which depends on the parameters of the problem⢠E.g. the unknown parameters that govern a neural network architecture
⢠The number of rows depends on the amount of data used, and adding/removing data does not intrinsically effect the nature of the problem⢠E. g. it does not change the network architecture, but merely perturbs the ascertained
values of the unknown parameters
Singular Value Decomposition (SVD)
⢠Factorization of any size ðð¥ð matrix: ðŽ = ððŽðð
⢠ðŽ is ðð¥ð diagonal with non-negative diagonal entries (called singular values)
⢠ð is ðð¥ð orthogonal, ð is ðð¥ð orthogonal (their columns are called singular vectors)⢠Orthogonal matrices have orthonormal columns (an orthonormal basis), so their transpose
is their inverse. They preserve inner products, and thus are rotations, reflections, and combinations thereof
⢠If A has complex entries, then ð and ð are unitary (conjugate transpose is their inverse)
⢠Introduced and rediscovered many times: Beltrami 1873, Jordan 1875, Sylvester 1889, Autonne1913, Eckart and Young 1936. Pearson introduced principle component analysis (PCA) in 1901, which uses SVD. Numerical methods by Chan, Businger, Golub, Kahan, etc.
(Rectangular) Diagonal Matrices
⢠All off-diagonal entries are 0 ⢠Diagonal entries are ððð , and off diagonal entries are ððð with ð â ð
⢠E.g. 5 00 20 0
ð1ð2
=10â1ðŒ
has 5ð1 = 10 and 2ð2 = â1, so ð1 = 2 and ð2 = â.5
⢠Note that ðŒ â 0 imposes a âno solutionâ condition (even though ð1 and ð2 are well-specified)
⢠E.g. 5 0 00 2 0
ð1ð2ð3
=10â1
has 5ð1 = 10 and 2ð2 = â1, so ð1 = 2 and ð2 =
â .5⢠Note that there are âinfinite solutionsâ for ð3 (even though ð1 and ð2 are well-specified)
⢠A zero on the diagonal indicates a singular system, which has either no solution (e.g. 0ð1 = 10) or infinite solutions (e.g. 0ð1 = 0)
Singular Value Decomposition (SVD)
⢠ðŽððŽ = ððŽðððððŽðð = ð ðŽððŽ ðð, so (ðŽððŽ)ð£ = ðð£ gives ðŽððŽ (ððð£) = ð(ððð£)
⢠ðŽððŽ is nð¥ð diagonal with eigenvectors Æžðð, so Æžðð = ððð£ and ð£ = ð Æžðð⢠That is, the columns of ð are the eigenvectors of ðŽððŽ
⢠ðŽðŽð = ððŽððððŽððð = ð ðŽðŽð ðð, so (ðŽðŽð)ð£ = ðð£ gives ðŽðŽð (ððð£) = ð(ððð£)
⢠ðŽðŽð is mð¥ð diagonal with eigenvectors Æžðð, so Æžðð = ððð£ and ð£ = ð Æžðð⢠That is, the columns of ð are the eigenvectors of ðŽðŽð
⢠When ð â ð, either ðŽððŽ or ðŽðŽð is larger and contains extra zeros on the diagonal
⢠Their other diagonal entries are the squares of the singular values
⢠That is, the singular values are the (non-negative) square roots of the non-extra eigenvalues of ðŽððŽ and ðŽðŽð
⢠Note that both ðŽððŽ and ðŽðŽð are symmetric positive semi-definite, and thus easy to work with
⢠E.g. symmetry means their eigensystem (and thus the SVD) has no complex numbers when ðŽ doesnât
Example (Tall Matrix)
⢠Consider size 4ð¥3 matrix ðŽ =
1 2 347
58
69
10 11 12
⢠Label the columns ð1 =
14710
, ð2 =
25811
, ð3 =
36912
⢠Since ð1 and ð2 point in different directions, ðŽ is at least rank 2
⢠ð3 = 2ð2 â ð1, so the third column is in the span of the first two columns
⢠Thus, ðŽ is only rank 2 (not rank 3)
Example (SVD)
ðŽ =
1 2 347
58
69
10 11 12
=
.141 .825 â.420 â.351
.344
.547.426.028
.298 .782.644 â.509
.750 â.371 â.542 .079
25.5 0 000
1.290
00
0 0 0
.504 .574 .644â.761 â.057 .646.408 â.816 .408
⢠Singular values are 25.5, 1.29, and 0
⢠Singular value of 0 indicates that the matrix is rank deficient
⢠The rank of a matrix is equal to its number of nonzero singular values
Derivation from ðŽððŽ and ðŽðŽð
ðŽ =
1 2 347
58
69
10 11 12
=
.141 .825 â.420 â.351
.344
.547.426.028
.298 .782.644 â.509
.750 â.371 â.542 .079
25.5 0 000
1.290
00
0 0 0
.504 .574 .644â.761 â.057 .646.408 â.816 .408
⢠ðŽððŽ is size 3ð¥3 and has 3 eigenvectors (seen in ð)
⢠The square roots of the 3 eigenvalues of ðŽððŽ are seen in ðŽ (color coded to the eigenvectors)
⢠ðŽðŽð is size 4ð¥4 and has 4 eigenvectors (seen in ð)
⢠The square roots of 3 of the eigenvalues of ðŽðŽð are seen in ðŽâ¢ The 4th eigenvalue of ðŽðŽð is an extra eigenvalue of 0
Understanding ðŽð
ðŽ =
1 2 347
58
69
10 11 12
=
.141 .825 â.420 â.351
.344
.547.426.028
.298 .782.644 â.509
.750 â.371 â.542 .079
25.5 0 000
1.290
00
0 0 0
.504 .574 .644â.761 â.057 .646.408 â.816 .408
⢠ðŽ maps from ð 3 to ð 4
⢠ðŽð first projects ð â ð 3 onto the 3 basis vectors in ð⢠Then, the associated singular values (diagonally) scale the results⢠Lastly, those scaled results are used as weights on the basis vectors in ð
Understanding ðŽð
ðŽð =
.141 .825 â.420 â.351
.344
.547.426.028
.298 .782.644 â.509
.750 â.371 â.542 .079
25.5 0 000
1.290
00
0 0 0
.504 .574 .644â.761 â.057 .646.408 â.816 .408
ð1ð2ð3
=
.141 .825 â.420 â.351
.344
.547.426.028
.298 .782.644 â.509
.750 â.371 â.542 .079
25.5 0 000
1.290
00
0 0 0
ð£1ðð
ð£2ðð
ð£3ðð
=
.141 .825 â.420 â.351
.344
.547.426.028
.298 .782.644 â.509
.750 â.371 â.542 .079
ð1ð£1ðð
ð2ð£2ðð
ð3ð£3ðð0
= ð¢1ð1ð£1ðð + ð¢2ð2ð£2
ðð + ð¢3ð3ð£3ðð + ð¢40
⢠ðŽð projects ð onto the basis vectors in ð, scales by the associated singular values, and uses those results as weights on the basis vectors in ð
Extra Dimensions
ðŽ =
1 2 347
58
69
10 11 12
=
.141 .825 â.420 â.351
.344
.547.426.028
.298 .782.644 â.509
.750 â.371 â.542 .079
25.5 0 000
1.290
00
0 0 0
.504 .574 .644â.761 â.057 .646.408 â.816 .408
⢠The 3D space of vector inputs can only span a 3D subspace of ð 4
⢠The last (green) column of ð represents the unreachable dimension, orthogonal to the range of ðŽ, and is always multiplied by 0
⢠One can delete this column and the associated portion of Σ (and still obtain a valid factorization)
Zero Singular Values
ðŽ =
1 2 347
58
69
10 11 12
=
.141 .825 â.420 â.351
.344
.547.426.028
.298 .782.644 â.509
.750 â.371 â.542 .079
25.5 0 000
1.290
00
0 0 0
.504 .574 .644â.761 â.057 .646.408 â.816 .408
⢠The 3rd singular value is 0, so ðŽ has a 1D null space that reduces the 3D input vectors to only 2 dimensions
⢠The associated (pink) terms make no contribution to the final result, and can also be deleted (still obtaining a valid factorization)
⢠The first 2 columns of ð span the 2D subset of ð 4 that comprises the range of ðŽ
Approximating ðŽ
ðŽ =
1 2 347
58
69
10 11 12
â
.141 .825 â.420 â.351
.344
.547.426.028
.298 .782.644 â.509
.750 â.371 â.542 .079
25.5 0 000
1.290
00
0 0 0
.504 .574 .644â.761 â.057 .646.408 â.816 .408
⢠The first singular value is much bigger than the second, and so represents the vast majority of what ðŽ does (note, the vectors in ð and ð are unit length)
⢠Thus, one could approximate ðŽ quite well by only using the terms associated with the largest singular value
⢠This is not a valid factorization, but an approximation (and the idea behind PCA)
Summary
⢠The columns of ð that do not correspond to ânonzeroâ singular values form an orthonormal basis for the null space of ðŽ
⢠The remaining columns of ð form an orthonormal basis for the space perpendicular to the null space of ðŽ (parameterizing meaningful inputs)
⢠The columns of ð corresponding to ânonzeroâ singular values form an orthonormal basis for the range of ðŽ
⢠The remaining columns of ð form an orthonormal basis for the (unattainable) space perpendicular to the range of ðŽ
⢠One can drop the columns of ð and ð that do not correspond to ânonzeroâ singular values and still obtain a valid factorization of ðŽ
⢠One can drop the columns of ð and ð that correspond to âsmall/smallerâ singular values and still obtain a reasonable approximation of ðŽ
Example (Wide Matrix)
ðŽ =1 4 7 102 5 8 113 6 9 12
=
.504 â.761 .408
.574 â.057 â.816
.644 .646 .408
25.5 0 0 00 1.29 0 00 0 0 0
.141 .344 .547 .750
.825â.420
.426
.298.028 â.371.644 â.542
â.351 .782 â.509 .079
⢠ðŽ maps from ð 4 to ð 3 and so has at least a 1D null space (green)
⢠The 3rd singular value is 0, and the associated (pink) terms make no contribution to the final result
Example (Wide Matrix)
ðŽ =1 4 7 102 5 8 113 6 9 12
=
.504 â.761 .408
.574 â.057 â.816
.644 .646 .408
25.5 0 0 00 1.29 0 00 0 0 0
.141 .344 .547 .750
.825â.420
.426
.298.028 â.371.644 â.542
â.351 .782 â.509 .079
⢠Only a 2D subspace of ð 4 matters, with the rest of ð 4 in the null space of ðŽ
⢠Only a 2D subspace of ð 3 is in the range of ðŽ
Notes
⢠The SVD is often unwieldy for computational purposes
⢠However, replacing matrices by their SVD can be quite useful/enlightening for theoretical pursuits
⢠Moreover, its theoretical underpinnings are often used to devise computational algorithms
⢠The SVD is unique under certain assumptions, such as all ðð ⥠0 and in descending order
⢠However, one can make both a ðð and its associated column in ð negative for an âalternate SVDâ (see e.g. âInvertible Finite Elements For Robust Simulation of Large Deformationâ, Irving et al. 2004)
SVD Construction (Important Detail)
⢠Let A =1 00 â1
so that ðŽððŽ = ðŽðŽð = ðŒ, and thus ð = ð = Σ = ðŒ
⢠But ðŽ â ðΣðð = ðŒ Whatâs wrong?
⢠Given a column vector ð£ð of ð, ðŽð£ð = ððŽððð£ð = ððŽ Æžðð = ððð Æžðð = ððð¢ð where ð¢ð is the corresponding column of ð
⢠ðŽð£1 =1 00 â1
10
=10
= ð¢1 but ðŽð£2 =1 00 â1
01
=0â1
â 01
= ð¢2
⢠Since ð and ð are orthonormal, their columns are unit length
⢠However, there are still two choices for the direction of each column
⢠Multiplying ð¢2 by â1 to get ð¢2 =0â1
makes ð = ðŽ, and thus ðŽ = ðΣðð as
desired
SVD Construction (Important Detail)
⢠An orthogonal matrix has determinant equal to ±1, where â1 indicates a reflection of the coordinate system
⢠If detð = â1, flip the direction of any column to make detð = 1 (so ð does not contain a reflection)
⢠Then, for each ð£ð, compare ðŽð£ð to ððð¢ð and flip the direction of ð¢ð when necessary in order to make ðŽð£ð = ððð¢ð
⢠detð = ± 1 and may contain a reflection
⢠When detð = â1, one can flip the sign of the smallest singular value in ðŽ to be negative, whilst also flipping the direction of the corresponding column in ð so that detð = 1
⢠This embeds the reflection into ðŽ and is called the polar-SVD (Irving et al. 2004)
Solving Linear Systems
⢠ðŽð = ð becomes ððŽððð = ð or ðŽ(ððð) = (ððð) or ðŽ Æžð = ð
⢠The unknowns ð are remapped into the space spanned by ð, and the right hand side ð is remapped into the space spanned by ð
⢠Every matrix is a diagonal matrix, when viewed in the right space
⢠Solve the diagonal system ðŽ Æžð = ð by dividing the entries of ð by the singular values ðð; then, ð = ð Æžð
⢠The SVD transforms the problem into an inherently diagonal space with eigenvectors along the coordinate axes
⢠Circles becoming ellipses (discussed earlier) is still problematic⢠Eccentricity is caused by ratios of singular values (since ð and ð are orthogonal matrices)
Condition Number
⢠The condition number of ðŽ is ðððð¥
ððððand measures closeness to being singular
⢠For a square matrix, it measures the difficulty in solving ðŽð = ð
⢠For a rectangular (and square) matrix, it measures how close the columns are to being linearly dependent⢠For a wide (rectangular) matrix, it ignores the extra columns that are guaranteed to be
linearly dependent (which is fine, because the associated variables lack any data)
⢠The condition number does not depend on the right hand side
⢠The condition number is always bigger than 1, and approaches â for nearly singular matrices
⢠Singular matrices have condition number equal to â, since ðððð = 0
Singular Matrices
⢠Diagonalize ðŽð = ð to ðŽ(ððð) = (ððð), e.g. ð1 00 ð2
Æžð1Æžð2
=ð1ð2
with Æžð1 =ð1
ð1, Æžð2 =
ð2
ð2
⢠Suppose ð1 â 0 and ð2 = 0; then, there is no unique solution:
⢠When ð2 = 0, there are infinite solutions for Æžð2 (but Æžð1 is still uniquely determined)
⢠When ð2 â 0, there is no solution for Æžð2, and ð is not in the range of ðŽ (but Æžð1 is still uniquely determined)
⢠Consider: ð1 00 ð20 0
Æžð1Æžð2
=
ð1ð2ð3
with Æžð1 =ð1
ð1, Æžð2 =
ð2
ð2
⢠When ð3 = 0, the last row adds no new information (one has extra redundant data)
⢠When ð3 â 0, the last row is false and there is no solution (but Æžð1and Æžð2 are still uniquely determined)
⢠Consider: ð1 0 00 ð2 0
Æžð1Æžð2Æžð3
=ð1ð2
with Æžð1 =ð1
ð1, Æžð2 =
ð2
ð2
⢠Infinite solutions work for Æžð3 (but Æžð1and Æžð2 are still uniquely determined)
Understanding Variables
⢠Consider any column ð of ðŽ
⢠When ðð â 0, one can state a value for Æžðð⢠When ðð = 0 or there is no ðð, then there is no information in the data for Æžðð
⢠This does not mean that other parameters cannot be adequately determined!
⢠Consider a row ð of ðŽ that is identically zero
⢠When ðð = 0, this row indicates that there is extra redundant data
⢠When ðð â 0, this row indicates that there is conflicting information in the data
⢠Conflicting information doesnât necessarily imply that all is lost, i.e. âno solutionâ; rather, it might merely mean that the data contains a bit of noise
⢠Regardless, in spite of any conflicting information, the determinable Æžðð represent the âbestâ that one can do
Norms
⢠Common norms: ð 1 = Ïð ðð , ð 2 = Ïð ðð2, ð â = max
ððð
⢠âAll norms are interchangeableâ is a theoretically valid statement (only)
⢠In practice, the âworst case scenarioâ (ð¿â) and the âaverageâ (ð¿1, ð¿2, etc.) are not interchangeable
⢠E.g. (100 people * 98.6ð + 1 person * 105ð)/(101 people) = 98.66ð
⢠Their average temperature is 98.66ð, but everyone is not âokâ
Matrix Norms
⢠Define the norm of a matrix ðŽ = maxðâ 0
ðŽð
ð, so:
⢠ðŽ 1 is the maximum absolute value column sum
⢠ðŽ â is the maximum absolute value row sum
⢠ðŽ 2 is the square root of the maximum eigenvalue of ðŽððŽ , i.e. the maximum singular value of ðŽ
⢠The condition number for solving (square matrix) ðŽð = ð is ðŽ 2 ðŽâ1 2
⢠Since ðŽâ1 = ðΣâ1ðð where Σâ1 has diagonal entries 1
ðð, ðŽâ1 2 =
1
ðððð
⢠Thus, ðŽ 2 ðŽâ1 2 =ðððð¥
ðððð
top related