large scale data analysis using deep learningukang/courses/17s-dl/l3-linear... · 2017-03-12 ·...

42
U Kang 1 Large Scale Data Analysis Using Deep Learning Linear Algebra U Kang Seoul National University

Upload: others

Post on 12-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

  • U Kang 1

    Large Scale Data Analysis UsingDeep Learning

    Linear Algebra

    U KangSeoul National University

  • U Kang 2

    In This Lecture

    Overview of linear algebra (but, not a comprehensive survey)

    Focused on the subset most relevant to deep learning

    For details on linear algebra, refer to Linear Algebra and Its Applications by Gilbert Strang

  • U Kang 3

    Scalar

    A scalar is a single number

    Integers, real number, rational numbers, etc.

    We denote it with italic font a, n, x

  • U Kang 4

    Vectors

    A vector is a 1-D array of numbers

    Can be real, binary, integer, etc. Notation for type and size

  • U Kang 5

    Matrices

    A matrix is a 2-D array of numbers

    Example notation for type and shape

  • U Kang 6

    Tensors

    A tensor is an array of numbers, that may have Zero dimensions, and be a scalar One dimension, and be a vector Two dimensions, and be a matrix Three dimensions or more Matrix over time Knowledge base (subject, verb, object) …

  • U Kang 7

    Matrix Transpose

    (AT)i,j = Aj,I

    (AB)T = BTAT

  • U Kang 8

    Matrix Product

    C = AB

  • U Kang 9

    Matrix Product

    Matrix product as sum of outer product

    Product with a diagonal matrix

  • U Kang 10

    Identity Matrix

    Example identity matrix: I3

  • U Kang 11

    Systems of Equations

    expands to

  • U Kang 12

    Solving Systems of Equations

    A linear system of equations can have: No solution 3x + 2y = 6, 3x + 2y = 12

    Many solutions 3x + 2y = 6, 6x + 4y = 12

    Exactly one solution: this means multiplication by the matrix is an invertible function 𝐴𝐴𝐴𝐴 = 𝑏𝑏 𝐴𝐴 = 𝐴𝐴−1𝑏𝑏

  • U Kang 13

    Matrix Inversion

    Matrix inverse: an inverse A-1 of an nxn matrix A satisfies

    𝐴𝐴𝐴𝐴−1 = 𝐴𝐴−1𝐴𝐴 = 𝐼𝐼𝑛𝑛

    Solving a system using an inverse

  • U Kang 14

    Invertibility

    Matrix A cannot be inverted if More rows than columns More columns than rows Redundant rows/columns (“linearly dependent”, “low

    rank”) full rank The number 0 is an eigenvalue of A

    A non-invertible matrix is called a singular matrix An invertible matrix is called non-singular

  • U Kang 15

    Linear Dependence and Span

    Linear combination of vectors {𝑣𝑣 1 , … , 𝑣𝑣 𝑛𝑛 } : ∑𝑖𝑖 𝑐𝑐𝑖𝑖𝑣𝑣(𝑖𝑖)

    The span of a set of vectors is the set of all points obtainable by linear combination of the original vectors

    Matrix-vector product Ax can be viewed as a linear combination of column vectors of A: 𝐴𝐴𝐴𝐴 = ∑𝑖𝑖 𝐴𝐴𝑖𝑖𝐴𝐴:,𝑖𝑖

    The span of columns of A is called column space or range of A

    A set of vectors is linearly independent if no vector in the set is a linear combination of the other vectors linearly dependent

  • U Kang 16

    Rank of a Matrix

    Rank of a matrix A: Number of linearly independentcolumns (or rows) of A

    For example: Matrix A = ’s rank ?

    Why?

  • U Kang 17

    Norms

    Functions that measure how “large” a vector is

    Similar to a distance between zero and the point represented by the vector

    Formally, a norm is any function f that satisfies the following:

  • U Kang 18

    Norms

    Lp norm

    Most popular norm: L2 norm, p = 2 Euclidean distance For matrix/tensor, Frobenius norm (LF) does the same thing

    L1 norm, p = 1: Called ‘Manhattan’ distance

    Max norm, infinite p:

    Dot product in terms of norm:

  • U Kang 19

    Special Matrices and Vectors

    Diagonal matrix

    2 by 2 : 𝑎𝑎𝑏𝑏

    We use diag(v) to denote the diagonal matrix where v is the vector containing the diagonal elements

    Inverse of diagonal matrix is computed easily

  • U Kang 20

    Special Matrices and Vectors

    Unit vector

    Symmetric matrix

    Orthogonal matrix

  • U Kang 21

    Eigenvector and Eigenvalue

    Eigenvector and eigenvalue of A

    Eigenvalue/eigenvector pairs are defined only for square matrix 𝐴𝐴 ∈ 𝑅𝑅𝑛𝑛×𝑛𝑛

    # of eigenvalues: n There may be duplicates

    Eigenvalues and eigenvectors can contain real or imaginary numbers

  • U Kang 22

    Intuition

    A as vector transformation

    2 11 3

    A

    10

    x

    21

    x’

    = x

    x’

    2

    1

    1

    3

    1

    0

    2

    1

  • U Kang 23

    Intuition

    By defn., eigenvectors remain parallel to themselves (‘fixed points’)

    2 11 3

    A0.520.85

    v1v1

    =0.520.853.62 *

    λ1

    2

    1

    1

    3

    0.52

    0.85

    0.52

    0.85

  • U Kang 24

    Eigendecomposition

    Eigendecomposition of a matrix Let A be a square (n x n) matrix with n linearly independent

    eigenvectors (=diagonizable) Then A can be factorized as

    where V is an (nxn) matrix whose i th column is the i theigenvector of A

  • U Kang 25

    Eigendecomposition

    Every real symmetric matrix has a real, orthogonal eigendecomposition

    A real orthogonal matrix is a square matrix with real entries whose columns and rows are orthogonal unit vectors (orthonormal vectors) 𝑄𝑄𝑇𝑇𝑄𝑄 = 𝑄𝑄 𝑄𝑄𝑇𝑇 = 𝐼𝐼 𝑄𝑄𝑇𝑇 = 𝑄𝑄−1

  • U Kang 26

    Eigendecomposition

    Interpreting matrix-vector multiplication Ax using eigendecomposition

  • U Kang 27

    Eigendecomposition

    Understanding optimization of 𝑓𝑓 𝐴𝐴 = 𝐴𝐴𝑇𝑇𝐴𝐴𝐴𝐴, s.t.| 𝐴𝐴 |2 = 1, using eigenvalues and eigenvectors

  • U Kang 28

    Eigendecomposition

    Positive definite matrix: a real symmetric matrix whose eigenvalues are all positive Positive semidefinite matrix Negative definite matrix Negative semidefinite matrix

    For positive semidefinite matrix A, ∀𝐴𝐴, 𝐴𝐴𝑇𝑇𝐴𝐴𝐴𝐴 ≥ 0. For positive definite matrix A, ∀𝐴𝐴 ≠ 0, 𝐴𝐴𝑇𝑇𝐴𝐴𝐴𝐴 > 0

  • U Kang 29

    Singular Value Decomposition (SVD)

    Similar to eigendecomposition

    More general; matrix need not be square

    𝐴𝐴 = 𝑈𝑈Λ𝑉𝑉𝑇𝑇

  • U Kang 30

    SVD - Definition

    A[n x m] = U[n x r] Λ [ r x r] (V[m x r])T

    A: n x m matrix (eg., n documents, m terms) U: n x r matrix (n documents, r concepts) Left singular vectors

    Λ: r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix)

    V: m x r matrix (m terms, r concepts) Right singular vectors

  • U Kang 31

    SVD - Definition

    r x r r x m

    n x rn x m

    A U

    Λ VT

    A[n x m] = U[n x r] Λ [ r x r] (V[m x r])T

    = x x

  • U Kang 32

    SVD - Properties

    THEOREM [Press+92]: always possible to decompose matrix A into A = U Λ VT , where

    U, Λ, V: unique (*) U, V: column orthonormal (ie., columns are unit

    vectors, orthogonal to each other) UT U = I; VT V = I (I: identity matrix)

    Λ: singular are positive, and sorted in decreasing order

  • U Kang 33

    𝐴𝐴 ≈ 𝑈𝑈Λ𝑉𝑉𝑇𝑇 = �𝑖𝑖𝜎𝜎𝑖𝑖 (𝑢𝑢𝑖𝑖𝑣𝑣𝑖𝑖𝑇𝑇)

    Best rank-k approximation in LF

    Am

    n

    𝚲𝚲m

    n

    U

    VT≈

    SVD - Properties

  • U Kang 34

    SVD and eigendecomposition

    The left-singular vectors of A are the eigenvectors of AAT

    The right-singular vectors of A are the eigenvectors of ATA

    The non-zero singular values of A are the square root of the eigenvalues of ATA

  • U Kang 35

    Moore-Penrose Pseudoinverse

    Assume we want to solve A x = y for x What if A is not invertible? What if A is not square?

    Still, we can find the ‘best’ x by using pseudoinverse

    For an (n x m) matrix A, the pseudoinverse A+ is an (m x n) matrix

  • U Kang 36

    Moore-Penrose Pseudoinverse

    If the equation has: Exactly one solution: this is the same as the inverse

    No solution: this gives us the solution with the smallest error ||Ax – y||2 Over-specified case

    Many solutions: this gives us the solution with the smallest norm of x Under-specified case

  • U Kang 37

    Pseudoinverse: Over-specified case

    No solution: this gives us the solution with the smallest error ||Ax – y||2

    [3 2]T [x] = [1 2]T (i.e., 3x = 1, 2x = 2)

    ([3 2]T)+ [1 2] = … = 7/13 This method is called the ‘least square’

    1 2 3 4

    1

    2reachable points (3x, 2x)

    desirable point y

  • U Kang 38

    Pseudoinverse: Under-specified case

    Many solutions: this gives us the solution with the smallest norm of x

    [1 2] [w z]T = 4 (i.e., 1 w + 2 z = 4)

    [1 2]+ 4 = [1/5 2/5]T 4 = [4/5 8/5]

    1 2 3 4

    1

    2 all possible solutionsx0

    w

    z shortest-length solution

  • U Kang 39

    Computing the Pseudoinverse

    The SVD allows the computation of the pseudoinverse:

  • U Kang 40

    Trace

    𝑇𝑇𝑇𝑇 𝐴𝐴 = ∑𝑖𝑖 𝐴𝐴𝑖𝑖,𝑖𝑖 𝑇𝑇𝑇𝑇 𝐴𝐴𝐴𝐴𝐴𝐴 = 𝑇𝑇𝑇𝑇 𝐴𝐴𝐴𝐴𝐴𝐴 = 𝑇𝑇𝑇𝑇(𝐴𝐴𝐴𝐴𝐴𝐴) 𝑇𝑇𝑇𝑇 𝐴𝐴 = 𝑇𝑇𝑇𝑇(𝐴𝐴𝑇𝑇)

    | 𝐴𝐴 |𝐹𝐹 = 𝑇𝑇𝑇𝑇(𝐴𝐴𝐴𝐴𝑇𝑇) For scalar a, 𝑇𝑇𝑇𝑇 𝑎𝑎 = 𝑎𝑎

    The trace of a matrix is also computed by the sum of all eigenvalues

  • U Kang 41

    What you need to know

    Linear algebra is crucial for understanding notations and mechanisms of many ML/deep learning methods

    Important concepts Matrix product, identity, invertibility, norm,

    eigendecomposition, singular value decomposition, pseudo inverse, and trace

  • U Kang 42

    Questions?

    슬라이드 번호 1In This LectureScalarVectorsMatricesTensorsMatrix TransposeMatrix ProductMatrix ProductIdentity MatrixSystems of EquationsSolving Systems of EquationsMatrix InversionInvertibilityLinear Dependence and SpanRank of a MatrixNormsNormsSpecial Matrices and VectorsSpecial Matrices and VectorsEigenvector and EigenvalueIntuitionIntuitionEigendecompositionEigendecompositionEigendecompositionEigendecompositionEigendecompositionSingular Value Decomposition (SVD)SVD - DefinitionSVD - DefinitionSVD - Properties슬라이드 번호 33SVD and eigendecompositionMoore-Penrose PseudoinverseMoore-Penrose PseudoinversePseudoinverse: Over-specified casePseudoinverse: Under-specified caseComputing the PseudoinverseTraceWhat you need to know슬라이드 번호 42