cs231a ps1 problemsection

21
CS 231A Section 2 CS 231A Section 2 Math background for PS1 Jiahui Shi Jiahui Shi Section 2 - Jiahui Shi 2011.10.7 1

Upload: others

Post on 22-Jan-2022

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS231A PS1 ProblemSection

CS 231A Section 2CS 231A Section 2Math background for PS1

Jiahui ShiJiahui Shi

Section 2 -Jiahui Shi 2011.10.71

Page 2: CS231A PS1 ProblemSection

AnnouncementAnnouncement

• Updated deadline of PS1:Updated deadline of PS1:

10/14, 12 noon

• Extra office hour for PS1:Extra office hour for PS1:

10/13, 7‐9pm, Gates 104

Section 2 -Jiahui Shi 2011.10.72

Page 3: CS231A PS1 ProblemSection

OverviewOverview

• Maximizing and Minimizing quadratic forms

• Least squaresLeast squares

• Linear filtering

• Maximum likelihood estimation

Section 2 -Jiahui Shi 2011.10.73

Page 4: CS231A PS1 ProblemSection

Quadratic FormsQuadratic Forms

• Quadratic forms look like (A is symmetric),

or if we have a 2x2 matrix

Section 2 -Jiahui Shi 2011.10.74

Page 5: CS231A PS1 ProblemSection

Quadratic Forms in OptimizationQuadratic Forms in Optimization

ill l k h i i i bl f• We will look at the optimization problem for a symmetric matrix, this will be useful for PS1

• This can be solved using the eigenvector g gcorresponding to the smallest eigenvalue.

• Similarly the eigenvector of the largest• Similarly, the eigenvector of the largest eigenvalue will give the max value.

Section 2 -Jiahui Shi 2011.10.75

Page 6: CS231A PS1 ProblemSection

Optimizing Quadratic FormsOptimizing Quadratic Forms

• Formulate the Lagrangian

k h i l i h• Now we take the partial with respect to x

Section 2 -Jiahui Shi 2011.10.76

Page 7: CS231A PS1 ProblemSection

Optimizing Quadratic FormsOptimizing Quadratic Forms

• Using this condition we know that our optimal x’smust satisfy

• Our possible optimizing vectors are just theOur possible optimizing vectors are just the eigenvectors of

Pl i i• Plugging in:

• The constraint on the matrix is pretty stringent.  The matrix must be square, and symmetric.

Section 2 -Jiahui Shi 2011.10.77

Page 8: CS231A PS1 ProblemSection

Linear RegressionLinear Regression

• Fit a linear model to data

• PS1 ‐ use least squares approach to fit a modelPS1  use least squares approach to fit a model to our data with minimum total error.

Section 2 -Jiahui Shi 2011.10.78

Page 9: CS231A PS1 ProblemSection

Least SquaresLeast Squares

• Given A and y, find optimal x, s.t. Ax = y.• Residual is defined asResidual is defined as

A A i ki (# # lAssume A is skinny (#rows> #columns, or more equations than unknowns) and full rank.

• We want to minimize the squared 2‐norm of residual:

Section 2 -Jiahui Shi 2011.10.79

Page 10: CS231A PS1 ProblemSection

Least SquaresLeast Squares

• To minimize 

take the gradient with respect to x and set equal to zeroequal to zero

solvingsolving

Section 2 -Jiahui Shi 2011.10.710

Page 11: CS231A PS1 ProblemSection

Geometric ApproachGeometric Approach

• Pick         by

thus we have

since     is skinny (#rows > #columns) and full rank we can write 

Section 2 -Jiahui Shi 2011.10.711

Page 12: CS231A PS1 ProblemSection

Least Squares SolutionLeast Squares Solution

• When there is an optimization problem of the form

we can immediately write down the solutionwe can immediately write down the solution,

• Trick is to get it in that form, to better illustrate this lets see an exampleillustrate this lets see an example 

Section 2 -Jiahui Shi 2011.10.712

Page 13: CS231A PS1 ProblemSection

Example ProblemExample Problem

Section 2 -Jiahui Shi 2011.10.713

Page 14: CS231A PS1 ProblemSection

Example ProblemExample Problem

• Solution:

Section 2 -Jiahui Shi 2011.10.714

Page 15: CS231A PS1 ProblemSection

Linear FilteringLinear Filtering

• Remember that convolution is linear

• Convolution properties: commutative,Convolution properties: commutative, associative, distributive, shift‐invariance

A l• As a result:

Section 2 -Jiahui Shi 2011.10.715

Page 16: CS231A PS1 ProblemSection

Linear FilteringLinear Filtering

• Trig identities are very useful for the problem set and analyzing linear filters in generaly g g– Useful for the problem set:

Section 2 -Jiahui Shi 2011.10.716

Page 17: CS231A PS1 ProblemSection

Likelihood functionLikelihood function

• You have a model which is parameterized bywhich characterizes the probability of each of the  outputs

• This model is also function of the     observed input data        

• The model is given by                         , which reads: the e ode s g e by , c eads eprobability of        given        and parameterized by       

• Assuming all our data is i i d we can write theAssuming all our data is i.i.d. we can write the probability of all our data as

Section 2 -Jiahui Shi 2011.10.717

Page 18: CS231A PS1 ProblemSection

Likelihood functionLikelihood function

• This distribution describes the probability of the output data given our input data and the model

• We want to find the model    , to make predictions in the future

• To find our model we use a likelihood function

• The likelihood is defined as• The likelihood is defined as

which is a function of

Section 2 -Jiahui Shi 2011.10.718

Page 19: CS231A PS1 ProblemSection

Maximum likelihood estimationMaximum likelihood estimation

• Once we have          we would like to maximize it over 

• This is known as maximum likelihood (ML) estimationestimation

• To make maximization easier and still solving an equivalent optimization problem we will maximize , instead of trying to directlymaximize            , instead of trying to directly maximize  

Section 2 -Jiahui Shi 2011.10.719

Page 20: CS231A PS1 ProblemSection

Maximum a posteriori (MAP)Maximum a posteriori (MAP)

• Very similar to maximum likelihood, but now we view the our model    as a random variable

• This allows us to put some prior distribution over what instances    can take on

• With this prior we can write our the probability of our data asou da a as

• We can maximize the same way as maximumWe can maximize the same way as maximum likelihood, in fact maximum likelihood is a MAP problem with a uniform prior

Section 2 -Jiahui Shi

problem with a uniform prior   

2011.10.720

Page 21: CS231A PS1 ProblemSection

Solving MAP and ML estimateSolving MAP and ML estimate

i d• Write down

• Now take the log

• Now maximize by taking the gradient and setting equal to zerosetting equal to zero

Section 2 -Jiahui Shi 2011.10.721