introduction to machine learning - cse · introduction machine learning is tool to extend human...

22
Introduction to Machine Learning Linear Regression Bhaskar Mukhoty, Shivam Bansal Indian Institute of Techonology Kanpur Summer School 2019 May 29, 2019 Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Techonology Kanpur Summer School 2019 ) Introduction to Machine Learning May 29, 2019 1 / 22

Upload: others

Post on 20-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Introduction to Machine LearningLinear Regression

Bhaskar Mukhoty, Shivam Bansal

Indian Institute of Techonology KanpurSummer School 2019

May 29, 2019

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Techonology Kanpur Summer School 2019 )Introduction to Machine Learning May 29, 2019 1 / 22

Introduction

Machine learning is tool to extend human intelligence.

Problems involving large amounts of real time data needs assistanceof computers to process. eg. Astronomical / Sub-particle level data,Fraud detection

It is used for automating tasks. eg. self driving cars

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Techonology Kanpur Summer School 2019 )Introduction to Machine Learning May 29, 2019 2 / 22

Future Applications

Potential future uses in Law.

Fun to see top players getting defeated by simple algorithms. eg. Go

Health Care.

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Techonology Kanpur Summer School 2019 )Introduction to Machine Learning May 29, 2019 3 / 22

Course Goals

We will focus on theoretical understanding of common machinelearning algorithms.

Mathematical tools needed would be introduced.

Students are encouraged to ask questions, to clarify even simplestdoubts.

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Techonology Kanpur Summer School 2019 )Introduction to Machine Learning May 29, 2019 4 / 22

Course Policy

Attendance will be taken in the class.

Passing an 1 hour written exam is required for successful completionof the course.

Course website will be:https://www.cse.iitk.ac.in/users/bhaskarm/IntroToML.html

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Techonology Kanpur Summer School 2019 )Introduction to Machine Learning May 29, 2019 5 / 22

Lecture Outline

System of Linear equations.

Overdetermined system with one or no solution.

Addressing the case for no solution.

The Ordinary Least Square Regression.

The Maximum Likelihood Estimate and Consistency.

The under-determined system, rank deficiency.

The Ridge Regression.

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Techonology Kanpur Summer School 2019 )Introduction to Machine Learning May 29, 2019 6 / 22

System of linear equations

Given X ∈ Rn×d , y ∈ Rn and n ≥ d :

Xx11 x12 .. x1dx21 x22 .. x2d··

xn1 xn2 .. xnd

ww1

w2

·wd

=

yy1y2··yn

{yi = xi

>w}ni=1

Question: find w ∈ Rd

Gaussian elimination Figure: Over-determined system

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Techonology Kanpur Summer School 2019 )Introduction to Machine Learning May 29, 2019 7 / 22

Example: An Over Determined System

w2 + w3 = y1

w2 + w4 = y2

w1 + w2 = y3

w3 + w4 = y4

w1 + w3 = y5

w1 + w4 = y6

0 1 1 00 1 0 11 1 0 00 0 1 11 0 1 01 0 0 1

w1

w2

w3

w4

=

y1y2y3y4y5y6

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Techonology Kanpur Summer School 2019 )Introduction to Machine Learning May 29, 2019 8 / 22

Example: An Over Determined System

0 1 1 0 y10 1 0 1 y21 1 0 0 y30 0 1 1 y41 0 1 0 y51 0 0 1 y6

1 0 0 0 −12 y2 + y3 + 1

2 y4 −12 y1

0 1 0 0 12 y2 −

12 y4 + 1

2 y10 0 1 0 −1

2 y2 + 12 y4 + 1

2 y10 0 0 1 1

2 y2 + 12 y4 −

12 y1

0 0 0 0 y2 + y5 − y3 − y40 0 0 0 −y3 − y4 + y1 + r2

One solution vs. no solution:

The system of equations has a solution only if:

y2 + y5 − y3 − y4 = 0

−y3 − y4 + y1 + y2 = 0

https://math.stackexchange.com/questions/1860348/solve-an-overdetermined-system-of-linear-equations

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Techonology Kanpur Summer School 2019 )Introduction to Machine Learning May 29, 2019 9 / 22

The No Solution Regime

There exists no model w such that:

X−2−113

[w ] =

Xw∗−2−113

+

e0

+0.10−0.2

=

y−2−0.9

12.8

What to do now?

Minimize the squared loss

Figure: Linear system with nosolution

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Techonology Kanpur Summer School 2019 )Introduction to Machine Learning May 29, 2019 10 / 22

The squared loss

L(w) =n∑

i=1

(yi−xi>w)2 = ‖y − Xw‖2

Penalizes higher residuals more.

Symmetric loss function.

Can be derived from Gaussianerror.

y−2−0.9

12.8

−X−2−113

[w ]

Figure: Squared Loss

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Techonology Kanpur Summer School 2019 )Introduction to Machine Learning May 29, 2019 11 / 22

The squared loss: continued

L(w) = (−2 + 2w)2 + (−0.9 + w)2 + (1− w)2 + (2.8− 3w)2

dL

dw= 4(−2 + 2w) + 2(−0.9 + w)− 2(1− w)− 6(2.8− 3w)

equating,dL

dw= 0 we have, w = 0.9

X−2−113

w[0.9]=

ypred−1.8−.90.92.7

'y−2−0.9

12.8

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Techonology Kanpur Summer School 2019 )Introduction to Machine Learning May 29, 2019 12 / 22

Ordinary Least Squares Regression

L(w) =n∑

i=1

(yi − xi>w)2 = ‖y − Xw‖2

L(w) = (y − Xw)>(y − Xw)

= [y>y − 2w>X>y + w>X>Xw]

dL

dw= −2X>y + 2X>Xw = 0

The Least Square Estimate

wOLS = arg minw∈Rd

L(w) = (XTX )−1XTy

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Techonology Kanpur Summer School 2019 )Introduction to Machine Learning May 29, 2019 13 / 22

The Gaussian Distribution

p(yi ;µ, σ2) =

1√2πσ2

exp

(−(yi − µ)2

2σ2

)Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Techonology Kanpur Summer School 2019 )Introduction to Machine Learning May 29, 2019 14 / 22

The Assumption of Independent Errors

Assuming ei ∼ N (0, σ2),

yi = xi>w∗ + ei

p(yi |xi>w, σ2) =1√

2πσ2exp

(−(yi − xi

>w)2

2σ2

)Our objective is to maximize:

p(y|Xw, σ2In) =n∏

i=1

p(yi |xi>w, σ2)

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Techonology Kanpur Summer School 2019 )Introduction to Machine Learning May 29, 2019 15 / 22

The maximum likelihood estimate

wMLE = arg maxw

n∏i=1

p(yi |xi>w, σ2)

= arg maxw

logn∏

i=1

p(yi |xi>w, σ2)

= arg minw−log

n∏i=1

p(yi |xi>w, σ2)

= arg minw

NLL(w) NLL: Negative Log Likelihood

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Techonology Kanpur Summer School 2019 )Introduction to Machine Learning May 29, 2019 16 / 22

The equivalence of MLE and OLS

NLL(w) = −logn∏

i=1

p(yi |xi>w, σ2) = −n∑

i=1

log p(yi |xi>w, σ2)

= −n∑

i=1

log

(1√

2πσ2exp

(−(yi − xi

>w)2

2σ2

))

=n∑

i=1

log(√

2πσ2)

+n∑

i=1

(yi − xi>w)2

2σ2

wMLE = arg minw

NLL(w) = arg minw

n∑i=1

(yi − xi>w)2 = wOLS

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Techonology Kanpur Summer School 2019 )Introduction to Machine Learning May 29, 2019 17 / 22

Consistency of MLE

y = Xw∗ + e for unknown w∗ ∈ Rd

wMLE = (XTX )−1XTy

= (XTX )−1XT (Xw∗ + e)

= w∗ + (XTX )−1XTe

‖wMLE −w∗‖2 =∥∥∥(XTX )−1XTe

∥∥∥2

= O(σ

√d

n)

Consistency

With more data points, we get closer to w∗

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Techonology Kanpur Summer School 2019 )Introduction to Machine Learning May 29, 2019 18 / 22

Consistency of MLE - Implementation

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Techonology Kanpur Summer School 2019 )Introduction to Machine Learning May 29, 2019 19 / 22

Drawback of OLS

We know that: wOLS = arg minw∈Rd

L(w) = (XTX )−1XTy

Fact

If rank(A) = p and rank(B) = q then rank(AB) = min(p, q)

If n < d , X>X has rank less than d

That is X>X is not invertible.

Solution

Regularization.

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Techonology Kanpur Summer School 2019 )Introduction to Machine Learning May 29, 2019 20 / 22

Ridge Regression

wridge = arg minw∈Rd{‖y − Xw‖2 + λ‖w‖2}

Exercise:

Show that wridge = (XTX + λId)−1XTy

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Techonology Kanpur Summer School 2019 )Introduction to Machine Learning May 29, 2019 21 / 22

Questions?

Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Techonology Kanpur Summer School 2019 )Introduction to Machine Learning May 29, 2019 22 / 22