dm week01 linreg.handout

Christof MonzInformatics Institute

University of Amsterdam

Data MiningWeek 1: Linear Regression

Outline

Christof MonzData Mining - Week 1: Linear Regression

1

I Plotting real-valued predictionsI Linear regressionI Error function

Linear Regression


2

I Predict real-values (as opposed to discreteclasses)

I Simple machine learning prediction taskI Assumes linear correlation between data and

target values

Scatter Plots


3

10 15 20 25 30 35 40 45

1015

2025

3035

40

x

y

Linear Regression


4

I Find the line that approximates the data asclosely as possible

I y = a + b · xwhere b is the slope, and a is the y-intercept

I a and b should be chosen such that theyminimize the difference between the predictedvalues and the values in the training data

Error Functions


5

I There are a number of ways to define an errorfunction

I Sum of absolute errors = ∑i∈D|yi− (a + bxi)|

I Sum of squared errors = ∑i∈D

(yi− (a + bxi))2

where yi is the true valueI Squared error is most commonly usedI Task: Find the parameters a and b that

minimize the squared error over the trainingdata

Error Functions


6

I Normalized error functions:

I Mean squared error = ∑i∈D

(yi−(a+bxi))2

|D|

I Relative squared error = ∑i∈D(yi−(a+bxi))2

∑i∈D(yi−y)2

where y = 1|D|∑i∈D yi

I Root relative squared error =√

∑i∈D(yi−(a+bxi))2

∑i∈D(yi−y)2

Minimizing Error Functions


7

I There are roughly two ways:• Try different parameter instantiations and see which

ones lead to the lowest error (search)

• Solve mathematically (closed form)

I Most parameter estimation problems in machinelearning can only be solved by searching

I For linear regression, we can solve itmathematically

Minimizing SSE


8

I SSE = ∑i∈D

(yi− (a + bxi))2

I Take the partial derivatives with respect to aand b

I Set each partial derivative equal to zero andsolve for a and b respectively

I The resulting values for a and b minimize theerror rate and can be used to predict unseendata instances

Applying Linear Regression


9

I For a given training set we first compute b:

b = |D|∑i∈D xiyi−∑i∈D xi ∑i∈D yi

|D|∑i∈D x2i −(∑i∈D xi)2

I and then a, using the value computed for b:a = y−bx

I For any new instances x ′ (i.e. instances thatwere not in the training set), the predicted valueis: a + bx ′

I Extendible to multi-valued functions

Linear Regression


10

I Used to predict real-number values, givennumerical input variables

I Parameters can be estimated analytically (i.e.by applying some mathematics), which won’t bethe case for most parameter estimationalgorithms we’ll see later on

I Extendible to non-linear functions, e.g.log-linear regression

Correlation


11

I So far we have used linear regression to predicttarget values (prediction)

I Linear regression can also be used to determinehow closely to variables are correlated(description)

I The smaller the error rate, the stronger thecorrelation between the variables

I Correlation does mean that there is some(interesting relation) between variables (notnecessarily causal)

Recap


12

I Linear regressionI Error ratesI Analytical parameter estimation

dm week01 linreg.handout

Education