dm week01 linreg.handout
TRANSCRIPT
Christof MonzInformatics Institute
University of Amsterdam
Data MiningWeek 1: Linear Regression
Outline
Christof MonzData Mining - Week 1: Linear Regression
1
I Plotting real-valued predictionsI Linear regressionI Error function
Linear Regression
Christof MonzData Mining - Week 1: Linear Regression
2
I Predict real-values (as opposed to discreteclasses)
I Simple machine learning prediction taskI Assumes linear correlation between data and
target values
Scatter Plots
Christof MonzData Mining - Week 1: Linear Regression
3
10 15 20 25 30 35 40 45
1015
2025
3035
40
x
y
Linear Regression
Christof MonzData Mining - Week 1: Linear Regression
4
I Find the line that approximates the data asclosely as possible
I y = a + b · xwhere b is the slope, and a is the y-intercept
I a and b should be chosen such that theyminimize the difference between the predictedvalues and the values in the training data
Error Functions
Christof MonzData Mining - Week 1: Linear Regression
5
I There are a number of ways to define an errorfunction
I Sum of absolute errors = ∑i∈D|yi− (a + bxi)|
I Sum of squared errors = ∑i∈D
(yi− (a + bxi))2
where yi is the true valueI Squared error is most commonly usedI Task: Find the parameters a and b that
minimize the squared error over the trainingdata
Error Functions
Christof MonzData Mining - Week 1: Linear Regression
6
I Normalized error functions:
I Mean squared error = ∑i∈D
(yi−(a+bxi))2
|D|
I Relative squared error = ∑i∈D(yi−(a+bxi))2
∑i∈D(yi−y)2
where y = 1|D|∑i∈D yi
I Root relative squared error =√
∑i∈D(yi−(a+bxi))2
∑i∈D(yi−y)2
Minimizing Error Functions
Christof MonzData Mining - Week 1: Linear Regression
7
I There are roughly two ways:• Try different parameter instantiations and see which
ones lead to the lowest error (search)
• Solve mathematically (closed form)
I Most parameter estimation problems in machinelearning can only be solved by searching
I For linear regression, we can solve itmathematically
Minimizing SSE
Christof MonzData Mining - Week 1: Linear Regression
8
I SSE = ∑i∈D
(yi− (a + bxi))2
I Take the partial derivatives with respect to aand b
I Set each partial derivative equal to zero andsolve for a and b respectively
I The resulting values for a and b minimize theerror rate and can be used to predict unseendata instances
Applying Linear Regression
Christof MonzData Mining - Week 1: Linear Regression
9
I For a given training set we first compute b:
b = |D|∑i∈D xiyi−∑i∈D xi ∑i∈D yi
|D|∑i∈D x2i −(∑i∈D xi)2
I and then a, using the value computed for b:a = y−bx
I For any new instances x ′ (i.e. instances thatwere not in the training set), the predicted valueis: a + bx ′
I Extendible to multi-valued functions
Linear Regression
Christof MonzData Mining - Week 1: Linear Regression
10
I Used to predict real-number values, givennumerical input variables
I Parameters can be estimated analytically (i.e.by applying some mathematics), which won’t bethe case for most parameter estimationalgorithms we’ll see later on
I Extendible to non-linear functions, e.g.log-linear regression
Correlation
Christof MonzData Mining - Week 1: Linear Regression
11
I So far we have used linear regression to predicttarget values (prediction)
I Linear regression can also be used to determinehow closely to variables are correlated(description)
I The smaller the error rate, the stronger thecorrelation between the variables
I Correlation does mean that there is some(interesting relation) between variables (notnecessarily causal)
Recap
Christof MonzData Mining - Week 1: Linear Regression
12
I Linear regressionI Error ratesI Analytical parameter estimation