computational intelligence sew · why linear regression? • simplest machine learning algorithm...
TRANSCRIPT
![Page 1: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/1.jpg)
COMPUTATIONAL
INTELLIGENCE SEW(INTRODUCTION TO MACHINE LEARNING) SS18
Lecture 2:
• Linear Regression
• Gradient Descent
• Non-linear basis functions
![Page 2: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/2.jpg)
Practical of Friday rescheduled to Monday
11:00 to 12:00 and
12:00 to 13:00
News group:
tu-graz.lv.ew
![Page 3: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/3.jpg)
LINEAR REGRESSION
MOTIVATION
![Page 4: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/4.jpg)
Why Linear Regression?
• Simplest machine learning algorithm for regression• Widely used in biological, behavioural and social sciences to describe
and to extract relationships between variables from data
• Prediction of real-valued outputs
• Easy to implement, fast to execute
• Benchmark algorithm for comparison with more complex algorithms
• Introduction to notation and concepts that we will need again later in
the course• Data format, vector & matrix notation
• Learning from data by minimizing a cost function
• Gradient descent
• Non-linear features and basis functions• Preparation for neural networks
![Page 5: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/5.jpg)
Applications of (linear) regression
• Brain computer interfaces
• https://www.youtube.com/watch?v=Ae6En8-eaww
• Neuroprosthetic control
• https://www.youtube.com/watch?v=X_AI4MiY6L4
![Page 6: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/6.jpg)
LINEAR REGRESSION
WITH ONE INPUT
![Page 7: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/7.jpg)
A regression problem• We want to learn to predict a person’s height based on his/her
knee height and/or arm span
• This is useful for patients who are bed bound or in a wheelchair
and cannot stand to take an accurate measurement of their height
Knee
Height
[cm]
Arm
span
[cm]
Height
[cm]
50 166 171
56 172 175
52 174 168
… … …
![Page 8: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/8.jpg)
Linear regression with one input
…
Learning algorithm
„Hypothesis“
hx
Training set
Hypothesis
Parameters
Test input
Prediction
45 50 55 60170
175
180
185
190
knee height
body h
eig
ht
?
?
![Page 9: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/9.jpg)
Example Data
Knee
height
[cm]
Arm
span
[cm]
Height
[cm]
50 166 171
56 172 175
52 174 168
… … …
45 50 55 60170
175
180
185
190
knee height
body h
eig
ht
160 165 170 175 180 185 190170
175
180
185
190
armspan
body h
eig
ht
m=30 data points
![Page 10: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/10.jpg)
Example Data
4550
5560
160
180
200170
175
180
185
190
knee heightarmspan
body h
eig
ht
Knee
Height
[cm]
Arm
span
[cm]
Height
[cm]
50 166 171
56 172 175
52 174 168
… … …
![Page 11: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/11.jpg)
Linear regression with one input
45 50 55 60170
175
180
185
190
knee height
body h
eig
ht
Knee
Height
[cm]
Height
[cm]
50 171
56 175
52 168
… …
HypothesisParameters ?
Which hypothesis is better?
In what sense is it better?
![Page 12: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/12.jpg)
Formalization of problem
• Given m training examples
• Goal: learn parameters
such that
for all training examples i=1…30.
…
Knee
Height
[cm]
Height
[cm]
50 171
56 175
52 168
… …
m=30 data points
45 50 55 60170
175
180
185
190
knee height
body h
eig
ht
![Page 13: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/13.jpg)
45 50 55 60170
175
180
185
190
knee height
body h
eig
ht
Least Squares Objective
• Minimize Error
0.6
150
![Page 14: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/14.jpg)
Least Squares Objective
• Minimize Error
45 50 55 60170
175
180
185
190
knee height
body h
eig
ht
10.77
0.6
150
cost function mean squared error
![Page 15: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/15.jpg)
45 50 55 60170
175
180
185
190
knee height
body h
eig
ht
Least Squares Objective
• Minimize Error
5.94
0.75
140
cost function mean squared error
![Page 16: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/16.jpg)
Cost function illustrated
Properties of cost function:
• Quadratic function
• „Bowl“-shaped
• Unique local and global
minimum (under
„regular“ conditions)
45 50 55 60170
175
180
185
190
knee height
body h
eig
ht
10.77
45 50 55 60170
175
180
185
190
knee height
body h
eig
ht
5.94
![Page 17: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/17.jpg)
Minimizing the cost
• Two ways to find the parameters
minimizing
• Gradient descent
• Direct analytical solution
(setting derivatives = 0)
![Page 18: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/18.jpg)
Recall: Functions of multiple variables
• Example:
• Partial derivatives
• Gradients vectors formed with the partial derivatives (fundamental in lecture 2)
• Chain rule (fundamental for neural networks in lecture 4)
• Function of multiple variable with high dimensional values
• Jacobian matrix formed with the partial derivatives
![Page 19: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/19.jpg)
GRADIENT DESCENT
![Page 20: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/20.jpg)
Descending in the steepest directionGradient descent on some arbitrary cost function …
![Page 21: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/21.jpg)
learning rate („eta“)
Gradient descent algorithm
• Repeat until convergence
(simultaneously updating
and )
partial derivative of
with respect to
negative gradient =
descent
![Page 22: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/22.jpg)
Gradient is orthogonal to contour
lines
-2-1
01
2 -2
-1
0
1
20
0.5
1
1.5
2
2.5
3
3.5
4
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
A contour line
is a line along which
= const
![Page 23: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/23.jpg)
Potential issues with gradient descent
• May get stuck in local minima
• Learning rate too small: slow
convergence
• Learning rate too large: oscillations,
divergence
too small too large
![Page 24: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/24.jpg)
LINEAR REGRESSION
WITH GRADIENT
DESCENT(ONE INPUT)
![Page 25: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/25.jpg)
Application of gradient descent
• Linear regression cost • Gradient descent
(simultaneous update)
(simultaneous
update)
“error” “input”
”learning rate”
![Page 26: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/26.jpg)
Predicting height from knee height
• Optimal fit to training data
45 50 55 60170
175
180
185
190
knee height
body h
eig
ht
0.8
137.4
![Page 27: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/27.jpg)
LINEAR REGRESSIONMORE GENERAL FORMULATION: MULTIPLE FEATURES
![Page 28: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/28.jpg)
Multiple inputs (features)
• Notation:
… number of training examples
… number of features
… input features of i‘th training example (vector-valued)
…. value of feature j in i‘th training example
Knee
Height
x1
Arm
span
x2
Age
x3
Height
y
50 166 32 171
56 172 17 175
52 174 62 168
… … … …
= 3
=
56
172
17
= 17
![Page 29: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/29.jpg)
Linear hypothesis
• Hypothesis (one input):
• Hypothesis (multiple input features):
• More compact notation:
Example: h(x) = 50 + 0.5*kneeheight + 0.3*armspan + 0.1*age
Introduce
Why? Notation convenience!
![Page 30: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/30.jpg)
Multiple inputs (features) revisited
• Notation:
… number of training examples
… number of features
… input features of i‘th training example (vector-valued)
…. value of feature j in i‘th training example
x0
Knee
Height
x1
Arm
span
x2
Age
x3
Height
y
1 50 166 32 171
1 56 172 17 175
1 52 174 62 168
1 … … … …
= 3
=
1
56
172
17
= 17
= 1
![Page 31: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/31.jpg)
Matrix and vector notation
x0
Knee
Height
x1
Arm
span
x2
Age
x3
Height
y
1 50 166 32 171
1 56 172 17 175
1 52 174 62 168
(n+1) ˟ 1 m ˟ (n+1) m ˟ 1
design matrixfeatures of i‘th training example output/target vector
![Page 32: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/32.jpg)
Matrix and vector notation
x0
Knee
Height
x1
Arm
span
x2
Age
x3
Height
y
1 50 166 32 171
1 56 172 17 175
1 52 174 62 168
𝐻 𝜽 = 𝑋𝜽
![Page 33: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/33.jpg)
LINEAR REGRESSION
WITH GRADIENT
DESCENT(GENERAL FORMULATION)
![Page 34: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/34.jpg)
Linear regression problem statement
• Hypothesis:
• Cost function:
Goal is to find parameters which minimize the cost
high-dimensional quadratic
(„bowl“-shaped) function
![Page 35: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/35.jpg)
Gradient descent (multiple features)
(simultaneous
update for
j=0…n)
For j = 0: define for convenience
with one input feature:
with n input features:
(simultaneous
update)
“error”
“error”
“input”
“input””learning rate”
”learning rate”
![Page 36: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/36.jpg)
LINEAR REGRESSION
ANALYTICAL SOLUTION
![Page 37: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/37.jpg)
Analytical solution
… design matrix
… output/target vector
• Set all partial derivatives of cost
function = 0
• Solving system of linear
equations yields:
Moore-Penrose Pseudoinverse of
• Note: This analytical solution requires that columns of are linearly
independent („regular“ conditions)
![Page 38: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/38.jpg)
Example: analytical solution applied
to problem with one input
Knee
Height
[cm]
Height
[cm]
50 171
56 175
52 168
… …
45 50 55 60170
175
180
185
190
knee height
body h
eig
ht
![Page 39: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/39.jpg)
Example: analytical solution applied
to problem with one input
Knee
Height
[cm]
Height
[cm]
50 171
56 175
52 168
… … 30 ˟ 2 30 ˟ 1
2 ˟ 2
2 ˟ 2
2 ˟ 1
![Page 40: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/40.jpg)
Predicting height from knee height
45 50 55 60170
175
180
185
190
knee height
body h
eig
ht
0.8
137.4
![Page 41: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/41.jpg)
Gradient descent Analytical solution
• Need to choose learning
rate
• Iterative algorithm (needs
many iterations to
converge)
• Works well even when
number of input features
is large
• No need to choose
• Direct solution (no
iteration)
• Slow if is too large
(inverting n x n matrix)
![Page 42: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/42.jpg)
NON-LINEAR FEATURES(NON-LINEAR BASIS FUNCTIONS)
![Page 43: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/43.jpg)
Non-linear trends in data
x y
0.01 -0.27
-1.22 2.63
0.17 -0.13
… …
-4 -3 -2 -1 0 1 2 3-2
0
2
4
6
8
10
12
14
16
-4 -3 -2 -1 0 1 2 3-2
0
2
4
6
8
10
12
14
16
• How can we learn non-linear hypotheses?
?
? ? ?
![Page 44: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/44.jpg)
Linear fit to this “non-linear” data
x y
0.01 -0.27
-1.22 2.63
0.17 -0.13
… …
standard design matrix
Hypothesis:
Optimal parameters:
![Page 45: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/45.jpg)
Linear fit to this “non-linear” data
-4 -3 -2 -1 0 1 2 3-2
0
2
4
6
8
10
12
14
16
![Page 46: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/46.jpg)
Non-linear (quadratic) fit
x y
0.01 -0.27
-1.22 2.63
0.17 -0.13
… …
design matrix with
non-linear features
Hypothesis:
Optimal parameters:
![Page 47: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/47.jpg)
Non-linear (quadratic) fit
-4 -3 -2 -1 0 1 2 3-2
0
2
4
6
8
10
12
14
16
![Page 48: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/48.jpg)
Non-linear (sinusoid) fit
x y
0.01 -0.27
-1.22 2.63
0.17 -0.13
… …
design matrix with
non-linear features
Hypothesis:
Optimal parameters:
![Page 49: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/49.jpg)
Non-linear (sinusoidal) fit
-4 -3 -2 -1 0 1 2 3-2
0
2
4
6
8
10
12
14
16
![Page 50: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/50.jpg)
Non-linear input features (in general)
• Feature 2 for each training example i is computed by applying a
non-linear basis function:
• Allows to learn a variety of non-linear functions with the same technique(s):• Analytical or gradient descent
all features of
1st training example
feature 2 of all training examples
![Page 51: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/51.jpg)
Polynomial regression• Features are powers of x
n = degree of polynome
to be learned
n=0 n=1
n=3 n=9
What happened here?
Next lecture…
![Page 52: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/52.jpg)
Radial basis functions
• „Gaussian“-shaped RBFs:• Each basis function j has a center in the input space
• The width of the basis functions is determined by
-6 -4 -2 0 2 4 6 80
0.2
0.4
0.6
0.8
1
x
![Page 53: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/53.jpg)
-6 -4 -2 0 2 4 6 80
0.2
0.4
0.6
0.8
1
x
Radial basis functions
• „Gaussian“-shaped RBFs:• Each basis function j has a center in the input space
• The width of the basis functions is determined by
![Page 54: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/54.jpg)
-6 -4 -2 0 2 4 6 80
0.2
0.4
0.6
0.8
1
x
Radial basis functions
• „Gaussian“-shaped RBFs:• Each basis function j has a center in the input space
• The width of the basis functions is determined by
![Page 55: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/55.jpg)
Fitting a single RBF to data
-4 -2 0 2 4 6-2
0
2
4
6
8
10
12
14
16
RBF with
-4 -2 0 2 4 6-2
0
2
4
6
8
10
12
14
16
![Page 56: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/56.jpg)
-4 -2 0 2 4 60
0.2
0.4
0.6
0.8
1
-4 -2 0 2 4 6-15
-10
-5
0
Fitting RBFs to data
-4 -2 0 2 4 6-2
0
2
4
6
8
10
12
14
16
-4 -2 0 2 4 6-2
0
2
4
6
8
10
12
14
16
RBFs with
![Page 57: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/57.jpg)
SUMMARY (QUESTIONS)
![Page 58: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/58.jpg)
Some questions…
• Hypothesis for linear regression = ?
• Cost function for linear regression = ?
• How many local minima may the cost function for lin. reg. have (under
regular conditions)?
• Name two ways to minimize the cost function?
• General gradient descent formula?
• How is Linear regression with gradient descent solved?
• What issues can arise during gradient descent?
• What is the design matrix? What are its dimensions?
• Analytical solution for linear regression = ?• What are the components of the solution?
• Pros and Cons of gradient descent vs. analytical solution?
• How can one learn non-linear hypotheses with linear regression?
• What is polynomial regression?
• What are radial basis functions?
![Page 59: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe](https://reader034.vdocuments.us/reader034/viewer/2022050206/5f58ef4f7d59e7384c22f82b/html5/thumbnails/59.jpg)
What is next?
• Classification with Logistic Regression
• Gradient descent tricks & more advanced optimization techniques
• Underfitting & Overfitting
• Model selection (Training, Validation and test set)