all you wanted to know about regression… cosc 526 class 9 arvind ramanathan computational science...
TRANSCRIPT
![Page 1: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/1.jpg)
All you wanted to know about Regression…
COSC 526 Class 9
Arvind RamanathanComputational Science & Engineering DivisionOak Ridge National Laboratory, Oak RidgePh: 865-576-7266E-mail: [email protected]
Slides inspired by: Andrew Moore (CMU), Andrew Ng (Stanford)
![Page 2: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/2.jpg)
2
Introducing your guest instructor (Feb 10-12)
• Dr. Sreenivas (Rangan) Sukumar
• Staff member at ORNL:– Leader in graph analytics
approaches
– UTK grad…
– “Healthcare Guru” at ORNLSide bar: • The class website location will be shortly updated. The original links
must work – but will be redirected to a new location at EECS.• Approved for space on the EECS website! • Hadoop server is working (finally) and your accounts are also ready
(utk id). More information on log in procedures as well as access to data forthcoming…
![Page 3: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/3.jpg)
3
Last class: Classification with SVMs
• We had a class variable: y– Categorical in nature
– {x1, x2, …, xn} could be anything
• Formulated a quadratic programming problem that would eventually allow us to classify– stochastic gradient descent (SGD)
• Alterations for big datasets: – Minimum enclosing ball (MEB)
– Shrinking the optimization problem
– Incremental and decremental SVM learning
![Page 4: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/4.jpg)
4
This class: predicting a real valued y
• Instead of a categorical class value y, we are going to see how to predict a real valued y
• Various regression algorithms:– Linear regression
– Regression with varying noise
– Non-linear regression
• Adapting regression for big data
![Page 5: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/5.jpg)
5
Part I: Linear Regression
![Page 6: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/6.jpg)
6
Regression
Living Area (sq. ft)
Prince ($ 1000s)
2104 400
1600 300
2400 370
1416 200
3000 540
Living area (sq. ft.)
Pri
ce• Can we predict the prices of other
houses as a function of their living area?
Linear regression helps us with this analysis…
As a recent home buyer (or a buyer interested in the market):
![Page 7: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/7.jpg)
7
Linear regression
• Linear regression assumes that the expected value of the output, given some input is linear
• Simplest way to think about this: y = wx for some unknown w
Living area (sq. ft.)
Pri
ceLiving Area (sq. ft)
Prince ($ 1000s)
2104 400
1600 300
2400 370
1416 200
3000 540
Given the data, how to estimate w….
w
1
![Page 8: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/8.jpg)
8
Some formalism…
• Assume that our data is formed by:
– Noise signals are independent
– Drawn from a Normal distribution
• p(y | w, x) has a normal distribution:– Mean wx
– variance σ2
Noise
![Page 9: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/9.jpg)
9
Linear Regression (1)
• we have a bunch of data {(x1, y1), (x2, y2), … (xn, yn)} which are all evidence about w
• How to infer w (from the data)?
• Bayes rule to our rescue:– Maximum likelihood estimate (MLE) of w
– Because you can do it on a computer!
![Page 10: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/10.jpg)
10
MLE of w
• For which value of w is the data most likely to have this behavior? – i.e., for what w is
maximized?
– i.e., for what w is maximized?
Since we know the distribution, i.e., we assumed that the data came from a normal distribution
2
![Page 11: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/11.jpg)
11
MLE of w
• Now do the log-likelihood trick…
• Equivalently:
now we are in familiar territory…
![Page 12: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/12.jpg)
12
All we have to do is …
• Take the derivative of E(w) w.r.t w and set to 0
0
![Page 13: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/13.jpg)
13
What do we mean by this (graphically)?
• If x=sq. ft., y = price,
is the average price for x = 2014 sq. ft.• If x=height, y = weight,
is the average weight for all people 60 in tall.
![Page 14: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/14.jpg)
14
Multi-linear Regression
• Now instead of a single x, let’s say we have x, where it comes from a d-dimensional spaceLiving Area (sq. ft)
No. of rooms
Prince ($ 1000s)
2104 2 400
1600 2 300
2400 3 370
1416 2 200
3000 4 540
How do we think of doing regression?
• Remember there are d-dimensions • (2 here)
• Can we visualize our data in a way that is easy to “regress”?
![Page 15: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/15.jpg)
15
Matrix algebra to our rescue…
• out(x) = wTx = w1x[1] + w2x[2] + … + wdx[d]
• How do we learn w?
• Let’s define a cost-function
![Page 16: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/16.jpg)
16
MLE is very similar to the simple regression story…
• MLE is given by:
• xTx is a n x n matrix:– where (i,j) th element is
• xTy is a n element vector: – with ith element
![Page 17: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/17.jpg)
17
How to solve this on a computer?
• Let’s say I have an initial guess for w
• I need to search for a suitable w that will make J(w) smaller
• Idea: use gradient descent!
Repeat until convergence: For every j = 1…m:
Calculate gradient
Update
![Page 18: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/18.jpg)
18
Problem(s) with gradient descent
• It will converge: for linear regression, since we have a global minimum, GD will converge to the solution!
• Takes a long time if training examples are large in number:– Each iteration scans through the entire training
dataset
– Can do stochastic gradient descent (SGD) in a similar way we discussed last time…
![Page 19: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/19.jpg)
19
Pesky detail…
• We always talked about the line as if it originated from the origin 0D
• What if this is not the case?
Living area (sq. ft.)
Pri
ce
Living area (sq. ft.)
Pri
ce
![Page 20: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/20.jpg)
20
Let’s fake it… neat trick!
• Create a fake input x0 with a value of 1
(always)
x1 x2 y
2104 2 400
1600 2 300
2400 3 370
1416 2 200
3000 4 540
x0 x1 x2 y
1 2104 2 400
1 1600 2 300
1 2400 3 370
1 1416 2 200
1 3000 4 540
y = w1x1 + w2x2 y = w0x0 + w1x1 + w2x2
= w0 + w1x1 + w2x2
![Page 21: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/21.jpg)
21
Let’s say we know something about the noise added to each data point
• E.g.: I know the variance of the noise added to each data point…
xi σi2 y
0.5 4 0.5
1 1 1
2 0.25 1
2 4 3
3 0.25 2
Now, how do we do the MLE?
![Page 22: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/22.jpg)
22
MLE with varying noise
Assuming independence among noise, plug in the Gaussian equation and simplify;
setting d(LL)/dw = 0 for minimum:
![Page 23: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/23.jpg)
23
Weighted Regression
• We just saw “weighted regression”
• points that have a “higher confidence” and “lower noise” are important
• Rest are weighted by the variance in their noise
![Page 24: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/24.jpg)
24
Part II: Non-linear Regression
![Page 25: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/25.jpg)
25
Non-linear regression…
• Suppose y is related to a function of x in such a way that the predicted values have a non-linear relationship…
xi y1
0.5 0.05
1 2.5
2 3
3 2
3 3
Assume
![Page 26: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/26.jpg)
26
Non-linear MLE
• Ugly, ugly algebra!!! What do we do?– Line search
– Simulated annealing
– GD and SGD
– Newton’s method
– Expectation Maximization!
![Page 27: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/27.jpg)
27
Polynomial Regression…
• All this while, we were talking about linear regression
• But, it may not be the best way to describe data
• Be careful about how to fit the data…
![Page 28: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/28.jpg)
28
Suppose we add an additional term…
• Quadratic regression: Each component is now called a term
• Each column is called a term column
• How many terms in a quadratic regression with p inputs?– 1 constant term
– p linear terms
– (p+1)C2 quadratic terms! => O(m2) terms
Solving our MLE:• Similar to our linear regression w =
(xTx)-1(xTy) • Cost will be O(p6)
![Page 29: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/29.jpg)
29
Generalizing: p inputs, Qth degree polynomial… how many terms?
• = number of unique terms of the form
• = number of unique terms of the form
• = the number of lists of non-negative integers [q0, q1, …, qp]
• =(Q+p)CQ terms!!
![Page 30: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/30.jpg)
30
Notes of caution…
• Is a polynomial with p = 2 better than p = 5?
• Linear fit is underfitting the data: – data shows structure not captured by the model
• Polynomial fit is overfitting the data:– data is fit very strongly by the model…
Moral of the story• Selecting model is important• More important is the selection of
the features!!
![Page 31: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/31.jpg)
31
Locally Weighted Regression (LWR)
• An approach to reduce the dependency on selecting features:– Many datasets don’t have linear descriptors
• We have seen this before: – In the weighted regression model
• How do we choose the right weights?
weights!
![Page 32: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/32.jpg)
32
Using the Kernel Trick once again…
• where Φ(x) is the kernel function
How do we estimate w?
![Page 33: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/33.jpg)
33
Using the Kernel Trick once again…
• where Φ(x) is the kernel function
All ci are held constant. We will just initialize them at random or on a uniformly spaced grid in d dimensions…
KW – kernel width is also held constant. It will be some value that ensures good overlap between the basis functions…
![Page 34: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/34.jpg)
34
How do we estimate w?
• Same as before…– Given the Q basis functions, let’s define a
matrix, Z such that
– Here xk is the the kth input vector…
• Now, we will:
• How to find the ci and KW?– Use BGD / SGD…
– Other methods will work
Also referred to as radial basis functions (RBFs)
![Page 35: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/35.jpg)
35
What are good radial basis function choices?
• We talked about overlaps…
Living area (sq. ft.)
no.
of
room
s
Living area (sq. ft.)
no.
of
room
s
Living area (sq. ft.)
no.
of
room
s
Just about right overlap…
Too little overlap? Too much overlap?
![Page 36: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/36.jpg)
36
Robust Regression…
• Best quadratic fit: – what is the problem
here?
• What would we want?– better fit to the varying
data!
– How can we find the better fitting curve?
![Page 37: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/37.jpg)
37
LOESS-based Robust Regression
• After the initial fit, score each data point to say how well it is fitted
good data point
good data point
Not that bad either
this is a horrible data point
Repeat until convergence: For every k = 1…m:
Let be the kth data point
Let be the estimate of the yk data point
Let wk is large if the data point is
fitted well and very small if it is not fitted well
Redo the regression with the weighted data points
How do we know we have converged? Use expectation maximization (EM)
![Page 38: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/38.jpg)
38
Multilinear Interpolation
• How to create a piecewise, linear fit to the data?
Create a set of “knot points” equally spaced along the data…
Let’s assume that the data points are generated by a noisy function that is allowed to bend only at these knot points…
We can do a linear regression for every segment identified here…
![Page 39: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/39.jpg)
39
How to find the best fit?
• With some algebraic manipulations…
q1 q2 q3 q4 q5 q6
h3
h2
![Page 40: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/40.jpg)
40
![Page 41: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/41.jpg)
41
Can we do classification with this?
• Map y to be {0, 1} – negative and positive class
• Function: Logistic/Sigmoid function
• Note g(θTx)1 as θTx ∞
• g(θTx)0 as θTx -∞
![Page 42: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/42.jpg)
42
How do we do MLE on this?
![Page 43: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/43.jpg)
43
Another approach to maximize L(θ)
• Using Newton’s approach: finding a zero for a function
Hessian: (n x n matrix to keep track of all partial derivatives
![Page 44: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/44.jpg)
44
Generalizing further…
• Regression:
• Classification:
• Begin by defining an exponential family of distributions:
natural parameter
sufficient statistic
log partition function
![Page 45: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/45.jpg)
45
Bernoulli and Gaussian as specific GLMs
![Page 46: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/46.jpg)
46
Softmax Regression
• Instead of a response variable y taking {0, 1}, we can think of having one of k values {1, 2,… k}
• Ex.: Mail classification = {spam, personal mail, work mail, advertisement}
• GLM with multinomial…
![Page 47: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/47.jpg)
47
Part II: What do we do with Big Data?
![Page 48: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/48.jpg)
48
Can we make Regression Faster?
• At least O(p2m):– Where p is the number of features (columns)
– m is the number of training examples
• Usually only a subset of p features, k, is relevant k << p
• What can we do to exploit this?– Variance inflation factor (VIF) regression O(pm)
![Page 49: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/49.jpg)
49
VIF regression
• Evaluation step: – approximate the partial correlation of each
candidate variable (feature xi) with y using a
small pre-sampled set of data
– [stagewise regression]
• Search step:
– Test each xi sequentially using an α-investing
rule
D. Lin, D.P. Foster, L.H. Ungar, VIM Regression, Arxiv 2012
![Page 50: All you wanted to know about Regression… COSC 526 Class 9 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697bf8b1a28abf838c8b3ce/html5/thumbnails/50.jpg)
50
Other standard approaches also work…• MapReduce
• Gather/Apply/Scatter (GAS) [to be seen in the future]
• Spark!
• What you need to know? – Regression is one of the most commonly used
ML algorithms
– Many flavors and can be generalized using GLMs
– Research still needs to be carried out for big datasets