linear methods for regression hege leite størvold tirsdag 12.04.05
TRANSCRIPT
![Page 1: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/1.jpg)
Linear methods for regression
Hege Leite Størvold
Tirsdag 12.04.05
![Page 2: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/2.jpg)
Linear regression models
Assumes that the regression function E(Y|X) is linear
Linear models are old tools but … Still very useful Simple Allow an easy interpretation of regressors effects Very wide since Xi’s can be any function of other variables
(quantitative or qualitative)
Useful to understand because most other methods are generalizations of them.
N
iii XfXXYE
10 )()|(
![Page 3: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/3.jpg)
Matrix Notation
X is n (p+1) of input vectors
y is the n-vector of outputs (labels)
is the (p+1)-vector of parameters
npnn
p
p
Tn
T
T
xxx
xxx
xxx
x
x
x
X
...1
...
...1
...1
1
...
1
1
21
22221
11211
2
1
ny
y
y
...2
1
y
p
...1
0
β
![Page 4: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/4.jpg)
Lesast squares estimation
The linear regression model has the form
the βj’s are unknown parameters or coefficients.
Typically we have a set of training data (x1,y1), …, (xn,yn) from which we want to estimate the parameters β.
The most popular estimation method is least squares
p
jjjxXf
10)(
![Page 5: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/5.jpg)
Least Squares: find solution, , by minimizing the residual sum of squares (RSS):
Training samples are random, independent draws OR, yi’s are conditionally independent given xi
Linear regression and least squares
)()()(
2
1 10 βyβyβ XXxyRSS T
N
i
p
jjiji
β̂
Reasonable criterion when…
![Page 6: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/6.jpg)
Geometrical view of least squares Simply find the best linear fit to the data
ei is the residual of observation i
One covariate Two covariates
![Page 7: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/7.jpg)
Solving Least Squares Derivative of a Quadratic Product
bxexexbxdxd TTTT ACDDCADCA
βy
yβyβ
yβyβββ
XX
XIXXIX
XIXRSS
T
TN
TN
T
NT
2
yβ
yβ
βy
TT
TT
TT
XXX
XXX
XXX
1
0
Then,
Setting the First Derivative to Zero:
![Page 8: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/8.jpg)
The normal equations
Assuming that (XTX) is non-singular, the normal equations gives the unique least squares solution:
Least squares predicitons
When (XTX) is singular the least squares coefficients are no longer uniquely defined. Some kind of alternative strategy is needed to obtain a solution:
Recoding and/or dropping redundant columns Filtering Control fit by regularization
yβ TTOLS XXX 1)(ˆ
yβy TTOLS XXXXX 1)(ˆˆ
![Page 9: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/9.jpg)
Geometrical interpretation of least squares estimates
Predicted outcomes ŷ are the orthogonal projection of y onto the columnspace of X (that spans a subspace of Rn).
![Page 10: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/10.jpg)
Properties of least squares estimators If Yi are independent, X fixed and Var(Yi) = σ2
constant, then
If, in addition Yi=f(Xi) + ε with ε ~ N(0,σ2), then
2222
12
))(ˆ(1
1ˆ with )ˆ( and
)()ˆ( ,)ˆ(
ii
T
fypn
E
XXVarE
x
βββ
21
2221 ~ˆ)1( and ))(,(~ˆ
pnT pnXXN ββ
![Page 11: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/11.jpg)
Properties of the least squares solution To test the hypothesis that a particular coefficient
βj = 0 we calculate
Under the null hypotesis that βj = 0, zj will be distributed as tn-p-1 and hence a large absolute value of zj will reject the null hypothesis
A (1-2α) confidence interval for βj can be formed by:
F-test can be used to test the nullity of a vector of parameters
j
jj
vz
ˆ
ˆ
)ˆˆ,ˆˆ( )1()1( jjjj vzvz
vj is the jth diagonal element of (XTX)-1
![Page 12: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/12.jpg)
Significance of Many Parameters We may want to test many features at once
Comparing model M0 with k parameters to an alternative
model MA with m parameters from M0 (m<k)
))2(),((~
20
knmkFknRSS
mkRSSRSSF
A
A
Full model
Reduced/alternative model
Use the F statistic:
![Page 13: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/13.jpg)
Example: Prostate cancer Response: level of prostate antigen
Regressors: 8 clinical measures useful for men receiving prostatectomy
Results from linear fit:
Term Coefficient Std.error Z score
intercept 2,48 0,09 27,66
lcalvol 0,68 0,13 5,37
lweight 0,30 0,11 2,75
age -0,14 0,10 -1,40
lbph 0,21 0,10 2,06
svi 0,31 0,12 2,47
lcp -0,29 0,15 -1,87
gleason -0,02 0,15 -0,15
pgg45 0,27 0,15 1,74
![Page 14: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/14.jpg)
Gram-Schmidt Procedure
1) Initialize z0 = x0 = 12) For j = 1 to p
For k = 0 to j-1, regress xj on the zk’s so that
Then compute the next residual
3) Let Z = [z0 z1 … zp] and be upper triangular with entries kj
X = Z = ZD-1D = QR
where D is diagonal with Djj = || zj ||
kk
jkkj zz
xz
1
0
j
k
kkjjj zxz
(univariate least squares estimates)
O(Np2)
![Page 15: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/15.jpg)
Technique for Multiple Regression Computing directly has poor numeric
properties
QR Decomposition of X Decompose X = QR where
Q is N (p+1) orthogonal matrix (QTQ = I(p+1)) R is an (p+1) (p+1) upper triangular matrix
Then
1) Compute QTy
2) Solve R = QTy by back-substitution
yβ TT XXX1ˆ
yyyyβ TTTTTTTTTTT QRQRRRQRRRQRQRQR 11111ˆ
β̂
![Page 16: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/16.jpg)
Multiple outputs
here Y is the nxK response matrix, with ik entry yik
X is the nx(p+1) input matrix B is the (p+1)xK matrix of parameters E is the nxK matrix of errors
kk
p
jkjkjkk XfXY
)(1
0
With n training cases the model can be written in matrix notation
Y=XB+E
Suppose we want to predict multiple outputs Y1,Y2,…,YK from multiple inputs X1,X2,…,Xp. Assume a linear model for each output:
![Page 17: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/17.jpg)
Multiple outputs cont.
A straightforward generalization of the univariate loss function is
)()())(()(1 1
2 XBYXBYtrfyBRSS TK
k
n
iikik
x
And the least squares estimates have the same form as before:
YXXXB TT 1)(ˆ the coefficients for the k’th outcome are just the least squares estimates of the single output regression of yk on x0,x1,…,xp
If the errors ε=(ε1,…., εK) are correlated a modified model might be more appropriate (details in textbook)
![Page 18: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/18.jpg)
Why Least Squares?
Gauss-Markov Theorem: The least squares estimates have the smallest variance
among all linear unbiased estimates
However, there may exist a biased estimator with lower mean square error
2
2
])ˆ([)ˆ(
)ˆ()ˆ(
βββ
βββ
EVar
EMSE
this is zero for least squares
![Page 19: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/19.jpg)
Subset selection and Coefficient Shrinkage Two reasons why we are not happy with least
squares Prediction accuracy: LS estimates often provide
predictions with low bias, but high variance. Interpretation: when the number of regressors i too high,
the model is difficult to interpret. One seek to find a smaller set of regressors with higher effects.
We will consider a numer of approaches to variable selection and coefficient shrinkage.
![Page 20: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/20.jpg)
Subset selection and shrinkage: Motivation Bias – variance trade off:
Goal: choose a model to minimize error
Method: sacrifice a little bit of bias to reduce the variance.
Better interpretation: find the strongest factors from the input space.
2)ˆ()ˆ()ˆ( biasVarMSE
![Page 21: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/21.jpg)
Subset selection Goal: to eliminate unnecessary variables
from the model. We will consider three approaches:
Best subset regression Choose subset of size k that gives lowest RSS.
Forward stepwise selection Continually add features with the largest F-ratio
Backward stepwise selection Remove features with small F-ratio
Greedy techniques – not guaranteed to find the best model
![Page 22: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/22.jpg)
Best subset regression For each find the subset of size k
that gives the smallest RSS.
Leaps and bounds procedure works with p ≤ 40.
How to choose k? Choose model that minimizes prediction error (not a topic here).
When p is large searching through all subsets is not feasible. Can we seek a good path through subsets instead?
},...,2,1,0{ pk
![Page 23: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/23.jpg)
Forward Stepwise selection
Method: Start with intercept model. Sequentially include variable that most improve
the RSS(β) based on the F statistic:
Stop when no new variable improves fit significantly
)2/()~
(
)~
()ˆ(
knRSS
RSSRSSF
![Page 24: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/24.jpg)
Backward Stepwise selection
Method: Start with full model Sequentially delete predictors that produces the
smallest value of the F statistic, i.e. increases RSS(β) least.
Stop when each predictor in the model produces a significant value of the F statistic
Hybrids between forward and backward stepwise selection exists
![Page 25: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/25.jpg)
Subset selection
Produces model that is interpretable and has possibly lower prediction error
Forces some dimensions of X to zero, thus probably decrease Var(β)
Optimal subset must be chosen to minimize predicion error (model selection: not a topic here)
))(,(~ˆ 21XXN Tββ
![Page 26: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/26.jpg)
Shrinkage methods Use additional penalties/constraints to reduce
coefficients
Shrinkage methods are more continous than stepwise selection and therefore don’t suffer as much from variability
Two examples: Ridge Regression
Minimize least squares s.t. The Lasso
Minimize least squares s.t.
p
j
j s1
2
p
jj s
1
||
![Page 27: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/27.jpg)
Ridge regression
Shrinks the regression coefficients by imposing a penalty on their size
Complexity parameter λ controls amount of shrinkage
equivalently
i
p
j
p
jjjiji
ridge xy1 1
220 )(minargˆ
ββ
p
jj
i
p
jjiji
ridge
s
xy
1
2
1
20
osubject t
)(minargˆ
β
β One-to-one corresponencebetween ss and λλ
![Page 28: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/28.jpg)
Properties of ridge regression Solution by matrix notation:
Addition of λ>0 to the diagonal of XTX before inversion makes the problem nonsingular even if X is not of full rank.
The size constraint prevents coefficient estimates of highly correlated variables show high variance.
Quadratic penalty makes the ridge solution a linear function of y.
yβ TTridge XIXX 1)(ˆ
![Page 29: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/29.jpg)
Properties of ridge regression cont. Can also be motivated through bayesian statistics by
choosing an appropriate prior for β.
Does no automatic variable selection
Ridge existence theorem states that there exists a λ>0 so that
Optimal complexity parameter must be estimated
)ˆ()ˆ( OLSridge MSEMSE ββ
![Page 30: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/30.jpg)
Example
The parameters arecontinously shrunken towardszero
Complexity parameter of the model:
Effective degrees of freedom
![Page 31: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/31.jpg)
Singular value decomposition (SVD) The SVD of the matrix has the form
where and are orthogonal matrices and D=diag(d1,…..,dr)
are the non-zero singular values of X.
r ≤ min(n,p) is the rank of X The eigenvectors vi are called the principal
components of X.
TUDVX
0.....21 rddd
pnRU ppRV
pnRX
![Page 32: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/32.jpg)
Linear regression by SVD
A general solution to y=Xβ can be written as
The filter factors ωi determines the extent of shrinking, 0≤ ωi ≤1, or stretching, ωi >1, along the singular directions ui
For the OLS solution ωi =1, i=1,…,p, i.e. all the directions ui contribute equally
p
ii
i
ii d1
ˆ vyu
β
![Page 33: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/33.jpg)
Ridge regression by SVD
I ridge regression the filter factors are given by
Shrinks the OLS estimator in every direction depending on λ and the corresponding di.
The directions with low variance (small singular values) are the directions shrunken the most by ridge Assumption: y vary most in the directions of high variance
pd
d
i
ii 1,...., i ,
2
2
![Page 34: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/34.jpg)
The Lasso
A shinkage method like ridge, but with important differences
The lasso estimate
The L1 penalty makes the solution nonlinear in y quadratic programming needed to compute the solutions.
Sufficient shrinkage will cause some coefficients to be exactly zero, so it acts like a subset selection method.
sxyp
jj
N
i
p
jjiji
lasso
1
2
1 10 ||osubject tminarg
ββ
![Page 35: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/35.jpg)
Example
Coefficients plottet against
|ˆ|1
p
j
ts
Note that the lasso profiles hit zero, while those for ridge do not.
![Page 36: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/36.jpg)
A unifying view
We can view these linear regression techniques under a common framework
includes bias, q indicates a prior distribution on =0: least squares >0, q=0: subset selection (counts number of nonzero parameters)
>0, q=1: the lasso >0, q=2: ridge regression
p
j
qj
N
i
p
jjiji xy
1
2
1 10 ||minargˆ
![Page 37: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/37.jpg)
Methods using derived input directions Goal: Using linear combinations of inputs as
inputs in the regression Includes
Principal Components Regression Regress on M < p principal components of X
Partial Least Squares Regress on M < p directions of X weighted by y
The methods differ in the way the linear combinations of the input variables are constructed.
![Page 38: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/38.jpg)
PCR Use linear combinations zm=X v as new features
vj is the principal component (column of V) corresponding to the jth largest element of D, e.g. the directions of maximal sample variance
For some M ≤ p form the derived input vectors [z1…zM] = [Xv1……XvM]
Regress y on z1…zM, gives the solution
where
M
mmm
PCR zM1
ˆ)(ˆ
mmmm zzyz ,/, ̂
![Page 39: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/39.jpg)
PCR continued
The m’th principal component direction vm solves:
Filter factors become
e.g. it discards the p-M smallest eigenvalue components from the OLS solution.
If p=M it gives the OLS solution
Mifor 0
Mifor 1i
)(max1,...,1,0
1|||| α
αvα
XVarmlST
l
![Page 40: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/40.jpg)
Comparison of PCR and ridge
Shrinkage and trucation patterns as a function of the principal component index
![Page 41: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/41.jpg)
Idea: find the directions that have high variance and have correlation with y
In construction of each zm the inputs are weighted by the strength of their univariate effect on y
Pseudo-algoritm: Set and For m=1,….,p
1. Find m’th PLS component
2. Regress y on zm
3. Update y
4. Orthogonalize each xj(m)
with respect to zm
PLS solution:
PLS
mmmm zyy ̂ˆˆ )1()(
mmm
mm
mm XXX zzzz ,/,)()1()(
)()()( ˆ whereˆ mTmmm
mm XX yz
mmmmm zzyz ,/,̂
l
m
l ljPLSj m ˆˆ)(ˆ
1
)var(/)()0(jjjj x xxx y1y )0(ˆ
![Page 42: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/42.jpg)
PLS cont.
Nonlinear function of y, because y is used to find the linear components
As with PCR M=p gives OLS estimate, while M<p directions produces a reduced regression.
The m’th PLS direction is found by using the that maximizes the covariation between the input and output variable:
where S is the sample covariance matrix of the xj.
)(),(max 2
1,...,1,01||||
ααyαα
XVarXCorrmlST
l
m̂
![Page 43: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/43.jpg)
PLS cont.
Filter factors for PLS become
where θ1≥… ≥θ are the Ritz values (not defined here).
Note that some ωi>1, but it can be shown that PLS shrinks the OLS solution,
It can also be shown that the sequence of PLS components for m=1,2,…,p represents the conjugate gradient sequence for computing the OLS solution.
q
j j
ii
d
1
2
)1(1
22 ||ˆ|| ||ˆ|| OLSPLS
![Page 44: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/44.jpg)
Comparison of the methods
Ridge shrinks all directions, but shinks the low-variance directions most
PCR leaves M high variance directions alone, and discards the rest.
PLS tends to shrink the low-variance directions, but can inflate some of the higher variance directions
pd
d
i
ii 1,...., i ,
2
2
p1,...,Mifor 0
M,...,1ifor 1i
q
j j
ii
d
1
2
)1(1
p
ii
i
ii d1
ˆ vyu
β Consider the general solution:
![Page 45: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/45.jpg)
Comparison of the methodsβ
2
Consider an example with two correlated inputs x1 and x2, ρ=0.5.Assume true regression coefficients β1=4 and β2=2
Coefficient profiles for the different methods as the tuning parametersare varied:
lasso
ridge
PCR
Best subset
Least SquaresPLS
β1 4
2
![Page 46: Linear methods for regression Hege Leite Størvold Tirsdag 12.04.05](https://reader035.vdocuments.us/reader035/viewer/2022062801/56649e215503460f94b0e02c/html5/thumbnails/46.jpg)
Example: Prostate cancerTerm LS Best
subsetRidge Lasso PCR PLS
Intercept 2,480 2,495 2,467 2,477 2,513 2,452
lcalvol 0,680 0,740 0,389 0,545 0,544 0,440
lweight 0,305 0,367 0,238 0,237 0,337 0,351
age -0,141 -0,029 -0,152 -0,017
lbph 0,210 0,159 0,098 0,213 0,248
svi 0,305 0,217 0,165 0,315 0,252
lcp -0,288 0,026 -0,053 0,078
Gleason -0,021 0,042 0,230 0,003
Pgg45 0,267 0,123 0,059 -0,053 0,080
Test err. 0,586 0,574 0,540 0,491 0,527 0,636
Std.err. 0,184 0,156 0,168 0,152 0,122 0,172