bias reduction: case studies · mcp/scad penalized regression models; its syntax is fairly similar...

22
Breast cancer gene expression study, revisited WHO-ARI study Bias reduction: Case studies Patrick Breheny March 6 Patrick Breheny University of Iowa High-Dimensional Data Analysis (BIOS 7240) 1 / 22

Upload: others

Post on 01-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bias reduction: Case studies · MCP/SCAD penalized regression models; its syntax is fairly similar to glmnet Let’s t two penalized regression models to the BRCA1 data, one with

Breast cancer gene expression study, revisitedWHO-ARI study

Bias reduction: Case studies

Patrick Breheny

March 6

Patrick Breheny University of Iowa High-Dimensional Data Analysis (BIOS 7240) 1 / 22

Page 2: Bias reduction: Case studies · MCP/SCAD penalized regression models; its syntax is fairly similar to glmnet Let’s t two penalized regression models to the BRCA1 data, one with

Breast cancer gene expression study, revisitedWHO-ARI study

Adaptive lassoMCP/SCAD

Introduction

• In our lecture for today, we will revisit our twohigh-dimensional studies from the previous chapter / topic andanalyze them with the reduced-bias approaches of this topic

• First, we consider an adaptive lasso model for the BRCA1gene expression data

• As our initial estimator, let’s use lasso estimates with λchosen according to BIC:

fit <- ncvreg(X, y, penalty='lasso')

b <- coef(fit, which=which.min(BIC(fit)))[-1]

(using ncvreg for fitting due to its compatibility with BIC)

• Cross-validation would of course be a reasonable alternative

Patrick Breheny University of Iowa High-Dimensional Data Analysis (BIOS 7240) 2 / 22

Page 3: Bias reduction: Case studies · MCP/SCAD penalized regression models; its syntax is fairly similar to glmnet Let’s t two penalized regression models to the BRCA1 data, one with

Breast cancer gene expression study, revisitedWHO-ARI study

Adaptive lassoMCP/SCAD

Adaptive lasso fit

Once we have the initial estimator, we can fit an adaptive lassomodel as follows:

w <- abs(b)^(-1) # Calculate weights

w <- pmin(w, 1e10) # cv.glmnet does not allow

# infinite weights

cvfit <- cv.glmnet(X, y, penalty.factor=w)

and plot the results as usual:

plot(cvfit)

Patrick Breheny University of Iowa High-Dimensional Data Analysis (BIOS 7240) 3 / 22

Page 4: Bias reduction: Case studies · MCP/SCAD penalized regression models; its syntax is fairly similar to glmnet Let’s t two penalized regression models to the BRCA1 data, one with

Breast cancer gene expression study, revisitedWHO-ARI study

Adaptive lassoMCP/SCAD

Adaptive lasso: Cross-validation (biased)

−2 −4 −6 −8 −10 −12

0.2

0.3

0.4

0.5

log(λ)

CV

(λ)

●●

●●

●●●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 1 3 5 5 9 12 17 20 20 23 23Variables selected

Patrick Breheny University of Iowa High-Dimensional Data Analysis (BIOS 7240) 4 / 22

Page 5: Bias reduction: Case studies · MCP/SCAD penalized regression models; its syntax is fairly similar to glmnet Let’s t two penalized regression models to the BRCA1 data, one with

Breast cancer gene expression study, revisitedWHO-ARI study

Adaptive lassoMCP/SCAD

Adaptive lasso: Cross-validation (unbiased)

−2 −4 −6 −8 −10 −12 −14

0.2

0.3

0.4

0.5

log(λ)

CV

(λ)

●●

●●

●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 1 3 5 7 9 16 20 20 23 23 23Variables selected

Patrick Breheny University of Iowa High-Dimensional Data Analysis (BIOS 7240) 5 / 22

Page 6: Bias reduction: Case studies · MCP/SCAD penalized regression models; its syntax is fairly similar to glmnet Let’s t two penalized regression models to the BRCA1 data, one with

Breast cancer gene expression study, revisitedWHO-ARI study

Adaptive lassoMCP/SCAD

Regular lasso: Cross-validation

−1.0 −1.5 −2.0 −2.5 −3.0 −3.5 −4.0

0.2

0.3

0.4

0.5

log(λ)

CV

(λ)

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 4 6 9 13 23 41 58 90 144 214Variables selected

Patrick Breheny University of Iowa High-Dimensional Data Analysis (BIOS 7240) 6 / 22

Page 7: Bias reduction: Case studies · MCP/SCAD penalized regression models; its syntax is fairly similar to glmnet Let’s t two penalized regression models to the BRCA1 data, one with

Breast cancer gene expression study, revisitedWHO-ARI study

Adaptive lassoMCP/SCAD

Source of bias

• In the first figure, the CV error is not estimated in anunbiased manner

• The reason is that the left-out fold is not truly external to thefitting procedure, as it was used to obtain an initial estimator

• As a result, prediction error is underestimated

• To obtain an (approximately) unbiased estimate of CV error,one must cross-validate the entire procedure, including theinitial estimate

Patrick Breheny University of Iowa High-Dimensional Data Analysis (BIOS 7240) 7 / 22

Page 8: Bias reduction: Case studies · MCP/SCAD penalized regression models; its syntax is fairly similar to glmnet Let’s t two penalized regression models to the BRCA1 data, one with

Breast cancer gene expression study, revisitedWHO-ARI study

Adaptive lassoMCP/SCAD

Remarks

• CV errors:◦ Lasso: 0.20◦ Adaptive lasso (biased): 0.18◦ Adaptive lasso (unbiased): 0.22

• This is an important cautionary example to keep in mind forthe adaptive lasso: flexible, two-stage methods have certainadvantages in terms of simplicity, but are also easy to makemistakes with

• Unfortunately, while existing R packages can be used to fitadaptive lasso models, there are not currently anycomprehensive software packages for the adaptive lasso (that Iam aware of) that carry out full cross-validation

Patrick Breheny University of Iowa High-Dimensional Data Analysis (BIOS 7240) 8 / 22

Page 9: Bias reduction: Case studies · MCP/SCAD penalized regression models; its syntax is fairly similar to glmnet Let’s t two penalized regression models to the BRCA1 data, one with

Breast cancer gene expression study, revisitedWHO-ARI study

Adaptive lassoMCP/SCAD

MCP analysis

• MCP and SCAD achieve the adaptive lasso’s goal of reducingthe bias associated with the lasso, but do so in a single stepand thus prove a bit more amenable to carrying out inferenceconcerning predictive accuracy using cross-validation

• The ncvreg package is a widely used package for fittingMCP/SCAD penalized regression models; its syntax is fairlysimilar to glmnet

• Let’s fit two penalized regression models to the BRCA1 data,one with γ = 3 and the other with γ = 7:

cvfit3 <- cv.ncvreg(X, y) # gam=3 is default

cvfit7 <- cv.ncvreg(X, y, gamma=7)

Patrick Breheny University of Iowa High-Dimensional Data Analysis (BIOS 7240) 9 / 22

Page 10: Bias reduction: Case studies · MCP/SCAD penalized regression models; its syntax is fairly similar to glmnet Let’s t two penalized regression models to the BRCA1 data, one with

Breast cancer gene expression study, revisitedWHO-ARI study

Adaptive lassoMCP/SCAD

Results: MCP

−1.0 −1.5 −2.0 −2.5

−0.2

−0.1

0.0

0.1

0.2

0.3

0.4

0.5

log(λ)

β̂

γ = 3

−1.0 −1.5 −2.0 −2.5

−0.2

−0.1

0.0

0.1

0.2

0.3

0.4

0.5

log(λ)

β̂

γ = 7

Patrick Breheny University of Iowa High-Dimensional Data Analysis (BIOS 7240) 10 / 22

Page 11: Bias reduction: Case studies · MCP/SCAD penalized regression models; its syntax is fairly similar to glmnet Let’s t two penalized regression models to the BRCA1 data, one with

Breast cancer gene expression study, revisitedWHO-ARI study

Adaptive lassoMCP/SCAD

CV Results: MCP

−1.0 −2.0 −3.0 −4.0

0.20

0.25

0.30

0.35

0.40

0.45

0.50

0.55

log(λ)

Cro

ss−

valid

atio

n er

ror ●

●●●●

●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●

●●●●●●●●

●●●●

0 2 3 3 5 18 28 44 60 84Variables selected

γ = 3

−1.0 −2.0 −3.0 −4.0

0.20

0.25

0.30

0.35

0.40

0.45

0.50

0.55

log(λ)

Cro

ss−

valid

atio

n er

ror ●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

0 4 6 9 18 25 48 95 118Variables selected

γ = 7

Patrick Breheny University of Iowa High-Dimensional Data Analysis (BIOS 7240) 11 / 22

Page 12: Bias reduction: Case studies · MCP/SCAD penalized regression models; its syntax is fairly similar to glmnet Let’s t two penalized regression models to the BRCA1 data, one with

Breast cancer gene expression study, revisitedWHO-ARI study

Adaptive lassoMCP/SCAD

summary

ncvreg provides a useful summary function for fitted CV objects:

> summary(cvfit3)

MCP-penalized linear regression with n=536, p=17322

At minimum cross-validation error (lambda=0.0464):

-------------------------------------------------

Nonzero coefficients: 38

Cross-validation error (deviance): 0.21

R-squared: 0.58

Signal-to-noise ratio: 1.39

Scale estimate (sigma): 0.461

Patrick Breheny University of Iowa High-Dimensional Data Analysis (BIOS 7240) 12 / 22

Page 13: Bias reduction: Case studies · MCP/SCAD penalized regression models; its syntax is fairly similar to glmnet Let’s t two penalized regression models to the BRCA1 data, one with

Breast cancer gene expression study, revisitedWHO-ARI study

Adaptive lassoMCP/SCAD

summary

And the equivalent summary for γ = 7:

> summary(cvfit7)

MCP-penalized linear regression with n=536, p=17322

At minimum cross-validation error (lambda=0.0492):

-------------------------------------------------

Nonzero coefficients: 52

Cross-validation error (deviance): 0.21

R-squared: 0.59

Signal-to-noise ratio: 1.45

Scale estimate (sigma): 0.455

Patrick Breheny University of Iowa High-Dimensional Data Analysis (BIOS 7240) 13 / 22

Page 14: Bias reduction: Case studies · MCP/SCAD penalized regression models; its syntax is fairly similar to glmnet Let’s t two penalized regression models to the BRCA1 data, one with

Breast cancer gene expression study, revisitedWHO-ARI study

Adaptive lassoMCP/SCAD

Remarks

• For both models, the minimum error is CV = 0.21; very closeto, although slightly larger than the CV = 0.20 achieved bythe lasso

• However, the two models select very different numbers ofvariables, both compared to each other and compared to thelasso, which selected 96 nonzero coefficients

• The most striking difference between the two solution paths isthat for MCP with γ = 3, the the optimal solution occurs inthe region that is not locally convex

• As this is real data, we cannot know which estimates are moreaccurate, but personally, I would prefer the γ = 7 solution

Patrick Breheny University of Iowa High-Dimensional Data Analysis (BIOS 7240) 14 / 22

Page 15: Bias reduction: Case studies · MCP/SCAD penalized regression models; its syntax is fairly similar to glmnet Let’s t two penalized regression models to the BRCA1 data, one with

Breast cancer gene expression study, revisitedWHO-ARI study

Adaptive lassoMCP/SCAD

SCAD

Finally, let us fit a SCAD-penalized regression model to this data;similar to the MCP case, we set γ = 8 here to increase the stabilityof the solution path:

> cvfit <- cv.ncvreg(X, y, gamma=8, penalty='SCAD')

> summary(cvfit)

SCAD-penalized linear regression with n=536, p=17322

At minimum cross-validation error (lambda=0.0478):

-------------------------------------------------

Nonzero coefficients: 79

Cross-validation error (deviance): 0.20

R-squared: 0.61

Signal-to-noise ratio: 1.53

Scale estimate (sigma): 0.447

Patrick Breheny University of Iowa High-Dimensional Data Analysis (BIOS 7240) 15 / 22

Page 16: Bias reduction: Case studies · MCP/SCAD penalized regression models; its syntax is fairly similar to glmnet Let’s t two penalized regression models to the BRCA1 data, one with

Breast cancer gene expression study, revisitedWHO-ARI study

Adaptive lassoMCP/SCAD

Results: SCAD (γ = 8)

The SCAD results are more lasso-like than that of the MCPmodels, as one would expect from the fact that the SCAD andlasso penalties are more similar

−1.0 −2.0 −3.0 −4.0

−0.2

−0.1

0.0

0.1

0.2

0.3

0.4

0.5

log(λ)

β̂

−1.0 −2.0 −3.0 −4.0

0.20

0.25

0.30

0.35

0.40

0.45

0.50

0.55

log(λ)

Cro

ss−

valid

atio

n er

ror ●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 5 7 13 25 48 82 160Variables selected

Patrick Breheny University of Iowa High-Dimensional Data Analysis (BIOS 7240) 16 / 22

Page 17: Bias reduction: Case studies · MCP/SCAD penalized regression models; its syntax is fairly similar to glmnet Let’s t two penalized regression models to the BRCA1 data, one with

Breast cancer gene expression study, revisitedWHO-ARI study

Adaptive lassoMCP/SCAD

Remarks

• This is just one example, but these results seen are fairlyrepresentative, in my experience

• The prediction performance (as estimated by cross-validation)is typically similar between MCP/SCAD/lasso, but there canbe substantial differences in terms of the estimates themselves

• The main advantage in practice of MCP (or SCAD) is theability to achieve that prediction performance using fewerfeatures

• Finally, the results of SCAD are almost always in betweenthose of MCP and lasso

Patrick Breheny University of Iowa High-Dimensional Data Analysis (BIOS 7240) 17 / 22

Page 18: Bias reduction: Case studies · MCP/SCAD penalized regression models; its syntax is fairly similar to glmnet Let’s t two penalized regression models to the BRCA1 data, one with

Breast cancer gene expression study, revisitedWHO-ARI study

WHO-ARI: MCP

• Let us also revisit the WHO study of acute respiratory illness,which you have looked at a few times in your homeworkassignments

• Let us fit an MCP-penalized regression model to this datausing γ = 6 and compare it to the fit of the lasso:

cvfit.mcp <- cv.ncvreg(XX, y, gam=6, seed=2)

cvfit.las <- cv.ncvreg(XX, y, penalty="lasso", seed=2)

• In making these kinds of comparisons, it is helpful to keep theCV fold assignments the same across methods, otherwise yourisk mistaking the effect of different folds for the effect of thepenalty

Patrick Breheny University of Iowa High-Dimensional Data Analysis (BIOS 7240) 18 / 22

Page 19: Bias reduction: Case studies · MCP/SCAD penalized regression models; its syntax is fairly similar to glmnet Let’s t two penalized regression models to the BRCA1 data, one with

Breast cancer gene expression study, revisitedWHO-ARI study

Results: CV

−1 −3 −5 −7

0.0

0.1

0.2

0.3

0.4

0.5

log(λ)

R2

●●●●●●●●

●●●●●

●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 5 7 20 39 57 63Variables selected

MCP

−1 −3 −5 −7

0.0

0.1

0.2

0.3

0.4

0.5

log(λ)

R2

●●●●●●●●●

●●●●

●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 8 12 27 48 57 64Variables selected

Lasso

Patrick Breheny University of Iowa High-Dimensional Data Analysis (BIOS 7240) 19 / 22

Page 20: Bias reduction: Case studies · MCP/SCAD penalized regression models; its syntax is fairly similar to glmnet Let’s t two penalized regression models to the BRCA1 data, one with

Breast cancer gene expression study, revisitedWHO-ARI study

Results: Coefficient path

−0.5 −1.5 −2.5 −3.5

−0.4

−0.2

0.0

0.2

0.4

log(λ)

β̂

MCP

−0.5 −1.5 −2.5 −3.5

−0.4

−0.2

0.0

0.2

0.4

log(λ)

β̂

Lasso

Patrick Breheny University of Iowa High-Dimensional Data Analysis (BIOS 7240) 20 / 22

Page 21: Bias reduction: Case studies · MCP/SCAD penalized regression models; its syntax is fairly similar to glmnet Let’s t two penalized regression models to the BRCA1 data, one with

Breast cancer gene expression study, revisitedWHO-ARI study

Summary: MCP

> summary(cvfit.mcp)

MCP-penalized linear regression with n=816, p=67

At minimum cross-validation error (lambda=0.0347):

-------------------------------------------------

Nonzero coefficients: 27

Cross-validation error (deviance): 1.23

R-squared: 0.39

Signal-to-noise ratio: 0.65

Scale estimate (sigma): 1.111

Patrick Breheny University of Iowa High-Dimensional Data Analysis (BIOS 7240) 21 / 22

Page 22: Bias reduction: Case studies · MCP/SCAD penalized regression models; its syntax is fairly similar to glmnet Let’s t two penalized regression models to the BRCA1 data, one with

Breast cancer gene expression study, revisitedWHO-ARI study

Summary: Lasso

> summary(cvfit.las)

lasso-penalized linear regression with n=816, p=67

At minimum cross-validation error (lambda=0.0228):

-------------------------------------------------

Nonzero coefficients: 39

Cross-validation error (deviance): 1.22

R-squared: 0.40

Signal-to-noise ratio: 0.67

Scale estimate (sigma): 1.104

Patrick Breheny University of Iowa High-Dimensional Data Analysis (BIOS 7240) 22 / 22