high-dimensional variable selection in nonlinear models ... · lucas janson harvard university...
TRANSCRIPT
High-Dimensional Variable Selection in NonlinearModels that Controls the False Discovery Rate
Lucas Janson
Harvard University Department of Statisticsblank lineblank line
CMSA Big Data Conference, August 18, 2017
Collaborators: Emmanuel Candes (Stanford), Yingying Fan, Jinchi Lv (USC)
Problem Statement
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 0 / 18
Controlled Variable Selection
Given:
Y an outcome of interest (AKA response or dependent variable),
X1, . . . , Xp a set of p potential explanatory variables (AKA covariates,features, or independent variables),
How can we select important explanatory variables with few mistakes?
Applications to:
Medicine/genetics/health care
Economics/political science
Industry/technology
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 1 / 18
Controlled Variable Selection
Given:
Y an outcome of interest (AKA response or dependent variable),
X1, . . . , Xp a set of p potential explanatory variables (AKA covariates,features, or independent variables),
How can we select important explanatory variables with few mistakes?
Applications to:
Medicine/genetics/health care
Economics/political science
Industry/technology
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 1 / 18
Controlled Variable Selection
Given:
Y an outcome of interest (AKA response or dependent variable),
X1, . . . , Xp a set of p potential explanatory variables (AKA covariates,features, or independent variables),
How can we select important explanatory variables with few mistakes?
Applications to:
Medicine/genetics/health care
Economics/political science
Industry/technology
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 1 / 18
Controlled Variable Selection
Given:
Y an outcome of interest (AKA response or dependent variable),
X1, . . . , Xp a set of p potential explanatory variables (AKA covariates,features, or independent variables),
How can we select important explanatory variables with few mistakes?
Applications to:
Medicine/genetics/health care
Economics/political science
Industry/technology
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 1 / 18
Controlled Variable Selection (cont’d)
What is an important variable?
We consider Xj to be unimportant if the conditional distribution of Y givenX1, . . . , Xp does not depend on Xj . Formally, Xj is unimportant if it isconditionally independent of Y given X-j :
Y ⊥⊥ Xj |X-j
Markov Blanket of Y : smallest set S such that Y ⊥⊥ X-S |XS
For GLMs with no stochastically redundant covariates, equivalent to {j : βj = 0}
To make sure we do not make too many mistakes, we seek to select a set S tocontrol the false discovery rate (FDR):
FDR(S) = E(#{j in S : Xj unimportant}
#{j in S}
)≤ q (e.g. 10%)
“Here is a set of variables S, 90% of which I expect to be important”
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 2 / 18
Controlled Variable Selection (cont’d)
What is an important variable?
We consider Xj to be unimportant if the conditional distribution of Y givenX1, . . . , Xp does not depend on Xj . Formally, Xj is unimportant if it isconditionally independent of Y given X-j :
Y ⊥⊥ Xj |X-j
Markov Blanket of Y : smallest set S such that Y ⊥⊥ X-S |XS
For GLMs with no stochastically redundant covariates, equivalent to {j : βj = 0}
To make sure we do not make too many mistakes, we seek to select a set S tocontrol the false discovery rate (FDR):
FDR(S) = E(#{j in S : Xj unimportant}
#{j in S}
)≤ q (e.g. 10%)
“Here is a set of variables S, 90% of which I expect to be important”
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 2 / 18
Controlled Variable Selection (cont’d)
What is an important variable?
We consider Xj to be unimportant if the conditional distribution of Y givenX1, . . . , Xp does not depend on Xj . Formally, Xj is unimportant if it isconditionally independent of Y given X-j :
Y ⊥⊥ Xj |X-j
Markov Blanket of Y : smallest set S such that Y ⊥⊥ X-S |XS
For GLMs with no stochastically redundant covariates, equivalent to {j : βj = 0}
To make sure we do not make too many mistakes, we seek to select a set S tocontrol the false discovery rate (FDR):
FDR(S) = E(#{j in S : Xj unimportant}
#{j in S}
)≤ q (e.g. 10%)
“Here is a set of variables S, 90% of which I expect to be important”
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 2 / 18
Controlled Variable Selection (cont’d)
What is an important variable?
We consider Xj to be unimportant if the conditional distribution of Y givenX1, . . . , Xp does not depend on Xj . Formally, Xj is unimportant if it isconditionally independent of Y given X-j :
Y ⊥⊥ Xj |X-j
Markov Blanket of Y : smallest set S such that Y ⊥⊥ X-S |XS
For GLMs with no stochastically redundant covariates, equivalent to {j : βj = 0}
To make sure we do not make too many mistakes, we seek to select a set S tocontrol the false discovery rate (FDR):
FDR(S) = E(#{j in S : Xj unimportant}
#{j in S}
)≤ q (e.g. 10%)
“Here is a set of variables S, 90% of which I expect to be important”
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 2 / 18
Controlled Variable Selection (cont’d)
What is an important variable?
We consider Xj to be unimportant if the conditional distribution of Y givenX1, . . . , Xp does not depend on Xj . Formally, Xj is unimportant if it isconditionally independent of Y given X-j :
Y ⊥⊥ Xj |X-j
Markov Blanket of Y : smallest set S such that Y ⊥⊥ X-S |XS
For GLMs with no stochastically redundant covariates, equivalent to {j : βj = 0}
To make sure we do not make too many mistakes, we seek to select a set S tocontrol the false discovery rate (FDR):
FDR(S) = E(#{j in S : Xj unimportant}
#{j in S}
)≤ q (e.g. 10%)
“Here is a set of variables S, 90% of which I expect to be important”Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 2 / 18
Sneak Peak
New interpretation of knockoffs solves the controlled variable selection problem
Allows any model for Y and X1, . . . , Xp
Allows any dimension (including p > n)
Finite-sample control (non-asymptotic) of FDR
Practical performance on real problems
Analysis of the genetic basis of Crohn’s Disease (WTCCC, 2007)
≈ 5, 000 subjects (≈ 40% with Crohn’s Disease)
≈ 375, 000 single nucleotide polymorphisms (SNPs) for each subject
Original analysis of the data made 9 discoveries by running marginal tests andselecting p-values to target a FDR of 10%
Model-free knockoffs used the same FDR of 10% and made 18 discoveries, withmany of the new discoveries confirmed by a larger meta-analysis
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 3 / 18
Sneak Peak
New interpretation of knockoffs solves the controlled variable selection problem
Allows any model for Y and X1, . . . , Xp
Allows any dimension (including p > n)
Finite-sample control (non-asymptotic) of FDR
Practical performance on real problems
Analysis of the genetic basis of Crohn’s Disease (WTCCC, 2007)
≈ 5, 000 subjects (≈ 40% with Crohn’s Disease)
≈ 375, 000 single nucleotide polymorphisms (SNPs) for each subject
Original analysis of the data made 9 discoveries by running marginal tests andselecting p-values to target a FDR of 10%
Model-free knockoffs used the same FDR of 10% and made 18 discoveries, withmany of the new discoveries confirmed by a larger meta-analysis
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 3 / 18
Sneak Peak
New interpretation of knockoffs solves the controlled variable selection problem
Allows any model for Y and X1, . . . , Xp
Allows any dimension (including p > n)
Finite-sample control (non-asymptotic) of FDR
Practical performance on real problems
Analysis of the genetic basis of Crohn’s Disease (WTCCC, 2007)
≈ 5, 000 subjects (≈ 40% with Crohn’s Disease)
≈ 375, 000 single nucleotide polymorphisms (SNPs) for each subject
Original analysis of the data made 9 discoveries by running marginal tests andselecting p-values to target a FDR of 10%
Model-free knockoffs used the same FDR of 10% and made 18 discoveries, withmany of the new discoveries confirmed by a larger meta-analysis
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 3 / 18
Sneak Peak
New interpretation of knockoffs solves the controlled variable selection problem
Allows any model for Y and X1, . . . , Xp
Allows any dimension (including p > n)
Finite-sample control (non-asymptotic) of FDR
Practical performance on real problems
Analysis of the genetic basis of Crohn’s Disease (WTCCC, 2007)
≈ 5, 000 subjects (≈ 40% with Crohn’s Disease)
≈ 375, 000 single nucleotide polymorphisms (SNPs) for each subject
Original analysis of the data made 9 discoveries by running marginal tests andselecting p-values to target a FDR of 10%
Model-free knockoffs used the same FDR of 10% and made 18 discoveries, withmany of the new discoveries confirmed by a larger meta-analysis
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 3 / 18
Review of Methods for Controlled Variable Selection
What is required for valid inference?
Lowdimensions
Model forY
Asymptopicregime Sparsity
Randomdesign
OLSp+BHq Yes Yes No No No
MLp+BHq Yes Yes Yes No No
HDp+BHq No Yes Yes Yes Yes
Orig KnO Yes Yes No No No
New KnO No No No No Yes*
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 4 / 18
Review of Methods for Controlled Variable Selection
What is required for valid inference?
Lowdimensions
Model forY
Asymptopicregime Sparsity
Randomdesign
OLSp+BHq Yes Yes No No No
MLp+BHq Yes Yes Yes No No
HDp+BHq No Yes Yes Yes Yes
Orig KnO Yes Yes No No No
New KnO No No No No Yes*
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 4 / 18
Review of Methods for Controlled Variable Selection
What is required for valid inference?
Lowdimensions
Model forY
Asymptopicregime Sparsity
Randomdesign
OLSp+BHq Yes Yes No No No
MLp+BHq Yes Yes Yes No No
HDp+BHq No Yes Yes Yes Yes
Orig KnO Yes Yes No No No
New KnO No No No No Yes*
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 4 / 18
Review of Methods for Controlled Variable Selection
What is required for valid inference?
Lowdimensions
Model forY
Asymptopicregime Sparsity
Randomdesign
OLSp+BHq Yes Yes No No No
MLp+BHq Yes Yes Yes No No
HDp+BHq No Yes Yes Yes Yes
Orig KnO Yes Yes No No No
New KnO No No No No Yes*
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 4 / 18
Review of Methods for Controlled Variable Selection
What is required for valid inference?
Lowdimensions
Model forY
Asymptopicregime Sparsity
Randomdesign
OLSp+BHq Yes Yes No No No
MLp+BHq Yes Yes Yes No No
HDp+BHq No Yes Yes Yes Yes
Orig KnO Yes Yes No No No
New KnO No No No No Yes*
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 4 / 18
The Knockoffs Idea
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 4 / 18
Knockoffs (Barber and Candes, 2015)
y and Xj are n× 1 column vectors of data: n draws from the random variablesY and Xj , respectively; design matrix X := [X1 · · ·Xp]
(1) Construct knockoffs: Knockoffs Xj must satisfy, (X := [X1 · · · Xp])
[X X]>[X X] =
[X>X X>X − diag{s}
X>X − diag{s} X>X
](2) Compute knockoff statistics:
Sufficiency: Wj only a function of [X X]>[X X] and [X X]>yAntisymmetry: swapping values of Xj and Xj flips sign of Wj
(3) Find the knockoff threshold:Order the variables by decreasing |Wj | and proceed down listSelect only variables with positive Wj until last time negatives
positives≤ q
Comments:
Finite-sample FDR control and leverages sparsity for power
Requires data follow low-dimensional (n ≥ p) Gaussian linear model
Canonical approach: condition on X, rely heavily on model for y
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 5 / 18
Knockoffs (Barber and Candes, 2015)
y and Xj are n× 1 column vectors of data: n draws from the random variablesY and Xj , respectively; design matrix X := [X1 · · ·Xp]
(1) Construct knockoffs: Knockoffs Xj must satisfy, (X := [X1 · · · Xp])
[X X]>[X X] =
[X>X X>X − diag{s}
X>X − diag{s} X>X
]
(2) Compute knockoff statistics:Sufficiency: Wj only a function of [X X]>[X X] and [X X]>yAntisymmetry: swapping values of Xj and Xj flips sign of Wj
(3) Find the knockoff threshold:Order the variables by decreasing |Wj | and proceed down listSelect only variables with positive Wj until last time negatives
positives≤ q
Comments:
Finite-sample FDR control and leverages sparsity for power
Requires data follow low-dimensional (n ≥ p) Gaussian linear model
Canonical approach: condition on X, rely heavily on model for y
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 5 / 18
Knockoffs (Barber and Candes, 2015)
y and Xj are n× 1 column vectors of data: n draws from the random variablesY and Xj , respectively; design matrix X := [X1 · · ·Xp]
(1) Construct knockoffs: Knockoffs Xj must satisfy, (X := [X1 · · · Xp])
[X X]>[X X] =
[X>X X>X − diag{s}
X>X − diag{s} X>X
](2) Compute knockoff statistics:
Sufficiency: Wj only a function of [X X]>[X X] and [X X]>yAntisymmetry: swapping values of Xj and Xj flips sign of Wj
(3) Find the knockoff threshold:Order the variables by decreasing |Wj | and proceed down listSelect only variables with positive Wj until last time negatives
positives≤ q
Comments:
Finite-sample FDR control and leverages sparsity for power
Requires data follow low-dimensional (n ≥ p) Gaussian linear model
Canonical approach: condition on X, rely heavily on model for y
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 5 / 18
Knockoffs (Barber and Candes, 2015)
y and Xj are n× 1 column vectors of data: n draws from the random variablesY and Xj , respectively; design matrix X := [X1 · · ·Xp]
(1) Construct knockoffs: Knockoffs Xj must satisfy, (X := [X1 · · · Xp])
[X X]>[X X] =
[X>X X>X − diag{s}
X>X − diag{s} X>X
](2) Compute knockoff statistics:
Sufficiency: Wj only a function of [X X]>[X X] and [X X]>yAntisymmetry: swapping values of Xj and Xj flips sign of Wj
(3) Find the knockoff threshold:Order the variables by decreasing |Wj | and proceed down listSelect only variables with positive Wj until last time negatives
positives≤ q
Comments:
Finite-sample FDR control and leverages sparsity for power
Requires data follow low-dimensional (n ≥ p) Gaussian linear model
Canonical approach: condition on X, rely heavily on model for y
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 5 / 18
Knockoffs (Barber and Candes, 2015)
y and Xj are n× 1 column vectors of data: n draws from the random variablesY and Xj , respectively; design matrix X := [X1 · · ·Xp]
(1) Construct knockoffs: Knockoffs Xj must satisfy, (X := [X1 · · · Xp])
[X X]>[X X] =
[X>X X>X − diag{s}
X>X − diag{s} X>X
](2) Compute knockoff statistics:
Sufficiency: Wj only a function of [X X]>[X X] and [X X]>yAntisymmetry: swapping values of Xj and Xj flips sign of Wj
(3) Find the knockoff threshold:Order the variables by decreasing |Wj | and proceed down listSelect only variables with positive Wj until last time negatives
positives≤ q
Comments:
Finite-sample FDR control and leverages sparsity for power
Requires data follow low-dimensional (n ≥ p) Gaussian linear model
Canonical approach: condition on X, rely heavily on model for y
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 5 / 18
Generalizing the Knockoffs Procedure
(1) Construct knockoffs:
Artificial versions (“knockoffs”) of each variableAct as controls for assessing importance of original variables
(2) Compute knockoff statistics:
Scalar statistic Wj for each variableMeasures how much more important a variable appears than its knockoffPositive Wj denotes original more important, strength measured by magnitude
(3) Find the knockoff threshold: (same as before)
Order the variables by decreasing |Wj | and proceed down listSelect only variables with positive Wj until last time negatives
positives≤ q
Coin-flipping property: The key to knockoffs is that steps (1) and (2) are donespecifically to ensure that, conditional on |W1|, . . . , |Wp|, the signs of theunimportant/null Wj are independently ±1 with probability 1/2
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 6 / 18
Generalizing the Knockoffs Procedure
(1) Construct knockoffs:
Artificial versions (“knockoffs”) of each variableAct as controls for assessing importance of original variables
(2) Compute knockoff statistics:
Scalar statistic Wj for each variableMeasures how much more important a variable appears than its knockoffPositive Wj denotes original more important, strength measured by magnitude
(3) Find the knockoff threshold: (same as before)
Order the variables by decreasing |Wj | and proceed down listSelect only variables with positive Wj until last time negatives
positives≤ q
Coin-flipping property: The key to knockoffs is that steps (1) and (2) are donespecifically to ensure that, conditional on |W1|, . . . , |Wp|, the signs of theunimportant/null Wj are independently ±1 with probability 1/2
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 6 / 18
Generalizing the Knockoffs Procedure
(1) Construct knockoffs:
Artificial versions (“knockoffs”) of each variableAct as controls for assessing importance of original variables
(2) Compute knockoff statistics:
Scalar statistic Wj for each variableMeasures how much more important a variable appears than its knockoffPositive Wj denotes original more important, strength measured by magnitude
(3) Find the knockoff threshold: (same as before)
Order the variables by decreasing |Wj | and proceed down listSelect only variables with positive Wj until last time negatives
positives≤ q
Coin-flipping property: The key to knockoffs is that steps (1) and (2) are donespecifically to ensure that, conditional on |W1|, . . . , |Wp|, the signs of theunimportant/null Wj are independently ±1 with probability 1/2
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 6 / 18
Generalizing the Knockoffs Procedure
(1) Construct knockoffs:
Artificial versions (“knockoffs”) of each variableAct as controls for assessing importance of original variables
(2) Compute knockoff statistics:
Scalar statistic Wj for each variableMeasures how much more important a variable appears than its knockoffPositive Wj denotes original more important, strength measured by magnitude
(3) Find the knockoff threshold: (same as before)
Order the variables by decreasing |Wj | and proceed down listSelect only variables with positive Wj until last time negatives
positives≤ q
Coin-flipping property: The key to knockoffs is that steps (1) and (2) are donespecifically to ensure that, conditional on |W1|, . . . , |Wp|, the signs of theunimportant/null Wj are independently ±1 with probability 1/2
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 6 / 18
New Interpretation of Knockoffs
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 6 / 18
Knockoffs Without a Model for Y (Candes et al., 2016)
Instead of modeling y and conditioning on X, condition on y and model X(shifts the burden of knowledge from y onto X)
Explicitly,
rows of X = (Xi,1, . . . , Xi,p)iid∼ G
where G can be arbitrary but is assumed known
As compared to original knockoffs, removes
Restriction on dimensionLinear model requirement for Y |X1, . . . , Xp
“Sufficiency” constraint for Wj
The rows of X must be i.i.d., not the columns (covariates)
Nothing about y’s distribution is assumed or need be known
Robust to overfitting X’s distribution in preliminary experiments
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 7 / 18
Knockoffs Without a Model for Y (Candes et al., 2016)
Instead of modeling y and conditioning on X, condition on y and model X(shifts the burden of knowledge from y onto X)
Explicitly,
rows of X = (Xi,1, . . . , Xi,p)iid∼ G
where G can be arbitrary but is assumed known
As compared to original knockoffs, removes
Restriction on dimensionLinear model requirement for Y |X1, . . . , Xp
“Sufficiency” constraint for Wj
The rows of X must be i.i.d., not the columns (covariates)
Nothing about y’s distribution is assumed or need be known
Robust to overfitting X’s distribution in preliminary experiments
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 7 / 18
Knockoffs Without a Model for Y (Candes et al., 2016)
Instead of modeling y and conditioning on X, condition on y and model X(shifts the burden of knowledge from y onto X)
Explicitly,
rows of X = (Xi,1, . . . , Xi,p)iid∼ G
where G can be arbitrary but is assumed known
As compared to original knockoffs, removes
Restriction on dimensionLinear model requirement for Y |X1, . . . , Xp
“Sufficiency” constraint for Wj
The rows of X must be i.i.d., not the columns (covariates)
Nothing about y’s distribution is assumed or need be known
Robust to overfitting X’s distribution in preliminary experiments
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 7 / 18
Knockoffs Without a Model for Y (Candes et al., 2016)
Instead of modeling y and conditioning on X, condition on y and model X(shifts the burden of knowledge from y onto X)
Explicitly,
rows of X = (Xi,1, . . . , Xi,p)iid∼ G
where G can be arbitrary but is assumed known
As compared to original knockoffs, removes
Restriction on dimensionLinear model requirement for Y |X1, . . . , Xp
“Sufficiency” constraint for Wj
The rows of X must be i.i.d., not the columns (covariates)
Nothing about y’s distribution is assumed or need be known
Robust to overfitting X’s distribution in preliminary experiments
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 7 / 18
Knockoffs Without a Model for Y (Candes et al., 2016)
Instead of modeling y and conditioning on X, condition on y and model X(shifts the burden of knowledge from y onto X)
Explicitly,
rows of X = (Xi,1, . . . , Xi,p)iid∼ G
where G can be arbitrary but is assumed known
As compared to original knockoffs, removes
Restriction on dimensionLinear model requirement for Y |X1, . . . , Xp
“Sufficiency” constraint for Wj
The rows of X must be i.i.d., not the columns (covariates)
Nothing about y’s distribution is assumed or need be known
Robust to overfitting X’s distribution in preliminary experiments
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 7 / 18
Knockoffs Without a Model for Y (Candes et al., 2016)
Instead of modeling y and conditioning on X, condition on y and model X(shifts the burden of knowledge from y onto X)
Explicitly,
rows of X = (Xi,1, . . . , Xi,p)iid∼ G
where G can be arbitrary but is assumed known
As compared to original knockoffs, removes
Restriction on dimensionLinear model requirement for Y |X1, . . . , Xp
“Sufficiency” constraint for Wj
The rows of X must be i.i.d., not the columns (covariates)
Nothing about y’s distribution is assumed or need be known
Robust to overfitting X’s distribution in preliminary experiments
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 7 / 18
Robustness
●
Exact Cov
0.00
0.25
0.50
0.75
1.00
0.0 0.5 1.0Relative Frobenius Norm Error
Pow
er
●
0.00
0.25
0.50
0.75
1.00
0.0 0.5 1.0Relative Frobenius Norm Error
FD
R
Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500,and target FDR is 10%. Y comes from a binomial linear model with logit link functionwith 50 nonzero entries.
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 8 / 18
Robustness
● ●
Exact Cov
Graph. Lasso
0.00
0.25
0.50
0.75
1.00
0.0 0.5 1.0Relative Frobenius Norm Error
Pow
er
● ●
0.00
0.25
0.50
0.75
1.00
0.0 0.5 1.0Relative Frobenius Norm Error
FD
R
Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500,and target FDR is 10%. Y comes from a binomial linear model with logit link functionwith 50 nonzero entries.
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 8 / 18
Robustness
● ●●
Exact Cov
Graph. Lasso50% Emp. Cov
0.00
0.25
0.50
0.75
1.00
0.0 0.5 1.0Relative Frobenius Norm Error
Pow
er
● ● ●
0.00
0.25
0.50
0.75
1.00
0.0 0.5 1.0Relative Frobenius Norm Error
FD
R
Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500,and target FDR is 10%. Y comes from a binomial linear model with logit link functionwith 50 nonzero entries.
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 8 / 18
Robustness
● ●●
●
Exact Cov
Graph. Lasso50% Emp. Cov
62.5% Emp. Cov
0.00
0.25
0.50
0.75
1.00
0.0 0.5 1.0Relative Frobenius Norm Error
Pow
er
● ● ●●
0.00
0.25
0.50
0.75
1.00
0.0 0.5 1.0Relative Frobenius Norm Error
FD
R
Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500,and target FDR is 10%. Y comes from a binomial linear model with logit link functionwith 50 nonzero entries.
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 8 / 18
Robustness
● ●●
●
●
Exact Cov
Graph. Lasso50% Emp. Cov
62.5% Emp. Cov
75% Emp. Cov
0.00
0.25
0.50
0.75
1.00
0.0 0.5 1.0Relative Frobenius Norm Error
Pow
er
● ● ●●
●
0.00
0.25
0.50
0.75
1.00
0.0 0.5 1.0Relative Frobenius Norm Error
FD
R
Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500,and target FDR is 10%. Y comes from a binomial linear model with logit link functionwith 50 nonzero entries.
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 8 / 18
Robustness
● ●●
●
●
●
Exact Cov
Graph. Lasso50% Emp. Cov
62.5% Emp. Cov
75% Emp. Cov
87.5% Emp. Cov
0.00
0.25
0.50
0.75
1.00
0.0 0.5 1.0Relative Frobenius Norm Error
Pow
er
● ● ●●
●
●
0.00
0.25
0.50
0.75
1.00
0.0 0.5 1.0Relative Frobenius Norm Error
FD
R
Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500,and target FDR is 10%. Y comes from a binomial linear model with logit link functionwith 50 nonzero entries.
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 8 / 18
Robustness
● ●●
●
●
●
●
Exact Cov
Graph. Lasso50% Emp. Cov
62.5% Emp. Cov
75% Emp. Cov
87.5% Emp. Cov
100% Emp. Cov0.00
0.25
0.50
0.75
1.00
0.0 0.5 1.0Relative Frobenius Norm Error
Pow
er
● ● ●●
●
●●0.00
0.25
0.50
0.75
1.00
0.0 0.5 1.0Relative Frobenius Norm Error
FD
R
Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500,and target FDR is 10%. Y comes from a binomial linear model with logit link functionwith 50 nonzero entries.
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 8 / 18
Shifting the Burden of Knowledge
When is it appropriate?
1. Subjects sampled from a population, and
2a. Xj highly structured, well-studied, or well-understood, OR
2b. Large set of unsupervised X data (without Y ’s)
For instance, many genome-wide association studies satisfy all conditions:
1. Subjects sampled from a population (oversampling cases still valid)
2a. Strong spatial structure: linkage disequilibrium models, e.g., Markov chains,are well-studied and work well
2b. Other studies have collected same or similar SNP arrays on different subjects
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 9 / 18
Shifting the Burden of Knowledge
When is it appropriate?
1. Subjects sampled from a population, and
2a. Xj highly structured, well-studied, or well-understood, OR
2b. Large set of unsupervised X data (without Y ’s)
For instance, many genome-wide association studies satisfy all conditions:
1. Subjects sampled from a population (oversampling cases still valid)
2a. Strong spatial structure: linkage disequilibrium models, e.g., Markov chains,are well-studied and work well
2b. Other studies have collected same or similar SNP arrays on different subjects
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 9 / 18
Shifting the Burden of Knowledge
When is it appropriate?
1. Subjects sampled from a population, and
2a. Xj highly structured, well-studied, or well-understood, OR
2b. Large set of unsupervised X data (without Y ’s)
For instance, many genome-wide association studies satisfy all conditions:
1. Subjects sampled from a population (oversampling cases still valid)
2a. Strong spatial structure: linkage disequilibrium models, e.g., Markov chains,are well-studied and work well
2b. Other studies have collected same or similar SNP arrays on different subjects
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 9 / 18
Shifting the Burden of Knowledge
When is it appropriate?
1. Subjects sampled from a population, and
2a. Xj highly structured, well-studied, or well-understood, OR
2b. Large set of unsupervised X data (without Y ’s)
For instance, many genome-wide association studies satisfy all conditions:
1. Subjects sampled from a population (oversampling cases still valid)
2a. Strong spatial structure: linkage disequilibrium models, e.g., Markov chains,are well-studied and work well
2b. Other studies have collected same or similar SNP arrays on different subjects
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 9 / 18
Shifting the Burden of Knowledge
When is it appropriate?
1. Subjects sampled from a population, and
2a. Xj highly structured, well-studied, or well-understood, OR
2b. Large set of unsupervised X data (without Y ’s)
For instance, many genome-wide association studies satisfy all conditions:
1. Subjects sampled from a population (oversampling cases still valid)
2a. Strong spatial structure: linkage disequilibrium models, e.g., Markov chains,are well-studied and work well
2b. Other studies have collected same or similar SNP arrays on different subjects
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 9 / 18
The New Knockoffs Procedure
(1) Construct knockoffs: Exchangeability
[X1 ···Xj ···Xp X1 ··· Xj ··· Xp]D= [X1 ··· Xj ···Xp X1 ···Xj ··· Xp]
(2) Compute knockoff statistics:
Variable importance measure ZAntisymmetric function fj : R2 → R, i.e.,
fj(z1, z2) = −fj(z2, z1)
Wj = fj(Zj , Zj), where Zj and Zj are the variable importances of Xj andXj , respectively
(3) Find the knockoff threshold: (same as before)
Order the variables by decreasing |Wj | and proceed down listSelect only variables with positive Wj until last time negatives
positives≤ q
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 10 / 18
The New Knockoffs Procedure
(1) Construct knockoffs: Exchangeability
[X1 ···Xj ···Xp X1 ··· Xj ··· Xp]D= [X1 ··· Xj ···Xp X1 ···Xj ··· Xp]
(2) Compute knockoff statistics:
Variable importance measure ZAntisymmetric function fj : R2 → R, i.e.,
fj(z1, z2) = −fj(z2, z1)
Wj = fj(Zj , Zj), where Zj and Zj are the variable importances of Xj andXj , respectively
(3) Find the knockoff threshold: (same as before)
Order the variables by decreasing |Wj | and proceed down listSelect only variables with positive Wj until last time negatives
positives≤ q
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 10 / 18
The New Knockoffs Procedure
(1) Construct knockoffs: Exchangeability
[X1 ···Xj ···Xp X1 ··· Xj ··· Xp]D= [X1 ··· Xj ···Xp X1 ···Xj ··· Xp]
(2) Compute knockoff statistics:
Variable importance measure ZAntisymmetric function fj : R2 → R, i.e.,
fj(z1, z2) = −fj(z2, z1)
Wj = fj(Zj , Zj), where Zj and Zj are the variable importances of Xj andXj , respectively
(3) Find the knockoff threshold: (same as before)
Order the variables by decreasing |Wj | and proceed down listSelect only variables with positive Wj until last time negatives
positives≤ q
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 10 / 18
Step (1): Construct Knockoffs
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 10 / 18
Knockoff Construction
Proof that valid knockoff variables can be generated for any X distribution
If (X1, . . . , Xp) multivariate Gaussian, exchangeability reduces to matching first
and second moments when Xj , Xj swapped
For Cov(X1, . . . , Xp) = Σ:
Cov(X1, . . . , Xp, X1, . . . , Xp) =
[Σ Σ− diag{s}
Σ− diag{s} Σ
]For non-Gaussian X, still second-order-correct approximate knockoffs
Linear algebra and semidefinite programming to find good s
Recently: construction for Markov chains and HMMs (Sesia et al., 2017)
Constructions also possible for grouped variables (Dai and Barber, 2016)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 11 / 18
Knockoff Construction
Proof that valid knockoff variables can be generated for any X distribution
If (X1, . . . , Xp) multivariate Gaussian, exchangeability reduces to matching first
and second moments when Xj , Xj swapped
For Cov(X1, . . . , Xp) = Σ:
Cov(X1, . . . , Xp, X1, . . . , Xp) =
[Σ Σ− diag{s}
Σ− diag{s} Σ
]For non-Gaussian X, still second-order-correct approximate knockoffs
Linear algebra and semidefinite programming to find good s
Recently: construction for Markov chains and HMMs (Sesia et al., 2017)
Constructions also possible for grouped variables (Dai and Barber, 2016)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 11 / 18
Knockoff Construction
Proof that valid knockoff variables can be generated for any X distribution
If (X1, . . . , Xp) multivariate Gaussian, exchangeability reduces to matching first
and second moments when Xj , Xj swapped
For Cov(X1, . . . , Xp) = Σ:
Cov(X1, . . . , Xp, X1, . . . , Xp) =
[Σ Σ− diag{s}
Σ− diag{s} Σ
]For non-Gaussian X, still second-order-correct approximate knockoffs
Linear algebra and semidefinite programming to find good s
Recently: construction for Markov chains and HMMs (Sesia et al., 2017)
Constructions also possible for grouped variables (Dai and Barber, 2016)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 11 / 18
Step (2): Compute Knockoff Statistics
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 11 / 18
Strategy for Choosing Knockoff Statistics
Recall Wj an antisymmetric function fj of Zj and Zj (the variable importances of
Xj and Xj , respectively):
Wj = fj(Zj , Zj) = −fj(Zj , Zj)
For example,
Z is magnitude of fitted coefficient β from a lasso regression of y on [X X]
fj(z1, z2) = z1 − z2Lasso Coefficient Difference (LCD) statistic:
Wj = |βj | − |βj |
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 12 / 18
Strategy for Choosing Knockoff Statistics
Recall Wj an antisymmetric function fj of Zj and Zj (the variable importances of
Xj and Xj , respectively):
Wj = fj(Zj , Zj) = −fj(Zj , Zj)
For example,
Z is magnitude of fitted coefficient β from a lasso regression of y on [X X]
fj(z1, z2) = z1 − z2
Lasso Coefficient Difference (LCD) statistic:
Wj = |βj | − |βj |
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 12 / 18
Strategy for Choosing Knockoff Statistics
Recall Wj an antisymmetric function fj of Zj and Zj (the variable importances of
Xj and Xj , respectively):
Wj = fj(Zj , Zj) = −fj(Zj , Zj)
For example,
Z is magnitude of fitted coefficient β from a lasso regression of y on [X X]
fj(z1, z2) = z1 − z2Lasso Coefficient Difference (LCD) statistic:
Wj = |βj | − |βj |
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 12 / 18
Exchangeability Endows Coin-Flipping
Recall exchangeability property: for any j,
[X1 ···Xj ···Xp X1 ··· Xj ··· Xp]
D= [X1 ··· Xj ···Xp X1 ···Xj ··· Xp]
Coin-flipping property for Wj : for any unimportant variable j,(Zj , Zj
):=(Zj
(y,[· · ·Xj · · · Xj · · ·
]), Zj
(y,[· · ·Xj · · · Xj · · ·
]))D=(Zj
(y,[· · · Xj · · ·Xj · · ·
]), Zj
(y,[· · · Xj · · ·Xj · · ·
]))=(Zj
(y,[· · ·Xj · · · Xj · · ·
]), Zj
(y,[· · ·Xj · · · Xj · · ·
]))=(Zj , Zj
)Wj = fj(Zj , Zj)
D= fj(Zj , Zj) = −fj(Zj , Zj) = −Wj
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18
Exchangeability Endows Coin-Flipping
Recall exchangeability property: for any j,
[X1 ···Xj ···Xp X1 ··· Xj ··· Xp]
D= [X1 ··· Xj ···Xp X1 ···Xj ··· Xp]
Coin-flipping property for Wj :
for any unimportant variable j,(Zj , Zj
):=(Zj
(y,[· · ·Xj · · · Xj · · ·
]), Zj
(y,[· · ·Xj · · · Xj · · ·
]))D=(Zj
(y,[· · · Xj · · ·Xj · · ·
]), Zj
(y,[· · · Xj · · ·Xj · · ·
]))=(Zj
(y,[· · ·Xj · · · Xj · · ·
]), Zj
(y,[· · ·Xj · · · Xj · · ·
]))=(Zj , Zj
)Wj = fj(Zj , Zj)
D= fj(Zj , Zj) = −fj(Zj , Zj) = −Wj
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18
Exchangeability Endows Coin-Flipping
Recall exchangeability property: for any j,
[X1 ···Xj ···Xp X1 ··· Xj ··· Xp]
D= [X1 ··· Xj ···Xp X1 ···Xj ··· Xp]
Coin-flipping property for Wj : for any unimportant variable j,(Zj , Zj
):=(Zj
(y,[· · ·Xj · · · Xj · · ·
]), Zj
(y,[· · ·Xj · · · Xj · · ·
]))
D=(Zj
(y,[· · · Xj · · ·Xj · · ·
]), Zj
(y,[· · · Xj · · ·Xj · · ·
]))=(Zj
(y,[· · ·Xj · · · Xj · · ·
]), Zj
(y,[· · ·Xj · · · Xj · · ·
]))=(Zj , Zj
)Wj = fj(Zj , Zj)
D= fj(Zj , Zj) = −fj(Zj , Zj) = −Wj
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18
Exchangeability Endows Coin-Flipping
Recall exchangeability property: for any j,
[X1 ···Xj ···Xp X1 ··· Xj ··· Xp]
D= [X1 ··· Xj ···Xp X1 ···Xj ··· Xp]
Coin-flipping property for Wj : for any unimportant variable j,(Zj , Zj
):=(Zj
(y,[· · ·Xj · · · Xj · · ·
]), Zj
(y,[· · ·Xj · · · Xj · · ·
]))D=(Zj
(y,[· · · Xj · · ·Xj · · ·
]), Zj
(y,[· · · Xj · · ·Xj · · ·
]))
=(Zj
(y,[· · ·Xj · · · Xj · · ·
]), Zj
(y,[· · ·Xj · · · Xj · · ·
]))=(Zj , Zj
)Wj = fj(Zj , Zj)
D= fj(Zj , Zj) = −fj(Zj , Zj) = −Wj
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18
Exchangeability Endows Coin-Flipping
Recall exchangeability property: for any j,
[X1 ···Xj ···Xp X1 ··· Xj ··· Xp]
D= [X1 ··· Xj ···Xp X1 ···Xj ··· Xp]
Coin-flipping property for Wj : for any unimportant variable j,(Zj , Zj
):=(Zj
(y,[· · ·Xj · · · Xj · · ·
]), Zj
(y,[· · ·Xj · · · Xj · · ·
]))D=(Zj
(y,[· · · Xj · · ·Xj · · ·
]), Zj
(y,[· · · Xj · · ·Xj · · ·
]))=(Zj
(y,[· · ·Xj · · · Xj · · ·
]), Zj
(y,[· · ·Xj · · · Xj · · ·
]))
=(Zj , Zj
)Wj = fj(Zj , Zj)
D= fj(Zj , Zj) = −fj(Zj , Zj) = −Wj
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18
Exchangeability Endows Coin-Flipping
Recall exchangeability property: for any j,
[X1 ···Xj ···Xp X1 ··· Xj ··· Xp]
D= [X1 ··· Xj ···Xp X1 ···Xj ··· Xp]
Coin-flipping property for Wj : for any unimportant variable j,(Zj , Zj
):=(Zj
(y,[· · ·Xj · · · Xj · · ·
]), Zj
(y,[· · ·Xj · · · Xj · · ·
]))D=(Zj
(y,[· · · Xj · · ·Xj · · ·
]), Zj
(y,[· · · Xj · · ·Xj · · ·
]))=(Zj
(y,[· · ·Xj · · · Xj · · ·
]), Zj
(y,[· · ·Xj · · · Xj · · ·
]))=(Zj , Zj
)
Wj = fj(Zj , Zj)D= fj(Zj , Zj) = −fj(Zj , Zj) = −Wj
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18
Exchangeability Endows Coin-Flipping
Recall exchangeability property: for any j,
[X1 ···Xj ···Xp X1 ··· Xj ··· Xp]
D= [X1 ··· Xj ···Xp X1 ···Xj ··· Xp]
Coin-flipping property for Wj : for any unimportant variable j,(Zj , Zj
):=(Zj
(y,[· · ·Xj · · · Xj · · ·
]), Zj
(y,[· · ·Xj · · · Xj · · ·
]))D=(Zj
(y,[· · · Xj · · ·Xj · · ·
]), Zj
(y,[· · · Xj · · ·Xj · · ·
]))=(Zj
(y,[· · ·Xj · · · Xj · · ·
]), Zj
(y,[· · ·Xj · · · Xj · · ·
]))=(Zj , Zj
)Wj = fj(Zj , Zj)
D= fj(Zj , Zj)
= −fj(Zj , Zj) = −Wj
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18
Exchangeability Endows Coin-Flipping
Recall exchangeability property: for any j,
[X1 ···Xj ···Xp X1 ··· Xj ··· Xp]
D= [X1 ··· Xj ···Xp X1 ···Xj ··· Xp]
Coin-flipping property for Wj : for any unimportant variable j,(Zj , Zj
):=(Zj
(y,[· · ·Xj · · · Xj · · ·
]), Zj
(y,[· · ·Xj · · · Xj · · ·
]))D=(Zj
(y,[· · · Xj · · ·Xj · · ·
]), Zj
(y,[· · · Xj · · ·Xj · · ·
]))=(Zj
(y,[· · ·Xj · · · Xj · · ·
]), Zj
(y,[· · ·Xj · · · Xj · · ·
]))=(Zj , Zj
)Wj = fj(Zj , Zj)
D= fj(Zj , Zj) = −fj(Zj , Zj) = −Wj
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18
Exchangeability Endows Coin-Flipping
Recall exchangeability property: for any j,
[X1 ···Xj ···Xp X1 ··· Xj ··· Xp]
D= [X1 ··· Xj ···Xp X1 ···Xj ··· Xp]
Coin-flipping property for Wj : for any unimportant variable j,(Zj , Zj
):=(Zj
(y,[· · ·Xj · · · Xj · · ·
]), Zj
(y,[· · ·Xj · · · Xj · · ·
]))D=(Zj
(y,[· · · Xj · · ·Xj · · ·
]), Zj
(y,[· · · Xj · · ·Xj · · ·
]))=(Zj
(y,[· · ·Xj · · · Xj · · ·
]), Zj
(y,[· · ·Xj · · · Xj · · ·
]))=(Zj , Zj
)Wj
D= −Wj
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18
Adaptivity and Prior Information in Wj
Recall LCD: Wj = |βj | − |βj |, where βj , βj come from `1-penalized regression
Adaptivity
Cross-validation (on [X X]) to choose the penalty parameter in LCD
Higher-level adaptivity: CV to choose best-fitting model for inference
− E.g., fit random forest and `1-penalized regression; derive feature importancefrom whichever has lower CV error—still strict FDR control
Can even let analyst look at (masked version of) data to choose Z function
Prior information
Bayesian approach: choose prior and model, and Zj could be the posteriorprobability that Xj contributes to the model
Still strict FDR control, even if wrong prior or MCMC has not converged
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 14 / 18
Adaptivity and Prior Information in Wj
Recall LCD: Wj = |βj | − |βj |, where βj , βj come from `1-penalized regression
Adaptivity
Cross-validation (on [X X]) to choose the penalty parameter in LCD
Higher-level adaptivity: CV to choose best-fitting model for inference
− E.g., fit random forest and `1-penalized regression; derive feature importancefrom whichever has lower CV error—still strict FDR control
Can even let analyst look at (masked version of) data to choose Z function
Prior information
Bayesian approach: choose prior and model, and Zj could be the posteriorprobability that Xj contributes to the model
Still strict FDR control, even if wrong prior or MCMC has not converged
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 14 / 18
Adaptivity and Prior Information in Wj
Recall LCD: Wj = |βj | − |βj |, where βj , βj come from `1-penalized regression
Adaptivity
Cross-validation (on [X X]) to choose the penalty parameter in LCD
Higher-level adaptivity: CV to choose best-fitting model for inference
− E.g., fit random forest and `1-penalized regression; derive feature importancefrom whichever has lower CV error—still strict FDR control
Can even let analyst look at (masked version of) data to choose Z function
Prior information
Bayesian approach: choose prior and model, and Zj could be the posteriorprobability that Xj contributes to the model
Still strict FDR control, even if wrong prior or MCMC has not converged
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 14 / 18
Adaptivity and Prior Information in Wj
Recall LCD: Wj = |βj | − |βj |, where βj , βj come from `1-penalized regression
Adaptivity
Cross-validation (on [X X]) to choose the penalty parameter in LCD
Higher-level adaptivity: CV to choose best-fitting model for inference
− E.g., fit random forest and `1-penalized regression; derive feature importancefrom whichever has lower CV error—still strict FDR control
Can even let analyst look at (masked version of) data to choose Z function
Prior information
Bayesian approach: choose prior and model, and Zj could be the posteriorprobability that Xj contributes to the model
Still strict FDR control, even if wrong prior or MCMC has not converged
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 14 / 18
Adaptivity and Prior Information in Wj
Recall LCD: Wj = |βj | − |βj |, where βj , βj come from `1-penalized regression
Adaptivity
Cross-validation (on [X X]) to choose the penalty parameter in LCD
Higher-level adaptivity: CV to choose best-fitting model for inference
− E.g., fit random forest and `1-penalized regression; derive feature importancefrom whichever has lower CV error—still strict FDR control
Can even let analyst look at (masked version of) data to choose Z function
Prior information
Bayesian approach: choose prior and model, and Zj could be the posteriorprobability that Xj contributes to the model
Still strict FDR control, even if wrong prior or MCMC has not converged
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 14 / 18
Adaptivity and Prior Information in Wj
Recall LCD: Wj = |βj | − |βj |, where βj , βj come from `1-penalized regression
Adaptivity
Cross-validation (on [X X]) to choose the penalty parameter in LCD
Higher-level adaptivity: CV to choose best-fitting model for inference
− E.g., fit random forest and `1-penalized regression; derive feature importancefrom whichever has lower CV error—still strict FDR control
Can even let analyst look at (masked version of) data to choose Z function
Prior information
Bayesian approach: choose prior and model, and Zj could be the posteriorprobability that Xj contributes to the model
Still strict FDR control, even if wrong prior or MCMC has not converged
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 14 / 18
Step (3): Find the Knockoff Threshold
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 14 / 18
Find the Knockoff Threshold
Example with p = 10 and q = 20% = 1/5:
0W1
W2
W3 W4 W5
W6W7
W8
W9W10
|W1|
|W2|
|W3| |W4| |W5|
|W6||W7|
|W8|
|W9| |W10|
01
02
03
13
14
15
25
35
36
37
|W1| |W4| |W5|
|W6||W7|
q = 20%
#{negative Wj}#{positive Wj}
τ
S = {1, 4, 5, 6, 7}
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18
Find the Knockoff Threshold
Example with p = 10 and q = 20% = 1/5:
0
W1
W2
W3 W4 W5
W6W7
W8
W9W10
|W1|
|W2|
|W3| |W4| |W5|
|W6||W7|
|W8|
|W9| |W10|
01
02
03
13
14
15
25
35
36
37
|W1| |W4| |W5|
|W6||W7|
q = 20%
#{negative Wj}#{positive Wj}
τ
S = {1, 4, 5, 6, 7}
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18
Find the Knockoff Threshold
Example with p = 10 and q = 20% = 1/5:
0W1
W2
W3 W4 W5
W6W7
W8
W9W10
|W1|
|W2|
|W3| |W4| |W5|
|W6||W7|
|W8|
|W9| |W10|
01
02
03
13
14
15
25
35
36
37
|W1| |W4| |W5|
|W6||W7|
q = 20%
#{negative Wj}#{positive Wj}
τ
S = {1, 4, 5, 6, 7}
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18
Find the Knockoff Threshold
Example with p = 10 and q = 20% = 1/5:
0
W1
W2
W3 W4 W5
W6W7
W8
W9W10
|W1|
|W2|
|W3| |W4| |W5|
|W6||W7|
|W8|
|W9| |W10|
01
02
03
13
14
15
25
35
36
37
|W1| |W4| |W5|
|W6||W7|
q = 20%
#{negative Wj}#{positive Wj}
τ
S = {1, 4, 5, 6, 7}
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18
Find the Knockoff Threshold
Example with p = 10 and q = 20% = 1/5:
0
W1
W2
W3 W4 W5
W6W7
W8
W9W10
|W1|
|W2|
|W3| |W4| |W5|
|W6||W7|
|W8|
|W9| |W10|
01
02
03
13
14
15
25
35
36
37
|W1| |W4| |W5|
|W6||W7|
q = 20%
#{negative Wj}#{positive Wj}
τ
S = {1, 4, 5, 6, 7}
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18
Find the Knockoff Threshold
Example with p = 10 and q = 20% = 1/5:
0
W1
W2
W3 W4 W5
W6W7
W8
W9W10
|W1|
|W2|
|W3| |W4| |W5|
|W6||W7|
|W8|
|W9| |W10|
01
02
03
13
14
15
25
35
36
37
|W1| |W4| |W5|
|W6||W7|
q = 20%
#{negative Wj}#{positive Wj}
τ
S = {1, 4, 5, 6, 7}
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18
Find the Knockoff Threshold
Example with p = 10 and q = 20% = 1/5:
0
W1
W2
W3 W4 W5
W6W7
W8
W9W10
|W1|
|W2|
|W3| |W4| |W5|
|W6||W7|
|W8|
|W9| |W10|
01
02
03
13
14
15
25
35
36
37
|W1| |W4| |W5|
|W6||W7|
q = 20%
#{negative Wj}#{positive Wj}
τ
S = {1, 4, 5, 6, 7}
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18
Find the Knockoff Threshold
Example with p = 10 and q = 20% = 1/5:
0
W1
W2
W3 W4 W5
W6W7
W8
W9W10
|W1|
|W2|
|W3| |W4| |W5|
|W6||W7|
|W8|
|W9| |W10|
01
02
03
13
14
15
25
35
36
37
|W1| |W4| |W5|
|W6||W7|
q = 20%
#{negative Wj}#{positive Wj}
τ
S = {1, 4, 5, 6, 7}
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18
Find the Knockoff Threshold
Example with p = 10 and q = 20% = 1/5:
0
W1
W2
W3 W4 W5
W6W7
W8
W9W10
|W1|
|W2|
|W3| |W4| |W5|
|W6||W7|
|W8|
|W9| |W10|
01
02
03
13
14
15
25
35
36
37
|W1| |W4| |W5|
|W6||W7|
q = 20%
#{negative Wj}#{positive Wj}
τ
S = {1, 4, 5, 6, 7}
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18
Find the Knockoff Threshold
Example with p = 10 and q = 20% = 1/5:
0
W1
W2
W3 W4 W5
W6W7
W8
W9W10
|W1|
|W2|
|W3| |W4| |W5|
|W6||W7|
|W8|
|W9| |W10|
01
02
03
13
14
15
25
35
36
37
|W1| |W4| |W5|
|W6||W7|
q = 20%
#{negative Wj}#{positive Wj}
τ
S = {1, 4, 5, 6, 7}
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18
Find the Knockoff Threshold
Example with p = 10 and q = 20% = 1/5:
0
W1
W2
W3 W4 W5
W6W7
W8
W9W10
|W1|
|W2|
|W3| |W4| |W5|
|W6||W7|
|W8|
|W9| |W10|
01
02
03
13
14
15
25
35
36
37
|W1| |W4| |W5|
|W6||W7|
q = 20%
#{negative Wj}#{positive Wj}
τ
S = {1, 4, 5, 6, 7}
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18
Find the Knockoff Threshold
Example with p = 10 and q = 20% = 1/5:
0
W1
W2
W3 W4 W5
W6W7
W8
W9W10
|W1|
|W2|
|W3| |W4| |W5|
|W6||W7|
|W8|
|W9| |W10|
01
02
03
13
14
15
25
35
36
37
|W1| |W4| |W5|
|W6||W7|
q = 20%
#{negative Wj}#{positive Wj}
τ
S = {1, 4, 5, 6, 7}
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18
Find the Knockoff Threshold
Example with p = 10 and q = 20% = 1/5:
0
W1
W2
W3 W4 W5
W6W7
W8
W9W10
|W1|
|W2|
|W3| |W4| |W5|
|W6||W7|
|W8|
|W9| |W10|
01
02
03
13
14
15
25
35
36
37
|W1| |W4| |W5|
|W6||W7|
q = 20%
#{negative Wj}#{positive Wj}
τ
S = {1, 4, 5, 6, 7}
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18
Find the Knockoff Threshold
Example with p = 10 and q = 20% = 1/5:
0
W1
W2
W3 W4 W5
W6W7
W8
W9W10
|W1|
|W2|
|W3| |W4| |W5|
|W6||W7|
|W8|
|W9| |W10|
01
02
03
13
14
15
25
35
36
37
|W1| |W4| |W5|
|W6||W7|
q = 20%
#{negative Wj}#{positive Wj}
τ
S = {1, 4, 5, 6, 7}
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18
Find the Knockoff Threshold
Example with p = 10 and q = 20% = 1/5:
0
W1
W2
W3 W4 W5
W6W7
W8
W9W10
|W1|
|W2|
|W3| |W4| |W5|
|W6||W7|
|W8|
|W9| |W10|
01
02
03
13
14
15
25
35
36
37
|W1| |W4| |W5|
|W6||W7|
q = 20%
#{negative Wj}#{positive Wj}
τ
S = {1, 4, 5, 6, 7}
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18
Find the Knockoff Threshold
Example with p = 10 and q = 20% = 1/5:
0
W1
W2
W3 W4 W5
W6W7
W8
W9W10
|W1|
|W2|
|W3| |W4| |W5|
|W6||W7|
|W8|
|W9| |W10|
01
02
03
13
14
15
25
35
36
37
|W1| |W4| |W5|
|W6||W7|
q = 20%
#{negative Wj}#{positive Wj}
τ
S = {1, 4, 5, 6, 7}
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18
Find the Knockoff Threshold
Example with p = 10 and q = 20% = 1/5:
0
W1
W2
W3 W4 W5
W6W7
W8
W9W10
|W1|
|W2|
|W3| |W4| |W5|
|W6||W7|
|W8|
|W9| |W10|
01
02
03
13
14
15
25
35
36
37
|W1| |W4| |W5|
|W6||W7|
q = 20%
#{negative Wj}#{positive Wj}
τ
S = {1, 4, 5, 6, 7}
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18
Intuition for FDR Control
FDR = E(
#{null Xj selected}#{total Xj selected}
)
= E(#{null positive |Wj | > τ}#{positive |Wj | > τ}
)≈ E
(#{null negative |Wj | > τ}
#{positive |Wj | > τ}
)≤ E
(#{negative |Wj | > τ}#{positive |Wj | > τ}
) q
τ
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 16 / 18
Intuition for FDR Control
FDR = E(
#{null Xj selected}#{total Xj selected}
)= E
(#{null positive |Wj | > τ}#{positive |Wj | > τ}
)
≈ E(#{null negative |Wj | > τ}
#{positive |Wj | > τ}
)≤ E
(#{negative |Wj | > τ}#{positive |Wj | > τ}
) q
τ
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 16 / 18
Intuition for FDR Control
FDR = E(
#{null Xj selected}#{total Xj selected}
)= E
(#{null positive |Wj | > τ}#{positive |Wj | > τ}
)≈ E
(#{null negative |Wj | > τ}
#{positive |Wj | > τ}
)
≤ E(#{negative |Wj | > τ}#{positive |Wj | > τ}
) q
τ
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 16 / 18
Intuition for FDR Control
FDR = E(
#{null Xj selected}#{total Xj selected}
)= E
(#{null positive |Wj | > τ}#{positive |Wj | > τ}
)≈ E
(#{null negative |Wj | > τ}
#{positive |Wj | > τ}
)≤ E
(#{negative |Wj | > τ}#{positive |Wj | > τ}
) q
τ
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 16 / 18
GWAS Application
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 16 / 18
Genetic Analysis of Crohn’s Disease
2007 case-control study by WTCCC
n ≈ 5, 000, p ≈ 375, 000; preprocessing mirrored original analysis
Strong spatial structure: second-order knockoffs generated using geneticcovariance estimate (Wen and Stephens, 2010)
Entire analysis took 6 hours of serial computation time; 1 hour in parallel
Knockoffs made twice as many discoveries as original analysis
− Some new discoveries confirmed in larger study
− Some corroborated by work on nearby genes: promising candidates
− Similar result when HMM knockoffs applied to same data (Sesia et al., 2017)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 17 / 18
Genetic Analysis of Crohn’s Disease
2007 case-control study by WTCCC
n ≈ 5, 000, p ≈ 375, 000; preprocessing mirrored original analysis
Strong spatial structure: second-order knockoffs generated using geneticcovariance estimate (Wen and Stephens, 2010)
Entire analysis took 6 hours of serial computation time; 1 hour in parallel
Knockoffs made twice as many discoveries as original analysis
− Some new discoveries confirmed in larger study
− Some corroborated by work on nearby genes: promising candidates
− Similar result when HMM knockoffs applied to same data (Sesia et al., 2017)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 17 / 18
Genetic Analysis of Crohn’s Disease
2007 case-control study by WTCCC
n ≈ 5, 000, p ≈ 375, 000; preprocessing mirrored original analysis
Strong spatial structure: second-order knockoffs generated using geneticcovariance estimate (Wen and Stephens, 2010)
Entire analysis took 6 hours of serial computation time; 1 hour in parallel
Knockoffs made twice as many discoveries as original analysis
− Some new discoveries confirmed in larger study
− Some corroborated by work on nearby genes: promising candidates
− Similar result when HMM knockoffs applied to same data (Sesia et al., 2017)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 17 / 18
Genetic Analysis of Crohn’s Disease
2007 case-control study by WTCCC
n ≈ 5, 000, p ≈ 375, 000; preprocessing mirrored original analysis
Strong spatial structure: second-order knockoffs generated using geneticcovariance estimate (Wen and Stephens, 2010)
Entire analysis took 6 hours of serial computation time; 1 hour in parallel
Knockoffs made twice as many discoveries as original analysis
− Some new discoveries confirmed in larger study
− Some corroborated by work on nearby genes: promising candidates
− Similar result when HMM knockoffs applied to same data (Sesia et al., 2017)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 17 / 18
Genetic Analysis of Crohn’s Disease
2007 case-control study by WTCCC
n ≈ 5, 000, p ≈ 375, 000; preprocessing mirrored original analysis
Strong spatial structure: second-order knockoffs generated using geneticcovariance estimate (Wen and Stephens, 2010)
Entire analysis took 6 hours of serial computation time; 1 hour in parallel
Knockoffs made twice as many discoveries as original analysis
− Some new discoveries confirmed in larger study
− Some corroborated by work on nearby genes: promising candidates
− Similar result when HMM knockoffs applied to same data (Sesia et al., 2017)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 17 / 18
Genetic Analysis of Crohn’s Disease
2007 case-control study by WTCCC
n ≈ 5, 000, p ≈ 375, 000; preprocessing mirrored original analysis
Strong spatial structure: second-order knockoffs generated using geneticcovariance estimate (Wen and Stephens, 2010)
Entire analysis took 6 hours of serial computation time; 1 hour in parallel
Knockoffs made twice as many discoveries as original analysis
− Some new discoveries confirmed in larger study
− Some corroborated by work on nearby genes: promising candidates
− Similar result when HMM knockoffs applied to same data (Sesia et al., 2017)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 17 / 18
Genetic Analysis of Crohn’s Disease
2007 case-control study by WTCCC
n ≈ 5, 000, p ≈ 375, 000; preprocessing mirrored original analysis
Strong spatial structure: second-order knockoffs generated using geneticcovariance estimate (Wen and Stephens, 2010)
Entire analysis took 6 hours of serial computation time; 1 hour in parallel
Knockoffs made twice as many discoveries as original analysis
− Some new discoveries confirmed in larger study
− Some corroborated by work on nearby genes: promising candidates
− Similar result when HMM knockoffs applied to same data (Sesia et al., 2017)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 17 / 18
Discussion
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 17 / 18
Summary and Next Steps
By conditioning on Y and modeling X, knockoffs can be applied tohigh-dimensional and nonlinear problems, where it is powerful, flexible, andappears robust
Some future directions for research:
Theoretical: rigorous guarantees on robustness
Methodological: develop knockoff constructions for new X distributions
Applied: team up with domain experts who know/control their X, e.g., geneknockout/knockdown, climate change modeling
Thank you!
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Summary and Next Steps
By conditioning on Y and modeling X, knockoffs can be applied tohigh-dimensional and nonlinear problems, where it is powerful, flexible, andappears robust
Some future directions for research:
Theoretical: rigorous guarantees on robustness
Methodological: develop knockoff constructions for new X distributions
Applied: team up with domain experts who know/control their X, e.g., geneknockout/knockdown, climate change modeling
Thank you!
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Summary and Next Steps
By conditioning on Y and modeling X, knockoffs can be applied tohigh-dimensional and nonlinear problems, where it is powerful, flexible, andappears robust
Some future directions for research:
Theoretical: rigorous guarantees on robustness
Methodological: develop knockoff constructions for new X distributions
Applied: team up with domain experts who know/control their X, e.g., geneknockout/knockdown, climate change modeling
Thank you!
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Summary and Next Steps
By conditioning on Y and modeling X, knockoffs can be applied tohigh-dimensional and nonlinear problems, where it is powerful, flexible, andappears robust
Some future directions for research:
Theoretical: rigorous guarantees on robustness
Methodological: develop knockoff constructions for new X distributions
Applied: team up with domain experts who know/control their X, e.g., geneknockout/knockdown, climate change modeling
Thank you!
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Summary and Next Steps
By conditioning on Y and modeling X, knockoffs can be applied tohigh-dimensional and nonlinear problems, where it is powerful, flexible, andappears robust
Some future directions for research:
Theoretical: rigorous guarantees on robustness
Methodological: develop knockoff constructions for new X distributions
Applied: team up with domain experts who know/control their X, e.g., geneknockout/knockdown, climate change modeling
Thank you!
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Appendix
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
References
Barber, R. F. and Candes, E. J. (2015). Controlling the false discovery rate viaknockoffs. Ann. Statist., 43(5):2055–2085.
Candes, E., Fan, Y., Janson, L., and Lv, J. (2016). Panning for gold: Model-freeknockoffs for high-dimensional controlled variable selection. arXiv preprintarXiv:1610.02351.
Dai, R. and Barber, R. F. (2016). The knockoff filter for fdr control ingroup-sparse and multitask regression. arXiv preprint arXiv:1602.03589.
Sesia, M., Sabatti, C., and Candes, E. (2017). Gene hunting with knockoffs forhidden markov models. arXiv preprint arXiv:1706.04677.
Wen, X. and Stephens, M. (2010). Using linear predictors to impute allelefrequencies from summary or pooled genotype data. Ann. Appl. Stat.,4(3):1158–1182.
WTCCC (2007). Genome-wide association study of 14,000 cases of sevencommon diseases and 3,000 shared controls. Nature, 447(7145):661–678.
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Simulations in Low-Dimensional Linear Model
0.00
0.25
0.50
0.75
1.00
2 3 4 5Coefficient Amplitude
Pow
er
MethodsBHq MarginalBHq Max Lik.MF KnockoffsOrig. Knockoffs 0.00
0.25
0.50
0.75
1.00
2 3 4 5Coefficient Amplitude
FD
R
Figure: Power and FDR (target is 10%) for MF knockoffs and alternative procedures.The design matrix is i.i.d. N (0, 1/n), n = 3000, p = 1000, and y comes from a Gaussianlinear model with 60 nonzero regression coefficients having equal magnitudes andrandom signs. The noise variance is 1.
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Simulations in Low-Dimensional Nonlinear Model
0.00
0.25
0.50
0.75
1.00
6 8 10Coefficient Amplitude
Pow
er
MethodsBHq MarginalBHq Max Lik.MF Knockoffs 0.00
0.25
0.50
0.75
1.00
6 8 10Coefficient Amplitude
FD
R
Figure: Power and FDR (target is 10%) for MF knockoffs and alternative procedures.The design matrix is i.i.d. N (0, 1/n), n = 3000, p = 1000, and y comes from a binomiallinear model with logit link function, and 60 nonzero regression coefficients having equalmagnitudes and random signs.
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Simulations in High Dimensions
0.00
0.25
0.50
0.75
1.00
8 10 12Coefficient Amplitude
Pow
er
MethodsBHq MarginalMF Knockoffs 0.00
0.25
0.50
0.75
1.00
8 10 12Coefficient Amplitude
FD
R
Figure: Power and FDR (target is 10%) for MF knockoffs and alternative procedures.The design matrix is i.i.d. N (0, 1/n), n = 3000, p = 6000, and y comes from a binomiallinear model with logit link function, and 60 nonzero regression coefficients having equalmagnitudes and random signs.
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Simulations in High Dimensions with Dependence
0.00
0.25
0.50
0.75
1.00
0.0 0.2 0.4 0.6 0.8Autocorrelation Coefficient
Pow
er
MethodsBHq MarginalMF Knockoffs 0.00
0.25
0.50
0.75
1.00
0.0 0.2 0.4 0.6 0.8Autocorrelation Coefficient
FD
R
Figure: Power and FDR (target is 10%) for MF knockoffs and alternative procedures.The design matrix has AR(1) columns, and marginally each Xj ∼ N (0, 1/n). n = 3000,p = 6000, and y follows a binomial linear model with logit link function, and 60 nonzerocoefficients with random signs and randomly selected locations.
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Checking Sensitivity to Misspecification Error
Concern about misspecification
Y |X X
Canonical (model Y , not X) Yes No
model X, not Y No Yes
Misspecification replicatedin simulation?
No Yes
Can actually check sensitivity to misspecification error!
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Checking Sensitivity to Misspecification Error
Concern about misspecification
Y |X X
Canonical (model Y , not X) Yes No
model X, not Y No Yes
Misspecification replicatedin simulation?
No Yes
Can actually check sensitivity to misspecification error!
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Checking Sensitivity to Misspecification Error
Concern about misspecification
Y |X X
Canonical (model Y , not X) Yes No
model X, not Y No Yes
Misspecification replicatedin simulation?
No Yes
Can actually check sensitivity to misspecification error!
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Robustness on Real Data
●●
●
0.00
0.25
0.50
0.75
1.00
9 12 15 18 21Coefficient Amplitude
Pow
er
●
●
●
● ●
0.00
0.25
0.50
0.75
1.00
9 12 15 18 21Coefficient Amplitude
FD
R
Figure: Power and FDR (target is 10%) for model-free knockoffs applied to subsamplesof a chromosome 1 of real genetic design matrix; n ≈ 1, 400.
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Computation of Second-Order Knockoffs
Cov(X1, . . . , Xp) = Σ, need:
Cov(X1, . . . , Xp, X1, . . . , Xp) =
[Σ Σ− diag{s}
Σ− diag{s} Σ
]
Equicorrelated (EQ) (fast, less powerful): sEQj = 2λmin(Σ) ∧ 1 for all j
Semidefinite program (SDP) (slower, more powerful):
minimize∑
j |1− sSDPj |
subject to sSDPj ≥ 0
diag{sSDP} � 2Σ,
(New) Approximate SDP:
Approximate Σ as block diagonal so that SDP separatesBisection search scalar multiplier of solution to account for approximationfaster than SDP, more powerful than EQ, and easily parallelizable
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Computation of Second-Order Knockoffs
Cov(X1, . . . , Xp) = Σ, need:
Cov(X1, . . . , Xp, X1, . . . , Xp) =
[Σ Σ− diag{s}
Σ− diag{s} Σ
]
Equicorrelated (EQ) (fast, less powerful): sEQj = 2λmin(Σ) ∧ 1 for all j
Semidefinite program (SDP) (slower, more powerful):
minimize∑
j |1− sSDPj |
subject to sSDPj ≥ 0
diag{sSDP} � 2Σ,
(New) Approximate SDP:
Approximate Σ as block diagonal so that SDP separatesBisection search scalar multiplier of solution to account for approximationfaster than SDP, more powerful than EQ, and easily parallelizable
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Computation of Second-Order Knockoffs
Cov(X1, . . . , Xp) = Σ, need:
Cov(X1, . . . , Xp, X1, . . . , Xp) =
[Σ Σ− diag{s}
Σ− diag{s} Σ
]
Equicorrelated (EQ) (fast, less powerful): sEQj = 2λmin(Σ) ∧ 1 for all j
Semidefinite program (SDP) (slower, more powerful):
minimize∑
j |1− sSDPj |
subject to sSDPj ≥ 0
diag{sSDP} � 2Σ,
(New) Approximate SDP:
Approximate Σ as block diagonal so that SDP separatesBisection search scalar multiplier of solution to account for approximationfaster than SDP, more powerful than EQ, and easily parallelizable
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Computation of Second-Order Knockoffs
Cov(X1, . . . , Xp) = Σ, need:
Cov(X1, . . . , Xp, X1, . . . , Xp) =
[Σ Σ− diag{s}
Σ− diag{s} Σ
]
Equicorrelated (EQ) (fast, less powerful): sEQj = 2λmin(Σ) ∧ 1 for all j
Semidefinite program (SDP) (slower, more powerful):
minimize∑
j |1− sSDPj |
subject to sSDPj ≥ 0
diag{sSDP} � 2Σ,
(New) Approximate SDP:
Approximate Σ as block diagonal so that SDP separatesBisection search scalar multiplier of solution to account for approximationfaster than SDP, more powerful than EQ, and easily parallelizable
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Sequential Independent Pairs Generates Valid Knockoffs
Algorithm 1 Sequential Conditional Independent Pairs
for j = {1, . . . , p} doSample Xj from L(Xj |X-j , X1:j−1) conditionally independently of Xj
end
Proof sketch (discrete case):
Denote PMF of (X1:p, X1:j−1) by L(X-j , Xj , X1:j−1)
Conditional PMF of Xj |X1:p, X1:j−1 is
L(X-j , Xj , X1:j−1)∑u L(X-j , u, X1:j−1)
.
Joint PMF of (X1:p, X1:j) is
L(X-j , Xj , X1:j−1)L(X-j , Xj , X1:j−1)∑u L(X-j , u, X1:j−1)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Sequential Independent Pairs Generates Valid Knockoffs
Algorithm 1 Sequential Conditional Independent Pairs
for j = {1, . . . , p} doSample Xj from L(Xj |X-j , X1:j−1) conditionally independently of Xj
end
Proof sketch (discrete case):
Denote PMF of (X1:p, X1:j−1) by L(X-j , Xj , X1:j−1)
Conditional PMF of Xj |X1:p, X1:j−1 is
L(X-j , Xj , X1:j−1)∑u L(X-j , u, X1:j−1)
.
Joint PMF of (X1:p, X1:j) is
L(X-j , Xj , X1:j−1)L(X-j , Xj , X1:j−1)∑u L(X-j , u, X1:j−1)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Sequential Independent Pairs Generates Valid Knockoffs
Algorithm 1 Sequential Conditional Independent Pairs
for j = {1, . . . , p} doSample Xj from L(Xj |X-j , X1:j−1) conditionally independently of Xj
end
Proof sketch (discrete case):
Denote PMF of (X1:p, X1:j−1) by L(X-j , Xj , X1:j−1)
Conditional PMF of Xj |X1:p, X1:j−1 is
L(X-j , Xj , X1:j−1)∑u L(X-j , u, X1:j−1)
.
Joint PMF of (X1:p, X1:j) is
L(X-j , Xj , X1:j−1)L(X-j , Xj , X1:j−1)∑u L(X-j , u, X1:j−1)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Sequential Independent Pairs Generates Valid Knockoffs
Algorithm 1 Sequential Conditional Independent Pairs
for j = {1, . . . , p} doSample Xj from L(Xj |X-j , X1:j−1) conditionally independently of Xj
end
Proof sketch (discrete case):
Denote PMF of (X1:p, X1:j−1) by L(X-j , Xj , X1:j−1)
Conditional PMF of Xj |X1:p, X1:j−1 is
L(X-j , Xj , X1:j−1)∑u L(X-j , u, X1:j−1)
.
Joint PMF of (X1:p, X1:j) is
L(X-j , Xj , X1:j−1)L(X-j , Xj , X1:j−1)∑u L(X-j , u, X1:j−1)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Sequential Independent Pairs Generates Valid Knockoffs
Algorithm 1 Sequential Conditional Independent Pairs
for j = {1, . . . , p} doSample Xj from L(Xj |X-j , X1:j−1) conditionally independently of Xj
end
Proof sketch (discrete case):
Denote PMF of (X1:p, X1:j−1) by L(X-j , Xj , X1:j−1)
Conditional PMF of Xj |X1:p, X1:j−1 is
L(X-j , Xj , X1:j−1)∑u L(X-j , u, X1:j−1)
.
Joint PMF of (X1:p, X1:j) is
L(X-j , Xj , X1:j−1)L(X-j , Xj , X1:j−1)∑u L(X-j , u, X1:j−1)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Sequential Independent Pairs Generates Valid Knockoffs
Algorithm 1 Sequential Conditional Independent Pairs
for j = {1, . . . , p} doSample Xj from L(Xj |X-j , X1:j−1) conditionally independently of Xj
end
Proof sketch (discrete case):
Denote PMF of (X1:p, X1:j−1) by L(X-j , Xj , X1:j−1)
Conditional PMF of Xj |X1:p, X1:j−1 is
L(X-j , Xj , X1:j−1)∑u L(X-j , u, X1:j−1)
.
Joint PMF of (X1:p, X1:j) is
L(X-j , Xj , X1:j−1)L(X-j , Xj , X1:j−1)∑u L(X-j , u, X1:j−1)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Sequential Independent Pairs Generates Valid Knockoffs
Algorithm 1 Sequential Conditional Independent Pairs
for j = {1, . . . , p} doSample Xj from L(Xj |X-j , X1:j−1) conditionally independently of Xj
end
Proof sketch (discrete case):
Denote PMF of (X1:p, X1:j−1) by L(X-j , Xj , X1:j−1)
Conditional PMF of Xj |X1:p, X1:j−1 is
L(X-j , Xj , X1:j−1)∑u L(X-j , u, X1:j−1)
.
Joint PMF of (X1:p, X1:j) is
L(X-j , Xj , X1:j−1)L(X-j , Xj , X1:j−1)∑u L(X-j , u, X1:j−1)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Proof of Control
FDR = E(
#{null Xj selected}#{total Xj selected}
)
= E(#{null positive |Wj | > τ}#{positive |Wj | > τ}
)≈ E
(#{null negative |Wj | > τ}
#{positive |Wj | > τ}
)≤ E
(#{negative |Wj | > τ}#{positive |Wj | > τ}
)
q
τ
More precisely:
mFDR = E(
#{null Xj selected}q−1 +#{total Xj selected}
)= E
(#{null positive |Wj | > τ}q−1 +#{positive |Wj | > τ}
)= E
(#{null positive |Wj | > τ}
1 + #{null negative |Wj | > τ}︸ ︷︷ ︸Supermartingale ≤ 1
with τ a stopping time
· 1 + #{null negative |Wj | > τ}q−1 +#{positive|Wj | > τ}︸ ︷︷ ︸≤ q by definition of τ
)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Proof of Control
FDR = E(
#{null Xj selected}#{total Xj selected}
)= E
(#{null positive |Wj | > τ}#{positive |Wj | > τ}
)
≈ E(#{null negative |Wj | > τ}
#{positive |Wj | > τ}
)≤ E
(#{negative |Wj | > τ}#{positive |Wj | > τ}
)
q
τ
More precisely:
mFDR = E(
#{null Xj selected}q−1 +#{total Xj selected}
)= E
(#{null positive |Wj | > τ}q−1 +#{positive |Wj | > τ}
)= E
(#{null positive |Wj | > τ}
1 + #{null negative |Wj | > τ}︸ ︷︷ ︸Supermartingale ≤ 1
with τ a stopping time
· 1 + #{null negative |Wj | > τ}q−1 +#{positive|Wj | > τ}︸ ︷︷ ︸≤ q by definition of τ
)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Proof of Control
FDR = E(
#{null Xj selected}#{total Xj selected}
)= E
(#{null positive |Wj | > τ}#{positive |Wj | > τ}
)≈ E
(#{null negative |Wj | > τ}
#{positive |Wj | > τ}
)
≤ E(#{negative |Wj | > τ}#{positive |Wj | > τ}
)
q
τ
More precisely:
mFDR = E(
#{null Xj selected}q−1 +#{total Xj selected}
)= E
(#{null positive |Wj | > τ}q−1 +#{positive |Wj | > τ}
)= E
(#{null positive |Wj | > τ}
1 + #{null negative |Wj | > τ}︸ ︷︷ ︸Supermartingale ≤ 1
with τ a stopping time
· 1 + #{null negative |Wj | > τ}q−1 +#{positive|Wj | > τ}︸ ︷︷ ︸≤ q by definition of τ
)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Proof of Control
FDR = E(
#{null Xj selected}#{total Xj selected}
)= E
(#{null positive |Wj | > τ}#{positive |Wj | > τ}
)≈ E
(#{null negative |Wj | > τ}
#{positive |Wj | > τ}
)≤ E
(#{negative |Wj | > τ}#{positive |Wj | > τ}
) q
τ
More precisely:
mFDR = E(
#{null Xj selected}q−1 +#{total Xj selected}
)= E
(#{null positive |Wj | > τ}q−1 +#{positive |Wj | > τ}
)= E
(#{null positive |Wj | > τ}
1 + #{null negative |Wj | > τ}︸ ︷︷ ︸Supermartingale ≤ 1
with τ a stopping time
· 1 + #{null negative |Wj | > τ}q−1 +#{positive|Wj | > τ}︸ ︷︷ ︸≤ q by definition of τ
)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Proof of Control
FDR = E(
#{null Xj selected}#{total Xj selected}
)= E
(#{null positive |Wj | > τ}#{positive |Wj | > τ}
)≈ E
(#{null negative |Wj | > τ}
#{positive |Wj | > τ}
)≤ E
(#{negative |Wj | > τ}#{positive |Wj | > τ}
) q
τMore precisely:
mFDR = E(
#{null Xj selected}q−1 +#{total Xj selected}
)= E
(#{null positive |Wj | > τ}q−1 +#{positive |Wj | > τ}
)
= E(
#{null positive |Wj | > τ}1 + #{null negative |Wj | > τ}︸ ︷︷ ︸
Supermartingale ≤ 1with τ a stopping time
· 1 + #{null negative |Wj | > τ}q−1 +#{positive|Wj | > τ}︸ ︷︷ ︸≤ q by definition of τ
)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Proof of Control
FDR = E(
#{null Xj selected}#{total Xj selected}
)= E
(#{null positive |Wj | > τ}#{positive |Wj | > τ}
)≈ E
(#{null negative |Wj | > τ}
#{positive |Wj | > τ}
)≤ E
(#{negative |Wj | > τ}#{positive |Wj | > τ}
) q
τMore precisely:
mFDR = E(
#{null Xj selected}q−1 +#{total Xj selected}
)= E
(#{null positive |Wj | > τ}q−1 +#{positive |Wj | > τ}
)= E
(#{null positive |Wj | > τ}
1 + #{null negative |Wj | > τ}· 1 + #{null negative |Wj | > τ}
q−1 +#{positive|Wj | > τ}
)
= E(
#{null positive |Wj | > τ}1 + #{null negative |Wj | > τ}︸ ︷︷ ︸
Supermartingale ≤ 1with τ a stopping time
· 1 + #{null negative |Wj | > τ}q−1 +#{positive|Wj | > τ}︸ ︷︷ ︸≤ q by definition of τ
)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Proof of Control
FDR = E(
#{null Xj selected}#{total Xj selected}
)= E
(#{null positive |Wj | > τ}#{positive |Wj | > τ}
)≈ E
(#{null negative |Wj | > τ}
#{positive |Wj | > τ}
)≤ E
(#{negative |Wj | > τ}#{positive |Wj | > τ}
) q
τMore precisely:
mFDR = E(
#{null Xj selected}q−1 +#{total Xj selected}
)= E
(#{null positive |Wj | > τ}q−1 +#{positive |Wj | > τ}
)= E
(#{null positive |Wj | > τ}
1 + #{null negative |Wj | > τ}· 1 + #{null negative |Wj | > τ}
q−1 +#{positive|Wj | > τ}︸ ︷︷ ︸≤ q by definition of τ
)
= E(
#{null positive |Wj | > τ}1 + #{null negative |Wj | > τ}︸ ︷︷ ︸
Supermartingale ≤ 1with τ a stopping time
· 1 + #{null negative |Wj | > τ}q−1 +#{positive|Wj | > τ}︸ ︷︷ ︸≤ q by definition of τ
)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18
Proof of Control
FDR = E(
#{null Xj selected}#{total Xj selected}
)= E
(#{null positive |Wj | > τ}#{positive |Wj | > τ}
)≈ E
(#{null negative |Wj | > τ}
#{positive |Wj | > τ}
)≤ E
(#{negative |Wj | > τ}#{positive |Wj | > τ}
) q
τMore precisely:
mFDR = E(
#{null Xj selected}q−1 +#{total Xj selected}
)= E
(#{null positive |Wj | > τ}q−1 +#{positive |Wj | > τ}
)= E
(#{null positive |Wj | > τ}
1 + #{null negative |Wj | > τ}︸ ︷︷ ︸Supermartingale ≤ 1
with τ a stopping time
· 1 + #{null negative |Wj | > τ}q−1 +#{positive|Wj | > τ}︸ ︷︷ ︸≤ q by definition of τ
)
Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 18 / 18