controlling the actual number of false discoveries at a given confidence level
DESCRIPTION
Controlling the Actual Number of False Discoveries at a Given Confidence Level. Joe Maisog BIST-530 Final Project December 3, 2008. False Discovery Rate. FDR (FPR) = proportion of positive tests which are actually false positives FDR methods control the FDR in the sense that - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Controlling the Actual Number of False Discoveries at a Given Confidence Level](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148ba550346895db5d406/html5/thumbnails/1.jpg)
Controlling the Actual Number of False Discoveries
at a Given Confidence Level
Joe Maisog
BIST-530 Final Project
December 3, 2008
![Page 2: Controlling the Actual Number of False Discoveries at a Given Confidence Level](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148ba550346895db5d406/html5/thumbnails/2.jpg)
False Discovery Rate• FDR (FPR) = proportion of positive tests
which are actually false positives
• FDR methods control the FDR in the sense that
E{FDR} q
where q [0,1] is the desired level of control
Benjamini and Hochberg, 1995
![Page 3: Controlling the Actual Number of False Discoveries at a Given Confidence Level](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148ba550346895db5d406/html5/thumbnails/3.jpg)
Korn’s Variants
Korn E et al., J of Statistical Planning and Inference 124(2): 379-98 (2004).
![Page 4: Controlling the Actual Number of False Discoveries at a Given Confidence Level](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148ba550346895db5d406/html5/thumbnails/4.jpg)
Follow-Up Paper by Lusa et al.
• Lusa L, Korn EL, McShane LM, A class comparison method with filtering-enhanced variable selection for high-dimensional data sets, Stat Med. 2008 Dec 10;27(28):5834-49.
• C code (R package)
![Page 5: Controlling the Actual Number of False Discoveries at a Given Confidence Level](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148ba550346895db5d406/html5/thumbnails/5.jpg)
A Problem“Procedures targeting control of the expected number or proportion of false discoveries rather than the actual number or proportion can give a false sense of security. … Even with no correlation the results here [using “regular” FDR with simulated data] are troubling: 10% of the time the false discovery proportion will be 0.29 or more.” [emphasis mine]
![Page 6: Controlling the Actual Number of False Discoveries at a Given Confidence Level](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148ba550346895db5d406/html5/thumbnails/6.jpg)
Analogy: Accuracy vs. Precision
High AccuracyLow Precision
High PrecisionLow Accuracy
FDR
http://en.wikipedia.org/wiki/Accuracy
![Page 7: Controlling the Actual Number of False Discoveries at a Given Confidence Level](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148ba550346895db5d406/html5/thumbnails/7.jpg)
Two Jokes: Controlling ExpectationWithout a Confidence Level
• Three statisticians went out hunting, and came across a large deer. The first statistician fired, but missed, by a meter to the left. The second statistician fired, but also missed, by a meter to the right.The third statistician didn't fire, but shouted in triumph, "On the average we got it!"
• With one foot in a bucket of ice water, and one foot in a bucket of boiling water, you are, on the average, comfortable.
http://www.workjoke.com/statisticians-jokes.html
![Page 8: Controlling the Actual Number of False Discoveries at a Given Confidence Level](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148ba550346895db5d406/html5/thumbnails/8.jpg)
Korn’s Solution
“[Procedures targeting control of the actual number or proportion of false discoveries] will allow statements such as ‘with 95% confidence, the number of false discoveries does not exceed 2’ or ‘with approximate 95% confidence, the proportion of false discoveries does not exceed 0.01.’ ”[emphasis mine]
![Page 9: Controlling the Actual Number of False Discoveries at a Given Confidence Level](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148ba550346895db5d406/html5/thumbnails/9.jpg)
Korn’s Variants
Adjusted
p-Values
Actual number of false discoveries (“A”)
Actual proportion of false discoveries (“B”)
Full Algorithm
Computationally Efficient Algorithm
Unadjusted
p-Values
Actual number of false discoveries (“A”)
Actual proportion of false discoveries (“B”)
Full Algorithm
Computationally Efficient Algorithm
![Page 10: Controlling the Actual Number of False Discoveries at a Given Confidence Level](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148ba550346895db5d406/html5/thumbnails/10.jpg)
Two Goals
1. Confirm Korn’s warning that when using “regular” FDR, a fairly large fraction of false positive rates exceed the expected rate.
2. Implement in R Korn’s method to control the actual number of false positives at a given confidence level, using the computationally efficient version.
![Page 11: Controlling the Actual Number of False Discoveries at a Given Confidence Level](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148ba550346895db5d406/html5/thumbnails/11.jpg)
Definition
• k variables (e.g., genes)
• P(1) < P(2) < . . . < P(k) are the ordered p-values from
the univariate tests
• H(1), H(2), . . . , H(k) are the corresponding null
hypotheses
• T = { t1, t2, . . . , tj } is any subset of K = { 1, 2, . . . ,
k }
• Pr00 is the multivariate permutation distribution of p-
values
![Page 12: Controlling the Actual Number of False Discoveries at a Given Confidence Level](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148ba550346895db5d406/html5/thumbnails/12.jpg)
Definition
![Page 13: Controlling the Actual Number of False Discoveries at a Given Confidence Level](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148ba550346895db5d406/html5/thumbnails/13.jpg)
Procedure To Control the Actual Number of False Discoveries
![Page 14: Controlling the Actual Number of False Discoveries at a Given Confidence Level](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148ba550346895db5d406/html5/thumbnails/14.jpg)
1000 Simulations in R
• 50 controls, 50 treatments,1000 genes
• Noise ~ N(0,1), no cross-gene correlations
• 100 genes “activated” in treatments with increase = 0.3969 ( p = 0.05)
• “Regular” FDR method to control E{FDR} at q = 0.05
• Korn’s method to control the number of actual FP’s at u = 50, with 95% confidence
![Page 15: Controlling the Actual Number of False Discoveries at a Given Confidence Level](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148ba550346895db5d406/html5/thumbnails/15.jpg)
Simulated Data Matrix
p-values
N1
= 5
0N
2 =
50
G1 =100 G2 = 900
k = 1000
Nto
t =
100
![Page 16: Controlling the Actual Number of False Discoveries at a Given Confidence Level](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148ba550346895db5d406/html5/thumbnails/16.jpg)
Results: “Regular” FDR
• Mean FPR = 0.0394 (so, controlled at q = 0.05)• But 17.5% of the time, FPR > 0.05
![Page 17: Controlling the Actual Number of False Discoveries at a Given Confidence Level](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148ba550346895db5d406/html5/thumbnails/17.jpg)
Results: Korn’s Method
• 98.9% of the time, the actual number of false positives was 50
• Controlled at u = 50 with 95% confidence
![Page 18: Controlling the Actual Number of False Discoveries at a Given Confidence Level](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148ba550346895db5d406/html5/thumbnails/18.jpg)
Conclusions
• 17.5% of the time, FPR > q = 0.05 with “regular” FDR
• Korn’s method controlled actual number of false positives at u = 50 with 95% confidence (actually slightly conservative)
• Disadvantage: computationally intensive• Examining someone else’s computer
program can be difficult but very rewarding!
![Page 19: Controlling the Actual Number of False Discoveries at a Given Confidence Level](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148ba550346895db5d406/html5/thumbnails/19.jpg)
Future Directions
• Try different parameters (e.g., signal size; number of subjects, variables, or permutations), or with correlated variables
• Try the method on real data
• Try Korn’s “Procedure B”, which controls the actual FDR at a given confidence level
• Try Lusa’s R package for feature selection
![Page 20: Controlling the Actual Number of False Discoveries at a Given Confidence Level](https://reader035.vdocuments.us/reader035/viewer/2022062314/568148ba550346895db5d406/html5/thumbnails/20.jpg)
References• Benjamini, Y., and Hochberg, Y. 1995. Controlling the false
discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57: 289–300.
• Korn EL, Troendle JF, McShane LM and Simon R. Controlling the number of false discoveries: application to high-dimensional genomic data. Journal of Statistical Planning and Inference 124(2): 379-398 (2004).
• Lusa L, Korn EL, McShane LM, A class comparison method with filtering-enhanced variable selection for high-dimensional data sets, Stat Med. 2008 Dec 10;27(28):5834-49. R package available at: http://linus.nci.nih.gov/Data/LusaL/bioinfo/
• Westfall PF, Tobias RD, Rom D, Wolfinger RD, Hochberg Y, Multiple Comparisons and Multiple Tests, Crary, NC:SAS Institute, Inc, 1999.
• A copy of the R code developed for this project can be found here:http://bist.pbwiki.com/f/bist530FinalProject.r