very large biomedical data sets - stanford...

22
Very Large Biomedical Data Sets (Trying To Do Thousands of Hypothesis Tests at the Same Time) Bradley Efron Stanford University Reference “Microarrays, Empirical Bayes and the Two-Groups Model” http://www-stat.stanford.edu/~brad/papers/twogroups.pdf

Upload: phamcong

Post on 14-Feb-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Very Large Biomedical Data Sets - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2008Biomedical2.pdf · gene2 -0.84 -0.85 -0.16 -0.75 0.25 -0.83 -0.27-0.82 3.57 ... pat1

Very Large Biomedical Data Sets(Trying To Do Thousands of Hypothesis Tests at the Same Time)

Bradley EfronStanford University

Reference “Microarrays, Empirical Bayes and the Two-Groups Model”http://www-stat.stanford.edu/~brad/papers/twogroups.pdf

Page 2: Very Large Biomedical Data Sets - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2008Biomedical2.pdf · gene2 -0.84 -0.85 -0.16 -0.75 0.25 -0.83 -0.27-0.82 3.57 ... pat1

What is “Statistics”?

• Learning from experience– That arrives a little bit at a time.

• Clinical Trial– No one patient’s response is conclusive, but

information can be accrued across patients….

Page 3: Very Large Biomedical Data Sets - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2008Biomedical2.pdf · gene2 -0.84 -0.85 -0.16 -0.75 0.25 -0.83 -0.27-0.82 3.57 ... pat1

Gene 4124: Prostate Cancer Study

• 50 Healthy Men‒1.05, 0.34, 1.16, ‒0.29, ‒0.40 LL 0.13, ‒0.81, 0.71, 0.80

Mean ‒0.033

• 52 Prostate Cancer Patients0.07, 1.67, 1.58, ‒1.06, ‒1.04 L

L ‒1.05, 0.83, 0.21, 0.50Mean 0.325

• Question: Is gene 4124 “overexpressed”in prostate cancer patients?

Page 4: Very Large Biomedical Data Sets - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2008Biomedical2.pdf · gene2 -0.84 -0.85 -0.16 -0.75 0.25 -0.83 -0.27-0.82 3.57 ... pat1

Hypothesis Test

Page 5: Very Large Biomedical Data Sets - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2008Biomedical2.pdf · gene2 -0.84 -0.85 -0.16 -0.75 0.25 -0.83 -0.27-0.82 3.57 ... pat1
Page 6: Very Large Biomedical Data Sets - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2008Biomedical2.pdf · gene2 -0.84 -0.85 -0.16 -0.75 0.25 -0.83 -0.27-0.82 3.57 ... pat1

Prostate Cancer Study

• 6033 genes

• 6033 z-values, comparing cancer patients with healthy controls for each gene

• Is gene 4124 still “interesting”?

Page 7: Very Large Biomedical Data Sets - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2008Biomedical2.pdf · gene2 -0.84 -0.85 -0.16 -0.75 0.25 -0.83 -0.27-0.82 3.57 ... pat1

PROSTATE CANCER DATA (Microarray)(Singh et al. 2002)

-0.91-0.790.00-0.80-0.80-0.70-0.67-0.09-0.25gene6033

0.100.09-0.89-0.88-0.87-0.91-0.881.33-0.90gene6032

-1.18-0.82-1.18-1.17-0.92-0.91-0.790.100.35gene6031

.

.

-0.14-0.14-0.10-1.080.941.701.050.18-1.12gene5

-1.130.43-0.19-0.36-0.13-1.13-0.102.42-0.36gene4

-0.03-1.100.094.040.11-1.160.220.100.06gene3

3.57-0.82-0.27-0.830.25-0.75-0.16-0.85-0.84gene2

1.470.732.77-1.09-0.58-0.99-1.08-0.75-0.93gene1

“z”pat102pat101pat52pat51pat50pat49pat2pat1

TESTSTATISTICS

PROSTATE CANCERHEALTHY

Question: Which genes, if any, are implicated in the development of prostate cancer?

Page 8: Very Large Biomedical Data Sets - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2008Biomedical2.pdf · gene2 -0.84 -0.85 -0.16 -0.75 0.25 -0.83 -0.27-0.82 3.57 ... pat1

Doing 6033 Hypothesis Tests at Once

Page 9: Very Large Biomedical Data Sets - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2008Biomedical2.pdf · gene2 -0.84 -0.85 -0.16 -0.75 0.25 -0.83 -0.27-0.82 3.57 ... pat1
Page 10: Very Large Biomedical Data Sets - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2008Biomedical2.pdf · gene2 -0.84 -0.85 -0.16 -0.75 0.25 -0.83 -0.27-0.82 3.57 ... pat1

False Discovery Rates(Benjamini and Hochberg 1995)

Page 11: Very Large Biomedical Data Sets - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2008Biomedical2.pdf · gene2 -0.84 -0.85 -0.16 -0.75 0.25 -0.83 -0.27-0.82 3.57 ... pat1
Page 12: Very Large Biomedical Data Sets - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2008Biomedical2.pdf · gene2 -0.84 -0.85 -0.16 -0.75 0.25 -0.83 -0.27-0.82 3.57 ... pat1
Page 13: Very Large Biomedical Data Sets - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2008Biomedical2.pdf · gene2 -0.84 -0.85 -0.16 -0.75 0.25 -0.83 -0.27-0.82 3.57 ... pat1

False Discovery Control Algorithm

Page 14: Very Large Biomedical Data Sets - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2008Biomedical2.pdf · gene2 -0.84 -0.85 -0.16 -0.75 0.25 -0.83 -0.27-0.82 3.57 ... pat1

A SNP Study(Quertermous et al.)

• 1000 subjects: 500 cardiovascular, 500 healthy

• Polymorphisms examined at 550,000 locations on whole genome

• Look for correlation between polymorphisms and disease status

550,000 z-values, one for each SNP• Mostly null!• Fdr{|z| > 4} = 34.7/41 = 84%

Page 15: Very Large Biomedical Data Sets - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2008Biomedical2.pdf · gene2 -0.84 -0.85 -0.16 -0.75 0.25 -0.83 -0.27-0.82 3.57 ... pat1
Page 16: Very Large Biomedical Data Sets - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2008Biomedical2.pdf · gene2 -0.84 -0.85 -0.16 -0.75 0.25 -0.83 -0.27-0.82 3.57 ... pat1
Page 17: Very Large Biomedical Data Sets - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2008Biomedical2.pdf · gene2 -0.84 -0.85 -0.16 -0.75 0.25 -0.83 -0.27-0.82 3.57 ... pat1

The Brain Data(Schwartzman et al. 2005)

Page 18: Very Large Biomedical Data Sets - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2008Biomedical2.pdf · gene2 -0.84 -0.85 -0.16 -0.75 0.25 -0.83 -0.27-0.82 3.57 ... pat1
Page 19: Very Large Biomedical Data Sets - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2008Biomedical2.pdf · gene2 -0.84 -0.85 -0.16 -0.75 0.25 -0.83 -0.27-0.82 3.57 ... pat1
Page 20: Very Large Biomedical Data Sets - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2008Biomedical2.pdf · gene2 -0.84 -0.85 -0.16 -0.75 0.25 -0.83 -0.27-0.82 3.57 ... pat1
Page 21: Very Large Biomedical Data Sets - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2008Biomedical2.pdf · gene2 -0.84 -0.85 -0.16 -0.75 0.25 -0.83 -0.27-0.82 3.57 ... pat1
Page 22: Very Large Biomedical Data Sets - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2008Biomedical2.pdf · gene2 -0.84 -0.85 -0.16 -0.75 0.25 -0.83 -0.27-0.82 3.57 ... pat1