very large biomedical data sets - stanford...

Post on 14-Feb-2018

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Very Large Biomedical Data Sets(Trying To Do Thousands of Hypothesis Tests at the Same Time)

Bradley EfronStanford University

Reference “Microarrays, Empirical Bayes and the Two-Groups Model”http://www-stat.stanford.edu/~brad/papers/twogroups.pdf

What is “Statistics”?

• Learning from experience– That arrives a little bit at a time.

• Clinical Trial– No one patient’s response is conclusive, but

information can be accrued across patients….

Gene 4124: Prostate Cancer Study

• 50 Healthy Men‒1.05, 0.34, 1.16, ‒0.29, ‒0.40 LL 0.13, ‒0.81, 0.71, 0.80

Mean ‒0.033

• 52 Prostate Cancer Patients0.07, 1.67, 1.58, ‒1.06, ‒1.04 L

L ‒1.05, 0.83, 0.21, 0.50Mean 0.325

• Question: Is gene 4124 “overexpressed”in prostate cancer patients?

Hypothesis Test

Prostate Cancer Study

• 6033 genes

• 6033 z-values, comparing cancer patients with healthy controls for each gene

• Is gene 4124 still “interesting”?

PROSTATE CANCER DATA (Microarray)(Singh et al. 2002)

-0.91-0.790.00-0.80-0.80-0.70-0.67-0.09-0.25gene6033

0.100.09-0.89-0.88-0.87-0.91-0.881.33-0.90gene6032

-1.18-0.82-1.18-1.17-0.92-0.91-0.790.100.35gene6031

.

.

-0.14-0.14-0.10-1.080.941.701.050.18-1.12gene5

-1.130.43-0.19-0.36-0.13-1.13-0.102.42-0.36gene4

-0.03-1.100.094.040.11-1.160.220.100.06gene3

3.57-0.82-0.27-0.830.25-0.75-0.16-0.85-0.84gene2

1.470.732.77-1.09-0.58-0.99-1.08-0.75-0.93gene1

“z”pat102pat101pat52pat51pat50pat49pat2pat1

TESTSTATISTICS

PROSTATE CANCERHEALTHY

Question: Which genes, if any, are implicated in the development of prostate cancer?

Doing 6033 Hypothesis Tests at Once

False Discovery Rates(Benjamini and Hochberg 1995)

False Discovery Control Algorithm

A SNP Study(Quertermous et al.)

• 1000 subjects: 500 cardiovascular, 500 healthy

• Polymorphisms examined at 550,000 locations on whole genome

• Look for correlation between polymorphisms and disease status

550,000 z-values, one for each SNP• Mostly null!• Fdr{|z| > 4} = 34.7/41 = 84%

The Brain Data(Schwartzman et al. 2005)

top related