simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for...

82

Upload: others

Post on 05-Jan-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several
Page 2: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

simplystatistics.org @simplystats

Page 3: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Not So Standard Deviations(with Hilary Parker of Stitch Fix)

Subscribe in iTunes: https://goo.gl/ZhWYbdhttps://soundcloud.com/nssd-podcast

@NSSDeviations

Page 4: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

NSSD Episode 23

https://goo.gl/d8eszr

Page 5: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

leanpub.com/artofdatascience leanpub.com/conversationsondatascience

Page 6: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several
Page 7: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Protecting Health, Saving Lives —

Page 8: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Protecting Health, Saving Lives —Millions at a Time

Page 9: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Protecting Health, Saving Lives —Millions at a Time

(of data points)

Page 10: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

$ per (human) Genome

http://www.genome.gov/sequencingcosts/

The Measurement Revolution…

Page 11: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

The Measurement Revolution…

Brian Caffo

Page 12: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

The Measurement Revolution…

Brian Caffo

Page 13: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

The Measurement Revolution…

Brian Caffo

Page 14: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

The Measurement Revolution…

Brian Caffo

Page 15: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

The Measurement Revolution…

Brian Caffo

Page 16: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

The Measurement Revolution…

Brian Caffo

Page 17: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

The Measurement Revolution… Multiplied!

Brian Caffo

Page 18: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Data is Eating the World*

*see “Software is Eating the World” by Marc Andreessen

Page 19: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Data is Eating the World*(but analysis isn’t yet)

*see “Software is Eating the World” by Marc Andreessen

Page 20: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Not Enough Geeks

Page 21: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Not Enough Geeks

Page 22: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Epidemic of Bad Data Analysis

Leek & Peng (2015), PNAS

Page 23: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Epidemic of Bad Data Analysis

“The best way to prevent poor data analysis in the scientific literature is to (i) increase the number of trained data analysts in the scientific community and (ii) identify statistical software and tools that can be shown to improve reproducibility and replicability of studies.”

Leek & Peng (2015), PNAS

Page 24: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several
Page 25: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several
Page 26: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several
Page 27: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

What is Data Science?• Formulating a question that can be answered with data

• Assembling, cleaning, tidying data relevant to a question

• Exploring data, checking, eliminating hypotheses

• Developing a (statistical) model

• Making statistical inference

• Communicating findings

Page 28: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several
Page 29: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several
Page 30: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Inner-city Childhood Asthma• Chronic inflammatory disorder of the airways

• Inflammation associated with (1) airways hyper-responsiveness; (2) airflow limitation (at least partially reversible); (3) respiratory symptoms (wheeze, cough)

• Airway inflammation can be present even in mild disease

• Racial/ethnic minorities often comprise majority of residents in inner-cities

• Asthma prevalence rates 25-28% in inner-cities

Page 31: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Environmental Intervention and (Allergic) Asthma Pathogenesis

Pollutant or Allergen

Exposure

Specific IgE + sensitized mast

cells

Inflammatory Mediators (histamine,

leukotrienes)

Vasodilation, mucous (symptoms),

bronchoconstriction

Page 32: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Environmental Intervention and (Allergic) Asthma Pathogenesis

Pollutant or Allergen

Exposure

Specific IgE + sensitized mast

cells

Inflammatory Mediators (histamine,

leukotrienes)

Vasodilation, mucous (symptoms),

bronchoconstriction

Anti-IgE Anti-inflammatory

Medication

Anti-mediator

Page 33: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Environmental Intervention and (Allergic) Asthma Pathogenesis

Pollutant or Allergen

Exposure

Specific IgE + sensitized mast

cells

Inflammatory Mediators (histamine,

leukotrienes)

Vasodilation, mucous (symptoms),

bronchoconstriction

Environmental Intervention Anti-IgE Anti-

inflammatory

Medication

Anti-mediator

Page 34: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

PREACH Study• Randomized intervention in homes in East Baltimore to

lower indoor PM levels

• Two groups: Control, Air Cleaner

• Baseline and 6-month clinic and home visit

• 126 children 6-12 yrs old with asthma enrolled

• Homes had to have a smoker (> 5 cigs/day) living there at least 4 days/week

• Goal: Decrease PM2.5 and increase symptom-free days

Page 35: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

PREACH Results (outcome)

Improvement of 1.9 symptom-free days

Page 36: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

PREACH Results (PM2.5)

Page 37: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Bayesian Mixture Models

Page 38: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Comparison of Model Estimates:Change in Symptom-Free Days

Model Always-taker Never-taker Complier

1 5.2 (-0.1, 11.8)

2 -0.3 (-1.4, 0.9) 5.5 (0.4, 13.3)

3 3.0 (-2.5, 10.2) 4.1 (0.1, 10.8)

Original 1.9 (0.2, 3.6)

Page 39: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several
Page 40: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several
Page 41: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several
Page 42: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Air Pollution

Page 43: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

WHO Global Burden of Disease

3.7 million deaths due to ambient air pollution (2012)

The Landscape

Page 44: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Ischaemicheart disease

Stroke

COPD

Lowerrespiratory in...

Tracheabronchus, lu...

HIV/AIDS

Diarrhœaldiseases

Diabetesmellitus

Road injury

Hypertensive...

0million 2million 4million 6million 8million 10million

7.4million

6.7million

3.1million

3.1million

1.6million

1.5million

1.5million

1.5million

1.3million

1.1million

The 10 leading causes of death in the world2012

The Landscape

WHO Global Burden of Disease

Page 45: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Ischaemicheart disease

Stroke

COPD

Lowerrespiratory in...

Tracheabronchus, lu...

HIV/AIDS

Diarrhœaldiseases

Diabetesmellitus

Road injury

Hypertensive...

0million 2million 4million 6million 8million 10million

7.4million

6.7million

3.1million

3.1million

1.6million

1.5million

1.5million

1.5million

1.3million

1.1million

The 10 leading causes of death in the world2012

The Landscape

WHO Global Burden of Disease

Page 46: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

The Global Landscape

Page 47: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Beijing, August 18, 2011

Beijing, December 5, 2011

What is an air pollution study?

Page 48: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Beijing, August 18, 2011

Beijing, December 5, 2011

What is an air pollution study?

Page 49: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Beijing

Shanghai

What is an air pollution study?

Page 50: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Beijing

Shanghai

What is an air pollution study?

Page 51: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several
Page 52: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Air Pollution Monitoring

Page 53: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Air Pollution Monitoring

Page 54: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Monitor Locations

Page 55: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Spatial—Temporal Model

Pollutant level atlocation s and time t Fixed effects

Gaussian process withcorrelation function

ρ(⋅,⋅ |φ,κ)County-wide block averagepollutant level

Page 56: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Predictive Distribution for xtJoint distribution of monitor values and block average is Normal

Page 57: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Bayesian and Plug-in Approaches• Predictive distribution of block average given

monitor values

• Bayesian Model: We can use MCMC to sample from

Poisson likelihood (time series model)

Predictive model

Page 58: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Combined Estimates Across 20 Counties

Original approach underestimated risk by more than half (!)

Page 59: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Combined Estimates Across 20 Counties

Original approach underestimated risk by more than half (!)

Page 60: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several
Page 61: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Parable of Google Flu Trends

Page 62: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several
Page 63: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Parable of Personalized Medicine

Genomic signatures to guide the use ofchemotherapeuticsAnil Potti1,2, Holly K Dressman1,3, Andrea Bild1,3, Richard F Riedel1,2, Gina Chan4, Robyn Sayer4,Janiel Cragun4, Hope Cottrill4, Michael J Kelley2, Rebecca Petersen5, David Harpole5, Jeffrey Marks5,Andrew Berchuck1,6, Geoffrey S Ginsburg1,2, Phillip Febbo1–3, Johnathan Lancaster4 &Joseph R Nevins1–3

Using in vitro drug sensitivity data coupled with Affymetrix microarray data, we developed gene expression signatures that predictsensitivity to individual chemotherapeutic drugs. Each signature was validated with response data from an independent set of cellline studies. We further show that many of these signatures can accurately predict clinical response in individuals treated withthese drugs. Notably, signatures developed to predict response to individual agents, when combined, could also predict responseto multidrug regimens. Finally, we integrated the chemotherapy response signatures with signatures of oncogenic pathwayderegulation to identify new therapeutic strategies that make use of all available drugs. The development of gene expressionprofiles that can predict response to commonly used cytotoxic agents provides opportunities to better use these drugs, includingusing them in combination with existing targeted therapies.

Numerous advances have been achieved in the development, selectionand application of chemotherapeutic agents, sometimes with remark-able clinical successes—as in the case of treatment for lymphomas orplatinum-based therapy for testicular cancers1. In addition, in severalinstances, combination chemotherapy in the postoperative (adjuvant)setting has been curative. However, most people with advanced solidtumors will relapse and die of their disease. Moreover, administrationof ineffective chemotherapy increases the probability of side effects,particularly those from cytotoxic agents, and of a consequent decreasein quality of life1,2.

Recent work has demonstrated the value in using biomarkers toselect individuals for various targeted therapeutics, including tamox-ifen, trastuzumab and imatinib mesylate. In contrast, equivalent toolsto select those most likely to respond to the commonly usedchemotherapeutic drugs are lacking3.

With the goal of developing genomic predictors of chemotherapysensitivity that could direct the use of cytotoxic agents to those mostlikely to respond, we combined in vitro drug response data, togetherwith microarray gene expression data, to develop models that couldpotentially predict responses to various cytotoxic chemotherapeuticdrugs4. We now show that these signatures can predict clinical orpathologic response to the corresponding drugs, including combina-tions of drugs. We further use the ability to predict deregulatedoncogenic signaling pathways in tumors to develop a strategy that

identifies opportunities for combining chemotherapeutic drugs withtargeted therapeutic drugs in a way that best matches the character-istics of the individual.

RESULTSA gene expression–based predictor of sensitivity to docetaxelTo develop predictors of cytotoxic chemotherapeutic drug response,we used an approach similar to previous work analyzing the NCI-60panel4 from the US National Cancer Institute (NCI). We firstidentified cell lines that were most resistant or sensitive to docetaxel(Fig. 1a,b) and then genes whose expression correlated most highlywith drug sensitivity, and used Bayesian binary regression analysis todevelop a model that differentiates a pattern of docetaxel sensitivityfrom that of resistance. A gene expression signature consisting of 50genes was identified that classified cell lines on the basis of docetaxelsensitivity (Fig. 1b, right).

In addition to leave-one-out cross-validation, we used an indepen-dent dataset derived from docetaxel sensitivity assays in a seriesof 30 lung and ovarian cancer cell lines for further validation.The significant correlation (P o 0.01, log-rank test) between thepredicted probability of sensitivity to docetaxel (in both lung andovarian cell lines) (Fig. 1c, left) and the respective 50% inhibitoryconcentration (IC50) for docetaxel confirmed the capacity of thedocetaxel predictor to predict sensitivity to the drug in cancer cell

Received 11 July; accepted 12 September; published online 22 October; corrected online 27 October 2006 (details online) and corrected after print 21 July 2008;doi:10.1038/nm1491

1Duke Institute for Genome Sciences and Policy, Duke University, Box 3382, Durham, North Carolina 27710, USA. 2Department of Medicine, Duke University MedicalCenter, Box 31295, Durham, North Carolina 27710, USA. 3Department of Molecular Genetics and Microbiology, Duke University Medical Center, Box 3054, Durham,North Carolina 27710, USA. 4Division of Gynecologic Surgical Oncology, H. Lee Moffitt Cancer Center & Research Institute, University of South Florida, 12902Magnolia Drive, Tampa, Florida 33612, USA. 5Department of Surgery, Duke University Medical Center, Box 3627, Durham, North Carolina 27710, USA. 6Departmentof Obstetrics and Gynecology, Duke University Medical Center, Box 3079, Durham, North Carolina 27710, USA. Correspondence should be addressed toJ.R.N. ([email protected]).

12 94 VOLUME 12 [ NUMBER 11 [ NOVEMBER 2006 NATURE MEDICINE

A R T I C L E S

© 2

011

Nat

ure

Am

eric

a, In

c. A

ll ri

ghts

res

erve

d.

Page 64: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Parable of Personalized Medicine

Genomic signatures to guide the use ofchemotherapeuticsAnil Potti1,2, Holly K Dressman1,3, Andrea Bild1,3, Richard F Riedel1,2, Gina Chan4, Robyn Sayer4,Janiel Cragun4, Hope Cottrill4, Michael J Kelley2, Rebecca Petersen5, David Harpole5, Jeffrey Marks5,Andrew Berchuck1,6, Geoffrey S Ginsburg1,2, Phillip Febbo1–3, Johnathan Lancaster4 &Joseph R Nevins1–3

Using in vitro drug sensitivity data coupled with Affymetrix microarray data, we developed gene expression signatures that predictsensitivity to individual chemotherapeutic drugs. Each signature was validated with response data from an independent set of cellline studies. We further show that many of these signatures can accurately predict clinical response in individuals treated withthese drugs. Notably, signatures developed to predict response to individual agents, when combined, could also predict responseto multidrug regimens. Finally, we integrated the chemotherapy response signatures with signatures of oncogenic pathwayderegulation to identify new therapeutic strategies that make use of all available drugs. The development of gene expressionprofiles that can predict response to commonly used cytotoxic agents provides opportunities to better use these drugs, includingusing them in combination with existing targeted therapies.

Numerous advances have been achieved in the development, selectionand application of chemotherapeutic agents, sometimes with remark-able clinical successes—as in the case of treatment for lymphomas orplatinum-based therapy for testicular cancers1. In addition, in severalinstances, combination chemotherapy in the postoperative (adjuvant)setting has been curative. However, most people with advanced solidtumors will relapse and die of their disease. Moreover, administrationof ineffective chemotherapy increases the probability of side effects,particularly those from cytotoxic agents, and of a consequent decreasein quality of life1,2.

Recent work has demonstrated the value in using biomarkers toselect individuals for various targeted therapeutics, including tamox-ifen, trastuzumab and imatinib mesylate. In contrast, equivalent toolsto select those most likely to respond to the commonly usedchemotherapeutic drugs are lacking3.

With the goal of developing genomic predictors of chemotherapysensitivity that could direct the use of cytotoxic agents to those mostlikely to respond, we combined in vitro drug response data, togetherwith microarray gene expression data, to develop models that couldpotentially predict responses to various cytotoxic chemotherapeuticdrugs4. We now show that these signatures can predict clinical orpathologic response to the corresponding drugs, including combina-tions of drugs. We further use the ability to predict deregulatedoncogenic signaling pathways in tumors to develop a strategy that

identifies opportunities for combining chemotherapeutic drugs withtargeted therapeutic drugs in a way that best matches the character-istics of the individual.

RESULTSA gene expression–based predictor of sensitivity to docetaxelTo develop predictors of cytotoxic chemotherapeutic drug response,we used an approach similar to previous work analyzing the NCI-60panel4 from the US National Cancer Institute (NCI). We firstidentified cell lines that were most resistant or sensitive to docetaxel(Fig. 1a,b) and then genes whose expression correlated most highlywith drug sensitivity, and used Bayesian binary regression analysis todevelop a model that differentiates a pattern of docetaxel sensitivityfrom that of resistance. A gene expression signature consisting of 50genes was identified that classified cell lines on the basis of docetaxelsensitivity (Fig. 1b, right).

In addition to leave-one-out cross-validation, we used an indepen-dent dataset derived from docetaxel sensitivity assays in a seriesof 30 lung and ovarian cancer cell lines for further validation.The significant correlation (P o 0.01, log-rank test) between thepredicted probability of sensitivity to docetaxel (in both lung andovarian cell lines) (Fig. 1c, left) and the respective 50% inhibitoryconcentration (IC50) for docetaxel confirmed the capacity of thedocetaxel predictor to predict sensitivity to the drug in cancer cell

Received 11 July; accepted 12 September; published online 22 October; corrected online 27 October 2006 (details online) and corrected after print 21 July 2008;doi:10.1038/nm1491

1Duke Institute for Genome Sciences and Policy, Duke University, Box 3382, Durham, North Carolina 27710, USA. 2Department of Medicine, Duke University MedicalCenter, Box 31295, Durham, North Carolina 27710, USA. 3Department of Molecular Genetics and Microbiology, Duke University Medical Center, Box 3054, Durham,North Carolina 27710, USA. 4Division of Gynecologic Surgical Oncology, H. Lee Moffitt Cancer Center & Research Institute, University of South Florida, 12902Magnolia Drive, Tampa, Florida 33612, USA. 5Department of Surgery, Duke University Medical Center, Box 3627, Durham, North Carolina 27710, USA. 6Departmentof Obstetrics and Gynecology, Duke University Medical Center, Box 3079, Durham, North Carolina 27710, USA. Correspondence should be addressed toJ.R.N. ([email protected]).

12 94 VOLUME 12 [ NUMBER 11 [ NOVEMBER 2006 NATURE MEDICINE

A R T I C L E S

© 2

011

Nat

ure

Am

eric

a, In

c. A

ll ri

ghts

res

erve

d.

Page 65: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

“Rock Star” Statisticians

New York Times

Page 66: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Deception at Duke

Page 67: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Bad process!

Page 68: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

The Data Science Process

Page 69: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Institute of Medicine Report

Page 70: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Institute of Medicine Report• Data/metadata used should be made publicly

available

• The computer code and fully specified computational procedures used should be made available

• Ideally, the computer code that is released will encompass all of the steps of computational analysis, including all data preprocessing steps

Page 71: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Air Pollution and Health: A Perfect Storm?

• Estimating small health effects in the presence of much stronger signals

• Results inform substantial policy decisions and affect many stakeholders

• EPA regulations can cost billions of dollars

• Complex statistical methods are needed and subjected to intense scrutiny

Page 72: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Medicare Cohort Air Pollution Study

Page 73: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several
Page 74: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

The success of a data analysis depends on the process, not the result.

Page 75: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several
Page 76: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

The Future is Bright!• A tremendous infrastructure on which to build

• Cheap hardware and Moore’s Law has made powerful computing available to all with the cloud

• Advanced software has abstracted complex details of data analysis

• The Internet allows analyses to be deployed to the entire world

• Volume of data is increasing dramatically

• A never-ending supply of difficult (but interesting) questions!

Page 77: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

The Future is Bright!

• The future will favor those trained in data science

• Problems and data are coming in too fast

• A perfect training for interdisciplinary work

• Many leadership opportunities

Page 78: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Johns Hopkins Biostatistics• Largest / oldest / best school of public health

• PhD program in Biostatistics (~10 per class)

• ScM program in Biostatistics (8-12 per class)

• Rigorous training with focus on science

• Applications: Environmental health, genomics, personalized medicine, medical imaging, wearable computing, clinical trials, infectious disease

Page 79: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several
Page 80: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Great Students

Page 81: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several

Great Faculty

Brian Caffo Jeff LeekRoger Peng

Page 82: simplystatistics.org @simplystats · able clinical successes—as in the case of treatment for lymphomas or platinum-based therapy for testicular cancers1. In addition, in several