big data network genomics network inference and perturbation to study chemical-mediated cancer...

18
Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti [email protected] Section of Computational BioMedicine Boston University School of Medicine Biostatistics, BUSPH Bioinformatics Program, BU Graduate Program in Genetics & Genomics, BU Broad Institute of MIT & Harvard

Upload: madison-cummings

Post on 11-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti smonti@bu.edu Section of Computational

Big Data Network Genomics Network Inference and Perturbation

to Study Chemical-Mediated Cancer Induction

Stefano [email protected]

Section of Computational BioMedicineBoston University School of Medicine

Biostatistics, BUSPH

Bioinformatics Program, BU

Graduate Program in Genetics & Genomics, BU

Broad Institute of MIT & Harvard

Page 2: Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti smonti@bu.edu Section of Computational

Abstract

Development and application of novel methods of network inference and differential analysis from multiple genomic data types toward the elucidation of a chemical's mechanism(s) of

cancer induction

Page 3: Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti smonti@bu.edu Section of Computational

Abstract

Development and application of novel methods of network inference and differential analysis from high-dimensional data types toward the elucidation of functionally relevant modules

(generalization)

high-dimensional data typesfunctionally relevant modules

domain specific

Page 4: Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti smonti@bu.edu Section of Computational

The Motivating Problem

Page 5: Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti smonti@bu.edu Section of Computational

GoalsDevelopment of “Carcinogenicity Biomarker(s)”

CarcinogenicityPrediction Model

Chemical

Carcinogen

Non-carcinogen

Pathways affected Driver alterations Biomarkers …

Understand Why

Manuscript under Review

Page 6: Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti smonti@bu.edu Section of Computational

GoalsDevelopment of “Carcinogenicity Biomarker(s)”

CarcinogenicityPrediction Model

Chemical

Carcinogen

Non-carcinogen

Non-carcinogens Carcinogens

gene1 gene2 gene3 gene4 gene5 gene6 gene7

To generate this ‘matrix’100,000s of experiments need

to be performed

1,000 of controls generated

Page 7: Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti smonti@bu.edu Section of Computational

In Progresshigh-throughput data generation

384-well plate

100,000s profiles

Phase I 24 plates (liver and lung) ~200 compounds ~10,000 profiles

Future plans … Phase II

More tissue types (breast, prostate, etc.) More compounds (~1,500) Mixtures 100,000s profiles

Phase III iPSC-derived cells & 3D cultures “personalized exposure” models

Page 8: Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti smonti@bu.edu Section of Computational

Generalization of the Motivating Problem

Comparison of a control state to multiple perturbation states

Standard approaches of gene-based differential analysis might miss salient (aggregate) differences

High-dimensional data (1000s of ‘features’) Usually representable as 2D [10K x 1K] matrices

Large sample size for the ‘control state’ ≥1000 observations

Small sample size for each of the ‘perturbation states’ ~10-100 observations/perturbation

Page 9: Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti smonti@bu.edu Section of Computational

Generalization of the Motivating Problem: an example

The Connectivity Map/LINCS project Expression Profiling of Chemical/Genetic perturbations

• >10,000 compounds (most FDA approved drugs)• ~5,000 genetic perturbation (RNAi, CRISPR)• 18 cell types, multiple doses, time-points

> 1,000,000 profiles

Main Goal: Drug Discovery

Page 10: Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti smonti@bu.edu Section of Computational

Approach Overview

Module1

Module2

ModulepCo

mpo

und 1

Com

poun

d 2

… Com

poun

d n

lossgain

connectivity

Annota

tionWild-Type

Network

Page 11: Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti smonti@bu.edu Section of Computational

Approach Overview

Module1

Module2

ModulepCo

mpo

und 1

Com

poun

d 2

… Com

poun

d n

lossgain

connectivity

Network constructionModule Identification

Annota

tionWild-Type

Network

Module/Network Comparison

Page 12: Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti smonti@bu.edu Section of Computational

Approach Detailsnetworks’ construction

Correlations Networks clustering vs. topology-based ‘module’ identification

Gaussian models Inverse covariance matrix partial correlations

Correlation networks + “scale-free transformations” mostly for comparison w/ existing methods

Page 13: Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti smonti@bu.edu Section of Computational

Approach Details networks’ comparison

Covariance matrices comparison

Probabilistic Model Selection Bayes Factor

Network topology Diffusion State Distance (M. Crovella) and related

Page 14: Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti smonti@bu.edu Section of Computational

The Data

Gene expression profiles networks’ inference

Protein-protein interaction networks’ priors

“Cell painting” profiles networks’ annotation

100K samples

10K features (genes)

Page 15: Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti smonti@bu.edu Section of Computational

Deliverables

Computational Toolbox Network inference and visualization Module (i.e., sub-network) identification/comparison Network/module-based clustering/annotation

Analysis and cataloguing of chemical perturbations Chemicals’ putative mechanisms of action Interpretable carcinogenicity predictor(s)

A sandbox for researchers to develop and test new methods richly annotated multi-type data domain expertise to evaluate relevance/usefulness

Preliminary results for pursuit of further funding

Page 16: Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti smonti@bu.edu Section of Computational

The Team

Stefano Monti, Ph.D. (Assoc. Professor)Computational Biology, Cancer Genomics, Machine Learning (Bayesian Networks)

Paola Sebastiani, Ph.D. (Professor)Biostatistics, Genetics/Genomics, Bayesian Graphical Models

Mark Cravella, Ph.D. (Professor)Computer Science, Network Analysis

Simon Kasif (Professor)Computational Biology, Systems Biology, Machine Learning

Francesca Mulas, Ph.D. (Post-doctoral Fellow)Computational Biology/Bioinformatics, Computer Science

Daniel Gusenleiter, M.S. (Ph.D. student)Bioinformatics, Computer Science, Machine Learning

Page 17: Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti smonti@bu.edu Section of Computational

“Background” TeamBU-SRPDavid Ozonoff Basra KomalHeather Henry (NIEHS)

Evans Foundation - ARCKatya RavidRobin MacDonald

NTP/NIEHSScott AuerbachRay Tice

Broad InstituteAravind SubramanianXiaodong LuTodd GolubcMAP team

BU CBM/Bioinformatics/SPHDavid Sherr (co-PI)Daniel GusenleitnerJessalyn Ubellacker

Tisha MeilaHarold GomezYuxiang TanLiye Zhang

Elizabeth MosesTeresa WangMarc LenburgAvi Spira

Page 18: Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti smonti@bu.edu Section of Computational

The End