the connectivity map - olsug.org · people that have participated in the project include irene...

22
A portrait of The Database as Biomedical Laboratory The Connectivity Map: by Pablo Tamayo, Broad Institute and Oracle Corporation

Upload: duongxuyen

Post on 06-Apr-2018

216 views

Category:

Documents


1 download

TRANSCRIPT

A portrait of

TheDatabase as Biomedical

Laboratory

The Connectivity Map:

by Pablo Tamayo, Broad Institute and Oracle Corporation

Part I:What is the CMAP?

Introduction and overview

Connectivity Map

In a nutshell: it connects diseases with drugs using the language of genes.

It is organized as a publicly available online database that contains the signatures of many drugs in the language of gene expression.

Connectivity Map

It can be “queried” with genetic signatures of disease, in an approach known as in silico drug screening, in order to find matching drugs that are therefore identified as potential new treatments for the disease.

Connectivity Map (‐)‐catechin 12,13‐EODE 3‐hydroxy‐DL‐kynurenine BW‐B70C DL‐PPMP MG‐132 cytochalasin demecolcine doxycycline minocycline monensin phenanthridinone phentolamine trichostatin tyrphostin yohimbine 15‐delta 17‐allylamino‐geldanamycin 17‐dimethylamino‐geldanamycin LY‐294002 acetylsalicylic

5186223 5186324 5213008 5286656 HC calmidazolium carbamazepine

celastrol

celecoxib clotrimazole colforsin decitabine docosahexaenoic ikarugamycin ionomycin pararosaniline quercetin rottlerin topiramate 5182598 5211181 5224221 5230742 5248896 5252917

fulvestrant geldanamycin genistein haloperidol

monorden nordihydroguaiaretic prochlorperazine rosiglitazone sirolimus thioridazine tretinoin troglitazone valproic vorinostat wortmannin clozapine trifluoperazine

5109870 5114445

5140203 5149715 5151277 5152487 5162773

5253409 5255229 5279552 Y‐27632 blebbistatin bucladesine depudecin felodipine oxaprozin prazosin pyrvinium resveratrol monastrol butirosin mercaptopurine W‐13 benserazide colchicine tioguanine paclitaxel pentamidine novobiocin 4,5‐dianilinophthalimide nocodazole 5666823

LM‐1685 NU‐1025 butein thalidomide MK‐886 arachidonic ciclosporin nifedipine arachidonyltrifluoromethane 3‐aminobenzamide probucol U0125 splitomicin HNMPA‐(AM)3 dimethyloxalylglycine fisetin copper deferoxamine tetraethylenepentamine 1,5‐isoquinolinediol SC‐58125 gefitinib staurosporine indometacin sodium iloprost

pirinixic dopamine imatinib rofecoxib cobalt quinpirole TTNPB diclofenac clofibrate

oligomycin oxamic fasudil raloxifene tacrolimus

tamoxifen dexamethasone 2‐deoxy‐D‐glucose

azathioprine nitrendipine

N‐phenylanthranilic flufenamic exisulind

sulindac fludrocortisone prednisolone tomelukast sulfasalazine amitriptyline dexverapamil exemestane verapamil chlorpropamide tolbutamide mesalazine metformin phenformin Phenylalpha‐estradiol Chlorpromazine

estradiol fluphenazine

It contains 164 (1079 v2) different drugs including most FDA approved drugs.

Connectivity Map

The CMAP can significantly speed up the rate of drug discovery, and find new uses for old drugs.

The CMAP is housed at the Broad Institute in Cambridge MA and is publicly available at

www.broad.mit.edu/cmap/

The Broad Institute is a research collaboration involving the MIT and Harvard academic and medical communities.

It was founded in 2003 through thefar‐sighted generosity of philanthropists Eli and Edythe Broad.

The Institute is organized around interdisciplinary Scientific Programs and Scientific Platforms to enable scientists to collaborate on important projects with the objective of bringing the power of genomics to medicine.

People that have participated in the project include Irene Blat,Jean‐Philippe Brunet, Steve Carr, Jon Clardy, Paul Clemons, Emily Crawford, Stephen Haggarty, William Hahn, Jim Lerner, Joshua Modell, David Peck, Xiao Peng, Srilakshmi Raj, Michael Reich, Kenneth Ross, Aravind Subramanian, David Twomey, Ru Wei and Matthew Wrobel. Justin Lamb and Todd Golub (shown in photo below) lead the CMAP team.

Photo courtesy of Justin Ide/Harvard News Office

The CMAP Team

CMAP reference: Lamb et al. The Connectivity Map: Using Gene‐Expression Signatures to Connect Small Molecules, Genes, and Disease. Science 313 (5795), 1929 (2006).

The CMAP v1 runs on an Oracle Database 10g Enterprise Edition Release 10.1 ‐ 64bit with partitioning, OLAP and data mining options.

What type of database is the CMAP?

CMAP

Web interface

Java ServletsIt captures information about the experimental process that generates the data

It stores the drug and disease signatures plus entire results sets for each user/query that can be retrieved at later times.

It has about 5,800 registered users.

It is implemented as a Java/servlet application with a web interface.

Two articles in this issue of Cancer Cell show the use of the CMAP in Leukemia and prostate cancer research to predict anticancer activity that was subsequently demonstrated in additional experiments on model systems.

Volume 10, October 2006

The Connectivity Map has been useful to identify novel therapeutics in leukemia and prostate cancer

Later in this presentation we will see the leukemia example in detail …

Part II:

How does the CMAP work?

… then those were profiled using Affymetrix arrays of DNA

micro‐chips and a scanner

Breast Prostate Leukemia Melanoma

…on 4 different types of cell lines…

genes that go down

genes that go upThe drug

signatures are ordered

lists of genes…

…then a computer program identified drug signatures

First 164 (1079 v2) distinct drugs were selected and used in several doses and times for a total of 564 (5774 v2) instances…

CMAP

.. .they were finally

stored in the database

How was the CMAP created?

How is the CMAP queried?Starting from two patient populations

E.g. Disease and Normal…A B

…samples are extracted and profiled using Affymetrix arrays

of DNA micro‐chips and a scanner

…a computer program defines the disease signature

genes that go down

genes that go up

Disease signatureCMAP

Query...and the disease

signature itself becomes

the query

match against all the drugs

~22,000 genes

564 (5774 v2) drug instances

Disease X signature

Top genes up

Top genes down

……

How to match diseases to drugs?

is match against all the drugs by using

an statistical

test~22,000 genes… …

564 (5774 v2) drug instances

strong weak null weak strongpositive positive negative negative

Disease signature e.g. 13 genes:

7 up and 6 downA B

gene upgene down

One Example in Detail…

Notice that the CMAP queries are not standard information retrieval queries such as:

SELECT <...> FROM CMAP <...>

Because the actual link between drugs and disease does not exist until the query is made!

The match between the disease and the drug signatures is computed using an statistical test that compares the gene orderings of both signatures and computes a similarity score.

Lets see how it works…….

CMAP queries use a Kolmogorov‐Smirnov statistical test

Drug x

Disease signature

drug x’s effect on genes down up

Connectivity score Sx =0 if sign(Kup) ≠ sign(Kdown)

Kup – Kdown otherwise

Are the genes in the down signature enriched on this side?

Are the genes in the up signature enriched on this side?

1

( ) 1maxdownt

jdown

V j jbn t=

−= −

tdown = size of down signaturen = number of genes

Kdown =a if a > b

‐b if b > a

More formally:

1

( )maxupt

jup

j V jat n=

= −

tup = size of up signaturen = number of genes

Kup =a if a > b

‐b if b > a

More formally:

It can be computed entirely inside the RDBMS:

SELECT stats_ks_test(drug_instance, disease_sig, 'STATISTIC') ks_statistic,

stats_ks_test(drug_instance, disease_sig) p_valueFROM cmap.drugs c, cmap.sig sWHERE c.gene_id = s.gene_id;

CMAP queries use a Kolmogorov‐Smirnov statistical test

Finally the top scoring drugs are selected

564 drug instances connectivity scores

S1S2S3.....S564

For example: Drugs:

Sx Sy Sz

hit + miss hit –

p‐values:0.01 0.3 0.02

Drugs are sorted by their connectivity scores and hits

found by the pattern of dose/time instances of the

same drug

A (second) test is used to assess the statistical

significance of each hit

Part III:The CMAP

in action

Finding a way around glucocorticoid resistance in

leukemia

Cancer is the most common cause of death from disease in children in developed countries, and the most frequent childhood malignancy is acute lymphoblastic leukemia (ALL).

Cancer is the most common cause of death from disease in children in developed countries, and the most frequent childhood malignancy is acute lymphoblastic leukemia (ALL).

dexamethasone

Glucocorticoids have been an important component of the treatment of acute lymphoblastic leukemia (ALL) for more than 50 years. However, it is still unknown what specific factors affect sensitivity and resistance to these drugs.