cell line models for genome wide association mapping in ......cell line models for genome wide...

Cell Line Models for Genome Wide Association Mapping in Cancer Drug ResponseAlison Motsinger-Reif, PhDBioinformatics Research CenterDepartment of Statistics

Introduction• Understanding variability in individual response to

drug/chemical exposure is a key goal of pharmacogenomics and toxicogenomics

• Goals of gene mapping:– Find efficient predictors of response (efficacy,

toxicity,potency, etc.)– Dissect the underlying mechanisms of differential

response

Challenges in Dose Response Genetics

• Study design limitations– Clinical trials– Rarely have family data– Limited sample size– Limited replication opportunities….

• Limits ability to test basic genetic assumptions– Are these traits heritable?– Is this actually a genetics problem?

• Study design limitations– Clinical trials– Limited number of – Rarely have family data– Limited sample size, replication opportunities….

High-throughput in vitro assaysof dose response can help assess the

heritability of dose response and perform well-powered linkage and association analysis.

Current Uses of the Model

• Cytotoxicity mapping for chemotherapy– Cytotoxics– Monoclonal antibodies

• Evaluation of methods for capturing dose response associations

• Use of high throughput methodology for chemical exposure

Assay Methodology• Alamar blue viability assays• 6 point dose response curves• Immortalized lymphoblastoid cell lines

Use of the Model

• We are using this model to interrogate genetic predictors of drug response for 45 chemotherapy drugs

• Heritability assessed with family-based samples– CEPH cell lines

• Mapping in unrelated cohorts– CHORI cohort– 1000 Genomes

Use of the Model

• We are using this model to interrogate genetic predictors of drug response for 45 chemotherapy drugs

• Heritability assessed with family-based samples– CEPH cell lines

• Mapping in unrelated cohorts– CHORI cohort– 1000 Genomes

Lots of methods challenges in here along the way…

Variation in Cellular Sensitivity

• Typical dose response curves

Heritability Calculations• Variance components analysis

as implemented in MERLIN 1.1.2– http://www.sph.umich.edu/c

sg/abecasis/Merlin/index.html

• h2 of the growth rate for each vehicle was calculated

• h2 adjusted for the growth rate for the appropriate vehicle by using growth rate as a covariate

GWAS Study• Genome-wide association (GWAS) studies for

highly heritable drugs

• Children’s Hospital of Oakland Research Institute (CHORI) population based cohorts used to generate cell line– 520 samples

• 650K SNP-chip data available for mapping– Simulation experiments to prepare for association

mapping– Imputed to ~2 million variants

Association Mapping

• Previous studies have looked at fitting curves and then doing simple association tests on genotypes versus these values:– EC/IC50– Hillslope

• These choices make LOTS of assumptions– Assumptions about how associations may be happening– Need methods that don’t make these assumptions

Complicated Response CurvesDifferences between phenotypes could be manifested in many ways.

Modeling Robustly• The vector or responses across concentration were modeled

jointly using multivariate analysis of covariance (MANCOVA):

• Minimal modeling assumptions– No assumptions made about the form of dose response curves or how

these curves vary between genotypes– The assumptions of multivairate normality seems reasonable in real data

Introduction Preliminary Work Methods Results Conclusions Future Directions Thanks

Modeling robustlyThe vector of responses across concentration were modeled jointlyusing multivariate analysis of covariance (MANCOVA):

yij = ↵ + µi + Xij� + Eij ,

I where Eijiid⇠ N(0,⌃),

I yij is the vector of responses for the j th LCL with genotype i ,

I Xij contains confounding covariates,

I and µi is the vector of e↵ects due to genotype i .

Minimal modeling assumptions

I No assumptions made about the form of dose-response curvesor how these curves vary between genotypes

I The assumption of multivariate normality seems reasonable inreal data [4]

27 / 62

Simulation Study Results• MANCOVA has most power to detect real signals (top) and is

most robust for hill slope alternatives (bottom)

MAGWAS• Multivariate Analysis of covariance Genome-Wide

Analysis Association Software• Designed for GWAS having multivariate responses• Allows for incorporation of covariates• Command line based, platform independent• Accepts data in PLINK format• Computationally efficient

– “typical” GWAS in 2-20 minutes

Association Results

Drug Families

Association by Drug Family

• Each dose response curve was summarized by the mean viability across drug concentrations

• MANCOVA was used to jointly model the mean viabilities across drug families

• Information is combined across drug family

• Small differences can become detectable, even if not present for each drug individually

Association by Drug Family

• Locus rs11639947 is associated (p < 10−6) with response to the alkylating class (temozolomide and mitomycin), and is located upstream of NIN1/RPN12 binding protein 1 homolog (NOB1)

• Polymorphisms on NOB1 have been found to be associated with myelotoxicity in malignant glioma patients treated with temozolomide

Drug Chrom. rsID � log10(p) Gene(s) nearby1 Carboplatin 5 rs1982901 6.23 None2 Cytarabine 3 rs12637988 6.21 MED12L / P2RY123 Daunorubicin 9 rs7867736 6.34 None4 Etoposide 22 rs2076112 6.47 PLA2G65 Fluorouracil 7 rs2270311 6.23 CHN26 Fluorouracil 15 rs10152957 6.05 MEGF117 Gemcitabine 2 rs4851774 6.26 FHL28 Gemcitabine 3 rs513659 6.31 None9 Idarubicin 2 rs7582313 6.87 None10 Mitoxantrone 10 rs7068798 6.05 dC10orf6711 Oxaliplatin 8 rs2897377 6.07 CSMD112 Oxaliplatin 17 rs1808918 6.01 dGNA1313 Paclitaxel 1 rs1338990 6.33 None14 Paclitaxel 4 rs306005 6.47 SPATA515 Paclitaxel 5 rs31878 6.86 None16 Temozolomide 10 rs531572 15.48 MGMT17 Teniposide 22 rs8138023 6.30 uNUP5018 Topotecan 6 rs11966294 6.29 DDO

Table 4: Single nucleotide polymorphisms (SNPs) most associated with drug response, for eachdrug separately. Superscripts u and d indicate that genes are located within 100kpb upstreamor downstream of the SNP, respectively.

Drug Class Chrom. rsID � log10(p) Gene(s) nearby1 DNA Alkylating Agents 2 rs7581424 6.13 uHDAC42 DNA Alkylating Agents 16 rs11639947 6.55 dNOB1 / dWWP23 uNQO1 / uNFAT54 Platinum Agents 10 rs10821910 7.84 C10orf1075 TK Inhibitors 10 rs10762827 7.95 None

Table 5: Single nucleotide polymorphisms (SNPs) most associated with drug response, for drugfamily. Superscripts u and d indicate that genes are located within 100kpb upstream or down-stream of the SNP, respectively. TK stands for tyrosine kinase.

Drug Class Drugs1 Nucleosides gemcitabine cytarabine cladaribine fludaribine azacitidine2 TK inhibitors dasatinib sunitinib3 Tubulin binding agents docetaxol pacitaxol vinblastine vincristine vinorelbine4 DNA alkylating agents mitomycin temozolomide5 Platinum agents carboplatin oxaliplatin6 Anthracyclines daunorubicin doxorubicin epirubicin idarubicin mitoxantrone7 Fluoropyrimidines floxuridine 5-fluorouracil8 Epipodophylotoxins etoposide teniposide

Table 6: Drug family membership for 25 anticancer agents. TK stands for tyrosine kinase.

Drug Clustering

• Can LCLs be used to predict drug families? • Distance metrics between each pair of drugs were calculated

from their vectors of viabilities

– Yai and Ybi are viabilities for the ith LCL for drugs a and b– Xi is the matrix of covariates

• Distance between drugs a and b was estimated as one minus the average partial r-squared for a regressed on b and b regressed on a.

the vector of normalized responses jointly provides more information than a single summarymeasure, such as half-maximal inhibitory concentrations (IC50). Simulation studies have shownthis method to be both robust to di↵erences in dose-response profiles between genotypes andpowerful in the detection of real biological signals [6, 5]. The model used in association for anydrug d at an SNP s is:

Yij = Xij� + µi + eij (2)

eij ⇠ Np(0,⌃),

where Yij is the vector of normalized responses (across the six concentrations for d) for the jth

individual having genotype i on s, Xij is the matrix of covariates for the first two PCs, tem-perature, growth rate and experimental batch and µi is the vector of parameters modeling thee↵ects of genotype i of s on d. Also, Np(0,⌃) is the multivariate normal distribution, for vectorsof length p = 6 and with mean 0 and variance ⌃. The significance of estimates for µi wereassessed using Pillai’s trace [15]. All computations were performed using the software programMAGWAS [8]. Because association tests rely on large sample asymptotic theory, only those lociwhich had at least 20 individuals in each genotype group were kept for association mapping.This left 1,278,133 SNPs for all drugs except 5-fluorouracil (971,593) and nilotinib (783,013).

Association tests were also performed for each drug family, as described in Table 6. For this,the mean normalized viability across each dose-response curve was calculated for every LCL andevery drug. In this way, the model used in association for any drug family d at an SNP s isalso uses Equation 2, where Yij now represents the vector of mean normalized viabilities (acrossall drugs in family d) for the jth individual having genotype i on s. All other variables fromEquation 2 remain the same, and p now equals the number of drugs in family d.

Drug clustering

Distance metrics between each pair of drugs were calculated from their vectors of normalizedviabilities, like in the association study. Specifically, the distance between drugs a and b werecalculated by first fitting:

Yai = Ybi� + Xi�,

where Yai and Ybi are the vectors of normalized viabilities for the ith LCL for drugs a and b,respectively, and Xi is the same corresponding matrix of covariates used in association mapping(the first two PCs, experimental batch, temperature and growth rate). The coe�cient of partialdetermination (partial r-squared) of Ybi in predicting Yai after controlling for Xi was calculatedfor all possible pairs (a, b). Distance between drugs a and b was estimated as one minus theaverage partial r-squared for a regressed on b and b regressed on a. In this way, it was notpossible to include both 5-fluorouracil and nilotinib, since each cell line was exposed to exactlyone of these agents. For this reason, and because nilotinib had lower laboratory replicability(see Table 2), nilotinib was removed from clustering. All other (28) drugs were clustered usingthe distance metric described above using no a priori knowledge of drug family. Clusteringwas performed using the matrix of distance metrics between all pairs, described above, and the“hclust” function, with the argument “method=ward” from the R statistical package [14].

Empirical support for Drug Clustering

MGMT and Temozolomide• Proof of concept that LCLs can identify clinically significant genes

in cancer drug efficacy.

• Manhattan plot for Temozolomide– The large red peak is for locus rs477693, located in the gene

coding for MGMT (O6-methylguanine–DNA methyltransferase), a protein known to be associated with Temozolomide efficacy [Hegi et al., 2005].

Gene Expression and MGMT• MGMT repairs DNA that has been damages (methylated), helping

prevent cell death

• MGMT expression is also associated with rs531572

Clinical Validation• Moffitt Cancer Center clinical trial

– 437 patients with high grade glioma

– 318 on standard of care (SOC)• SOC

– Resection plus radiation plus temozolomide

Group N deathsAdditive Genotypic

HR (95% CI)* p-value HR (95% CI)* p-

valueall patients 437 375 0.93 (0.80, 1.09) 0.369 0.88 (0.70, 1.11) 0.293SOC, male 200 171 0.79 (0.63, 0.99) 0.040 0.73 (0.52, 1.01) 0.057SOC, female 118 94 1.14 (0.85, 1.53) 0.395 1.20 (0.75, 1.95) 0.448

• Evaluated 7 SNPs in linkage disequilibrium with this hit

• Looked at overall survival• Rs477692 – top hit

Monoclonal Antibodies

• Used the LCL model for testing new class of drugs (anti-CD20):– Rituximab– Ofitumumab

• Used the C’EPH Pedigrees for linkage analysis– Found 2 large peaks for followup – Chr 3, Chr 12– Overlapped for both drugs

Monoclonal Antibodies

• To narrow down genes– Gene expression data– 57 C’EPH cell lines with

available expression– For genes in the region,

• One Gene: CBLB– CBLB encodes an E3 ubitquitin ligase– Involved in T-Cell and B-Cell receptor downregulation– CBLB loss provokes autoimmunity via loss of

autoregulatory mechanisms51

Functional Validation

• Rituximab’s target is CD20• Tested whether knocking down

CBLB changes CD20 expression

Immunofluorescence assay showing CD20 localizationCD20 Gene Expression is not altered by CBLB

Knockdown 52

Lessons from the LCL models

• LCLs are a promising approach for dose response mapping:– Allow for research that is not possible with human subjects– High throughput means that QC, both genotypic and

phenotypic, is important– Typical association methods may not capture the full array

of potential differential response– Support for known drug/chemical classes – Dose response models seem as “complex” as complex trait

mapping always is…

Other Methods Development Challenges Along the Way

• Dose response modeling– EADRM– Beam A, Motsinger-Reif A. Beyond IC50s: Towards Robust Statistical Methods for in vitro Association Studies. J

Pharmacogenomics Pharmacoproteomics. 2014 Mar 1;5(1):1000121.– Beam AL, Motsinger-Reif AA. Optimization of nonlinear dose- and concentration-response models utilizing

evolutionary computation. Dose Response. 2011;9(3):387-409.

• Extending approaches for accurate permutation testing

– Che R, Jack JR, Motsinger-Reif AA, Brown CC. An adaptive permutation approach for genome-wide association study: evaluation and recommendations for use. BioData Min. 2014 Jun 14;7:9. doi: 10.1186/1756-0381-7-9. eCollection 2014.

Current Work

• Tyrosine Kinase Inhibitors• Continued follow up of top hits• Evaluating the model for PD1K inhibitors• Exploring analysis methods to combine results across drugs

– Pathway analysis– Cross-heritability

• Additional methods to build more complex models– Gene-gene, gene-drug interactions– Genomic prediction approaches and Bayesian variable selection– Advances in permutation testing implementations

Current Work

• Drug combinations– Chemotherapies are rarely given alone– Modeling mixtures is a real challenge– Evaluating methods for quantifying synergy

• Add inference to Chou-Talalay method

Synergy in the LCL Model• Pilot Study

• 8 different drugs/ 6 concentrations• 7 combinations of mixtures tested• 123 LCLs • Contains 45 trios (a set of parents and single child)• Hypothesis: synergy/antagonism is quantifiable in vitro and

heritable

Synergy in the LCL Model

Genetic Etiology of Synergy

Summary• In vitro assays can be used to assess the genetic component

of dose response traits and to perform well-powered GWAS.

• Such new models take careful consideration and experimentation with new statistical approaches to answer biological questions.

Biology

MethodsDevelopment

NCSUDaniel RotroffKyle RoellJohn JackChad BrownFred WrightPaul GallinsYihiu ZhouDavid Reif

UNC Chapel HillTammy HavenerTim WiltshireEric PetersMichael WagnerKristy RichardsPaul GallinsNour Abdo

Funding: National Cancer Institute: R01 CA161608

MoffittHoward McLeodKathleen Egan

CHORIRon KraussMarisa Wong-Medina

Acknowledgments

Questions?motsinger@stat.ncsu.edu

cell line models for genome wide association mapping in ......cell line models for genome wide...

Documents

genome mapping and molecular breeding in...

techniques for genome mapping & sequencing

genome-wide association mapping of growth dynamics detects

mapping the genome of bacteria

genome mapping -...

data simulation software for whole-genome association and...

techniques for genome mapping & sequencing · pdf...

linkage analysis and genome mapping

selective mapping and simulation study. high-density genome...

association mapping and the human genome

genome-wide association mapping for grain manganese in

genome mapping genome sequencing next gen sequencing

genome-wide association mapping for flowering and maturity

parallel short sequence mapping for high throughput genome...

recent advances of genome mapping and marker-assisted

high-resolution genome-wide mapping of histone modifications

genome mapping and characterization of the anopheles ......

genome-wide mapping of arabidopsis origins of dna

comprehensive and quantitative mapping of eukaryotic genome

quantitative trait loci, genome wide association mapping...