epidemiology 719 quantitative methods in genetic epidemiology bhramar mukherjee and sebastian...

74
Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner [email protected] [email protected]

Upload: stephanie-hall

Post on 02-Jan-2016

220 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Epidemiology 719

Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner [email protected] [email protected]

Page 2: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Acknowledgements

• Peter Kraft (HSPH)

• Ken Rice (UW)

• Nilanjan Chatterjee (NCI)

• Stephen Channock (NCI)

• Lu Wang (UM)

• Nan Laird (HSPH)

• Goncalo Abecasis (UM)

Page 3: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Course OverviewA brave new world

Page 4: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 5: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Reverse Effects

Page 6: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 7: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 8: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 9: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 10: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 11: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 12: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 13: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 14: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 15: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 16: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 17: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 18: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 19: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Central Course Theme

Genetic Association and Gene-Environment Interaction

Page 20: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Course Advice for You:

Page 21: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Assigned Paper 1

Page 22: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Assigned Paper 1

• GWAS of Age-related macular degeneration

• Initial GWAS identified four loci explaining one-half of the heritability. Appreciable predictive power.

• Additional GWAS to explain remaining heritability. Combined scan vs replication. Meta-Analysis.

Page 23: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Assigned Paper 2

Page 24: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Assigned Paper 2

• Collaborative Association Study of Psoriasis

• Examined ~1,500 cases / ~1,500 controls at ~500,000 SNPs

• • Examined 20 promising SNPs in extra ~5,000 cases / ~5,000 controls

• Outcome: 7 regions of confirmed association with psoriasis

Page 25: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Assigned Paper 3

Page 26: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Assigned Paper 3

• Meta-analysis of colorectal cancer (COGENT study) .

• A thorough evaluation of ten confirmed loci for colorectal cancer. Very detailed. Supplementary material also available online.

• Interesting combination of various study design.

Page 27: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Tests for Association

Page 28: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Basic principle of GWAS

Page 29: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Depends on study design

• Case-control study

• Family-based study: case-parent triad, case-sib pairs being popular choices

• Longitudinal Cohort Study

• Looking at a secondary outcome under case-control sampling

Page 30: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 31: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

The GWAS Mantra!

Page 32: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Primary Analysis

• Single marker association tests

• Genetic susceptibility model

- Dominant, recessive, co-dominant

• Which test to use

• Multiple testing correction

Page 33: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 34: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 35: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 36: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 37: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 38: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 39: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 40: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 41: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Case-Control Study: Standard Analysis

Page 42: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 43: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Pros and Cons

• Simple, Complete.

• Robust to misspecification of the true dominance pattern

• Less powerful.

• Unreliable for sparse table

Page 44: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 45: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Pros and Cons

• Test statistic has single df, so more powerful.

• Simple to report.

• Not robust to true mode of dominance

• Does not present entire information in the data.

Page 46: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Armitage’s trend test

• Test linear trend in log(OR) with # A allele

• Test statistic still has single d.f.• Simplicity, use information from the 2 x 3 array• More robust than 2 x 2 tests, but less robust

than the 2 d.f. test.

Page 47: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Allelic test

• Previous tests were based on genotype

• Can also treat allele as the unit of observation.

• You have doubled the sample size!!

Page 48: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

But…

• Serious impact on Type 1 error under departures from HWE

• Interpretation becomes trickier.

Page 49: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Example

AIC: Akaike information criterion, lower the value, better is model fit

Page 50: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Using logistic regression• Trick: Just code genotype differently• Dominant: G=1 if AA or Aa, 0 otherwise• Recessive: G=1 if AA, 0 otherwise• Trend: G=# A alleles, thus G=2 if AA, =1 if Aa

and 0 if aa• Two df test: Create two dummy variables:G1=1 if Aa and 0 otherwiseG2=1 if AA and 0 otherwisePerform likelihood ratio test of full (G1 and G2) vs

reduced model (No G1, G2).• Adjust for other variables, fit a multivariate model

Page 51: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Example

Page 52: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Flip Flops

• Under a co-dominant model you see a non-monotone trend, i.e.,

• OR(Aa)<1 and OR(AA)>1

• You will likely miss these under the trend test.

Page 53: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Alternative tests• Use alternative maximal test statistic• Calculate dominant, recessive, trend, co-

dominabt: take maximum test statistic• Use permutation to get right P-valueCaveats• Resist temptations of going on to a fishing

expedition. “MOST SIGNIFICANT CODING”

• Mode of inheritance models were developed for simple mendelian disease with near-complete penetrance, much more difficult to believe for complex diseases.

Page 54: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Reliable Test

• Co-dominant is model free

• Not much loss of power unless AA (homozygous carriers) are very rare.

• Log additive is what is reported most of the time to risk some false positives but enhance power.

Page 55: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

QQ Plots• Nice visual tools for checking association and

systematic biases

• Plot observed (-log10)P-values versus expected under the global null (i.e. quantiles draw from U[0,1])

• Since vast majority (>99%) of tested markers are not associated with the trait, plot should fall along y=x line (if we are lucky we will see a few departures in the tail).

• Departures could be due to stratification, cryptic relatedness, differential genotyping error, incorrect test.

Page 56: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Clear population stratification bias

Page 57: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Family Based Studies

• You heard about population stratification.• Solutions: -Match on self-reported ethnicity-Adjust for Principal components extracted from

markers.-Use family based controlsCase-sib (conditional logistic regression)Case-parent (TDT, FBAT)Nuclear Families Extended Pedigrees

Page 58: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Why use families

• Robust tests.

• Detect genotyping error with inheritance impossibilities. (mother and father AA, offspring Aa).

• Do not have to think about selection of “good” controls.

Page 59: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Why not use families

• Case-control typically more powerful unless the disease is very rare.

• Not just logistic regression or trend chi-squared as analysis tools.

• Much harder to recruit (depending on the disease : late onset or childhood disorder).

Page 60: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Hypotheses

• Case-control design

• Family-based designs have no power to detect association unless linkage is present.

• When testing for association in family-based design, HA is always: both linkage and association is present between the marker and disease susceptibility locus (DSL) underlying the trait.

Page 61: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

The null hypotheses

Page 62: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Mendelian Transmission

Page 63: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Transmission Disequilibrium Test

• Spielman et al, AJHG, 1998

• H0: Association but no linkage

Page 64: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

A simple statistical test

• No variation in diagonal elements (homozygote parents have no uncertainty in determining the conditional genotype distribution of the offspring).

• Under the null (i.e., Mendelian transmission) x| x+y~Binomial (n=x+y, p=1/2)

• Similarly, like McNemar’s test:

Page 65: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

TDT Example

Test statistic:

Page 66: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

FBAT : Family based association tests

• Extending TDT beyond case-parent trios

• Test statistic for FBAT mimics a natural covariance function between trait and genotype.

• i: family j: individual Sum over all i and j

• T: Trait (centered) X: Coded Genotype

• S: parental genotype or a sufficient statistic for parental genotype

Page 67: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Details

• The E(X|S) is calculated under Mendelian transmission.

• X-E(X|S): residual of the transmission of parental genotype to offspring.

• You basically assess whether there is any association between the trait and this genotype residual.

Page 68: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Test statistic

• Under all three nulls, E(U)=0.

• Note all expectation and variance are on X, conditional on parental genotype and trait T.

Test Statistic:

Page 69: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 70: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Another Tool: Conditional Likelihood

• Used in case-sib studies, in general in matched studies. Strata: Family/pair.

• Breslow et al (1978) first proposed this tool for matched case-control data.

• R function clogit does this. Underlying codes use survival model as there is connection with partial likelihood from Cox’s proportional hazard model.

Page 71: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 72: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu
Page 73: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

• For case-sib studies, sib is the control and the genotype is the exposure. The contribution of a given pair to the conditional likelihood is:

• exp[β Genotype(case)] exp[β Genotype(case)]+exp[β Genotype(control)] -Obtain variance-covariance of conditional MLE

using inverse Fisher information.

Page 74: Epidemiology 719 Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner bhramar@umich.edu szoellne@umich.edubhramar@umich.eduszoellne@umich.edu

Summary

• Different tests for association in case-control studies: 2 by 3 table and logistic regression.

• Family based studies: tests and hypotheses.

• Study design choices, power, recruitment.