introduction to genetic association analysis in families · introduction to genetic association...

43
Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics, LUMC [email protected] Leiden, June 24, 2011 1 / 43

Upload: others

Post on 27-Jul-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Introduction to Genetic Association Analysis inFamilies

Hae-Won Uh

Department of Medical Statistics and Bioinformatics, LUMC

[email protected]

Leiden, June 24, 2011

1 / 43

Page 2: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Overview

1 Background

2 Genetic association analysis in familiesScore testGenetic correlationUsing imputed dataIncorporating family history

3 Summary and future work

2 / 43

Page 3: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Challenge: finding genetic variants affecting humanhealth

Chromosomes 22 pairs of autosomes, X, Y

Genes ca. 23,000 protein-coding genes

Base-pairs ca. 3 billion DNA base pairs

Variants 99.9% of bases are identical between all people→ ca. 3 million SNPs

3 / 43

Page 4: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Terminology

DNA: strings of bases, A, T, G or C

Genes: a segment of DNA

Single Nucleotide Polymorphism (SNP):one-letter variations in the DNA sequence

4 / 43

Page 5: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Genetic concept

Genotype (or SNP) AA,Aa,aa is codedautosome: 0, 1, or 2 minor alleles aX: 0,1, or 2 for females & 0 or 2 for males

Hardy-Weinberg Equilibrium (HWE) assumptionthe alleles at the pair of chromosomes are independentp frequency of minor alleleP(AA) = (1− p)2, P(Aa) = 2p(1− p), P(aa) = p2

Genetic modelspecifies the relationship between genotypes and the diseasedominant, recessive, additive, X-linked

5 / 43

Page 6: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Overview

Background

Genetic association analysis in familiesScore testGenetic correlationUsing imputed dataIncorporating family history

Summary and future work

6 / 43

Page 7: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Methods for finding genetic variants

Linkagestudies linkage of marker alleles and disease within families

Associationseeks a marker allele that is present more frequently in casesthan in controlscan be more powerful than linkage methods for common diseases

7 / 43

Page 8: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Use of families in population-based associationstudies

AdvantageMost common diseases demonstrate familial aggregationMore powerful than (traditional) family-based tests

ChallengeNeed tests that correct for relatedness between subjects

8 / 43

Page 9: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Example: sibling pair - control study

Leiden Longevity Study (LLS)

9 / 43

Page 10: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

LLS genome-wide association study (GWAS)

Study samplesibling pairs from 421 families: mean age 94y1670 controls: mean age 58y

(genotyped) SNPs: 500K

After imputation: 2.5 million

Score testretrospective likelihoodcorrects for correlation within families

10 / 43

Page 11: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

LLS genome-wide association study (GWAS)

Study samplesibling pairs from 421 families: mean age 94y1670 controls: mean age 58y

(genotyped) SNPs: 500K

After imputation: 2.5 million

Score testretrospective likelihoodcorrects for correlation within families

11 / 43

Page 12: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Selective genotyping

NotationXi the genotype 0, 1, or 2 for subject i(for i = 1, . . . ,n)Yi the phenotypeY is the mean of Y , or proportion of cases in case-control studiesS ascertainment event

Retrospective likelihoodP(X |Y ,S) = P(X |Y )

12 / 43

Page 13: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Score test for independent cases

The score statistic UX =∑n

i=1(Yi − Y )Xi

UX is asymptotically normally distributed under H0with zero mean and variance

Var UX = VX

n∑i=1

(Yi − Y )2,

where VX is the variance of Xi .

Under H0, the ratio U2X/Var UX ∼ χ2

(1)

To account for the correlations between relatives modify VX (Uh et al,BMC Proc, 2009)

13 / 43

Page 14: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Overview

Background

Genetic association analysis in familiesScore testGenetic correlationUsing imputed dataIncorporating family history

Summary and future work

14 / 43

Page 15: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Genetic correlation

To account for the correlations between relatives we modify VX using

Correlation coefficients ρij

Kinship coefficients Φij

Identity by descent (IBD) sharing

15 / 43

Page 16: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Identity by descent (IBD)

Two alleles which are copies of a common ancestral allele are said tobe IBD.

Subjects 3 and 4share IBD=1(the partenal allele, a)

But, in these families,they sharerespectively0 and 2 IBD

16 / 43

Page 17: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

IBD sharing by relative pairs

When unable to assign IBD sharing, we can assess the probabilitythat two individuals share 0, 1 or 2 alleles IBD: (π0, π1, π2)

Prior IBD probabilities are the probabilities of IBD sharingconditional only on the relationship between 2 subjects

The prior values for a sibling pair: (π0, π1, π2) = (1/4,1/2,1/4)

17 / 43

Page 18: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Kinship coefficients Φij

Measure of relatedness between two individuals i and j

Probability that one allele sampled at random from each of twoindividuals are IBD

Derived from prior IBD probabilities:

Φij =14π1 +

12π2.

18 / 43

Page 19: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Calculation of Φij

19 / 43

Page 20: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

IBD sharing and kinship by relationship

20 / 43

Page 21: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Correlation coefficients of relatives

Define the correlation matrix K as follows

K =

1 ρ12 . . . ρ1nρ12 1 . . . ρ2n... . . . . . .

...ρ1n ρ2n . . . 1

.

ρij is twice of the prior kinship coefficients.

21 / 43

Page 22: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

IBD sharing, kinship and correlation coefficients

ρ = 2× Φ

22 / 43

Page 23: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Correlation coefficients of relatives II

T the transition matrix from the ITO matrices of Li and Sacks(Biometrics, 1954)

ρij = π2 + π1ρT =(1

2

)R,

where πk is the probability that the specified relatives shares kalleles IBD and R is the degree of relationship.For autosomal loci ρT equals to 1/2.

23 / 43

Page 24: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Correlation coefficients of relatives III

For autosomal genes

Multiplicative effectgrandparent-grandchild: 0 + 1/2 ∗ ρT = 1/4double first cousins: 1/16 + 6/16 ∗ ρT = 1/4

Recessive effectUnder HWE, ρT = p/(1 + p) with minor allele frequency p. Thecorrelation of a sib-pair is

ρrec ij =14

+12

( p1 + p

)=

1 + 3p4(1 + p)

24 / 43

Page 25: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Correlation coefficients of relatives III

For autosomal genes

Multiplicative effectgrandparent-grandchild: 0 + 1/2 ∗ ρT = 1/4double first cousins: 1/16 + 6/16 ∗ ρT = 1/4

Recessive effectUnder HWE, ρT = p/(1 + p) with minor allele frequency p. Thecorrelation of a sib-pair is

ρrec ij =14

+12

( p1 + p

)=

1 + 3p4(1 + p)

25 / 43

Page 26: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Correlation coefficients of relatives IV

For X-linked SNPsFour basic correlations:

ρTf ,f = 1/2, ρTf ,m = ρTm,f = 1/√

2, ρTm,m = 0,

where the subscripts indicate female pairs, mixed pairs, and malepairs, respectively.

For example,full sisters: 1/2 + 1/2ρTf ,f = 3/4

maternal uncle and niece: 1/4ρTm,f = 1/4 ∗ 1/√

2

26 / 43

Page 27: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Variance of autosomal SNP for sibling pair (s1, s2)

Multiplicative effect

E Xs1 = 2p, Var Xs1 = 2p(1− p)

Var(Xs1 + Xs2) = Var Xs1 + Var Xs2 + 2 Cov(Xs1,Xs2)

Cov(Xs1,Xs2) = ρs1,s2√

Var Xs1√

Var Xs2 = (1/2){2p(1− p)}

Recessive effect

E Xs1 = p2, Var Xs1 = p2(1− p)2

Cov(Xs1,Xs2) =1 + 3p

4(1 + p)p2(1− p)2

27 / 43

Page 28: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Variance of autosomal SNP for sibling pair (s1, s2)

Multiplicative effect

E Xs1 = 2p, Var Xs1 = 2p(1− p)

Var(Xs1 + Xs2) = Var Xs1 + Var Xs2 + 2 Cov(Xs1,Xs2)

Cov(Xs1,Xs2) = ρs1,s2√

Var Xs1√

Var Xs2 = (1/2){2p(1− p)}

Recessive effect

E Xs1 = p2, Var Xs1 = p2(1− p)2

Cov(Xs1,Xs2) =1 + 3p

4(1 + p)p2(1− p)2

28 / 43

Page 29: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Variance of X-linked SNP for sibling pair

Females carry 2 copies: X ∈ {0,1,2} and E Xf = 2pMales carry 1 copy: X ∈ {0,2} and E Xm = 2p

Variance of females or males

Females: σ2f = 2p(1− p)

Males: σ2m = 4p(1− p)

Covariance of sibling pairs

Sister-Sister: [1/2 + 1/2ρTf ,f ]σ2f = (3/4)2p(1− p)

Brother-Brother: [1/2 + 0ρTm,m ]σ2m = 2p(1− p)

Sister-Brother: [0 + 1/2ρTm,f ]√σ2

m

√σ2

f = p(1− p)

29 / 43

Page 30: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Variance of X-linked SNP for sibling pair

Females carry 2 copies: X ∈ {0,1,2} and E Xf = 2pMales carry 1 copy: X ∈ {0,2} and E Xm = 2p

Variance of females or males

Females: σ2f = 2p(1− p)

Males: σ2m = 4p(1− p)

Covariance of sibling pairs

Sister-Sister: [1/2 + 1/2ρTf ,f ]σ2f = (3/4)2p(1− p)

Brother-Brother: [1/2 + 0ρTm,m ]σ2m = 2p(1− p)

Sister-Brother: [0 + 1/2ρTm,f ]√σ2

m

√σ2

f = p(1− p)

30 / 43

Page 31: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Score test for related cases

The score statisticUX = (Y − Y )>X

Var UX = (Y − Y )>K (Y − Y )σ2X ,

For binary & quantitative outcome Y

Under H0, the ratio U2X/Var UX ∼ χ2

(1)

Implemented in CCassoc & QTassoc(www.msbi.nl/uh)

31 / 43

Page 32: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Overview

Background

Genetic association analysis in familiesScore testGenetic correlationUsing imputed dataIncorporating family history

Summary and future work

32 / 43

Page 33: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

When using imputed SNP data

Instead of X ∈ (0,1,2) we have the posterior probability $ = ($0, $1, $2)>

obtained by imputation software.

In the score statistic replace X with its expectation

UX = (Y − Y )>X

The variance of the score statistic is

Var UX = Iimp = Icomp − Icomp|imp

The loss of information due to uncertainty for 1 subject

Icomp|imp;i = $i1(1−$i1) + 4$i2(1−$i2)− 4$i1$i2

33 / 43

Page 34: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Efficiency measure R2T

Relative efficiency measure for case control design (Uh et al, BMCGenet, 2009)

R2T = Iimp/Icomp

Post-imputation: to assess the quality of imputed genotypesI MACH r2, IMPUTE info, BEAGLE R2

Post-analysis: to assess the quality of imputed genotypes w.r.t.parameter of association

I SNPTEST info, CCassoc & QTassoc R2T

34 / 43

Page 35: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Efficiency measure R2T for (extra) QC

λGC = {median (observed test statistic)}/0.456

R2T > 0.30 R2

T > 0.98

35 / 43

Page 36: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Overview

Background

Genetic association analysis in familiesScore testGenetic correlationUsing imputed dataIncorporating family history

Summary and future work

36 / 43

Page 37: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Additional phenotypic information in LLS

37 / 43

Page 38: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

More efficient test

CCassoc deals with related cases with ascertainment

Want: incorporate (available) phenotypic information ofun-genotyped relatives for optimal weighting

How? MQLS test (Thornton & McPeek, 2007) is allelic testWant to develop a test directly from score statisticIn fact we want to modify CCassoc

38 / 43

Page 39: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Multiplex-Case Control Score (MCCS) test

Let YN and YM phenotype of subjects with Non-missing and M issinggenotype.K N,M is N × (N ∪M) correlation matrix.

UX = (Y − Y )>X in CCassoc

Include phenotypic information in

Y ∗ = YN + K−1K N,MYM

Then U∗X = (Y ∗ − Y ∗)>X in MCCS

39 / 43

Page 40: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Example

1 case from 1 affected & 2 unaffected siblings:

Y ∗ = 1 + (1/2 1/2)

(00

)= 1

1 case from 2 affected & 1 unaffected siblings:

Y ∗ = 1 + (1/2 1/2)

(10

)= 1.5

1 case selected & typed from 3 affected siblings:

Y ∗ = 1 + (1/2 1/2)

(11

)= 2

40 / 43

Page 41: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Weight for LLS data

●●● ●● ●●●●

●●

●●

●●●●

●●

●●

0 4 7 10 14

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Autosomal SNPs

Family size

Wei

ght

●●●●●

●●

●●

●●●

● ●●

●●

●●

●●●

●●●

● ●

0 4 7 10 14

0.0

0.5

1.0

1.5

2.0

2.5

3.0

X−linked SNPs

Family size

Wei

ght

427 nonagenariansibling pairsadditional phenotypicinformation: age atdeath of family members(n= 2425 for controls &n=277 for cases)

41 / 43

Page 42: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Summary and future work

We developed CCassoc & MCCSTo test SNP for multiplicative, dominant, and recessive effects in arelated sampleTo test X-linked SNP in a related sampleFor genome-wide scan data (e.g. 650K) and imputed genotypes(e.g. 2.5 million)To incorporate phenotypic information of un-genotyped relative foroptimal weighting

We want to extend MCCS:Direct use of family history score (Houwing-Duistermaat, 2009)Extend to other outcome

42 / 43

Page 43: Introduction to Genetic Association Analysis in Families · Introduction to Genetic Association Analysis in Families Hae-Won Uh Department of Medical Statistics and Bioinformatics,

Acknowledgement

Medical StatisticsJeanine Houwing-DuistermaatQuinta Helmer

Molecular EpidemiologyJoris Deelen, Marian Beekman, Eline Slagboom

Financial supportVIDI grant (NWO 917.66.344) from the Netherlands Organizationfor Scientific ResearchIOP genomics/SenterNovem (IGE05007)

43 / 43