genetics and methodology of twin and family studies

66
23 rd International Workshop on Statistical Genetics and Methodology of Twin and Family Studies: Advanced course Pak Sham (director) Lindon Eaves Jeff Barrett David Evans Goncalo Abecasis Mike Neale Hermine Maes Sarah Medland Dorret Boomsma Danielle Posthuma Meike Bartels John Hewitt (host) Jeff Lessem Marleen de Moor Stacey Cherny Ben Neale Shaun Purcell Manuel Ferreira Nick Martin Kate Morley Allan McRae

Upload: others

Post on 14-Apr-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genetics and Methodology of Twin and Family Studies

23rd International Workshop on Statistical Genetics and Methodology of Twin and Family Studies: Advanced course

Pak Sham (director)

Lindon Eaves Jeff Barrett David Evans Goncalo Abecasis Mike Neale Hermine Maes Sarah Medland Dorret Boomsma Danielle Posthuma Meike Bartels

John Hewitt (host) Jeff Lessem Marleen de Moor Stacey Cherny Ben Neale Shaun Purcell Manuel Ferreira Nick Martin Kate Morley Allan McRae

Page 2: Genetics and Methodology of Twin and Family Studies

Hunting QTLs – an introduction

Nick MartinQueensland Institute of Medical Research

Boulder workshop: March 2, 2009

Page 3: Genetics and Methodology of Twin and Family Studies

Mendel 1865 – genetics of discrete traits

Page 4: Genetics and Methodology of Twin and Family Studies

Stature in adolescent twins

Stature

190.0185.0

180.0175.0

170.0165.0

160.0155.0

150.0145.0

Women700

600

500

400

300

200

100

0

Std. Dev = 6.40 Mean = 169.1

N = 1785.00

Page 5: Genetics and Methodology of Twin and Family Studies

0

1

2

3

1 Gene 3 Genotypes 3 Phenotypes

0

1

2

3

2 Genes 9 Genotypes 5 Phenotypes

01234567

3 Genes 27 Genotypes 7 Phenotypes

05

1015

20

4 Genes 81 Genotypes 9 Phenotypes

R.A. Fisher, 1918

The explanation of quantitative inheritance in Mendelian terms

Page 6: Genetics and Methodology of Twin and Family Studies

unaffected affected

Disease liability

Single threshold

severe

Disease liability

Multiple thresholds

mildnormal mod

Multifactorial Threshold Model of Disease

Page 7: Genetics and Methodology of Twin and Family Studies

Complex Trait Model

Disease Phenotype

Commonenvironment

Marker Gene1

Individualenvironment

Polygenicbackground

Gene2

Gene3

Linkage

Linkagedisequilibrium

Mode ofinheritanceLinkage

Association

Page 8: Genetics and Methodology of Twin and Family Studies

Using genetics to dissect metabolic pathways: Drosophila eye color

Beadle & Ephrussi, 1936

Page 9: Genetics and Methodology of Twin and Family Studies

Finding QTLs

Linkage

Association

Page 10: Genetics and Methodology of Twin and Family Studies

First (unequivocal) positional cloning of a complex disease QTL !

Page 11: Genetics and Methodology of Twin and Family Studies

Linkage analysis

Page 12: Genetics and Methodology of Twin and Family Studies

Thomas Hunt Morgan – discoverer of linkage

Page 13: Genetics and Methodology of Twin and Family Studies

Linkage = Co-segregation

A2A4

A3A4

A1A3

A1A2

A2A3

A1A2 A1A4 A3A4 A3A2

Marker allele A1cosegregates withdominant disease

Page 14: Genetics and Methodology of Twin and Family Studies

Linkage Markers…

Page 15: Genetics and Methodology of Twin and Family Studies

x

1/4 1/4 1/4 1/4

Page 16: Genetics and Methodology of Twin and Family Studies

IDENTITY BY DESCENTSib 1

Sib 2

4/16 = 1/4 sibs share BOTH parental alleles IBD = 28/16 = 1/2 sibs share ONE parental allele IBD = 1

4/16 = 1/4 sibs share NO parental alleles IBD = 0

Page 17: Genetics and Methodology of Twin and Family Studies

For disease traits (affected/unaffected)Affected sib pairs selected

IBD = 2IBD = 1IBD = 0

1000

250

750

500

Expected 1 2 3 127 310

Markers

Page 18: Genetics and Methodology of Twin and Family Studies

For continuous measuresUnselected sib pairs

1.00

0.25

0.75

0.50

IBD = 0 IBD = 1 IBD = 2

Cor

rela

tion

betw

een

sibs

0.00

Page 19: Genetics and Methodology of Twin and Family Studies

Twin 1 mole count

Twin 2 mole count

EC

A

Q Q

AC

ErMZ = 1, rDZ = π̂

rMZ = 1, rDZ = 0.5

rMZ = rDZ = 1

q q

a a

c ce e

Page 20: Genetics and Methodology of Twin and Family Studies

Linkage for mole counts in Australian twin families

Page 21: Genetics and Methodology of Twin and Family Studies

Flat mole count: chromosome 9 linkage in Australian and UK twins

Australia

UK

Page 22: Genetics and Methodology of Twin and Family Studies

Linkage for MaxCigs24 in Australia and Finland

AJHG, in press

Page 23: Genetics and Methodology of Twin and Family Studies

Linkage Doesn’t depend on “guessing gene” Needs family data ! Works over broad regions and whole genome Only detects large effects (>10%) Requires large samples (10,000’s?) Can’t guarantee close to gene On the whole, has failed to deliver for complex

traits

Page 24: Genetics and Methodology of Twin and Family Studies

Association Looks for correlation between specific

alleles and phenotype (trait value, disease risk)

Page 25: Genetics and Methodology of Twin and Family Studies

1 2 3 4 51234567890 1234567890 1234567890 1234567890 1234567890

191,044,935 GTGAGTATGG CTTTCTTCCC TCTCCCGCCA CCCCCTGCCC CACACTGCAA Intron 1191,044,985 GCTGCAAACG CGGTACTTTC GGGCTCGCCT TTGACGTTAG GAAACTAGCC191,045,035 TGAGCCTATG CAGGGAAAAA AAATCGAAAA GGTCAATTTG TTAAGTAAGG191,045,085 TTAAATCTGG GTGATGCTCG GGTACAGTTT AAGAACCGAG GGAGACAGTT191,045,135 GATATGAGGG CGGTGGTTGA TGCGCTAAGA AATTGCGGGT TGGCTTTTTG191,045,185 TCCTCCTGCA TTCAAAATGA CATCAGAATC CTGCGGCTGA AGCGCGTCCC rs34774923191,045,235 CAGCATTCAT ACGTTGCATG ATGAGTTCTC ATCAGCTTAC ACAGCTACTG191,045,285 GAAGGTGATG CTCTTGCTGG TTCTGAATAT ACTCGTTTAA AATCCATTTT191,045,335 TGTTTTTTAA TTATAGAGCA GATCTCACCC AGTCCGAATG TGGAACAAAT191,045,385 AATTGTTATG CAGCGTCTGC TTAAAAGAAG TGTCGTAGGT GGGGAAGGAA191,045,435 GAGCGCAGGG GAATCAGTCA CCCACCTCTT TGTACAGTCT CTGGCGTGGT rs10218513191,045,485 CCAGAACCTC CTGCTCTAAA GAGAGAAGCG TGGGCCGGCT CCAGACAGTT191,045,535 CCATGTCTGT CCTTTTCATT AAAGTGCAAA ACGTCTCGGA ATTGTAATTA191,045,585 ACCTTGCAAA CAAACTGATG CCCTTTGTGA GCCAGAAATA GTGTCTGCCT191,045,635 TTTGAACTAA ATTCATTAAC AATTCTTTAA AATACCCTAG TGATTATAGG191,045,685 TAGCCCTGCC CTTAGTTGTA AAACTAGTAG ATACGGTCAG ATTAATGGAC191,045,735 GAAACTGCTC AGTACATGAG GTTTAAATGT TAGGTGGATA AGACTTATTT191,045,785 GAAGAGTTCT TGCTTTGCCT TATGCGGTTT GTCTCTAGTT ACTGGGTGAC191,045,835 TTTATTTGGT AAAAATGCGT TCAGCTGCAG TAGCATATTC AAGTGTTGCT rs2746073191,045,885 AGTTAGTAAT TATCTTTTTA ATTTTTTGTT TTAG

Association studies

Controls Cases Quantitative Traits

Single Nucleotide Polymorphisms

Page 26: Genetics and Methodology of Twin and Family Studies

Association More sensitive to small effects Need to “guess” gene/alleles (“candidate

gene”) or be close enough for linkage disequilibrium with nearby loci

May get spurious association (“stratification”) – need to have genetic controls to be convinced

Page 27: Genetics and Methodology of Twin and Family Studies

Variation: Single Nucleotide Polymorphisms

Page 28: Genetics and Methodology of Twin and Family Studies

Differences (between subjects) in DNA sequence are responsible for (structural) differences in proteins.

Page 29: Genetics and Methodology of Twin and Family Studies

Human OCA2 and eye colour

Zhu et al., Twin Research 7:197-210 (2004)

Page 30: Genetics and Methodology of Twin and Family Studies

OCA2 Mutations

Missense Mutations

Polymorphisms

G27R S86RC112F R290G

NW273KV H549QP385IM395L

A368V

T404M

A481TI473S

M446VV443I

R720C

W679C

W652R

K614N

T592IK614E

I617L

A724P

A787VS736L

G795RP743L

H/R615 I/T722L/F440

R/Q419

R/W305

D/A257

R290GNW273KV H549Q

P385IM395L

A368V

T404M

A481TI473S

M446VV443I

R720C

W679C

W652R

K614N

T592IK614E

I617L

A724P

A787VS736L

G795RP743L

W679R

H/R615 I/T722L/F440

R/Q419

R/W305

D/A257

N489D

Albinism data base www.cbc.umn.e

OculotaneousAlbinism type 2

Page 31: Genetics and Methodology of Twin and Family Studies

LD blocks in OCA2

Association with eye color

Page 32: Genetics and Methodology of Twin and Family Studies
Page 33: Genetics and Methodology of Twin and Family Studies

Sturm et al., AJHG 82: 424-431, 2008

Sulem et al, Nat Genet 39: 1443, 2007; Kayser et al, AJHG 82: 411-423, 2008; Eiberg et al, Hum Genet 123: 177-187, 2008

A single SNP within intron 86 of HERC2 determines Blue-Brown eye colour

rs12913832 C = Blue rs12913832 T = Brown

HLTF binding

HLTF = Helicase-like transcription factorUses ATP for energy to initiate unwinding of DNA, so allowing transcription

Page 34: Genetics and Methodology of Twin and Family Studies

OCA2 promoter

…21 kb

SWI/SNF Chromatin remodelling model for OCA2 gene activation

ATP

Brown eye colour

HLTFPol II

THERC2 intron 86

ADPHLTF

T

Locus Control Region

Heterochromatin

OCA2 open promoter

Pol II

Euchromatin

…21 kb

MITF LEF1

Page 35: Genetics and Methodology of Twin and Family Studies

SWI/SNF Chromatin remodelling model for OCA2 gene activation

Blue eye colour

CHERC2 intron 86

OCA2 promoter

…21 kb

ATP

HLTFPol II

Heterochromatin

…21 kb

OCA2 closed promoter

Pol IIHeterochromatinATP

HLTF

MITF LEF1

X

C

Page 36: Genetics and Methodology of Twin and Family Studies

Genetic architecture

Effect size

Number of variants

Number of variants

Number of variants Effect size of variants Frequency of risk allele Genetic mode of action

Genetic variance

Page 37: Genetics and Methodology of Twin and Family Studies

Allele Frequency

Effect size

Very very Rare

Common

VeryverySmall

Large

What is the genetic architecture of complex genetic disorders?

Not possibleMendelianDisorders

Possible and detectable spectrum of common complex genetic disorders

Not detectable/Not useful

Page 38: Genetics and Methodology of Twin and Family Studies

Allele Frequency

Effect size

Very very Rare

Common

VeryverySmall

Large

A potted history of genetic studies

Not possibleMendelianDisorders

Not detectable/Not useful

Linkage studies

Page 39: Genetics and Methodology of Twin and Family Studies

Allele Frequency

Effect size

Very very Rare

Common

VeryverySmall

Large

A potted history of genetic studies

Not possibleMendelianDisorders

Not detectable/Not useful

Linkage studies RR >3

Candidate association studies: Effect size RR ~2sample size- hundreds

Page 40: Genetics and Methodology of Twin and Family Studies

Association studies until 2007• Candidate regions

• Some successes but generally:– Poor replication– Small sample sizes

• Conclusion – effect sizes are smaller than expected– Poor selection of candidate regions

Page 41: Genetics and Methodology of Twin and Family Studies

500 000 - 1. 000 000 SNPsHuman Genome - 3,1x109 Base Pairs

Genome-Wide Association Studies

Page 42: Genetics and Methodology of Twin and Family Studies

High density SNP arrays – up to 1 million SNPs

Page 43: Genetics and Methodology of Twin and Family Studies

Genome-wide Association Studies• Multiple testing !

– Declare significance at 0.05/500,000 = 10-8

• Follow-up significant SNPs by genotyping in independent replication samples

• Validated associations:- significant across replication samples- same SNP- same allele

Page 44: Genetics and Methodology of Twin and Family Studies

5 x 10-8

CACNA1CAnkryin-G (ANK3)

Bipolar GWAS of 10,648 samples

Sample Cases Controls P-valueSTEP 7.4% 5.8% 0.0013WTCCC 7.6% 5.9% 0.0008EXT 7.3% 4.7% 0.0002Total 7.5% 5.6% 9.1×10-9

Sample Case Controls P-valueSTEP 35.7% 32.4% 0.0015WTCCC 35.7% 31.5% 0.0003EXT 35.3% 33.7% 0.0108Total 35.6% 32.4% 7×10-8

X

>1.7 million genotyped and (high confidence) imputed SNPs

Ferreira et al (Nature Genetics, 2008)

Page 45: Genetics and Methodology of Twin and Family Studies

GWAS for major depression

PCLO

Page 46: Genetics and Methodology of Twin and Family Studies

GWAS for Melanoma Association analysis of SNPs across a region of chromosome 20q11.22 for the combined sample. The x-axis is chromosomal position, the left y-axis –log10(p) for genotyped SNPs. Nature Genetics 2008 Jul;40(7):838-40.

Page 47: Genetics and Methodology of Twin and Family Studies
Page 48: Genetics and Methodology of Twin and Family Studies

200520062007 first quarter2007 second quarter2007 third quarter2007 fourth quarterFirst quarter 2008

Manolio, Brooks, Collins, J. Clin. Invest., May 2008 Stephen Channock

Page 49: Genetics and Methodology of Twin and Family Studies

Functional Classification of 284 SNPs Associated with Complex Traits

0 10 20 30 40 50 60

Other

Intronic

Missense

Synonymous

3' UTR

5' UTR

Percent of Associated SNPs

http://www.genome.gov/gwastudies/

n = 1

n = 2

n = 3

n = 13

n = 119

n = 146

Stephen Channock

Page 50: Genetics and Methodology of Twin and Family Studies

Allele Frequency

Effect size

Very very Rare

Common

VeryverySmall

Large

A potted history of genetic studies

Not possibleMendelianDisorders

Not detectable/Not useful

Linkage studies

Candidate association studies: Effect size RR ~2sample size- hundreds

Genome-wide association studies Effect size RR ~1.2Sample size - thousands

Page 51: Genetics and Methodology of Twin and Family Studies

Conclusions about genetic architecture of complex genetic traits based on GWAS results

Genetic variation is not tagged by GWAS SNPs

Effect sizes are smaller

Effects sizes of validated variants from 1st 16 GWAS studies

Page 52: Genetics and Methodology of Twin and Family Studies

Allele Frequency

Effect size

Very very Rare

Common

VeryverySmall

Large

A potted history of genetic studies

Not possibleMendelianDisorders

Not detectable/Not useful

Linkage studies

Candidate association studies: Effect size RR ~2sample size- hundreds

Genome-wide association studies Effect size RR ~1.2Sample size - thousands

Next Generation GWAS Effect size RR ~1.05Sample size –tens of thousands

Page 53: Genetics and Methodology of Twin and Family Studies

GWA of Height

Weedon et al. (in press) Nat Genet

Large numbers are needed to detect QTLs !!!

Collaboration is the name of the game !!!

1914 Cases (WTCCC T2D)4892 Cases (DGI)

6788 Cases (WTCCC HT) 8668 Cases (WTCCC CAD)

2228 Cases (EPIC) 13665 Cases

Significant result

Other loci?

Page 54: Genetics and Methodology of Twin and Family Studies

Usual paradigm of association studies

• Identify significant SNPs (not many)• Genotype in independent replication samples• Find causal variants to inform on biology

But only a small proportion of the known genetic variance is explained

What would we expect under a polygenic model of many, many variants each of small effect?

Page 55: Genetics and Methodology of Twin and Family Studies

A 5% risk increasing common variant

Relative risk 1.05

Population MAF 0.2

MAF in cases 0.2079

MAF in controls 0.1999

Power α=1×10-6

(N=10,000 cases, 10,000 controls)0.2%

Proportion ranked in top half(N=ISC)

72%

Page 56: Genetics and Methodology of Twin and Family Studies

Schizophrenia (ISC) Q-Q plot

Consistent with:

Stratification?

Genotyping bias?

Distribution of true polygenic effects?

Obs

erve

d -lo

g10(

p)

Expected -log10(p)

λ = 1.092

Page 57: Genetics and Methodology of Twin and Family Studies

Indexing polygenic variance with large sets of weakly associated alleles

Individuals’“polygenic scores”

Top 20% independent

SNPs

Discovery set

Target set

Score # of “nominal risk

alleles”

Do target cases have a higherallele load?

→ Independent SCZ studies (MGS, O’Donovan)→ Bipolar disorder (STEP-BD, WTCCC)→ Non-psychiatric disease (WTCCC)

→ ISCISC

Douglas Levinson, Pablo Gejman,Jianxin Shi and colleagues

Page 58: Genetics and Methodology of Twin and Family Studies

0

0.01

0.02

0.03P < 0.1

P < 0.2

P < 0.3

P < 0.4

P < 0.5

R2 P=2×10-28

0.008

5×10-11

7×10-9

1×10-12

0.71 0.050.30 0.65 0.23 0.06

Schizophrenia Bipolar disorder Non-psychiatric (WTCCC)

CAD CD HT RA T1D T2DMGSEuro.

MGSAf-Am O’Donovan STEP-BD WTCCC

A greater load of “nominal” schizophrenia alleles (from ISC)?

ISC X Test

Can predict bipolar from SzSNPs, but not other diseases

Predictive information on Risk from up to 50% of Top 20% SNPs in a GWAS !

Page 59: Genetics and Methodology of Twin and Family Studies

Vink et al, 2009 AJHG in press

Pathway (Ingenuity) analysis of GWAS for smoking

Page 60: Genetics and Methodology of Twin and Family Studies

Even for “simple” diseasesthe number of alleles is large

Ischaemic heart disease (LDR) >190 Breast cancer (BRAC1) >300 Colorectal cancer (MLN1) >140

Page 61: Genetics and Methodology of Twin and Family Studies

[Science 2004]

Complex disease: common or rare alleles?

Increasing evidence for Common Disease – Rare Variant

hypothesis (CDRV)

Page 62: Genetics and Methodology of Twin and Family Studies

62

Human 1M HapMap Coverage by Population

Human 1M CEU (mean 0.96 median 1.0)

Human 1M CHB+JPT (mean 0.95 median 1.0)

Human 1M YRI (mean 0.85 median 1.0)

GENOME COVERAGE ESTIMATED FROM 990,000 HAPMAP SNPs IN HUMAN 1M

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

>0 >0.1 >0.2 >0.3 >0.4 >0.5 >0.6 >0.7 >0.8 >0.9

MAX r2

CO

VER

AG

E O

F H

APM

AP

REL

EASE

21

~95%~94%

~74%

Page 63: Genetics and Methodology of Twin and Family Studies
Page 64: Genetics and Methodology of Twin and Family Studies

http://www.genemapping.org/

Page 65: Genetics and Methodology of Twin and Family Studies

We also run two journals (1)

• Editor: John Hewitt• Editorial assistant

Christina Hewitt• Publisher: Kluwer

/Plenum• Fully online• http://www.bga.org

Page 66: Genetics and Methodology of Twin and Family Studies

We also run two journals (2) Editor: Nick Martin Editorial assistant +

subscriptions: Lorin Grey

Publisher: Australian Academic Press

Fully online http://www.ists.qimr.e

du.au/journal.html