lecture 25: association genetics november 30, 2012

27
Lecture 25: Association Genetics November 30, 2012

Upload: kristopher-green

Post on 20-Jan-2016

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Lecture 25: Association Genetics November 30, 2012

Lecture 25: Association Genetics

November 30, 2012

Page 2: Lecture 25: Association Genetics November 30, 2012

Announcements Final exam on Monday, Dec 10 at 11 am,

in 3306 LSB

2010 exam and study sheets posted on website

Exam is mostly non-cumulative

Review session on Friday, Dec. 7

Extra credit lab next Wednesday: up to 10 points

Extra credit report due at final exam

Page 3: Lecture 25: Association Genetics November 30, 2012

Last Time

Quantitative traits

Genetic basis

Heritability

Linking phenotype to genotype

QTL analysis introduction

Limitations of QTL

Page 4: Lecture 25: Association Genetics November 30, 2012

Today

Association genetics

Effects of population structure

Transmission Disequilibrium Tests

Page 5: Lecture 25: Association Genetics November 30, 2012

Quantitative Trait Locus Mapping

HEIG

HT

GENOTYPEBBBbbb

modified from D. Neale

abc

ABC

ABC

Parent 1 Parent 2

Xabc

F1 F1

X

ABC

abc

ABC

abc

ABc

aBc

aBc

Abc

ABc

aBc

Abc

Abc

abc

Abc

ABC

ABc

Abc

aBc

aBc

Abc

aBc

aBc

Bb

BbBB BB BBbb bbBB Bb Bb

Page 6: Lecture 25: Association Genetics November 30, 2012

QTL for aggressive behavior in mice

X chromosome

Monoamine Oxidase A (MAOA)

Brodkin et al. 2002

http://people.bu.edu/jcherry/webpage/pheromone.htm

XABC

ABC

abc

abc

F1 XABC

abc

ABC

abc

ABc

aBc

aBc

Abc

ABc

aBc

Abc

Abc

Page 7: Lecture 25: Association Genetics November 30, 2012

Monoamine Oxidase A (MAOA) Selectively degrades serotonin, norephinephrine,

and dopamine

Located near QTL for aggressive behavior on the X chromosome

Levels of expression affected by a VNTR (minisatellite) locus in the promoter region

Sabol et al. 1998

Page 8: Lecture 25: Association Genetics November 30, 2012

MAOA and childhood maltreatment

Caspi et al. 2002

Genotype-by-Environment interaction

Page 9: Lecture 25: Association Genetics November 30, 2012

QTL Limitations

Biased toward detection of large-effect loci

Need very large pedigrees to do this properly

Limited genetic base: QTL may only apply to the two individuals in the cross!

Genotype x Environment interactions rampant: some QTL only appear in certain environments

Huge regions of genome underly QTL, usually hundreds of genes

How to distinguish among candidates?

Page 10: Lecture 25: Association Genetics November 30, 2012

Linkage Disequilibrium and Quantitative Trait Mapping

Linkage and quantitative trait locus (QTL) analysis

Need a pedigree and moderate number of molecular markers

Very large regions of chromosomes represented by markers

Association Studies with Natural Populations

No pedigree required

Need large numbers of genetic markers

Small chromosomal segments can be localized

Many more markers are required than in traditional QTL analysis

Cardon and Bell 2001, Nat. Rev. Genet. 2: 91-99

Page 11: Lecture 25: Association Genetics November 30, 2012

Association Mapping

ancestral chromosomes

*TG

recombination throughevolutionary history

present-daychromosomesin natural population

*TG

*TA

CG

CA*TG

CA

Slide courtesy of Dave Neale

HEIG

HT

GENOTYPECCTCTT

Page 12: Lecture 25: Association Genetics November 30, 2012

Candidate Gene Associations vs. Whole Genome Scans

If LD is high and haplotype blocks are conserved, entire genome can be efficiently scanned for associations with phenotypes

Simplest for case-control studies (e.g., disease, gender)

If LD is low, candidate genes are usually identified a priori, and a limited number are scanned for associations

Biased by existing knowledge

Use "Candidate Regions" from high LD populations, assess candidate genes in low LD populations

P_2852_A157.3

P_2385_A

AB

OV

E:B

ELO

W

CO

AR

SE

RO

OT

P_204_C0.0S8_328.8P_2385_C11.6T4_1012.1S15_8S5_3713.8T4_7S6_1215.5S8_2917.9P_2786_A S12_1820.4T1_1322.3T7_423.5T3_13 T3_36S17_2124.1

S15_16T12_1525.3T2_3026.5S13_2029.5S1_2036.5T9_1 S1_1943.2S3_1350.5S1_2452.9S2_754.1P_575_A59.1T12_2260.6S2_3285.0T7_995.7S2_6107.8S13_16 T5_25121.4T5_12124.3T10_4129.0T1_26 T7_13135.7P_93_A148.6S4_20150.2S7_13 S7_12T12_4152.8

S4_24T3_10S6_4154.1

S3_1163.4S6_20 S13_31T7_15171.3

T2_31178.2S8_4180.8S8_28182.1O_30_A184.2T5_4193.5T3_17198.1T12_12206.8S5_29210.6P_2789_A219.9P_634_A S17_43226.5S17_33230.3S17_12232.7S4_19243.1

S17_26262.9

I

QTL Candidate Region

Candidate Gene Identification

Page 13: Lecture 25: Association Genetics November 30, 2012

Human HapMap Project and Whole Genome Scans

LD structure of human Chromosome 19 (www.hapmap.org)

1 common SNP genotyped every 5kb for 269 individuals 9.2 million SNP in total

Take advantage of haplotype blocks to efficiently scan genome

NATURE|Vol 437|27 October 2005

Page 14: Lecture 25: Association Genetics November 30, 2012

Next-Generation Sequencing and Whole Genome Scans

The $1000 genome is on the horizon

Current cost with Illumina HiSeq 2000 is about $2000 for 10X depth

The 1000 genomes project has sequenced thousands of human genomes at low depth

Can detect most polymorphisms with frequency >0.01

True whole genome association studies now possible at a very large scalehttp://www.1000genomes.org/

Page 15: Lecture 25: Association Genetics November 30, 2012

Identifying genetic mechanisms of simple vs. complex diseases

Simple (Mendelian) diseases: Caused by a single major gene

High heritability; often can be recognized in pedigrees

Example: Huntington’s, Achondroplasia, Cystic fibrosis, Sickle Cell Anemia

Tools: Linkage analysis, positional cloning

Over 2900 disease-causing genes have been identified thus far: Human Gene Mutation Database: www.hgmd.cf.ac.uk

Complex (non-Mendelian) diseases: Caused by the interaction between environmental factors and multiple genes with minor effects

Interactions between genes, Low heritability

Example: Heart disease, Type II diabetes, Cancer, Asthma

Tools: Association mapping, SNPs !!

Over 35,000 SNP associations have been identified thus far:

http://www.snpedia.com

Slide adapted from Kermit Ritland

Page 16: Lecture 25: Association Genetics November 30, 2012

Complicating factor: Trait HeterogeneitySame phenotype has multiple genetic mechanisms underlying

it

Slide adapted from Kermit Ritland

Page 17: Lecture 25: Association Genetics November 30, 2012

Case-Control Example: Diabetes

Knowler et al. (1988) collected data on 4920 Pima and Papago Native American populations in Southwestern United States

High rate of Type II diabetes in these populations

Found significant associations with Immunoglobin G marker (Gm)

Does this indicate underlying mechanisms of disease?

Knowler et al. (1988) Am. J. Hum. Genet. 43: 520

Page 18: Lecture 25: Association Genetics November 30, 2012

Type 2 Diabetes present absent Total

present 8 29 37

absent 92 71 163

Total 100 100 200

Gm Haplotype

(1) Test for an association

21 = (ad - bc)2N .

(a+c)(b+d)(a+b)(c+d)

Case-control test for association (case=diabetic, control=not diabetic)

Question: Is the Gm haplotype associated with risk of Type 2 diabetes???

(2) Chi-square is significant. Therefore presence of GM haplotype seems to confer reduced occurence of diabetes

= [(8x71)-(29x92)]2 (200) (100)(100)(37)(163)

= 14.62

Slide adapted from Kermit Ritland

Page 19: Lecture 25: Association Genetics November 30, 2012

Index of indian Heritage

Gm Haplotype

Percent with diabetes

0 Present

Absent

17.8

19.9

4 Present

Absent

28.3

28.8

8 Present

Absent

35.9

39.3

Case-control test for association (continued)

Question: Is the Gm haplotype actually associated with risk of Type 2 diabetes???

The real story: Stratify by American Indian heritage

0 = little or no indian heritage; 8 = complete indian heritage

Conclusion: The Gm haplotype is NOT a risk factor for Type 2 diabetes, but is a marker of American Indian heritage

Slide adapted from Kermit Ritland

Page 20: Lecture 25: Association Genetics November 30, 2012

Assume populations are historically isolated

One has higher disease frequency by chance

Unlinked loci are differentiated between populations also

Unlinked loci show disease association when populations are lumped together

Population structure and spurious association

Alleles at neutral locus

Alleles causing susceptibility to disease

Population with low disease frequency

Population with high disease

frequencyG

ene

flow

bar

rier

Page 21: Lecture 25: Association Genetics November 30, 2012

Association Study Limitations

Population structure: differences between cases and controls

Genetic heterogeneity underlying trait

Random error/false positives

Inadequate genome coverage

Poorly-estimated linkage disequilibrium

Page 22: Lecture 25: Association Genetics November 30, 2012

a=# times M transmitted

b=# times M not transmitted

(a-b)2/(a+b)

Approximately distributed as 2 with 1 degree of freedom

Transmission Disequilibrium Test (TDT) (Spiegelman et al 1993)

Mm

Mm

mm

Mm

mm

mm

Slide adapted from Kermit Ritland

Compare diseased offspring genotypes to parental genotypes to test if loci violate Mendelian expectations

Controls for population structure

Page 23: Lecture 25: Association Genetics November 30, 2012

Compared with “standard” association tests:

Still need to have tight LD, so need many markers:

Is not affected by population stratification

Only detects signal if there is both linkage and association, does not depend on mode of inheritance

Uses only affected progeny (and parental genotypes), so method is efficient

Transmission Disequilibrium Test (TDT)

Page 24: Lecture 25: Association Genetics November 30, 2012

Association Tests and Population Structure Transmission disequilibrium

tests have limited power and range of application

sample size limitations

restricted allelic diversity

“Genomic Control” uses random markers throughout genome to control for false associations

“Mixed Model” approach allows incorporation of known relatedness and population structure simultaneously

Cardon and Bell 2001 Nature Reviews Genetics 2:91

Page 25: Lecture 25: Association Genetics November 30, 2012

ANOVA/Regression Model

(monotonic)transformation

phenotype(response variable)

of individual i

effect size(regression coefficient)

coded genotype(feature) of individual i

p(β=0)error

(residual)

Goal: Find effect size that explains best all (potentially transformed) phenotypes as a linear function of the

genotypes and estimate the probability (p-value) for the data being consistent with the null hypothesis (i.e. no effect)http://www2.unil.ch/cbg/index.php?title=Genome_Wide_Association_Studies

Page 26: Lecture 25: Association Genetics November 30, 2012

Mixed Model

phenotype(response variable)

of individual i

effect of target SNP Family effect(Kinship

coefficient)

Population Effect (e.g., Admixture coefficient from

Structure or values of Principal Components)

effects of background

SNPs

Implemented in the Tassel program (Wednesday in lab)

Page 27: Lecture 25: Association Genetics November 30, 2012

Commercial Services for Human Genome-Wide SNP Characterization

NATURE|Vol 437|27 October 2005

Assay 1.2 million “tag SNPs” scattered across genome using Illumina BeadArray technology

Ancestry analyses and disease/behavioral susceptibility