statistical geneticsgalton.uchicago.edu/events/retreat/slides/stephens.pdf · retreat talk 2012....

24
Statistical Genetics Matthew Stephens Statistics Retreat, October 26th 2012 Matthew Stephens Retreat Talk 2012

Upload: others

Post on 14-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

Statistical Genetics

Matthew Stephens

Statistics Retreat, October 26th 2012

Matthew Stephens

Retreat Talk 2012

Page 2: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

Two stories

I The two most influential statistical ideas in analysis of geneticassociation studies.1

I Sequence, sequence, everywhere.

1With apologies to Steve StiglerMatthew Stephens

Retreat Talk 2012

Page 3: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

Story I: Genetic Association Studies

Genetic association studies aim to identify genetic variants thatmodify risk of common diseases or affect other phenotypes(e.g. Type I Diabetes, height, LDL cholestrol).

The idea is absurdly simple: measure genetic variants (usuallySNPs), and phenotypes in randomly-sampled individuals, and seewhich SNPs are correlated with phenotypes.

Matthew Stephens

Retreat Talk 2012

Page 4: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

Story I: Genetic Association Studies

I Typical recent genome-wide studies have typed 500K-1MSNPs in thousands of (unrelated) phenotyped individuals.

I Basic Analysis: test each SNP, one-by-one, for statisticalassociation with each phenotype.

Matthew Stephens

Retreat Talk 2012

Page 5: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

Progress identifying variants underlying common disease

Published Genome‐Wide Associations through 09/20111,617 published GWA at p≤5X10‐8 for 249 traits

NHGRI GWA Catalogwww.genome.gov/GWAStudies

Credit:

Darryl Leja and Teri Manolio

Matthew Stephens

Retreat Talk 2012

Page 6: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

The two most influential statistical ideas in GWAS

I Correction for unmeasured confounding (populationstructure).

I Imputation to combine studies.

Matthew Stephens

Retreat Talk 2012

Page 7: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

Population Structure and Unmeasured Confounding

The Problem in a nutshell: What would happen if you conducted aGenetic Association study for “Chopstick Use” in San Francisco?

Matthew Stephens

Retreat Talk 2012

Page 8: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

Population Structure and Unmeasured Confounding

If you know the “genetic background” of the individuals in yourstudy (e.g. which continent they inherited their genes from), thenyou can correct for it.

What if you don’t know it?

Matthew Stephens

Retreat Talk 2012

Page 9: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

Principal Components Analysis to the rescue!

Novembre et al, Nature, 2008

Matthew Stephens

Retreat Talk 2012

Page 10: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

Principal Components Analysis to the rescue!

Test for significance of genetic effect β, controlling for effects ofgenetic background (α):

y = vα + xβ + ε

Price et al, Nature Genetics, 2006

Matthew Stephens

Retreat Talk 2012

Page 11: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

The two most influential statistical ideas in GWAS

I Correction for unmeasured confounding (populationstructure).

I Imputation to combine studies.

Credit: Bryan Howie

Matthew Stephens

Retreat Talk 2012

Page 12: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

Genotype(imputa-on(background(

SNPs%genotyped%on%an%array%

0% 0% 0% 0% 0% 0% 0%1% 1% 1% 1% 1% 1% 1% 1%0% 0% 1% 1% 1% 1% 1%0% 0% 0% 1% 0% 0% 0% 1%1% 1% 0% 0% 0% 0% 0%1% 1% 1% 0% 1% 0% 0% 0%1% 0% 0% 0% 1% 1% 1%1% 1% 0% 1% 1% 0% 0% 1%

1% 1% 1%0% 0%2%1% 0% 0%0% ?%1%0% 0% 1%1% 1%1%1% 1% 1%0% 0%2%?% 0% 0%0% 0%2%1% 0% ?%1% 1%1%0% 1% 1%0% 0%2%1% 1% 2%1% 1%1%

Reference(haplotypes(

Phenotyped(GWAS(samples(

Matthew Stephens

Retreat Talk 2012

Page 13: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

0% 0% 0% 0% 0% 0% 0%1% 1% 1% 1% 1% 1% 1% 1%0% 0% 1% 1% 1% 1% 1%0% 0% 0% 1% 0% 0% 0% 1%1% 1% 0% 0% 0% 0% 0%1% 1% 1% 0% 1% 0% 0% 0%1% 0% 0% 0% 1% 1% 1%1% 1% 0% 1% 1% 0% 0% 1%

?% ?%?%?% ?% ?%?% ?% ?%?% ?%?%?% ?% ?%?% ?% ?%?% ?%?%?% ?% ?%?% ?% ?%?% ?%?%?% ?% ?%?% ?% ?%

1% 1% 1%0% 0%2%1% 0% 0%0% ?%1%0% 0% 1%1% 1%1%1% 1% 1%0% 0%2%?% 0% 0%0% 0%2%1% 0% ?%1% 1%1%0% 1% 1%0% 0%2%1% 1% 2%1% 1%1%

?% ?%?%?% ?% ?%?% ?% ?%?% ?%?%?% ?% ?%?% ?% ?%?% ?%?%?% ?% ?%?% ?% ?%?% ?%?%?% ?% ?%?% ?% ?%

Genotype(imputa-on(background(

Untyped%SNPs%

Reference(haplotypes(

Phenotyped(GWAS(samples(

Matthew Stephens

Retreat Talk 2012

Page 14: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

0% 0% 0% 0% 0% 0% 0%1% 1% 1% 1% 1% 1% 1% 1%0% 0% 1% 1% 1% 1% 1%0% 0% 0% 1% 0% 0% 0% 1%1% 1% 0% 0% 0% 0% 0%1% 1% 1% 0% 1% 0% 0% 0%1% 0% 0% 0% 1% 1% 1%1% 1% 0% 1% 1% 0% 0% 1%

0% 0%0%1% 1% 1%2% 2% 2%0% 0%1%1% 1% 0%1% 1% 2%1% 1%0%0% 2% 0%1% 1% 1%0% 0%0%2% 1% 1%2% 2% 2%

1% 1% 1%0% 0%2%1% 0% 0%0% 0%1%0% 0% 1%1% 1%1%1% 1% 1%0% 0%2%2% 0% 0%0% 0%2%1% 0% 1%1% 1%1%0% 1% 1%0% 0%2%1% 1% 2%1% 1%1%

0% 0%2%1% 0% 0%2% 2% 2%1% 1%1%1% 1% 0%1% 1% 1%0% 2%2%0% 2% 1%2% 2% 2%1% 1%1%1% 1% 1%1% 1% 1%

Associa8on%signal%

Genotype(imputa-on(background(

Reference(haplotypes(

Phenotyped(GWAS(samples(

Matthew Stephens

Retreat Talk 2012

Page 15: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

0% 0% 0% 0% 0% 0% 0%1% 1% 1% 1% 1% 1% 1% 1%0% 0% 1% 1% 1% 1% 1%0% 0% 0% 1% 0% 0% 0% 1%1% 1% 0% 0% 0% 0% 0%1% 1% 1% 0% 1% 0% 0% 0%1% 0% 0% 0% 1% 1% 1%1% 1% 0% 1% 1% 0% 0% 1%

Imputa-on(facilitates(meta>analysis(

Reference(haplotypes(

1% 1% 1%0% 0%2%1% 0% 0%0% 0%1%0% 0% 1%1% 1%1%1% 1% 1%0% 0%2%

GWAS(1(

GWAS(2(

1% 1% 1%0% 1%1% 2% 0%0% 0%0% 1% 0%2% 2%0% 1% 1%1% 1%

0%0%0%1%

Matthew Stephens

Retreat Talk 2012

Page 16: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

0% 0% 0% 0% 0% 0% 0%1% 1% 1% 1% 1% 1% 1% 1%0% 0% 1% 1% 1% 1% 1%0% 0% 0% 1% 0% 0% 0% 1%1% 1% 0% 0% 0% 0% 0%1% 1% 1% 0% 1% 0% 0% 0%1% 0% 0% 0% 1% 1% 1%1% 1% 0% 1% 1% 0% 0% 1%

Imputa-on(facilitates(meta>analysis(

Reference(haplotypes(

1% 1% 1%0% 0%2%1% 0% 0%0% 0%1%0% 0% 1%1% 1%1%1% 1% 1%0% 0%2%

0% 0%2%1% 1% 1%2% 2% 1%0% 0%1%1% 1% 0%1% 1% 0%1% 1%1%0% 2% 0%1% 1% 1%0% 0%1%2% 1% 1%2% 2% 0%

GWAS(1(

0% 1% 2%1%1% 1%0% 2%1% 1% 1% 1%0% 1% 0%0% 0% 1%1%0% 1%0% 2%1% 1% 2% 0%0% 0% 0%1% 0% 1%1%1% 1%1% 1%0% 0% 1% 0%2% 2% 0%0% 1% 1%0%1% 0%0% 2%0% 0% 1% 1%1% 1% 1%

GWAS(2(

Associa8on%signal%

Matthew Stephens

Retreat Talk 2012

Page 17: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

0% 0% 0% 0% 0% 0% 0%1% 1% 1% 1% 1% 1% 1% 1%0% 0% 1% 1% 1% 1% 1%0% 0% 0% 1% 0% 0% 0% 1%1% 1% 0% 0% 0% 0% 0%1% 1% 1% 0% 1% 0% 0% 0%1% 0% 0% 0% 1% 1% 1%1% 1% 0% 1% 1% 0% 0% 1%

Imputa-on(facilitates(meta>analysis(

Reference(haplotypes(

1% 1% 1%0% 0%2%1% 0% 0%0% 0%1%0% 0% 1%1% 1%1%1% 1% 1%0% 0%2%

0% 0%2%1% 1% 1%2% 2% 1%0% 0%1%1% 1% 0%1% 1% 0%1% 1%1%0% 2% 0%1% 1% 1%0% 0%1%2% 1% 1%2% 2% 0%

GWAS(1(

0% 1% 2%1%1% 1%0% 2%1% 1% 1% 1%0% 1% 0%0% 0% 1%1%0% 1%0% 2%1% 1% 2% 0%0% 0% 0%1% 0% 1%1%1% 1%1% 1%0% 0% 1% 0%2% 2% 0%0% 1% 1%0%1% 0%0% 2%0% 0% 1% 1%1% 1% 1%

GWAS(2(

Type%2%diabetes:%Zeggini%et%al.,%May%2008%(Nature'Gene*cs)%

Crohn’s%disease:%BarreH%et%al.,%Aug%2008%(Nature'Gene*cs)%

Type%1%diabetes:%Cooper%et%al.,%Nov%2008%(Nature'Gene*cs)%

Matthew Stephens

Retreat Talk 2012

Page 18: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

Story II: Sequence, Sequence, Everywhere

Matthew Stephens

Retreat Talk 2012

Page 19: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

Sequencing Assays, and Statistical Challenges

Although DNA sequencing is best known for obtaining “genomesequences”, it is now routinely used for measuring cellularprocesses to try to understand how cells operate.For example:

I Gene expression (RNA-seq).

I Chromatin openness (DNase-seq).

I Transcription Factor Binding (ChIP-seq)

I Histone modifications (ChIP-seq)

A key question is how/why cells differ from one another (theyshare the same DNA!).

Matthew Stephens

Retreat Talk 2012

Page 20: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

Chromatin and DNA structure

Figure from Felsenfeld and Groudine. Nature, 2003

Matthew Stephens

Retreat Talk 2012

Page 21: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

The Data

The basic structure of these assays is the same:

I Do something clever to get bits of the DNA that you want(e.g. the bits that contact a modified histone, or the bits thatare bound by a particular transcription factor).

I Sequence these bits (producing millions of little sequences).

I Work out where in the genome each sequence came from.

I The number of sequences coming from each location (usually0 or 1) is a measure of the “intensity” of the process at thatlocation.

I Basic model: an inhomogeneous Poisson process,xib ∼ Poi(λib).

Matthew Stephens

Retreat Talk 2012

Page 22: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

Example: Histone Modification H3K4me1

Can you spot the difference?

32230000 32250000 32270000 32290000

0.00

0.02

0.04

0.06

0.08

Left Ventricle, H3K4me1

xx

32230000 32250000 32270000 32290000

0.00

0.02

0.04

0.06

0.08

Right Ventricle, H3K4me1

Data from Scott Smemo, Nobrega lab

Matthew Stephens

Retreat Talk 2012

Page 23: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

Advertisement: STAT 45800

We have preliminary ideas and methods for dealing with thesedata, based on wavelets for count data (work with H. Shim).

In STAT 45800 we will try “crowd-sourcing” these ideas, to seehow much further progress we can make.

Aim: to combine expertises in Bioinformatics, Computing, Biologyand Statistics, to make more progress together than any of uscould do alone!

Matthew Stephens

Retreat Talk 2012

Page 24: Statistical Geneticsgalton.uchicago.edu/events/retreat/slides/Stephens.pdf · Retreat Talk 2012. Story I: Genetic Association Studies I Typical recent genome-wide studies have typed

Acknowledgements

I Bryan Howie, Heejung Shim.

I Funding: NHGRI, NIH GTEX project, and NIH ENDGAMEconsortium.

Matthew Stephens

Retreat Talk 2012