illumina’s gwas roadmap

© 2010 Illumina, Inc. All rights reserved.Illumina, illuminaDx, Solexa, Making Sense Out of Life, Oligator, Sentrix, GoldenGate, GoldenGate Indexing, DASL, BeadArray, Array of Arrays, Infinium, BeadXpress, VeraCode, IntelliHyb,iSelect, CSPro, and GenomeStudio are registered trademarks or trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners.

Illumina’s GWAS Roadmap:

next-generation genotyping studies in

the post-1KGP era

Daniel Peiffer, PhDSr. Product Manager

Genotyping Applications

2

Overview

First-gen GWAS vs. Next-gen GWAS

Next-gen Sequencing and the 1kGP Revolution

Illumina’s GWAS Roadmap

Presenter

Presentation Notes

The structure of the talk includes a brief overview of some of the critical milestones that have occurred over the past decade that really have contributed to where we are today as a community of human genetic researchers. Next we’ll review some of the successes of GWAS, GWAS meaning genome wide association study, in the latter half of the decade, but we will also begin a conversation about what is missing, what is needed next, and where are the new hypotheses that are going to lead the community moving forward. This will lead into a discussion of what Illumina is developing in terms of tools that will enable researchers to explore more fully these new hypotheses throughout 2010. And lastly, we’ll have a short discussion of sequencing and arrays. Two complementary technology that each have their strengths and each have their own place in a researcher’s toolbox, or arsenal, for going after the variants that contribute to their disease or trait of interest.

3

First-gen GWAS vs. Next-gen GWAS

4

The GWAS approach is successful in human genetics

Year # of publications

2005 2

2006 8

2007 89

2008 151

2009 222

First publications in 2005

Almost 600 total publications since

Over 3500 associations published

Wide-range of phenotypes and diseases

5

Published Genome-Wide Associations through 9/2009, 536 published GWA at p < 5 x 10-8

NHGRI GWA Catalogwww.genome.gov/GWAStudies

Presenter

Presentation Notes

And GWAS has been successful for many different phenotypes and identified variants across the entirety of the genome. This slide here was downloaded from the NHGRI GWAS Catalog available at the website listed here and clearly shows the diversity of diseases and traits that have been studied using a GWAS approach and the many significant findings that have been discovered as a result of these experiments. This is truly remarkable as many of these disorders have been documented for centuries, even millennia, but for many of them, it has only been through the GWAS efforts of the past few years that an understanding of the genetics variants that contribute have begun to emerge.

6

0%

20%

40%

60%

80%

100%Rare Common Disease/Traits

For most common diseases, the sum of individual effects found so faris much less than the total estimated heritability

The Case of the Missing Heritability

Missing

Explained

Heritability

Adapted from Manolio et al 2009

Presenter

Presentation Notes

Many genetic studies have successfully detected both common and rare genetic variants for both single gene and complex traits. However, there is still much more needed to understand the genetics behind many of these disorders and this is highlighted by the observation of missing heritability for many common diseases and traits. In other words, for many diseases, the sum of individual effects of the indentified associated variants is often much less than the total estimated heritability of those diseases. To illustrate this point, on this graph a number of different disorders are listed along the x-axis while percentage of explained heritability is along the y-axis. For very rare, Mendelian disorders such as Huntington’s or Cystic Fibrosis, located on the left hand side of the graph, genes and variants that contribute to these disorders have been identified for decades through the work of traditional linkage analysis and cumulatively explain almost all of the observed heritability. However, even though many variants have been identified that contribute to the more common disorders such as AMD, Crohn’s, Lupus, T2D, cumulatively these variants explain a much smaller fraction of the observed heritability. The question remains how much more could be found if scientists had better tools. So, what are we, as a community, missing? Where is it? And what tools are needed to find it?

7

Tackling the Full Spectrum of Variants in DiseaseEf

fect

siz

eS

mal

lLa

rge

Allele FrequencyLow High

Very Rare VariantsLarge Effect Size

LINKAGESEQUENCINGNEW!

Rare/Intermediate VariantsIntermediate Effect Size

Next-gen GWAS

NEW!

Common VariantsSmall Effect Size

GWAS

(Common VariantsLarge Effect Size)

(Rare VariantsMinimal Effect Size)

Next-gen GWAS

NEW!

Presenter

Presentation Notes

Well, one way to think about this is in the context of two variables – risk allele frequency along the x axis and effect size along the y-axis. As you see in this plot here. The sweet spot, if you will, for disease studies is marked by the blue band that travels almost diagonally from rare variants of large effect down and across to common variants of small effect. Outside of that blue band, in the upper right, common variants with high effect size, well there aren’t many of those our researchers would be finding them regularly with the tools that have been available for common variant GWAS. In the lower left corner, rare variants of small effect are also of limited interest as they are almost impossibly difficult to identify and even if it were possible have marginal importance to understanding disease on the population level. Now, across this swath, the extremes have been well covered by available technologies. For example, those rare variants that confer large effect sizes have been identified using linkage mapping techniques for decades now with over 2000 hits for Mendelian disorders. Likewise, common variants that confer small to modest effect sizes have also been well explored using available GWAS tools for the past five years. The great unexplored area of this curve, therefore, is the centre section, the variants of rare to intermediate allele frequency that confer intermediate effect sizes. And it is through a second generation of GWAS tools, enabling rich GWAS, that researchers will be able to explore this class of variation for association to disease. Obviously, as more and more is learned about the true spectrum of variation through projects such as the 1000 genomes project, this will improve the tools for exploring common variation as well, so in essence this segment of the curve will also benefit from new “rich” GWAS tools. And lastly, as next generation sequencing matures and becomes more available, it will lend itself nicely to the exploration of the left hand extreme of the curve, the rare variants of large effect, taking over where traditional linkage mapping had left off.

8

Enabling discoveries with the right technology

Effe

ct S

ize

MAFCommonRare

SEQSEQ or

ARRAYSNext-Gen

Arrays 1st Gen ArraysLa

rge

Sm

all

9

Next-gen Sequencing and the 1kGP Revolutiona new era beyond the HapMap Project

10

Next-Gen Sequencing

High Density Custom Arrays

Targeted resequencing

Next-gen GWAS Arrays

ARRAYS

Presenter

Presentation Notes

But, what’s next? Well, the future of human genetics is intimately linked to sequencing-based discovery efforts. As next-gen sequencing matures, the catalogue of variation that is available for creating microarrays will grow at an unprecedented rate. These new variants can be deployed on custom arrays or high density standard arrays that in turn will identify regions of the genome for further discovery efforts, funneling back into targeted resequencing efforts, for example. Suffice it to say that sequencing and arrays are evolving hand in hand to enable the next wave of discoveries.

11

The 1,000 Genomes ProjectSequence 2,500 genomes to complete the picture of genetic variation

Project Goals

1. Accelerate fine-mapping efforts in gene regions indentified through genome-wide association studies or

candidate gene studies

2. Improve the power of future genetic association studies by enabling design of next-generation genotyping

microarrays that more fully represent human genetic variation

3. Enhance the analysis of ongoing and already completed association studies by improving our ability to “impute”

or “predict” untyped genetic variants

Achieve a nearly complete catalog of common human genetic variants with frequency 1% or higher.

Presenter

Presentation Notes

Indeed, massive next-generation resequencing projects such as the 1000 Genomes Project, are delivering a wealth of new information about the true spectrum of variants present in populations. And all of this new content is available for design of the next generation of microarrays for GWAS. In fact, it is spelled out in the 1000 Genomes mission statement – to improve the power of future genetic association studies by enabling design of next-generation microarrays…

12

New Content for Next-gen GWAS ArraysRich content to explore new hypotheses and enable new discoveries

Project Year

Approx. Cumulative SNPs

found

Tag SNPs needed for

max coverage

Lower limit of allele frequency

targeted% variation tagged

(r2>0.8)

HapMap 2003-2007 3M ~0.6M 5% >90%

1kG Pilot Project 2008-2009 13M ~2.5M 2.5% ~80%

1kG Full Project 2010 35M* ~5.0M 1% >90%

Sequence to discover SNPs >1% MAF (1000-Genomes project)

Leverage the power of LD to select tagSNPs and remove redundancy

Include progressively more SNPs at lower allele frequencies (5%, 2.5%, 1%)

* Estimated

Presenter

Presentation Notes

So, what does this mean in practice. Well, step one is to sequence to discover SNPs - an effort that is being completed, for example, by the 1000 Genomes Project as we’ve already discussed. Next, the same concept of haplotype blocks and tagSNPs that was so effective for the first generation of GWAS arrays continues to apply for these new variants as well. The only difference is that now a more complete picture of the variants and their LD blocks is available for further improved tagSNP selection. Furthermore, as the data from the 1000 Genomes Project is to be released in stages, this process of selecting tags can be applied iteratively down to lower and lower frequencies. This table begins with the HapMap and ~3 million SNP which could be tagged by slightly over a ½ a million well chosen markers. Arrays designed off of the HapMap had a lower limint of allele frequency of ~5% and currently available arrays are able to tag about 90% of all variation in this category. Next, the first phase of the 1000 Genome Project has identified upwards of 17 million variants so far, though again, applying an intelligent tagging approach, we estimate about 2.5 million will be needed to capture approximately 80% of all variants down to 2.5% MAF. And lastly, the final phase of the 1000 Genomes Project is anticipated to deliver ~35 million new variants. We estimate that ~90% of these down to 1% MAF can be captured using ~5 million variants.

13

1000 Genomes

0 10 20 30 40 50 600.0

0.2

0.4

0.6

0.8

1.0Co

unt (

x 106 )

Minor Alleles

HapMap Represents a Small Part of All Variation

SNPs by observations in 60 CEU Samples

HapMap

Presenter

Presentation Notes

The need for more comprehensive chips can be seen just by looking at the amount of content available before the 1000 genomes versus what is now coming out of the 1000 Genomes. This plot shows the number of CEU SNPs within the HapMap database as a function of the number of times that the minor allele was observed. Maximizing coverage of the common SNPs that predominantly populate the HapMap data base has been the logical approach for years because, until recently, it represented the most comprehensive database of SNP information across many samples. When we compare just the CEU data from the 1000 Genomes that was released through the end of 2009, we now have ascertain over three times as many putative SNPs compared to what we knew about from the HapMap data. Additionally, we now have a much richer understanding of the full frequency spectrum of SNPs.

14

15

GWAS Roadmap Review

HapMap Phase 1 HapMap Phase 3HapMap Phase 2 1,000 Genomes Project

Future GWAS Products

HumanOmni1-Quad & OmniExpress

Human1M-Duo

Human660-Quad

HumanHap500

HumanHap300

5M

2.5M

1M660K550K

317K Current Projected

Arr

ay P

rodu

cts

Dat

a P

oint

s pe

r Sam

ple

Content Source

Announcement at ASHG, Oct 2009

16

The Omni Family of MicroarraysNext-generation GWAS. NOW.

Omni Express*

Omni1-Quad Omni1S-8Omni2.5-

QuadOmni2.5S Omni5

Highest-throughput array with industry-proven quality at an

exceptional price.

Optimal combination of common SNPs,

CNVs, and content from 1kGP.

Takes researchers from Omni1/Express

to 2.5M

The most optimal and

comprehensive set of both

common and rare SNP content from

the 1kGP

~2.5M additional markers providing rare 1kGP content

The ultimate GWAS tool providing near complete coverage

of common and rare variation

MAF > 5% MAF >2.5% MAF > 1%

Presenter

Presentation Notes

So, on a single slide, here is the Omni Family of Microarrays mapped out. The family begins with the Omni1 and OmniExpress arrays on the left hand slide of this slide and proceeds through the Omni1S and Omni2.5, leading then into the Omni2.5S and the Omni5. As these products are being designed on successive releases of the 1000 Genomes Project data, we see a step-wise progression from 5% MAF target down to 1% MAF target with the Omni5 and Omni2.5S products. The Omni1 and OmniExpress are available now and were designed primarily from information available from the HapMap project, with the incorporation of a small fraction of new variants identified by the 1kGP. As the Roadmap proceeds, the fraction of new, 1000 Genomes Project data that will be incorporated into the chip design will increase sequentially. The whole idea here is to provide a clear path for researchers to begin accessing these new, rarer variants from 1kGP as quickly as possible throughout 2010 so that new discoveries can be made faster, than if the community waited for the 5M as a stand alone product. Furthermore, the Roadmap gives researchers the flexibility to jump into next-gen GWAS at whatever stage is most appropriate for their diease, trait, budget, and long and short term research goals.

17

Content optimized from next-gen re-sequencing efforts such as 1000 Genomes.

Pushing the boundary of GWAS content into unexplored territory

Cost effective path for researchers that want to ride the cutting edge today

18

Path Step 1 Step 2 Step 3 Total Markers

1

OmniExpress Omni1S Omni2.5S

~4.4 Million

2

Omni1 Omni1S Omni2.5S

~5 Million

3

Omni2.5 Omni2.5S

~5 Million

4

Omni5

~5 Million

Roadmap Paths

19

2010 GWAS RoadmapMultiple chips made Easier with the Multi-use Workflow

Omni1-QuadMulti-use

OmniExpressMulti-use

Omni2.5Multi-use

Omni1S Multi-use

Omni2.5S Multi-use

Roadmap Entry Point Second Array Third Array

20

Omni2.5 Details

21

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Competitor“New Array” *

Competitor “Old 900K”

660W Omni1/OmniExpress

Omni2.5

% C

aptu

red

at r

2 >0.

8

CEU Coverage Estimates: HapMap vs. 1kGP Reference Data

HapMap 5% 1kGP 5% 1kGP 2.5%

This is not just an array with “new” content!The Omni2.5 array is a complete game-changer!

*Base content only

22

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Competitor "New Array" *

Competitor "Old 900K"

660W Omni1/ OmniExpress Omni2.5

% C

aptu

red

at r

2 >0

.8

YRI Coverage Estimates: HapMap vs. 1kGP Pilot Data

HapMap 5%

1kGP 5%

1kGP 2.5%

Genomic Coverage Stats for African Populations

*Base content only

23

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Competitor "New Array" *

Competitor "Old 900K"

660W Omni1/ OmniExpress Omni2.5

% C

aptu

red

at r

2 >0

.8

CHB/JPT Coverage Estimates: HapMap vs. 1kGP Pilot Data

HapMap 5%

1kGP 5%

1kGP 2.5%

Genomic Coverage Stats for Asian Populations

*Base content only

25

Illumina GWAS Portfolio at a Glance

Omni2.5 Omni1 OEx CytoSNP12

Number of Markersper Sample 2,450,000 1,140,419 733,202 301, 232

Number of Samplesper BeadChip 4 4 12 12

Scan Times per Sample(minutes) 15 13 5 3

Spacing (Mean / Median / 90% percentile largest gap)

1.18 / 0.63 / 2.74

2.4/ 1.2 / 6.4 4.1 / 2.2 / 9.2 9.6 / 6.2 / 18.6

Markers Within 10 kb ofa RefSeq Gene 1,233,932 618,959 381,329 148,666

Non-Synonymous SNPs§ 49,564 32,110 12,134 3,480

MHC/ ADME / Indel11,149 / 27895 /

019,081 / 22,429 /

4597,566 / 16680 /

0761 / 2,382 / 0

Sex Chromosome(X / Y / PAR Loci)

57,061 / 1897 / 554

27,493 / 2,322 / 1,157

18,239 / 1697 / 540

15,063 / 2,841 / 1,579

Mitochondrial SNPs 93 27 0 0

26

Enabling discoveries with next-gen GWASEf

fect

siz

e

Allele Frequency

(Common VariantsLarge Effect Size)

(Rare VariantsMinimal Effect Size)

25%1% 5%

12.0

1.1

1.5

3.0

6.0

Sequ

enci

ng

Omni1Omni2.5MOmni5M

Arrays

27

Summary

First-generation GWAS has provided a foundation for beginning to understand the genetic architecture of many diseases and traits.

However, first-generation GWAS was limited by the extent of knowledge about the spectrum of variation in humans in the HapMap era.

NGS re-sequencing efforts, such as 1kGP, are providing a much more comprehensive catalog of common variation (>1% MAF) in diverse populations

Next-gen GWAS tools are leveraging this expanded catalog of variation to drive a new wave of genetic discovery by enabling exploration of the rare-variant hypothesis and higher resolution CNV research in a cost-effective tools.

28

Thank you!

illumina’s gwas roadmap

Documents

stéphanie debette - gwas of sadb mri hyperintensities

illuminaillumina s’s gwas roadmap · illuminaillumina...

gee & glmm in gwas

crowdsourcing gwas

class gwas go to genotation.stanford.edu go to “traits”,...

phenotype information for existing gwas studies

gwas – the future the proposed acquisition of gwas by...

epi519 gwas talk

livrepository.liverpool.ac.uklivrepository.liverpool.ac.uk/3091012/1/pangeneu...

the case of gwas of obesity: does body weight control play...

genome-wide association study (gwas)

lecture 7 gwas full

gwas qc -theory and steps

illumina terms and conditions of sale · necessary to...

srep15786.pdf gwas corr ayurveda prakriti

r gwas packages

exeter 2011-gwas

classical hypothesis testing and gwas - brown...

psoriasis drug development and gwas interpretation through

ultimate package for gwas - faculty.washington.edu