contemporary research in human genomics - human genome variation

Post on 30-Nov-2014

297 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

CONTEMPORARY RESEARCH IN HUMAN GENOMICS

Genetics, Ethics and the LawMay 29-31, 2009

Josyf Mychaleckyj, D.Phil.Center for Public Health GenomicsUniversity of Virginia

Slide 2

Joe Mychaleckyj

Today we’ll review…

• Genome Wide Association Studies (GWAS)• Copy Number Variants (CNVs)• Medical Resequencing• Direct-to-Consumer Services (DTC)

Joe Mychaleckyj

Slide 3

Genome Wide Association Studies (GWAS)

Slide 4

Joe Mychaleckyj

A C C G C G T G T C

Single Nucleotide Polymorphisms: SNPs (‘SNiPs’)

A C C G T G T G T C

Chromosome #1

Chromosome #2

C, T are the 2 different alleles for this SNP

Mutation = Rare variantPolymorphism = Frequent (> 1% prevalence)

Slide 5

Joe Mychaleckyj

Homozygote f(AA)

Each person carries pairs of chromosomes with a separate allele at the SNP position on each chromosome

3 Possible SNP Genotypes

A A

Heterozygote f(AG)A G

Homozygote f(GG)G G

f(AA) + f(AG) + f(GG) = 1

frequency

Slide 6

Joe Mychaleckyj

Case Control Association studyCases =

Clinical Disease Controls =

Disease Free

eg Blue Allele: 0.48 (48%) 0.41 (41%)

Quantitative Trait Locus (QTL)

Association Study

Slide 8

Joe Mychaleckyj

Genome Wide Association Study

• SNPs most common type of human genome variant by number (10-15 Million)

• Stable, easy to assay, accurately genotype• Able to multiplex 1000’s of SNPs into same assay

Affymetrix Human 6.0906,000 SNPS946,00 probes for CNV

Illumina 1M-Duo

Slide 9

Joe Mychaleckyj

GWAS • SNPs present in genes (affect proteins) but

since coding sequence is ~2% of genome, the vast majority of human SNPs are outside exons or introns

• Genotype Dense map of SNPs across all chromosomes of the human genome

• Studies with 500,000 SNPs are becoming routine and 1 Million SNP panels are available

• Do not have to test all 10M SNPs because of SNP-SNP correlations (linkage disequilibrium)

Slide 10

Joe Mychaleckyj

GWAS approach

Does not assume a knowledge of genes or biology

Hardy J, Singleton A.N Engl J Med. 2009 Apr 23;360(17):175

Joe Mychaleckyj

Slide 11

Genome wide Association Analysis of Coronary Artery Disease, NEJM 2007

Slide 12

Joe Mychaleckyj

But Common Diseases are Complex

Gene 1

Gene 2

Gene 4Gene 3VPPGEEQRYT[C/Y]QVEHPGLD

rs1800562GGGGAAGAGCAGAGATATACGT[A/G]CCAGGTGGAGCACCCAGGCCTG

C282Y

HFE

P( Hemochromatosis+ | CC homozyote) ~ 60-100%

Environment 1

Environment 2

Clinical Complex Disease

Environment 3

Clinical Monogenic Disease

OR

OR

Gene 5OR

Slide 13

Joe Mychaleckyj

Monogenic vs Complex DiseaseMonogenic Complex

1 or small # of genes Many

Often etiologic Susceptibility / molecular (severe phenotype) pathology ?

Highly penetrant Modest penetrance

High Odds Ratio Modest/Low Odds Ratio

Strong selection => Weak/No selection => Low frequency/Rare High frequency/Common

Coding Sequence Non-coding/regulation (?)

Slide 14

Joe Mychaleckyj

What are GWAS Studies Finding

• Typically detected variants are common (allele freq >10%)

• low genotype risk, odds ratio (1.1-1.5)• Small sibling relative risk• Causal variants have not been mapped -

function unknown and major signals occur in non-coding regions

• Penetrance model not well known

Slide 15

Joe Mychaleckyj

Example: Crohn Disease

First susceptibility gene NOD2 for Crohn DiseaseSNP: rs17221417

• GRR (het) = 1.29, GRR Homo = 1.92• Allele frequency 0.287 • Sibling Risk Ratio = 1.02• Familial risk in NOD2 has been estimated at

1.19-1.49 but varies with populationLewis J Med Genet 2007, Economou Am J Gastroenterol 2004

Slide 16

Joe MychaleckyjHindorff, PNAS 2009

>200 GWAS studies published as of December 2008

Slide 17

Joe Mychaleckyj

Nature Genetics 41, 666 - 676 (2009) Published online: 10 May 2009Genome-wide association study identifies eight loci associated with blood pressure

Slide 18

Joe Mychaleckyj

The GWAS conundrum: Little variance/risk is explained by GWAS alleles• Obesity

– FTO and MC4R <2% of variance

• Lipids– 30 gene loci, proportion of variance explained in each trait:– 9.3% for HDL cholesterol– 7.7% for LDL cholesterol– 7.4% for triglycerides

• Diabetes– 18 replicated loci: combined sibling relative risk ~1.07

Slide 19

Joe Mychaleckyj

Example: Height

• Highly heritable (heritability ~0.8)• Combined sample of ~63,000• 54 validated variants in multiple genes• Each locus explains ~0.3% - 0.5% of the

phenotypic variance• Total variance explained < 5% overall

Slide 20

Joe Mychaleckyj

What are we missing?

• Population differences• Alleles with small effect sizes• Copy number variants• Rare variants• Epigenetic effects

Slide 21

Joe Mychaleckyj

• Genotype and phenotype datasets made available as rapidly as possible to a wide range of scientific investigators

• Grantees are expected to develop a sharing plan consistent with the GWAS policy.

• Plan should include data submission to the NIH GWAS data repository (dbGaP).http: grants.nih.gov/grants/guide/notice-files/NOT- OD- 07-

088.html)

Pezzolesi et al Diabetes 2009

Slide 22

Joe Mychaleckyj

http://www.ncbi.nlm.nih.gov/gap

Slide 24

Joe Mychaleckyj

NIH GWAS Data Sharing Issues• Sharing of individual genotype & phenotype

data with any approved researcher worldwide

(*Public access to genetic summary statistics)• Review by a central NIH data use committee

(DUC) not constituted by the study • Informed consent templates for new GWAS • ‘Retrofitting’ existing cohorts to conform to

NIH Policy – adequacy of consents– Data sharing clauses– Use of data for research purposes not intended or foreseen

• Ancestry, ethnic origins – harm to community http://grants.nih.gov/grants/gwas/

Slide 25

Joe MychaleckyjPloS Genetics Aug 2008

0.0 0.25 0.75 1.0 Allele Frequency

More Likely to be in mixture

MixtureReference Sample

Personal Genome

Summation over all SNPs, can infer with very high confidence whether the Person (or a close relative) is more likely to be in the Mixture versus a Reference Sample

Example Results for one SNP

Joe Mychaleckyj

Slide 26

Copy Number Variants (CNVs)

Slide 27

Joe Mychaleckyj

Copy Number Variants• Submicroscopic structural genome

rearrangments (cf cytogenetics, FISH)– ~ 10 – 10,000 base pairs in length– Insertions, deletions, duplications (2+ copies), inversions

• Copy number variant or polymorphism – polymorphism = more common CNV (> 1% frequency = CNP)

• Common feature of the genome• Frequency >1% => polymorphism (CNPs)• Assay using genome wide SNP or CNV arrays

– Electronic FISH study

Slide 28

Joe Mychaleckyj

Copy number variants (CNVs)

The Copy Number Variation (CNV) Projecthttp://www.sanger.ac.uk/humgen/cnv/

Slide 29

Joe Mychaleckyj

~11kb deletion on chromosome 8 revealed by ultra-high resolution CGH. Blue lines: individuals with two copies. Red line: individual with zero copies.

The Copy Number Variation (CNV) Projecthttp://www.sanger.ac.uk/humgen/cnv/

Points are SNPs or probes from GWAS Array

Slide 30

Joe Mychaleckyj

Location and frequency of CNVs in the genome

Nature. 2006 Nov 23;444(7118):444-54

Joe Mychaleckyj

Slide 31

Medical Resequencing: Next Generation Sequencing (NGS)

Slide 32

Joe Mychaleckyj

Public Reference Human Genome Sequence (2001, 2004) is Haploid and Chimeric

DNA Library 2, Individual 2

DNA Library 1, Individual 1

DNA Library 3, Individual 3

Slide 33

Joe Mychaleckyj

Next Generation Sequencing (NGS) enables Diploid Sequencing of an individual

Positions of variants, SNPS, CNVs etc

Hundreds of Millions of small random sequence ‘reads’

Slide 34

Joe Mychaleckyj

Mapping of Individual Variants (SNPs, CNVs)

N = 1 individual

A

T

A

T

A

T

Shotgun Reads:T

G

G

G

G

G

G

CReference Genome

Slide 35

Joe Mychaleckyj

Mapping of Individual Variants

• Random reads from diploid genome sequencing – Align random shotgun reads from single individual diploid library

& look for high quality mismatches– Find heterozygous positions

• Medical Sequencing (to determine disease risk profile)– Incorporation of sequence and variants in the Medical Record

Slide 36

Joe Mychaleckyj

ABBA00000000

Slide 37

Joe Mychaleckyj

‘Project Jim’

Bio-IT World June 2007

1.3 percent of Watson’s genome did not match the existing reference genome. > 600,000 novel SNPs< 68,000 insertions and deletions compared to the reference sequence, 3bp - 7kbases

Slide 38

Joe Mychaleckyj

NGS of Diploid Genomes

5 Completely Sequenced as of (May 2009):J. Craig VenterJames WatsonYoruban (West Africa, HGVS)Chinese (YH)Korean (SJK May 2009)

Levy et al, PLoS Biology, 2007

Slide 39

Joe Mychaleckyj

Scientific American 2006

Slide 40

Joe Mychaleckyj

Slide 41

Joe Mychaleckyj

2008: Announcement of the $5,000 Genome

Joe Mychaleckyj

Slide 42

Direct-to-Consumer Services

Slide 43

Joe Mychaleckyj

Bio-IT World November 2008

Launch Platform List Cost Counselor

deCODEme Nov-07 Illumina $985 Referrals

23andMe Nov-07 Illumina $399 No

Navigenics Apr-08 Affymetrix $2500+$250 annual sub

On staff

SeqWright Jan-08 Affymetrix $998 No

Slide 44

Joe Mychaleckyj

Slide 45

Joe Mychaleckyj

Rival genetic tests leave buyers confused

Firms that offer to predict your risk of disease give worryingly varied resultsNic Fleming

(September 7, 2008)

Slide 46

Joe Mychaleckyj

Different Companies produce differing assessments of risk• Different genetic variants reviewed and

included – threshold for inclusion• Level of expertise in companies to review

literature• Different statistical models for risk prediction

– no ‘right’ answer• How frequently updated – new findings in

literature

top related