analyzing copy number variation in the human genome
DESCRIPTION
Jeff Bailey S5-432. Analyzing Copy Number Variation in the Human Genome. Continuum of Genomic Variation. Forms of genetic variation. Nucleotide. Single base-pair changes Point mutations (1 per 800 bp) Small insertions/deletions Frameshift, microsatellite, minisatellite Mobile elements - PowerPoint PPT PresentationTRANSCRIPT
Analyzing Copy Number Variationin the Human Genome
Jeff BaileyS5-432
Forms of genetic variation.
Cytogenetics
Nucleotide
Continuum of Genomic VariationSingle base-pair changes
Point mutations (1 per 800 bp)Small insertions/deletions
Frameshift, microsatellite, minisatellite Mobile elements
Retroelement insertions (300bp -10 kb)Large-scale genomic copy number variation (>10 kb)
Large-scale DeletionsSegmental Duplications
Local Rearangements
Chromosomal variation
Translocation, inversion, fusion
Stru
ctural V
ariants (S
V)
Co
py N
um
ber V
ariation
(blue line)
>green
>red
Gain
Loss
Gain
METHOD 1: Copy Number Variation:Array Comparative Genomic Hybridization
30% CNVs overlap duplicated regions (variant SD = CNV) ( Sebat et al. Science 2004)
Modified:Feuk et al. Nat Rev Genet 2006
Two genomic surveys of normal individuals identified 76 and 255 CNV regions by array CGH ( Sebat et al. Science 2004; Iafrate et al. Nat Genet 2004)
Segmental Duplications (SD)
Bailey and Eichler (2006) Nat Rev Genet
Properties:•Clustered•Complex regions
99.1% identical over 180 kb (VCF/DiGeorge Syndrome in 1 in 3000 births)
5.4% of the genome (>90% identity and >1 kb)chr22
SDs predispose to copy number variation
Cen TelI
D D’
CenI D’D
Tel
Tel
Cen
Cen
GAMETES
D D’I I
Change in Dosage Sensitive Genes → phenotype or disease
Dynamic Regions – predisposed to further rearrangements
Non-allelic Homologous Recombination (Lupski, 1999)
D’- D
D - D’
Complex disease associations
CNV Disease Association
CCL3L1 Decreased copies cause HIV/AIDS susceptibility (Gonzalez et al. 2005). Increased copies increase risk of rheumatoid arthritis.(Mckinney et al. 2008)
FCGR3B Decreased copies increases risk for lupus nephritis (Aitman et al. 2006)
APP Duplication leading to (Rovelet,Lecrux et al. 2006)UGTB17 Deletion associated with 2-fold increased risk of osteoporosis (Yang et al. 2008)
Synuclein Triplication causes Parkinson Disease (Singleton et al 2003)
DEFB4 More than 5 copies of beta-defensins associated with 1.7-fold increased risk of psoriasis (Hollox et al. 2008). Less than 4 copies is associated with 3-fold increased risk for Crohn disease.(Fellermann et al. 2006)
LCE3B & LCE3C
Multigene deletion of late cornified envelope genes are associated with psoriasis (de Cid, et al. 2009)
1) Recurrent germline rearrangements causing congenital disease2) Rare CNVs causing disease in a small proportion of affected individuals
in a Mendelian fashion3) Common CNVs that are responsible for a proportion of complex genetic risk in many individuals
< 32 kb Putative Insertionwithin fosmid
>48 kb Putative Deletionwithinfosmid
Method 2: End-Sequence Pair (ESP) Analysis
~1.1 million fosmid end-sequence pairs derived from a single donor (sequenced by MIT to help close gaps in the reference genome)
InversionsInsertion Deletion
Dataset: 1,122,408 fosmid pairs preprocessed (15.5X genome coverage) 639,204 fosmid pairs BEST pairs (8.8X genome coverage)
Results:
Fosmid insert size tightly distributed around mean (40 kb)
Compare fosmid optimal placements to detect deviations from expected.
fosmid
insert
ConcordantFosmid:
ReferenceGenome
Tuzun*, Bailey*, Sharp* et al. Nat. Genet 2005
Fosmid SV Project Fosmid End Sequencing 8 HapMap Individuals
1695 structural variants 525 novel insertion sequences
(Kidd et al. 2008 453:56)NAHR-non-allelic homologous recombination NHEJ-- repair of double strand breaksVNTR-- strand slippageRetrotransposition-- insertion of L1, SVA or Alu element
Method 3: Whole Genome Sequencing
Genome Resequencing Studies SNPs: 3,2 M bases Non-SNP: 9.1 M bases
22% events, 74% variant bases
(Levy et al Plos Biol 2007:e266) Read Depth, Mismapping Pairs Future: Perfect Whole Genome Assembly
Summary of Human Genome Copy Number Variation (12/2006)
Summary of recent analyses of structural variation in the human genome (12/06).Reference Analysis # Individuals # Events Av. Bp Median (bp) Total Mbp
Mills, 2006 Align trace data 36 415434 20 2 8.36Hinds, 2006 Oligo arrayCGH 1000 1379 947 0.14McCarrol, 2006 HapMap SNP genotyping 269 538 16874 6887 9.08Conrad, 2006 HapMap SNP genotyping 180* 609 34996 17217 21.31Tuzun, 2005 Paired End-sequence 1 269 55706 25230 14.98Redon, 2006 Affyx 500 K data 269 980 165996 63140 162.68Iafrate, 2004 BAC Array-CGH 55** 246 146189 150395 35.96Sharp, 2006 BAC Array-CGH 47 124 170019 164704 21.08Wong, 2006 BAC Array-CGH 105 1365*** 185504 175314 253.21Sebat, 2004 ROMA-CGH 20 72 350670 199800 25.25Redon, 2006 BAC Array-CGH 269 913 349880 227889 319.44All Vars NA NA 323573 1901 2 615.10All Vars > 1 kb NA NA 4131 148578 93356 613.77
*- effectively independent individuals equal to number of trios
** - 39 healthy controls, 16 with karyotype abnormalities
*** - accounting for only those sites that showed in 2 or more individuals
20% of the human genome is CNV? 3000+ genes with exons in these regions CNV?
(Currently 30% of genome and 9473 genes)
How many genes are truly CNV?
Lack of Breakpoint Precision? BACs: 150-250 kb clones of which
only a part of the sequence may be CNV
False positives? Multiple studies: Increase
the proportion of false positives since true positivestend to overlap
BAC
geneCNV
Study#1#2#3
TP FP
Design of Custom oligonucleotide aCGH
1 3
Select genomic regions to target for probe design
Select oligonucleotide probe sequences (average 12/exon) and place on microarray
Merge overlapping regions
2
•Equal number of probes per exon (exon size 3 bp – 10 Kb).•Limitation: NimbleGen algorithm creates equally spaced probes across a region.
Bailey et al. Cytogenet Genome Res 2008
+1.1 SD +1.4 SD
Step #1: Seed
Mean intensity
difference-0.2 SD +1.1 SD +1.4 SD +0.6 SD +1.2 SD
Detection Method
Exon 1
Exon 2 Exon 3 Exon 4 Exon 5
4-exon Partial-gene CNV
Log2 probeintensity
Probe Regions
ExonStructure
Hybridization
Step #2: Extension
+0.6 SD +1.2 SD-0.2 SD
Bailey et al. Cytogenet Genome Res 2008
CNV in RHD
Gene Model
GM18507
GM18517
GM18956
GM19129
GM12156
GM18502
GM19240
ExonsProbe Regions
GM12878
GM18555
SegmentalDuplications
Chr 1 (kb)2525,350 25,370 25,390
Bailey et al. Cytogenet Genome Res 2008
Detecting >500 bp and >5% freq
Conrad, et al. 2009 Nature
8,599 CNV regions: 3.7% of genome (112.7 Mb)2 genomes: 1,098 CNVs 0.78% (24 Mb)
Causal CNVs
Conrad, et al. 2009 Nature
Infectious Disease Genetics
Complex interplay that results in infectious disease phenotype Potential host defense responses and pathogen virulence are encode
in respective genomes.
SD and CNV represent key mechanisms for adaptation and diversification of responses for both host and pathogen.
The study of SD and CNV is necessary to fully understand the genetics and biology of infectious disease pathogenesis.
Human Genome
Pathogen Genome
Environment Vector Genome
Human CNV typing and association studies
Comprehensive CNV Typing Chip (1st generation)
Collaboration with the Eichler Lab Preferentially targeting gene CNVs
(5,000 CNVs → 1000 genic regions → 30% host defense) Agilent and NimbleGen oligoarray platforms
Defining copy number responsive probes Defining copy specific probes to remove cross-
hybridization Case-control studies to examine infectious disease
and immune phenotypes for association with CNVs
Human Malaria
Malaria: 2-3 million deaths per year “strongest known force for evolutionary selection in
the recent history of the human genome” (Kwitkowski 2005 Am J Hum Genet)
HbS, HbC, HbE, thalassemia, ABO, Duffy null, SE Asian ovalocytosis, IL-4, CR1, HLA-DRB ...
Hypothesis: Strong selection will have impacted CNVs
Testing case-control samples for CNV associations with resistance to infection and cerebral malaria.