science wwebinar seriesebinar series dna target … slides...sdfsdfsdf make genomic dna fragment...
Post on 30-Aug-2020
2 Views
Preview:
TRANSCRIPT
Sponsored by:
Participating Experts:
Daniel Turner, Ph.D.Wellcome Trust Sanger Institute,Cambridge, UK
Webinar SeriesWebinar SeriesScienceScienceDNA Target DNA Target 10 June, 200910 June, 2009
Brought to you by the Science/AAAS Business Office
Kelly Frazer, Ph.D.Scripps Genomic MedicineSan Diego, CA
Enrichment StrategiesEnrichment Strategies
www.opengenomics.com/SureSelect
Daniel J TurnerHead of Sequencing Technology Development
Wellcome Trust Sanger Institute
DNA Target EnrichmentStrategies – bringing efficiencies
to genome sequencing
Target enrichment strategies
PCR
on array
in solution
Target enrichment strategies
PCR
on array
in solution
• Design primers that are specific for the region of interest
• Amplify
• Sequence
XR
1,438 samples
57 populations
Population sequencing of ACTN3
The α-actinin-3 deficiency trade-off:
Compared to R577 homozygotes, R557X homozygotes have:
• lower muscle strength and mass
• reduced capacity for rapid energy generation
MacArthur et al. 2007. Nature Genetics 39:1261-1265MacArthur et al. 2008 Hum Mol Genet 17:1076-86
• increased endurance capacity
• increased fatigue recovery
• enhanced muscle metabolic efficiency
Acoustic shearing
96-well library prep
ACTN3 CTSF
25 kb
Quail et al. (2008) Nat. Methods 5, 1005-1010
SPRI bead clean-ups
Custom adapters and barcoded PCR primers
Sequencing Strategy
lanes 1,3,5,7 lanes 2,6,8
Sequencing Strategy
Uniformity of coverage
0
5000
10000
15000
20000
25000
30000
35000
40000
2200 2250 2300 2350 2400 2450 2500 2550 2600 2650 2700 2750 2800 2850 2900 2950 3000 3050 3100 3150 3200 3250 3300 3350 3400 3450
0
5000
10000
15000
20000
25000
30000
11800 11900 12000 12100 12200 12300 12400 12500 12600 12700 12800 12900 13000 13100 13200 13300 13400
Fragment 2
Fragment 8
0
5000
10000
15000
20000
25000
15750 15850 15950 16050 16150 16250 16350 16450 16550 16650 16750 16850 16950 17050 17150 17250 17350 17450
Fragment 10
• Uniformity is governed by the accuracy of pooling
• 80% with a coverage within 2-fold range of the median
• 99.9% accuracy for genotype calling
• 63 high-confidence SNPs identified, 27 of them novel and 23 rare.
• Analysis of non-European HapMap samples and HGDP samples ongoing.
Results
Target enrichment strategies
PCR
on array
in solution
• Limit of 5–20 kb per PCR
• Difficult to multiplex, optimise and normalise
• Uses a lot of DNA
• Expensive if multiplexing
• But very effective
Target enrichment strategies
PCR
on array
in solution
• Hybridise sample DNA to target-specific probes on a microarray
• Wash to remove background
• Elute
• Sequence
Target enrichment strategies
PCR
on array
in solution
• Hybridise sample DNA to target-specific probes in solution
• Capture probe / target
• Wash to remove background
• Elute
• Sequence
gDNA Fragmentation
Target size: 100-300bpTarget size: 100-400bp
• Shorter fragments hybridize more efficiently
• Optimized settings give tighter distribution of fragment sizes
Library purification
SPRI beads: easily automated
allow elution in a wider variety of buffers
PCR and GC bias
Without PCR prior to hybridization
a. b.10 30 40 50 60
GC content (%)
0
80
60
40
20
0 20 10040 60 80
Percentile of unique sequence ordered by GC content
0 20 100806040
Percentile of unique sequence ordered by GC content
10 30 40 50 60 70
GC content (%)
0
80
60
40
20
With PCR prior to hybridization
• Completeness: % of target bases covered by >= 1 sequence read
• Specificity: % of sequences mapping to target regions
• Uniformity: variation in coverage
Evaluation parameters
Completeness
On array ~ 98.6% of targeted bases
In solution ~ 99.5% of targeted bases
PCR =< 100% of targeted bases
Specificity
On array up to 70% on target
In solution up to 80% on target
PCR up to 100% on target
On array 90% of CTR at 30x
In solution 95% of CTR at 30x
90%
95%
100%
0 10 20 30Coverage (-fold)
% o
f CTR
bas
es 14M7.5M6.8M6.5M6.2M5.8MArray 6.5M
Sequence uniformity
%GC vs %Coverage
Target enrichment strategies
PCR
on array
in solution
• enables large-scale projects, which would not be realistic with PCR
• Not easily scalable
• Requires expensive hardware
Target enrichment strategies
PCR
on array
in solution
• enables large-scale projects, which would not be realistic with PCR
• Simple & relatively rapid to perform
• Scalable & easily automated
• Uses least DNA
• Requires expensivehardware
• No whole exome set available commercially
AcknowledgementsLira MamanovaCarol Scott
Iwanka KozarewaDaniel MacArthurChris Tyler-SmithQasim AyubLiz Huckle
Alison CoffeyEleanor HowardAarno Palotie
Wellcome Trust Sanger Institute
Emily LeProustFred Ernani
Agilent Technologies
Tom AlbertHeike FieglerGreg McGuiness
Nimblegen
Sponsored by:
Participating Experts:
Daniel Turner, Ph.D.Wellcome Trust Sanger Institute,Cambridge, UK
Webinar SeriesWebinar SeriesScienceScienceDNA Target DNA Target 10 June, 200910 June, 2009
Brought to you by the Science/AAAS Business Office
Kelly Frazer, Ph.D.Scripps Genomic MedicineSan Diego, CA
Enrichment StrategiesEnrichment Strategies
www.opengenomics.com/SureSelect
Enrichment of sequencing targets from the human genome
Kelly A Frazer, PhDDirector, Genomic BiologyScripps Genomic Medicine
June 10, 2009
genomic DNA
select regions
What is targeted sequencing?
Define sequence targets
Target enriched samples
Sequence
Next‐Gen Sequencing
• Low costs for generating raw, per nucleotide sequence, ($0.00001 per base).
• Best suited for generating large amounts of raw sequence data per sample, (109nucleotides per day).
Still too costly and too low through‐put to perform whole‐genome sequencing for on many different DNA samples
Why perform targeted sequencing?
To efficiently use current technologies for population‐based sequencing studies, it is necessary to enrich for specific loci in the human genome.
Population Sequence Studies
• Sequence‐based association studies
Healthy elderly cohort versus individuals with age‐related diseases
• Functional annotation of genomic intervals
9p21 interval associated with CAD and T2D
• PCR – enriches target sequences with high specificity but difficult to scale
• Hybridization based methods – long oligonucleotides in solution allow for efficient capture of ~3.5 Mb of sequence targets
• Microdroplet PCR – encapsulation of PCR reactions allows for simultaneous amplification of ~4,000 targeted elements
Sample enrichment methods
Important parameters • Efficiency of assay design
– The fraction of targeted base pairs for which an assay can be designed
• Specificity of target enrichment– The fraction of high quality reads that map directly on the targeted sequences
• Coverage uniformity across targeted sequences– If coverage differs greatly then one has to sequence deeply to adequately cover underrepresented bases
• Reproducibility across technical replicates & samples
• Systematic allelic biases resulting in drop‐out effects– Errors of this nature result in high rates of incorrectly called heterozygous variant sites
Target Enrichment by Solution Hybridization
sdfsdfsdf Make Genomic DNA Fragment Libraries
Agilent Microarray
‐ synthesis 120‐mer oligonucleotides
‐ convert to biotinylated RNA capture probes
‐ hybridization with DNA
‐ capture and wash
‐ elution and PCR amplify
‐ sequence targeted sequences
3.6 Mb of Targeted Sequences
• 624 genes– 9,215 exons
– 4,886 evolutionarily conserved sequences (ECS)
– total 3.2 Mb of sequence
• 3 Contiguous Regions– 9p21: 196 kb
– APOE: 100 kb
– 8q24.21: 125 kb
Probe design efficiency
(a)
(b)
genes
Repeat Mask
Probes
Chr9
CDKN2ACDKN2BAS
CDKN2BC9orf53
21950000 21960000 21970000 21980000 21990000 22000000 22010000 22020000 22030000 22040000 22050000 22060000 22070000 22080000 22090000 22100000 22110000 22120000 22130000
CDKN2A
CDKN2BASCDKN2B
21960000 21965000 21970000 21975000 21980000 21985000 21990000 21995000 22000000
FOXO1 gene
Repeat Mask
ECS Block
Probes
Chr13
ECS Signal
• 622 genes – CDS (97%) UTR (88%) ECS (86%)
• Three genomic intervals – 37% to 55%
Specificity of target enrichment
38.6% map directly on target47.8% map on or near target (+/‐ 150 bp)
Percent of base pairs corresponding to filtered reads
Coverage uniformity across targeted sequences
Normalized coverage – divided the observed coverage of each base by the mean coverage of all targeted bases
88.4% of all bases fell within ¼ to 4 times the mean coverage
98.3% of all bases covered by at least one read
Reproducibility of coverage
Technical replicates r2 ~0.95
Variant calling accuracycomparison to microarray genotypes
~ 4,100 SNPs
QS >= 30 detection rate = 93% concordance rate = 99.3%
No systematic allelic biases
Solution hybridization‐based method is well suited for the enrichment of loci in the mega‐base‐pair scale from the human genome for population sequence studies
Microdroplet PCR Workflow
Primer library – up to 4000 different elements
Fragmented genomic DNA template
Primer design efficiency
• 47 genes – 435 exons– 29 from ENCODE intervals
– 8 TRP channel superfamily
– 11 deep venous thrombosis
• 457 amplicons of varying sizes (119‐956 bp) and GC content (33‐74%)
Successfully design PCR assays for all exons
Specificity of target enrichment
• 78% of filtered reads successfully mapped to a targeted amplicon
• Off target reads aligned across genome in a random fashion ‐ suggesting that background sequence is due to non‐specific genomic DNA carryover rather then from off‐target amplification
Coverage uniformity across targeted sequences
Normalized coverage – divided the observed coverage of each base by the mean coverage of all targeted bases
89.6% of all bases fell within ¼ to 4 times the mean coverage
99.6% of all bases covered by at least one read
Only one ampliconcompletely failed
Reproducibility of coverage
Sample to sample r2 ~0.96
Variant calling accuracycomparison to microarray genotypes
~ 450 SNPs
QS >= 30 detection rate = 97.6% concordance rate = 99.1%
Accuracy was similar in ENCODE versus non‐ENCODE interval variants and between samples of African and European ancestry indicating that allelic biases are mimimal
The microdroplet PCR process is extremely efficient with almost 100% of all primer pairs successful. The data generated is well suited for performing population‐based sequence studies.
Selecting a method
• Study design– Known functional elements or entire intervals
– Total amount of targeted sequences
– Number of samples
• Sequencing Technology
AcknowledgementsSTSI/Scripps Genomic Medicine
Ryan Tewhey
Kazu Nakano
Wendy Wang
Sarah Murray
Olivier Harismendy
Eric Topol
Sponsored by:
Participating Experts:
Daniel Turner, Ph.D.Wellcome Trust Sanger Institute,Cambridge, UK
Webinar SeriesWebinar SeriesScienceScienceDNA Target DNA Target 10 June, 200910 June, 2009
Brought to you by the Science/AAAS Business Office
Kelly Frazer, Ph.D.Scripps Genomic MedicineSan Diego, CA
Enrichment StrategiesEnrichment Strategies
www.opengenomics.com/SureSelect
Look out for more webinars in the series at:
www.sciencemag.org/webinar
For related information on this webinar topic, go to:
www.opengenomics.com/SureSelect
To provide feedback on this webinar, please e‐mail
your comments to webinar@aaas.org
Sponsored by:
Webinar SeriesWebinar SeriesScienceScienceDNA Target DNA Target 10 June, 200910 June, 2009
Brought to you by the Science/AAAS Business Office
Enrichment StrategiesEnrichment Strategies
top related