sequencing technologies and applications at jgi feng chen, ph.d. 05/14/2012 mgm workshops
TRANSCRIPT
![Page 1: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/1.jpg)
Sequencing Technologies and Applications at JGI
Feng Chen, Ph.D.05/14/2012
MGM Workshops
![Page 2: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/2.jpg)
Outline
• Overview of sequencing technologies at JGI
• Pacific Biosciences potentials
• Highlights of application development
![Page 3: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/3.jpg)
Staying State of the Art
454 early access
07/2005
Solexa early access
01/2007
454 in production
04/2007
SOLiD early access
10/2007
Solexa in production
07/2008
Megabace offline
08/2007
AB 3730 reduced
12/2007
Illumina GAIIx
454 Titanium
05/2009
454 1K
12/2009
Illumina HiSeq 2000
PacBio
Ion TorrentIllumina MiSeq
ONT
![Page 4: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/4.jpg)
Emerging Sequencing Technologies
Illumina MiSeq(improvement) Illumina HiSeq 2500
Ion Torrent PGM Ion Torrent Proton
![Page 5: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/5.jpg)
Illumina Improvement
• Longer read length (250 bp)• 3-fold more reads (15 M)• Higher throughput (5-7 Gb)• Faster run time
Two run configurations•Fast run config can be done in 27 hours and produce 120 Gb•Standard run config remains the same (600 Gb in 17 days)
![Page 6: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/6.jpg)
Promises from Ion Torrent
![Page 7: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/7.jpg)
Oxford Nanopore Technologies
Long read length: > 50kbHigh output: > 1gb/hr
“Run until…”Cheap: ~$40/gbError rate: < 4%
![Page 8: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/8.jpg)
0
5
10
15
20
25
30
35
40
0
5
10
15
20
25
30
35
40
2009 2010 2011
ABI3730xl Units
Roche/454 Units
GAIIx Units
HiSeq Units
Budget ($ Millions)
Output (Trillions Bases)
Staff (FTE)
{ { {
FY200949 Units24 FTEs
$11M1Tb
FY201022 Units15 FTEs
$11M6Tb
FY201115 Units9 FTEs
$8M29Tb
Evolution of JGI Sequencing Platforms
Budget ($M)
Staff (#)O
utpu
t (Tb)
3730
454GAii
Hiseq
454
454
GAii
GAiiHiseq
![Page 9: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/9.jpg)
Illumina HiSeq
Pacific Biosciences
RS
Illumina MiSeq
Illumina GAIIx
Roche/454FLX-Ti
Units 8 2 2 5 2
Reads 1,400 Million per Flowcell
0.04 Million per SMRT Cell
5 Million per Flowcell
210 Million per Flowcell
1 Million per PTP
Average Readlength
150bp 2,700bp 150bp 150bp 450bp
Total Bases 325 Billion per Flowcell
0.100 Billion per SMRT Cell
2.1 Billion per Flowcell
75 Billion per Flowcell
0.450 Billion per PTP
Run Time 16.5 Days 0.08 Days (2 hours)
1 Day 14 Days 0.3 Days (8 hours)
Applications Primary Sequence
Generator at JGI
de novo, cDNA, 16S ID, validation
16s, Sample QC, R&D
Replaced by HiSeq
16s (replaced by MiSeq)
JGI Current Sequencing Platforms
Major Platforms Supplement Platform
Platforms being Phased-out
![Page 10: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/10.jpg)
STANDARDDNA De Novo and Reseq:Std frag 270bp, 500bp (amplified/ unamp)tight insert 250bp, 500bp (amplified/ unamp)CLIP-PE 4kb, 8kb
Transcriptome Diversity/Counting:RNASeq strandedRNASeq with/without rRNA depletion (Prok and Meta)small RNASeqPET RNASeq (5’ and 3')
Environmental Diversity Profiling:16S Profiling
CUSTOM/R&D:DNA De Novo:CLIP-PE fosmidCLIP-PE 20kb LFPE 4kb, 8kbHaplotype resolved sequencingsingle cells/fragmentsPacbio WGSPacBio amplicon sequencing
Functional Genomics:TSS prokaryotic RNAseqTn insertion site profiling sequencingPacbio FL RNAPacBio methylation sequencingpools of 96 fosmids indexed librariesBisulfite Seqchromatin IPnano RNAseq
Portfolio of Library Capabilities
![Page 11: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/11.jpg)
Outline
• Overview of sequencing technologies at JGI
• Pacific Biosciences potentials
• Highlights of application development
![Page 12: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/12.jpg)
Pacific Biosciences Technology
• Single Molecule– Sequence directly from the molecules in your sample, not
the amplification product
• Real time– Direct observation of natural DNA synthesis in a
continuous and processive manner
• Phospholinked Nucleotides– Fluorescent label is at
gama-phosphate position– Naturally cleaved during
incorporation
![Page 13: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/13.jpg)
Pacific Biosciences Advantages
• Fast run time• Long read length • No amplification biases• Able to measure DNA polymerase kinetics
– Inter-pulse distance– Pulse duration
• Multiple sequencing modes– Standard– Strobe– Circular consensus
• Disadvantages: high error (indel), low throughput
![Page 14: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/14.jpg)
28% GC
73% GC
V3 HiSeq V2 HiSeq V2 GAiix
Less GC Bias Than Newest Illumina Chemistry
![Page 15: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/15.jpg)
PacBio Data Improves Assembly
Least improved genomes (.. but started out in good shape)
Most improved genome: 53 / 71 (75%) gaps closed
11% of gaps were closed incorrectly with either errors in consensus or
misassemblies
![Page 16: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/16.jpg)
Allpaths assembly (illumina only)
Illumina coverage
PacBiocoverage
Great coverage of PacBio in gap region
~100x coverage
PacBio Data Coverage
![Page 17: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/17.jpg)
PacBio Read Length
0
500
1000
1500
2000
2500
3000
3500
Oct-10 Dec-10 Feb-11 Apr-11 Jun-11 Aug-11 Oct-11 Dec-11
Timeline
Rea
d L
eng
th (
bp
)
V 1.1.2 V 1.2
V 1.2.1
V 1.2.2
C2 chemistry
We started from here
Successful upgrades
Laser overpower
Instrument fine tuning
![Page 18: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/18.jpg)
before
after
coverage
annotation 0x
800x
Alignment before and after correction
Transcriptome/FL-cDNA Sequencing
Goals: capture the 5’ and 3’ end of the transcripts and splicing variants
![Page 19: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/19.jpg)
annotated transcript
annotated transcript
annotated transcript
Transcripts hit (73.3%)
Transcripts tiled (38.6%)
Transcripts covered by > 1 subread (36.5%)
Transcriptome Coverage
• 1/3 of the transcripts (1/2 of transcripts hit by this dataset) are covered by at least one single PacBio subread
• There is NO ambiguity if splice variants are detected
![Page 20: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/20.jpg)
Error Correction revealed isoforms
J. MartinZ. Wang
![Page 21: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/21.jpg)
Outline
• Overview of sequencing technologies at JGI
• Pacific Biosciences potentials
• Highlights of application development
![Page 22: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/22.jpg)
Application Development
• Large-insert paired-end sequencing- 3-5 kb, 8-10 kb, and >20 kb insert size- CLIP-PE: developed in-house
• RNA sequencing- 5’ and 3’ end targeted and full-length sequencing- Metatranscriptome sequencing
• 16S rRNA profiling and identification- iTag on Illumina MiSeq and 16S ID on PacBio
•Haplotype-resolved sequencing- Single chromosome sequencing
•Functional genomics:- Gene synthesis
- Large scale gene disruption
![Page 23: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/23.jpg)
16S Tagging on MiSeq
Targeting V4 region in 16S gene (291 nt in length)• Use 3rd-read indexing strategy and custom forward sequencing primer to maximize the use of Illumina’s limited read length• 2x250 bp run to ensure read overlap
16S geneHVR
16S specific primerIllumina adapter 1
Illumina adapter 2 Read1 priming site
Read2 priming site
Barcode priming site
Spacer
V4
Illumina454
![Page 24: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/24.jpg)
Amplicon Modification
96 samples are pooled in one MiSeq runHigh quality sequencing data were obtained from both reads
![Page 25: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/25.jpg)
• MiSeq data largely agrees with 454 PyroTag data• Major differences are in low abundance clusters
Illumina MiSeq Suitable for 16S Tagging
![Page 26: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/26.jpg)
• Random Tn insertion mutagenesis• Cell growth at multiple conditions• High throughput insertion site sequencing• Map insertion sites to reference sequence for functional
annotation
Functional Genomics through “Transposon bombing”
![Page 27: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/27.jpg)
High throughput sequencing revels “essential” genes appear as transposon free regions
0
230
Illumina read depth
Genes
Non-essential genes Non-essential genes
Essential gene:dihydroxy-acid dehydratase
(required for biosynthesis of amino acids)
Transposon insertions
Transposon insertions
Insertion free site
![Page 28: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/28.jpg)
Tn Insertion Reveals Essential Genes
Essential genes 508 (12 %)
Non-essential genes 3,542 (80 %)
Uncertain 362 (8%)
Expected distribution from random insertions
Observed distribution of insertions
Pseudomonas Stutzeri RCH2
![Page 29: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/29.jpg)
Single Chromosome Sequencing
Metaphase chromosomes
Single chromosome in droplet or micro-well
MDA/PCR amplification
MM
MM: micromanipulatorMF: microfluidicsLCM: Laser Capture
Microdissector
LCM
MF
![Page 30: Sequencing Technologies and Applications at JGI Feng Chen, Ph.D. 05/14/2012 MGM Workshops](https://reader033.vdocuments.us/reader033/viewer/2022051516/56649de65503460f94adf45d/html5/thumbnails/30.jpg)
Thank you very much!
Question?