Final ResultsGenome Assembly Team
Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington, Juliette Zerick
454 raw reads
PRE-PROCESSING
Illumina raw reads
Pre-processing
454 reads
Illumina reads
Statistical analysis
Read stats
Published Genomes from public databases
V. vulnificus
YJ016
V. vulnificus CMCP6
V. vulnificus MO6-24/O
Align Illumina against the reference
FastqcPrinseqNGS QC
Compare mapping statistics
Reference genome
samstats
bwa
REFERENCE SELECTION
Hybrid DeNovo • Ray• MIRA
Illumina/ 454/ Hybrid DeNovo assembly
454 DeNovo• Newbler• CABOG• SUTTA
Illumina DeNovo• Allpaths LG• SOAP DeNovo• Velvet• Taipan• SUTTA
contigs * 3
Align illumina reads against 454 contigs
Unmapped reads
Mac vectorCLC wb
contigs
Unmapped reads
Evaluation
GAGEHawk-eye
Illumina/(454?) reference based
assembly
AMOScmp
contigs
Unmapped reads
DENOVO ASSEMBLY
REFERENCE BASED ASSEMBLY
Draft/ Finished genome
Reference evaluation
Reference evaluation
DNA DiffMUMmer
Parameter optimization
CONTIG MERGING
All possible combinations of the
best 3
MimimusMAIA
PAGITMauve
Finished genomeScaffolds
GAGE
GENOME FINISHING
Gap filling Nulceotide identity
MUMmer
GRASSBuilt-in
Process
454
Illumina
Info.
Chosen Ref.
Assemblers
Assemblers
Illumina454
LEGEND
hybrid
Original Pipeline
Read Visualization – spot the differences
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Comparison of 454 Reads for 08-2462 (low coverage) and 2541-90 (improved coverage)
Read Visualization - more is better!
Nav 08-2462 454 reads compared to Nav 08-2462 Illumina reads.
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Read Visualization – cousins or siblings?
Nav_2541-90 and Vul_06-2432 (454 and Illumina reads) coverage comparison.
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
V. navarensis (454; non-preprocessed|pre-processed)Metric 2423-01 08-2462 2541-90 2756-81
Per Base Seq. Quality
Per Seq. Quality Sc
Per Base Seq. Content
Per Base GC Content
Per Seq. GC Content
Per Base N Content
Seq. Length Dist.
Seq. Dup. Levels
Overrepresented Seqs.
Kmer Content
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
V. Vulnificus (454; non-preprocessed|preprocessed)
Metric
Metric 2009V_1368
06-2432 08-2435 08-2439 07-2444
Per Base Seq. Quality
Per Seq. Quality Score
Per Base Seq. Content
Per Base GC Content
Per Seq. GC Content
Per Base N Content
Seq. Length Dist.
Seq. Dup. Levels
Overrepresented Seqs.
Kmer Content
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
V. navarensis (Illumina; non-preprocessed|preprocessed)
Metric 2423-01 08-2462 2541-90 2756-81
Per Base Seq. Quality
Per Seq. Quality Score
Per Base Seq. Content
Per Base GC Content
Per Seq. GC Content
Per Base N Content
Seq. Length Dist.
Seq. Dup. Levels
Overrepresented Seqs.
Kmer Content
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
V. vulnificus (Illumina; non-preprocessed|preprocessed)Metric 2009V_1368 06-2432 08-2435 08-2439 07-2444
Per Base Seq. Quality
Per Seq. Quality Score
Per Base Seq. Content
Per Base GC Content
Per Seq. GC Content
Per Base N Content
Seq. Length Dist.
Seq. Dup. Levels
Overrepresented Seqs.
Kmer Content
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
ARE – Assembly Score
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Reference-guided vs de-Novo assembly
AMOSC
mp
Newbl
er (r
ef)
CABOG
Newbl
er (d
n)
SOAP
dn
Velvet
Ray0
10
20
30
40
50
60
70
80
90
454 (Vul_06-2432)454 (Nav_2541-90)Illumina (Vul_06-2432)Illumina (Nav_2541-90)
ARE
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Summary of Reference-guided assembly Using V. vulnificus (CMCP6) reference strain
84% coverage De-Novo assemblers overall provided higher assembly score
than reference based assembly
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
40 50 1000
102030405060708090
100Newbler (denovo)
Nav_2541-90Vul_06-2432
K-MER SIZE
AR
EDe Novo Assembly
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
De Novo Assembly
15 22 2505
101520253035404550
CABOG
Nav_2541-90Vul_06-2432
K-MER Size
AR
E
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
De Novo Assembly
20 30 40 50 60 700
0.51
1.52
2.53
3.54
SOAPdenovo
Nav_2541-90Vul_06-2432
K-MER Size
AR
E
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
De Novo Assembly
19 25 310
1
2
3
4
5
6
7
Velvet
Nav_2541-90Vul_06-2432
K-MER Size
AR
E
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
De-Novo Assembler Comparison (Optimal Parameters)
CABOG Newbler (dn)
Ray Ray (hybrid)
SOAPdn Velvet0
10
20
30
40
50
60
70
80
90
100
454 (Vul_06-2432)Illumina (Vul_06-2432)454 (Nav_2541-90)Illumina (Nav_2541-90)Hybrid (Vul_06-2432)Hybrid (Nav_2541-90)
ARE
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Final Results – V. vulnificus
Assem
bly
S
co
re
Velvet
Graph comparing assemblers on 3 criteria: Assembly Score, Span Ratio, 1/(Break Points). Higher score for all criteria are preferable. Newbler (dn) has been removed to show variance in other tools.
Span Ratio
CABOG
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Final Results – V. vulnificus
Graph comparing assemblers on 3 criteria: Assembly Score, Span Ratio, 1/(Break Points). Higher score for all criteria are preferable.
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
1000/(Break Points)
Summary of de-Novo results OLC assemblers showed considerable differences in ARE than
de-Brujin based assemblers Cabog/Newbler vs Soap de-Novo/Velvet
Hybrid assembler, Ray, did not perform as well in terms of assembly score
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Merging-Vul_06-2432AMOScmp CABOG Newbler
(dn;454)Newbler (ref;454)
Newbler ref ill
Ray (454) Ray(Ill) Ray (hybrid)
SOAPdn Velvet
AMOScmp164.00 234.69 6.35 4.69 63.51 55.13 64.51 44.38 67.22
CABOG164.00 225.12 101.30 62.66 73.23 93.88 98.11 75.98 113.08
Newbler (dn;454) 234.69 221.89 5.48 ND 311.98 ND 419.76 104.46 127.01
Newbler (ref;454) 6.35 99.30 5.48 1.44 67.72 64.99 72.79 35.07 72.34
Newbler (ref;Illumina) 4.69 62.66 ND 1.44 35.28 ND ND ND ND
Ray (454)63.50 72.56 311.99 67.72 35.28 33.81 49.94 22.92 37.68
Ray (Illumina) 55.13 93.88 ND 64.99 ND 33.81 ND ND ND
Ray (hybrid)64.51 97.17 419.76 72.79 ND 49.94 ND ND ND
SOAPdn44.38 75.98 104.46 35.07 ND 22.92 ND ND ND
Velvet67.22 113.08 127.01 72.34 ND 37.68 ND ND ND
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Merging-Nav_2541-90AMOScmp Cabog Newblerdn Newbler
(ref;454)Newbler (ref;Illumina)
Ray (454) Ray (Illumina)
Ray (hybrid)
SOAPdn Velvet
AMOScmp133.95 ND 0.03 0.03 15.26 14.00 15.77 11.23 45.32
Cabog133.95 ND 107.60 114.60 82.62 92.44 92.53 80.73 123.02
NewblerdnND ND ND ND 54.21 59.81 60.47 33.17 94.89
Newbler (ref;454) 0.03 107.60 59.94 0.11 11.6 11.78 11.86 10.17 39.2
Newbler (ref;Illumina)
0.03 114.60 ND 0.28 12.66 12.15 12.41 9.6 39.60
Ray (454)15.26 82.62 54.21 11.60 12.66 59.19 76.36 13.65 63.75
Ray (Illumina) 14.01 92.44 59.81 11.78 12.15 33.79 24.21 11.54 39.84
Ray (hybrid)15.77 92.53 60.47 11.86 12.41 40.33 36.79 14.06 ND
SOAPdenovo 11.22 80.73 33.17 10.04 9.54 13.61 11.40 13.91 8.47
Velvet45.32 123.02 94.89 39.20 39.84 64.54 39.84 ND 8.31
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Assembler ReviewAssembler Status 454 Illumina Hybrid Algorithm
Allpaths LG Paired-end only DBG
AMOScmp BB
CABOG OLC
MIRA ZEBRA
Newbler OLC
Ray DBG
SOAPdenovo DBG
SUTTA Unresolved errors BB
Velvet DBG
BB = branch-and-bound; OLC = overlap consensus; DBG = de Bruijn Graph; ZEBRA
Mira worked as good as our merged contigs but it is impractical – 40hr run time
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
454 raw reads
PRE-PROCESSING
Illumina raw reads
Pre-processing
454 reads Illumina reads
Statistical analysis
Read stats
FastqcPrinseq
Hybrid DeNovo • Ray• Mira
Illumina/ 454/ Hybrid DeNovo assembly
454 DeNovo• Newbler• CABOG
Illumina DeNovo• Velvet
contigs
Align illumina reads against 454 contigs
contigs
DENOVO ASSEMBLY
CONTIG MERGING
Merge Ray –hyb/ Newbler Merge CABOG/Velvet
MIRA-hyb
Mimimus
Draft genome
Process
454
Illumina
Info.
Assemblers
Assemblers
Illumina
454
LEGEND
hybrid
Final Pipeline
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Splinter
Pipeline 1 Pipeline 2
NUM AVG N50Assembly Size
Assembly Score
Nav_2423-01 106 42657.2 156064 4.52 136.53Nav_08-2462 149 25736.8 51230 3.83 19.48Nav_2541-90 166 26172.5 130386 4.34 62.57Nav_2756-81 107 42939.4 131591 4.59 122.31Vul_2009v-1368 83 57787.2 401973 4.80 345.03Vul_06-2432 57 85122.7 322525 4.85 419.76Vul_08-2435 111 42872.9 230373 4.76 144.01Vul_08-2439 98 50885.7 250789 4.99 210.94Vul_07-2444 70 73255.1 492706 5.13 656.10
NUM AVG N50Assembly Size
Assembly Score
Nav_2423-01 125 35357.0 164305 4.42 111.36Nav_08-2462 451 311.9 2253 0.14 0.09Nav_2541-90 106 40547.5 169781 4.30 123.02Nav_2756-81 111 41840.8 132119 4.64 124.55Vul_2009v-1368 97 49705.8 228408 4.82 170.81Vul_06-2432 167 28489.7 78353 4.76 32.53Vul_08-2435 193 24903.7 204178 4.85 75.19Vul_08-2439 114 44047.9 180889 5.02 134.64Vul_07-2444 143 35905.1 130942 5.13 85.93
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Visualization
Merged
Newbler Ray Hybrid
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results