assembly: before and after
DESCRIPTION
A talk I gave at the Dec 2013 Assembly Masterclass at UC Davis. Really licensed under CC0. UPDATED May 2014, for the presentation I gave at the combined SeRC Nordic Assembly Workshop in Stockholm, Sweden, May 14th 2014TRANSCRIPT
![Page 2: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/2.jpg)
A warning
The list is by no means complete
Nor do we have experience with all the programs mentioned
![Page 3: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/3.jpg)
Sample
DNA
Reads
Genome assembly
Sequencing AssemblyDNA isolation
QC QCQC
![Page 4: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/4.jpg)
Reads
Genome
assembly
Assembly
QC
![Page 5: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/5.jpg)
Fastqc
![Page 6: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/6.jpg)
Prinseq
![Page 7: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/7.jpg)
Many others…
www.nipgr.res.in/ngsqctoolkit.html
![Page 8: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/8.jpg)
preqc (sga)
http://arxiv.org/abs/1307.8026
![Page 9: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/9.jpg)
Reads
Genome
assembly
Assembly
Grooming
![Page 10: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/10.jpg)
Format conversion
http://en.wikipedia.org/wiki/FASTQ_format
Fastq format hell
![Page 11: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/11.jpg)
Adapter/quality trimming
http://www.biostars.org/p/53528/
Celera assemblerOverlap based trimming
Fastx ToolkitSeqtkPrinSeqNGS QC ToolkitTrimmomaticBioPiecesCutadapt……
![Page 12: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/12.jpg)
Mate pair splitting and orientation
150 – 600 bases
Illumina paired end reads
2 – 40 kilobases
Illumina mate pair reads
2 – 40 kilobases
454 mate pair reads
linker
![Page 13: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/13.jpg)
Mate pair splitting and orientationIllumina paired end reads
Illumina mate pair reads
454 mate pair reads
linker
junctionjunction
+ +
paired end reads ‘contamination’
![Page 14: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/14.jpg)
Mate pair splitting and orientationIllumina paired end reads
Illumina mate pair reads
454 mate pair reads
linker
junctionjunction
+ +
paired end reads ‘contamination’
Check what orientation your assembler expects
for the reads!
![Page 15: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/15.jpg)
Reads
Genome
assembly
AssemblyPreparing
![Page 16: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/16.jpg)
Error-correctionStand-alone or built into assembler
![Page 17: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/17.jpg)
Merging pairs
List from Torsten Seeman’s bloghttp://thegenomefactory.blogspot.no/2012/11/tools-to-merge-overlapping-paired-end.html
COPE http://sourceforge.net/projects/coperead/SeqPrep https://github.com/jstjohn/SeqPrepFLASH http://www.cbcb.umd.edu/software/flashfastq-join http://code.google.com/p/ea-utils/wiki/FastqJoinPANDAseq https://github.com/neufeld/pandaseqmergePairs.py http://code.google.com/p/standardized-velvet-assembly-report/source/browse/trunk/mergePairs.py
Recent addition
![Page 18: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/18.jpg)
Extend reads
http://140.116.235.124/~tliu/arf-pe/
![Page 19: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/19.jpg)
Digital normalisation
http://arxiv.org/abs/1203.4802
![Page 20: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/20.jpg)
Estimate kmer to use
preqc (SGA)
http://arxiv.org/abs/1307.8026
![Page 21: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/21.jpg)
Reads
Genome
assembly
Assembly
What can the reads tell us about the genome
![Page 22: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/22.jpg)
kmer-based
preqc (SGA)
Kmerspectrumanalyzer
http://arxiv.org/abs/1307.8026
Khmer from Titus
![Page 23: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/23.jpg)
Reads
Genome
assembly
Assembly
This talk
![Page 24: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/24.jpg)
Reads
Genome
assembly
Assembly
QC
![Page 25: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/25.jpg)
Genome assembly
Comparing to each other
Metrics
MergingImprovement
Visualization
Validation
Comparing to reference
![Page 26: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/26.jpg)
Genome assembly
Comparing to each other
Metrics
MergingImprovement
Visualization
Validation
Comparing to reference
![Page 27: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/27.jpg)
Assemblathon stats
http://korflab.ucdavis.edu/datasets/Assemblathon/Assemblathon2/Basic_metrics/assemblathon_stats.pl
OR
https://github.com/lexnederbragt/sequencetools/
![Page 28: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/28.jpg)
Genome assembly
Comparing to each other
Metrics
MergingImprovement
Visualization
Validation
Comparing to reference
![Page 29: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/29.jpg)
Gap closing
IMAGE2
![Page 30: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/30.jpg)
Correcting bases
Quiver from Pacific Biosciences
![Page 31: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/31.jpg)
Separate scaffolding
![Page 32: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/32.jpg)
Genome assembly
Comparing to each other
Metrics
MergingImprovement
Visualization
Validation
Comparing to reference
![Page 33: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/33.jpg)
Assembly merging/reconciliation
![Page 34: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/34.jpg)
Genome assembly
Comparing to each other
Metrics
MergingImprovement
Visualization
Validation
Comparing to reference
![Page 35: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/35.jpg)
Mapped genomic reads
FRCBAM
![Page 36: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/36.jpg)
Mapped transcriptomic reads
![Page 37: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/37.jpg)
Gene finding
![Page 38: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/38.jpg)
Binning
Nederbragt et al, 2010
![Page 39: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/39.jpg)
Genome assembly
Comparing to each other
Metrics
MergingImprovement
Visualization
Validation
Comparing to reference
![Page 40: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/40.jpg)
Genome browser(s)IGV
![Page 41: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/41.jpg)
Genome assembly
Comparing to each other
Metrics
MergingImprovement
Visualization
Validation
Comparing to reference
![Page 42: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/42.jpg)
Comparative measures
Log Average Probability (LAP)
Assembly Likelihood Evaluation (ALE)
See also Howison, Zapata2 and Dunn (2013) Toward a statistically explicit understanding of de novo sequence
assembly doi: 10.1093/bioinformatics/btt525
![Page 43: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/43.jpg)
Genome assembly
Comparing to each other
Metrics
MergingImprovement
Visualization
Validation
Comparing to reference
![Page 44: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/44.jpg)
Reference comparison
Mauve assembly metrics
![Page 45: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/45.jpg)
Review
![Page 46: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/46.jpg)
Too many tools…
http://seqanswers.com/wiki/Software/list
![Page 47: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/47.jpg)
Too many tools…
http://wwwdev.ebi.ac.uk/fg/hts_mappers
88 short-read mappers
![Page 48: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/48.jpg)
Embargo!
![Page 49: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/49.jpg)
Benchmarking, anyone?
![Page 50: Assembly: before and after](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e8544b4c905fc368b4579/html5/thumbnails/50.jpg)
All-in-one assembly pipeline
doi:10.1186/1471-2105-15-126