genome assembly: then and now — v1.1
DESCRIPTION
This was a talk given on 2014-06-19 for the Genome Center’s Bioinformatics Core as part of a 1 week workshop on using Galaxy. It concerns the Assemblathon projects as well as other aspects relating to genome assembly. A version of this talk is also available on Slideshare with embedded notes. Note, this is an evolving talk. There are older and newer versions of the talk also available on slideshare.TRANSCRIPT
![Page 1: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/1.jpg)
Genome assembly: then and nowKeith Bradnam
Image from Wellcome Trust
![Page 2: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/2.jpg)
Image from flickr.com/photos/dougitdesign/5613967601/
Contents
Sequencing 101!! Genome assembly: then!! Genome assembly: now
Assemblathon 1 & 2!! Advice & Angst!! The future
![Page 3: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/3.jpg)
More info
✤ http://assemblathon.org!
!
✤ http://gigasciencejournal.com!
!
✤ http://twitter.com/assemblathon
![Page 4: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/4.jpg)
Sequencing 101A, C, G, T...
Image from nlm.nih.gov
![Page 5: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/5.jpg)
Read
![Page 6: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/6.jpg)
Read pair
![Page 7: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/7.jpg)
Read pair
Mate pair
![Page 8: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/8.jpg)
![Page 9: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/9.jpg)
Contigs
![Page 10: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/10.jpg)
![Page 11: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/11.jpg)
![Page 12: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/12.jpg)
ScaffoldNNNNNNNNNNNNNNNNNNN
![Page 13: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/13.jpg)
Assembly size
NNNNNNNNNNNNNNNNNNN
NNNNNNNNNNN
NNNNNNNNNNN
70 25
20
10
10
5
55
15
15
15
5
![Page 14: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/14.jpg)
Assembly size
NNNNNNNNNNNNNNNNNNN
NNNNNNNNNNN
NNNNNNNNNNN
70 25
20
10
10
5
55
200 Mbp
15
15
15
5
![Page 15: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/15.jpg)
N50 length
NNNNNNNNNNNNNNNNNNN
NNNNNNNNNNN
NNNNNNNNNNN
70 25
20
10
10
5
55
200 Mbp
15
15
15
5
![Page 16: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/16.jpg)
N50 length
NNNNNNNNNNNNNNNNNNN
NNNNNNNNNNN
NNNNNNNNNNN
70 25
20
10
10
5
55
200 Mbp
15
15
15
5
![Page 17: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/17.jpg)
N50 length
NNNNNNNNNNNNNNNNNNN
NNNNNNNNNNN
NNNNNNNNNNN
70 25
20
10
10
5
55
200 Mbp
15
15
15
5
70
![Page 18: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/18.jpg)
N50 length
NNNNNNNNNNNNNNNNNNN
NNNNNNNNNNN
NNNNNNNNNNN
70 25
20
10
10
5
55
15
15
15
5
200 Mbp
95
![Page 19: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/19.jpg)
N50 length
NNNNNNNNNNNNNNNNNNN
NNNNNNNNNNN
NNNNNNNNNNN
70 25
20
10
10
5
55
15
15
15
5
200 Mbp
95
![Page 20: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/20.jpg)
N50 length
NNNNNNNNNNNNNNNNNNN
NNNNNNNNNNN
NNNNNNNNNNN
70 25
20
10
10
5
55
15
15
15
5
200 Mbp
115
![Page 21: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/21.jpg)
N50 length
NNNNNNNNNNNNNNNNNNN
NNNNNNNNNNN
NNNNNNNNNNN
70 25
20
10
10
5
55
15
15
15
5
200 Mbp
115
![Page 22: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/22.jpg)
N50 length
NNNNNNNNNNNNNNNNNNN
NNNNNNNNNNN
NNNNNNNNNNN
70 25
20
10
10
5
55
15
15
15
5
200 Mbp
![Page 23: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/23.jpg)
N50 length
NNNNNNNNNNNNNNNNNNN
NNNNNNNNNNN
NNNNNNNNNNN
70 25
20
10
10
5
5
15
15
15
5
5
![Page 24: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/24.jpg)
N50 length
NNNNNNNNNNNNNNNNNNN
NNNNNNNNNNN
NNNNNNNNNNN
70 25
20
10
10
5
5
15
15
15
5
5
![Page 25: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/25.jpg)
N50 length
NNNNNNNNNNNNNNNNNNN
NNNNNNNNNNN
NNNNNNNNNNN
70 25
20
10
10
5
5
15
15
15
![Page 26: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/26.jpg)
N50 length
NNNNNNNNNNNNNNNNNNN
NNNNNNNNNNN
NNNNNNNNNNN
70 25
20
10
10
5
5
15
15
15
![Page 27: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/27.jpg)
N50 length
NNNNNNNNNNNNNNNNNNN
NNNNNNNNNNN
NNNNNNNNNNN
70 25
20
10
10
5
5
15
15
15
190 Mbp
![Page 28: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/28.jpg)
N50 length
NNNNNNNNNNNNNNNNNNN
NNNNNNNNNNN
NNNNNNNNNNN
70 25
20
10
10
5
5
15
15
15
190 Mbp
![Page 29: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/29.jpg)
N50 for two assemblies
![Page 30: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/30.jpg)
N50 for two assemblies
208 Mbp 190 Mbp
![Page 31: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/31.jpg)
N50 for two assemblies
208 Mbp 190 Mbp
N50 = 15 Mbp N50 = 25 Mbp
![Page 32: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/32.jpg)
NG50 for two assemblies
208 Mbp 190 Mbp
![Page 33: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/33.jpg)
NG50 for two assemblies
![Page 34: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/34.jpg)
NG50 for two assemblies
Expected genome size = 250 Mbp
![Page 35: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/35.jpg)
Expected genome size = 250 Mbp
NG50 for two assemblies
![Page 36: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/36.jpg)
NG50 = 15 Mbp NG50 = 15 Mbp
Expected genome size = 250 Mbp
NG50 for two assemblies
![Page 37: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/37.jpg)
You should check that high N50 values!are not simply due to lots of Ns in the scaffolds!
![Page 38: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/38.jpg)
Assembly 'x'
![Page 39: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/39.jpg)
Assembly 'x'
Size: 859 Mbp!!
Number of scaffolds: 28!!
N50 = 70.3 Mbp
![Page 40: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/40.jpg)
Assembly 'x'
Size: 859 Mbp!!
Number of scaffolds: 28!!
N50 = 70.3 Mbp
Ns = 90.6% !!!
![Page 41: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/41.jpg)
Assembly 'x'
Size: 859 Mbp!!
Number of scaffolds: 28!!
N50 = 70.3 Mbp
Ns = 90.6% !!!
![Page 42: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/42.jpg)
Basic assembly metrics
![Page 43: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/43.jpg)
Basic assembly metrics
Metric Description
Assembly size With or without very short contigs?
N50 / NG50 For contigs and/or scaffolds
Coverage When compared to a reference sequence
Errors Base errors from alignment to reference sequence !and/or input read data
Number of genes From comparison to reference transcriptome !and/or set of known genes
![Page 44: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/44.jpg)
Basic assembly metrics
Metric Description
Assembly size With or without very short contigs?
N50 / NG50 For contigs and/or scaffolds
Coverage When compared to a reference sequence
Errors Base errors from alignment to reference sequence !and/or input read data
Number of genes From comparison to reference transcriptome !and/or set of known genes
And many, many more...
![Page 45: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/45.jpg)
Genome assemblyBack in the day...
![Page 46: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/46.jpg)
Genome assemblyBack in the day...
1998
![Page 47: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/47.jpg)
Genome assembly: then
![Page 48: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/48.jpg)
Genetic maps ✓
Genome assembly: then
![Page 49: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/49.jpg)
Genetic maps ✓ Physical maps ✓
Genome assembly: then
![Page 50: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/50.jpg)
Genetic maps ✓ Physical maps ✓Understanding of target genome ✓
Genome assembly: then
![Page 51: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/51.jpg)
Genetic maps ✓ Physical maps ✓Understanding of target genome ✓Haploid / low heterozygosity genome ✓
Genome assembly: then
![Page 52: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/52.jpg)
Genetic maps ✓ Physical maps ✓Understanding of target genome ✓Haploid / low heterozygosity genome ✓Accurate & long reads ✓
Genome assembly: then
![Page 53: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/53.jpg)
Genetic maps ✓ Physical maps ✓Understanding of target genome ✓Haploid / low heterozygosity genome ✓Accurate & long reads ✓Resources (time, money, people) ✓
Genome assembly: then
![Page 54: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/54.jpg)
So what was the result of spending millions of dollars !to assemble genomes of well-characterized species,!with accurate long reads, and detailed maps???
![Page 55: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/55.jpg)
✤ 2000: published genome size = 125 Mbp
✤ 2007: genome size = 157 Mbp
✤ 2012: genome size = 135 Mbp
Arabidopsis thaliana
![Page 56: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/56.jpg)
✤ 2000: published genome size = 125 Mbp
✤ 2007: genome size = 157 Mbp
✤ 2012: genome size = 135 Mbp
✤ Amount sequenced = 119 Mbp
Arabidopsis thaliana
![Page 57: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/57.jpg)
✤ 2000: published genome size = 125 Mbp
✤ 2007: genome size = 157 Mbp
✤ 2012: genome size = 135 Mbp
✤ Amount sequenced = 119 Mbp
✤ Ns = 0.2% of genome
Arabidopsis thaliana
![Page 58: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/58.jpg)
Two views of the same gene
![Page 59: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/59.jpg)
Two views of the same gene
Top: from genome sequence view on TAIR web site!Bottom: from gene sequence file on TAIR FTP site
![Page 60: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/60.jpg)
Drosophila melanogaster
✤ Genome published 1998
✤ Heterochromatin finished 2007
![Page 61: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/61.jpg)
Drosophila melanogaster
✤ Genome published 1998
✤ Heterochromatin finished 2007
✤ Ns = 4% of genome
![Page 62: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/62.jpg)
Caenorhabditis elegans
✤ Genome published 1998
✤ 2004: last N removed
![Page 63: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/63.jpg)
Caenorhabditis elegans
✤ Genome published 1998
✤ 2004: last N removed
✤ 1998–2014: genome sequence changes
![Page 64: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/64.jpg)
Caenorhabditis elegans
✤ Genome published 1998
✤ 2004: last N removed
✤ 1998–2014: genome sequence changes
✤ 558 insertions
✤ 230 deletions
✤ 614 substitutions
![Page 65: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/65.jpg)
Caenorhabditis elegans
✤ Genome published 1998
✤ 2004: last N removed
✤ 1998–2014: genome sequence changes
✤ 558 insertions
✤ 230 deletions
✤ 614 substitutions
} Nov 2012
![Page 66: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/66.jpg)
Saccharomyces cerevisiae
✤ Genome published 1997
✤ 12 Mbp genome
✤ 1,653 changes to genome since 1997
![Page 67: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/67.jpg)
Saccharomyces cerevisiae
✤ Genome published 1997
✤ 12 Mbp genome
✤ 1,653 changes to genome since 1997
✤ Last changes made in 2011
![Page 68: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/68.jpg)
Genetic maps ✓ Physical maps ✓Understanding of target genome ✓Haploid / low heterozygosity genome ✓Accurate & long reads ✓Resources (time, money, people) ✓
Genome assembly: then
![Page 69: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/69.jpg)
Genetic maps ✗
Physical maps ✗
Understanding of target genome ✗
Haploid / low heterozygosity genome ✗
Accurate & long reads ✗
Resources (time, money, people) ✗
Genome assembly: now
![Page 70: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/70.jpg)
Assembling & finishing!a genome is not easy!
![Page 71: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/71.jpg)
AssemblathonsA new idea is born
Image from flickr.com/photos/dullhunk/4422952630
![Page 72: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/72.jpg)
![Page 73: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/73.jpg)
If you sequence 10,000 genomes...!...you need to assemble 10,000 genomes
![Page 74: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/74.jpg)
How many assembly tools are out there?
![Page 75: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/75.jpg)
bambus2
How many assembly tools are out there?
Ray
Celera
MIRA
ALLPATHS-LGSGA
Curtain MetassemblerPhusion
ABySS
Amos
Arapan
CLC
Cortex
DNAnexus
DNA Dragon
EdenaForge
GeneiousIDBA
Newbler
PRICE
PADENA
PASHA
Phrap
TIGR
Sequencher
SeqMan NGen
SHARCGS
SOPRA
SSAKE
SPAdes
Taipan
VCAKE
Velvet
Arachne
PCAP
GAM
MonumentAtlas
ABBA
Anchor
ATAC
Contrail
DecGPU GenoMinerLasergene
PE-Assembler
Pipeline Pilot
QSRA
SeqPrep
SHORTY
fermiTelescoper
QuastSCARPA Hapsembler
HapCompass
HaploMerger
SWiPS
GigAssembler
MSR-CA
MaSuRCA
GARM
Cerulean
TIGRA
ngsShoRT
PERGA
SOAPdenovo
REAPR
FRCBam
EULER-SR SSPACE
Opera
mip
gapfiller
image
PBJelly
HGAP
FALCON
Dazzler
GGAKE
A5
CABOG
SHRAPSR-ASM
SuccinctAssembly
SUTTA
Ragout
Tedna
Trinity
SWAP-Assembler
SILP3
AutoAssemblyD
KGBAssembler
MetAMOS
iMetAMOS
MetaVelvet-SL
KmerGenie
Nesoni
Pilon
Platanus
CGAL
GAGM
Enly
BESST
Khmer
GRIT
IDBA-MTP
dipSPAdes
WhatsHap
SHEAR
ELOPER
OMACC
![Page 76: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/76.jpg)
How many assembly tools are out there?
![Page 77: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/77.jpg)
bambus2
How many assembly tools are out there?
Ray
Celera
MIRA
ALLPATHS-LGSGA
Curtain MetassemblerPhusion
ABySS
Amos
Arapan
CLC
Cortex
DNAnexus
DNA Dragon
EdenaForge
GeneiousIDBA
Newbler
PRICE
PADENA
PASHA
Phrap
TIGR
Sequencher
SeqMan NGen
SHARCGS
SOPRA
SSAKE
SPAdes
Taipan
VCAKE
Velvet
Arachne
PCAP
GAM
MonumentAtlas
ABBA
Anchor
ATAC
Contrail
DecGPU GenoMinerLasergene
PE-Assembler
Pipeline Pilot
QSRA
SeqPrep
SHORTY
fermiTelescoper
QuastSCARPA Hapsembler
HapCompass
HaploMerger
SWiPS
GigAssembler
MSR-CA
MaSuRCA
GARM
Cerulean
TIGRA
ngsShoRT
PERGA
SOAPdenovo
REAPR
FRCBam
EULER-SR SSPACE
Opera
mip
gapfiller
image
PBJelly
HGAP
FALCON
Dazzler
GGAKE
A5
CABOG
SHRAPSR-ASM
SuccinctAssembly
SUTTA
Ragout
Tedna
Trinity
SWAP-Assembler
SILP3
AutoAssemblyD
KGBAssembler
MetAMOS
iMetAMOS
MetaVelvet-SL
KmerGenie
Nesoni
Pilon
Platanus
CGAL
GAGM
Enly
BESST
Khmer
GRIT
IDBA-MTP
dipSPAdes
WhatsHap
SHEAR
ELOPER
OMACC
![Page 78: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/78.jpg)
bambus2
How many assembly tools are out there?
Ray
Celera
MIRA
ALLPATHS-LGSGA
Curtain MetassemblerPhusion
ABySS
Amos
Arapan
CLC
Cortex
DNAnexus
DNA Dragon
EdenaForge
GeneiousIDBA
Newbler
PRICE
PADENA
PASHA
Phrap
TIGR
Sequencher
SeqMan NGen
SHARCGS
SOPRA
SSAKE
SPAdes
Taipan
VCAKE
Velvet
Arachne
PCAP
GAM
MonumentAtlas
ABBA
Anchor
ATAC
Contrail
DecGPU GenoMinerLasergene
PE-Assembler
Pipeline Pilot
QSRA
SeqPrep
SHORTY
fermiTelescoper
QuastSCARPA Hapsembler
HapCompass
HaploMerger
SWiPS
GigAssembler
MSR-CA
MaSuRCA
GARM
Cerulean
TIGRA
ngsShoRT
PERGA
SOAPdenovo
REAPR
FRCBam
EULER-SR SSPACE
Opera
mip
gapfiller
image
PBJelly
HGAP
FALCON
Dazzler
GGAKE
A5
CABOG
SHRAPSR-ASM
SuccinctAssembly
SUTTA
Ragout
Tedna
Trinity
SWAP-Assembler
SILP3
AutoAssemblyD
KGBAssembler
MetAMOS
iMetAMOS
MetaVelvet-SL
KmerGenie
Nesoni
Pilon
Platanus
CGAL
GAGM
Enly
BESST
Khmer
GRIT
IDBA-MTP
dipSPAdes
WhatsHap
SHEAR
ELOPER
OMACC
Which is the best?
![Page 79: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/79.jpg)
Comparing assemblers
✤ Can't fairly compare two assemblers if they:
![Page 80: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/80.jpg)
Comparing assemblers
✤ Can't fairly compare two assemblers if they:
✤ produced assemblies from different species
![Page 81: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/81.jpg)
Comparing assemblers
✤ Can't fairly compare two assemblers if they:
✤ produced assemblies from different species
✤ assembled same species, but used sequence data from different sequencing technologies
![Page 82: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/82.jpg)
Comparing assemblers
✤ Can't fairly compare two assemblers if they:
✤ produced assemblies from different species
✤ assembled same species, but used sequence data from different sequencing technologies
✤ used same sequencing technologies but have different sequence libraries
![Page 83: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/83.jpg)
Comparing assemblers
✤ Can't fairly compare two assemblers if they:
✤ produced assemblies from different species
✤ assembled same species, but used sequence data from different sequencing technologies
✤ used same sequencing technologies but have different sequence libraries
✤ Even using different options for the same assembler may produce very different assemblies!
![Page 84: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/84.jpg)
The PRICE genome assembler has 52 command-line options!!!
![Page 85: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/85.jpg)
The PRICE genome assembler has 52 command-line options!!!
how many of them are you going to learn?
![Page 86: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/86.jpg)
A genome assembly competition
![Page 87: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/87.jpg)
An attempt to standardize some aspects !of the genome assembly process
Genome assembly contests
![Page 88: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/88.jpg)
✤ 2010–2011!
✤ Used synthetic data!
✤ Small genome (~100 Mbp)!
✤ We knew the answer!
Assemblathon 1
![Page 89: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/89.jpg)
Here we go again
![Page 90: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/90.jpg)
Type of data Number of genomes
Size of genomes
Do we know the answer?
Assemblathon 1 Synthetic 1 Small ✓
![Page 91: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/91.jpg)
Type of data Number of genomes
Size of genomes
Do we know the answer?
Assemblathon 1 Synthetic 1 Small ✓
Assemblathon 2 Real 3 Large ✗
![Page 92: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/92.jpg)
Melopsittacus undulatus
Boa constrictor constrictorMaylandia zebra
![Page 93: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/93.jpg)
Bird
SnakeFish
![Page 94: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/94.jpg)
Why these three species?
![Page 95: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/95.jpg)
Why these three species?
Because they were there
![Page 96: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/96.jpg)
Species
Bird
Fish
Snake
Estimated genome size
1.2 Gbp
1.0 Gbp
1.6 Gbp
Assemble this!
![Page 97: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/97.jpg)
Species
Bird
Fish
Snake
Estimated genome size
1.2 Gbp
1.0 Gbp
1.6 Gbp
Illumina
285x!(14 libraries)
192x!(8 libraries)
125x!(4 libraries)
Assemble this!
![Page 98: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/98.jpg)
Species
Bird
Fish
Snake
Estimated genome size
1.2 Gbp
1.0 Gbp
1.6 Gbp
Illumina
285x!(14 libraries)
192x!(8 libraries)
125x!(4 libraries)
Roche 454
16x!(3 libraries)
Assemble this!
![Page 99: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/99.jpg)
Species
Bird
Fish
Snake
Estimated genome size
1.2 Gbp
1.0 Gbp
1.6 Gbp
Illumina
285x!(14 libraries)
192x!(8 libraries)
125x!(4 libraries)
Roche 454
16x!(3 libraries)
PacBio
10x!(2 libraries)
Assemble this!
![Page 100: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/100.jpg)
Who took part?
![Page 101: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/101.jpg)
Who took part?
![Page 102: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/102.jpg)
Who took part?
21 teams!43 assemblies!
52,013,623,777 bp of sequence
![Page 103: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/103.jpg)
Species
Bird
Fish
Snake
Competitive entries
12
10
12
Entries
![Page 104: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/104.jpg)
Species
Bird
Fish
Snake
Competitive entries
12
10
12
Evaluation entries
3
6
0
Entries
![Page 105: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/105.jpg)
Goals
![Page 106: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/106.jpg)
Goals
✤ Assess 'quality' of assemblies
![Page 107: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/107.jpg)
Goals
✤ Assess 'quality' of assemblies
✤ Define quality!
![Page 108: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/108.jpg)
Goals
✤ Assess 'quality' of assemblies
✤ Define quality!
✤ Produce ranking of assemblies for each species
![Page 109: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/109.jpg)
Goals
✤ Assess 'quality' of assemblies
✤ Define quality!
✤ Produce ranking of assemblies for each species
✤ Produce ranking of assemblers across species?
![Page 110: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/110.jpg)
Who did what?
Person/group Jobs
Me, Ian Korf, and Joseph Fass Perform various analyses of all assemblies
David Schwarz et al. Produce & evaluate optical maps
Jay Shendure et al. Produce Fosmid sequences !(bird & snake only)
Martin Hunt & Thomas Otto Performed REAPR analysis
Dent Earl & Benedict Paten Help with meta-analysis of final rankings
![Page 111: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/111.jpg)
91 co-authors!
flickr.com/photos/jamescridland/613445810
![Page 112: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/112.jpg)
Results!
![Page 113: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/113.jpg)
Lots of results!
![Page 114: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/114.jpg)
![Page 115: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/115.jpg)
102 different metrics!
![Page 116: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/116.jpg)
10 key metrics
![Page 117: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/117.jpg)
Key Metric Description
1 NG50 scaffold length
2 NG50 contig length
3 Amount of assembly in 'gene-sized' scaffolds
4 Number of 'core genes' present
5 Fosmid coverage
6 Fosmid validity
7 Short-range scaffold accuracy
8 Optical map: level 1
9 Optical map: levels 1–3
10 REAPR summary score
![Page 118: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/118.jpg)
Key Metric Description
1 NG50 scaffold length
2 NG50 contig length
3 Amount of assembly in 'gene-sized' scaffolds
4 Number of 'core genes' present
5 Fosmid coverage
6 Fosmid validity
7 Short-range scaffold accuracy
8 Optical map: level 1
9 Optical map: levels 1–3
10 REAPR summary score
![Page 119: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/119.jpg)
1) Scaffold NG50 lengths
✤ Can calculate NG50 length for each assembly!
✤ But also calculate NG60, NG70 etc.!
✤ Plot all results as a graph
![Page 120: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/120.jpg)
1) Scaffold NG50 lengths
![Page 121: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/121.jpg)
2) Contig vs scaffold NG50
![Page 122: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/122.jpg)
2) Contig vs scaffold NG50
![Page 123: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/123.jpg)
2) Contig vs scaffold NG50
![Page 124: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/124.jpg)
3) Gene-sized scaffolds
![Page 125: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/125.jpg)
3) Gene-sized scaffolds
✤ Some assembly folks get a little obsessed by length!
![Page 126: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/126.jpg)
3) Gene-sized scaffolds
✤ Some assembly folks get a little obsessed by length!
✤ How long is 'long enough' for a scaffold?
![Page 127: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/127.jpg)
3) Gene-sized scaffolds
✤ Some assembly folks get a little obsessed by length!
✤ How long is 'long enough' for a scaffold?
✤ What if you just wanted to find genes?
![Page 128: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/128.jpg)
3) Gene-sized scaffolds
✤ Some assembly folks get a little obsessed by length!
✤ How long is 'long enough' for a scaffold?
✤ What if you just wanted to find genes?
✤ Average vertebrate gene = ~25 Kbp
![Page 129: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/129.jpg)
3) Gene-sized scaffolds
![Page 130: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/130.jpg)
4) Core genes
![Page 131: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/131.jpg)
4) Core genes
✤ Used CEGMA (Core Eukaryotic Gene Mapping Approach)
![Page 132: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/132.jpg)
4) Core genes
✤ Used CEGMA (Core Eukaryotic Gene Mapping Approach)
✤ CEGMA uses a set of 458 'Core Eukaryotic Genes' (CEGs)
![Page 133: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/133.jpg)
4) Core genes
✤ Used CEGMA (Core Eukaryotic Gene Mapping Approach)
✤ CEGMA uses a set of 458 'Core Eukaryotic Genes' (CEGs)
✤ CEGs are conserved in: S. cerevisiae, S. pombe, A. thaliana, C. elegans, D. melanogaster, and H. sapiens
![Page 134: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/134.jpg)
4) Core genes
✤ Used CEGMA (Core Eukaryotic Gene Mapping Approach)
✤ CEGMA uses a set of 458 'Core Eukaryotic Genes' (CEGs)
✤ CEGs are conserved in: S. cerevisiae, S. pombe, A. thaliana, C. elegans, D. melanogaster, and H. sapiens
✤ How many full-length CEGs are in each assembly?
![Page 135: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/135.jpg)
4) Core genes
Species
Bird
Fish
Snake
Core genes (out of 458)
Best individual assembly
420
436
438
![Page 136: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/136.jpg)
4) Core genes
Species
Bird
Fish
Snake
Core genes (out of 458)
Best individual assembly
420
436
438
Across all assemblies
442
455
454
![Page 137: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/137.jpg)
4) Core genes
![Page 138: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/138.jpg)
ABYSS MNTVLTRANSLFAFSLSVMAALTFGCFITTAFKERTVPVSIAVSRVML-------KNVED BCM MNTVLTRANSLFAFSLSVMAALTFGCFITTAFKERTVPVSIAVSRVML-------KNVED CRACS MNTVLTRANSLFAFSLSVMAALTFGCFITTAFKERTVPVSIAVSRVML-------KNVED CURT MNTVLTRANSLFAFSLSVMAALTFGCFITTAFKERTVPVSIAVSRVML-------KNVED GAM MNTVLTRANSLFAFSLSVMAALTFGCFITTAFKERTVPVSIAVSRVMLFYEVRKIKNVED MERAC MNTVLTRANSLFAFSLSVMAALTFGCFITTAFKERTVPVSIAVSRVML-------KNVED PHUS MNTVLTRANSLFAFSLSVMAALTFGCFITTAFKERTVPVSIAVSRVML-------KNVED RAY MNTVLTRANSLFAFSLSVMAALTFGCFITTAFKERTVPVSIAVSRVML-------KNVED SGA MNTVLTRANSLFAFSLSVMAALTFGCFITTAFKERTVPVSIAVSRVML-------KNVED SYMB MNTVLTRANSLFAFSLSVMAALTFGCFITTAFKERTVPVSIAVSRVMLFYEVRKIKNVED SOAP MNTVLTRANSLFAFSLSVMAALTFGCFITTAFKERTVPVSIAVSRVML-------KNVED ************************************************ ***** !ABYSS FTGPGERSDLGIITFNISANILYYKHSSLFPNIFDWNVKQLFLYLSAEYSTKNN------ BCM FTGPGERSDLGIITFNISANILYYKHSSLFPNIFDWNVKQLFLYLSAEYSTKNN------ CRACS FTGPGERSDLGIITFNISANILYYKHSSLFPNIFDWNVKQLFLYLSAEYSTKNN------ CURT FTGPGERSDLGIITFNISANILYYKHSSLFPNIFDWNVKQLFLYLSAEYSTKNN------ GAM FTGPGERSDLGIITFNISANILYYKHSSLFPNIFDWNVKQLFLYLSAEYSTKNNLPHTHI MERAC FTGPGERSDLGIITFNISANILYYKHSSLFPNIFDWNVKQLFLYLSAEYSTKNN------ PHUS FTGPGERSDLGIITFNISANILYYKHSSLFPNIFDWNVKQLFLYLSAEYSTKNN------ RAY FTGPGERSDLGIITFNISANILYYKHSSLFPNIFDWNVKQLFLYLSAEYSTKNN------ SGA FTGPGERSDLGIITFNISANILYYKHSSLFPNIFDWNVKQLFLYLSAEYSTKNN------ SYMB FTGPGERSDLGIITFNISANILYYKHSSLFPNIFDWNVKQLFLYLSAEYSTKNN------ SOAP FTGPGERSDLGIITFNISANILYYKHSSLFPNIFDWNVKQLFLYLSAEYSTKNN------ ****************************************************** !ABYSS ---ALNQVVLWDKIILRGDDPNLLLKDMKSKYFFFDDGNGLKGNRNVTLTLSWNVVPNAG BCM ---ALNQVVLWDKIILRGDDPNLLLKDMKSKYFFFDDGNGLKGNRNVTLTLSWNVVPNAG CRACS ---ALNQVVLWDKIILRGDDPNLLLKDMKSKYFFFDDGNGLKGNRNVTLTLSWNVVPNAG CURT ---ALNQVVLWDKIILRGDDPNLLLKDMKSKYFFFDDGNGLKGNRNVTLTLSWNVVPNAG GAM YGHALNQVVLWDKIILRGDDPNLLLKDMKSKYFFFDDGNGLK------------------ MERAC ---ALNQVVLWDKIILRGDDPNLLLKDMKSKYFFFDDGNGLKGNRNVTLTLSWNVVPNAG PHUS ---ALNQVVLWDKIILRGDDPNLLLKDMKSKYFFFDDGNGLKGNRNVTLTLSWNVVPNAG RAY ---ALNQVVLWDKIILRGDDPNLLLKDMKSKYFFFDDGNGLKGNRNVTLTLSWNVVPNAG SGA ---ALNQVVLWDKIILRGDDPNLLLKDMKSKYFFFDDGNGLKGNRNVTLTLSWNVVPNAG SYMB ---ALNQVVLWDKIILRGDDPNLLLKDMKSKYFFFDDGNGLKGNRNVTLTLSWNVVPNAG SOAP ---ALNQVVLWDKIILRGDDPNLLLKDMKSKYFFFDDGNGLKGNRNVTLTLSWNVVPNAG *************************************** !
4) Core genes
![Page 139: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/139.jpg)
8 & 9) Optical maps
![Page 140: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/140.jpg)
8 & 9) Optical maps
✤ Stretch out DNA
![Page 141: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/141.jpg)
8 & 9) Optical maps
✤ Stretch out DNA
✤ Cut with restriction enzymes
![Page 142: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/142.jpg)
8 & 9) Optical maps
✤ Stretch out DNA
✤ Cut with restriction enzymes
✤ Note lengths of fragments
![Page 143: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/143.jpg)
8 & 9) Optical maps
✤ Stretch out DNA
✤ Cut with restriction enzymes
✤ Note lengths of fragments
✤ Compare to in silico digest of scaffolds
![Page 144: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/144.jpg)
8 & 9) Optical maps
✤ Stretch out DNA
✤ Cut with restriction enzymes
✤ Note lengths of fragments
✤ Compare to in silico digest of scaffolds
✤ Not all scaffolds suitable for analysis
![Page 145: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/145.jpg)
8 & 9) Optical maps
Image from University of Wisconsin-Madison
![Page 146: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/146.jpg)
8 & 9) Optical maps
![Page 147: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/147.jpg)
8 & 9) Optical maps
![Page 148: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/148.jpg)
8 & 9) Optical maps
![Page 149: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/149.jpg)
What does this all mean?
![Page 150: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/150.jpg)
102 metrics!per assembly
10 key !metrics
1 final!ranking
![Page 151: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/151.jpg)
Assembly
CRACS
SYMB
PHUS
BCM
SGA
MERAC
ABYSS
SOAP
RAY
GAM
CURT
Number of !core genes
438
436
435
434
433
430
429
428
422
415
360
![Page 152: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/152.jpg)
Assembly
CRACS
SYMB
PHUS
BCM
SGA
MERAC
ABYSS
SOAP
RAY
GAM
CURT
Number of !core genes
438
436
435
434
433
430
429
428
422
415
360
Rank
1
2
3
4
5
6
7
8
9
10
11
![Page 153: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/153.jpg)
Assembly
CRACS
SYMB
PHUS
BCM
SGA
MERAC
ABYSS
SOAP
RAY
GAM
CURT
Number of !core genes
438
436
435
434
433
430
429
428
422
415
360
Rank
1
2
3
4
5
6
7
8
9
10
11
Z-score
+0.68
+0.59
+0.54
+0.49
+0.44
+0.30
+0.25
+0.21
–0.08
–0.41
–3.02
![Page 154: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/154.jpg)
![Page 155: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/155.jpg)
![Page 156: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/156.jpg)
![Page 157: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/157.jpg)
![Page 158: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/158.jpg)
![Page 159: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/159.jpg)
![Page 160: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/160.jpg)
![Page 161: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/161.jpg)
![Page 162: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/162.jpg)
![Page 163: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/163.jpg)
What does this all mean?
![Page 164: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/164.jpg)
No really, what does this all mean?
![Page 165: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/165.jpg)
Some conclusions
✤ Very hard to find assemblers that performed well across all 10 key metrics!
✤ Assemblers that perform well in one species, do not always perform as well in another!
✤ Bird & snake assemblies appear better than fish!
✤ No real 'winner' for bird and fish
![Page 166: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/166.jpg)
SGA — best assembler for snake?
![Page 167: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/167.jpg)
SGA — best assembler for snake?
![Page 168: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/168.jpg)
Description Rank of snake SGA assembly
NG50 scaffold length 2
NG50 contig length 5
Amount of assembly in 'gene-sized' scaffolds 7
Number of 'core genes' present 5
Fosmid coverage 2
Fosmid validity 2
Short-range scaffold accuracy 3
Optical map: level 1 2
Optical map: levels 1–3 1
REAPR summary score 2
![Page 169: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/169.jpg)
Description Rank of snake SGA assembly
NG50 scaffold length 2
NG50 contig length 5
Amount of assembly in 'gene-sized' scaffolds 7
Number of 'core genes' present 5
Fosmid coverage 2
Fosmid validity 2
Short-range scaffold accuracy 3
Optical map: level 1 2
Optical map: levels 1–3 1
REAPR summary score 2
![Page 170: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/170.jpg)
Best assembler across species?
![Page 171: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/171.jpg)
Best assembler across species?
Assembler Number of 1st places (out of 27)
BCM 5
Meraculous 4
Symbiose 4
Ray 3
Excluding evaluation entries
![Page 172: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/172.jpg)
Best assembler across species?
Assembler Number of 1st places (out of 27)
BCM 5
Meraculous 4
Symbiose 4
Ray 3
Excluding evaluation entries
![Page 173: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/173.jpg)
Ray performance
Species Final ranking
Bird 7th
Fish 7th
Snake 9th
![Page 174: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/174.jpg)
![Page 175: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/175.jpg)
![Page 176: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/176.jpg)
Assembler
BCM - evaluation
BCM - competitive
Final rank
1
2
NGS data used in
assembly
Illumina + 454
Illumina + 454 + PacBio
BCM bird assemblies
![Page 177: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/177.jpg)
Assembler
BCM - evaluation
BCM - competitive
Final rank
1
2
NGS data used in
assembly
Illumina + 454
Illumina + 454 + PacBio
BCM bird assemblies
![Page 178: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/178.jpg)
Assembler
BCM - evaluation
BCM - competitive
Final rank
1
2
NGS data used in
assembly
Illumina + 454
Illumina + 454 + PacBio
Coverage!Z-score
+2.0
–0.3
BCM bird assemblies
![Page 179: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/179.jpg)
Assembler
BCM - evaluation
BCM - competitive
Final rank
1
2
NGS data used in
assembly
Illumina + 454
Illumina + 454 + PacBio
Coverage!Z-score
+2.0
–0.3
Validity!Z-score
+1.4
–0.8
BCM bird assemblies
![Page 180: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/180.jpg)
Assembler
BCM - evaluation
BCM - competitive
Final rank
1
2
NGS data used in
assembly
Illumina + 454
Illumina + 454 + PacBio
Coverage!Z-score
+2.0
–0.3
Validity!Z-score
+1.4
–0.8
NG50 Contig Z-score
+1.5
+2.7
BCM bird assemblies
![Page 181: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/181.jpg)
BCM evaluation scaffold
NNNNNNNNNNNNNNNNNNN
![Page 182: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/182.jpg)
BCM evaluation scaffold
NNNNNNNNNNNNNNNNNNN
BCM competition scaffold
NNNNNNNNNNNNNNNNNNN
![Page 183: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/183.jpg)
BCM evaluation scaffold
NNNNNNNNNNNNNNNNNNN
BCM competition scaffold
NNNNNNNNNNNNNNNNNNN
PacBio sequence
![Page 184: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/184.jpg)
BCM evaluation scaffold
NNNNNNNNNNNNNNNNNNN
BCM competition scaffold
CGTCGNNATCNNGGTTACG
![Page 185: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/185.jpg)
BCM evaluation scaffold
NNNNNNNNNNNNNNNNNNN
BCM competition scaffold
CGTCGNNATCNNGGTTACG
Mismatches from PacBio sequence penalized alignment !score more than matching unknown bases
![Page 186: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/186.jpg)
The choice of one command-line option,!used by one tool in the calculation of one key metric...
...probably made enough difference to drop!the PacBio-containing assembly to 2nd place.
![Page 187: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/187.jpg)
Other conclusions
✤ Different metrics tell different stories!
✤ Heterozygosity was a big issue for bird & fish assemblies!
✤ Final rankings very sensitive to changes in metrics!
✤ N50 is a semi-useful predictor of assembly quality
![Page 188: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/188.jpg)
![Page 189: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/189.jpg)
![Page 190: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/190.jpg)
Inter-specific differences matter
![Page 191: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/191.jpg)
Inter-specific differences matter
✤ The three species have genomes with different properties !
✤ repeats!
✤ heterozygosity
![Page 192: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/192.jpg)
Inter-specific differences matter
✤ The three species have genomes with different properties !
✤ repeats!
✤ heterozygosity
✤ The three genomes had very different NGS data sets!
✤ Only bird had PacBio & 454 data!
✤ Different insert sizes in short-insert libraries
![Page 193: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/193.jpg)
The Big Conclusion
![Page 194: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/194.jpg)
The Big Conclusion
"You can't always get what you want"Sir Michael Jagger, 1969
![Page 195: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/195.jpg)
What comes next?
![Page 196: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/196.jpg)
What comes next?
![Page 197: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/197.jpg)
What comes next?
3?
![Page 198: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/198.jpg)
A wish list for Assemblathon 3
![Page 199: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/199.jpg)
A wish list for Assemblathon 3
✤ Only have 1 species
![Page 200: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/200.jpg)
A wish list for Assemblathon 3
✤ Only have 1 species
✤ Teams have to 'buy' resources using virtual budgets
![Page 201: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/201.jpg)
A wish list for Assemblathon 3
✤ Only have 1 species
✤ Teams have to 'buy' resources using virtual budgets
✤ Factor in CPU time/cost?
![Page 202: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/202.jpg)
A wish list for Assemblathon 3
✤ Only have 1 species
✤ Teams have to 'buy' resources using virtual budgets
✤ Factor in CPU time/cost?
✤ Agree on metrics before evaluating assemblies!
![Page 203: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/203.jpg)
A wish list for Assemblathon 3
✤ Only have 1 species
✤ Teams have to 'buy' resources using virtual budgets
✤ Factor in CPU time/cost?
✤ Agree on metrics before evaluating assemblies!
✤ Encourage experimental assemblies
![Page 204: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/204.jpg)
A wish list for Assemblathon 3
✤ Only have 1 species
✤ Teams have to 'buy' resources using virtual budgets
✤ Factor in CPU time/cost?
✤ Agree on metrics before evaluating assemblies!
✤ Encourage experimental assemblies
✤ Use new FASTG genome assembly file format
![Page 205: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/205.jpg)
A wish list for Assemblathon 3
✤ Only have 1 species
✤ Teams have to 'buy' resources using virtual budgets
✤ Factor in CPU time/cost?
✤ Agree on metrics before evaluating assemblies!
✤ Encourage experimental assemblies
✤ Use new FASTG genome assembly file format
✤ Get someone else to write the paper!
![Page 206: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/206.jpg)
Intermission
![Page 207: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/207.jpg)
NGS must die!
![Page 208: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/208.jpg)
NGS must die!
‘NGS’ is used to refer to everything post-Sanger
![Page 209: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/209.jpg)
NGS must die!
‘NGS’ is used to refer to everything post-Sanger
Pyrosequencing was developed ~1996
![Page 210: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/210.jpg)
![Page 211: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/211.jpg)
![Page 212: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/212.jpg)
NGS madness
Next generation sequencing
aka second generation sequencing
![Page 213: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/213.jpg)
NGS madness
Next generation sequencing
aka second generation sequencing
but there’s also:
![Page 214: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/214.jpg)
NGS madness
Next generation sequencing
aka second generation sequencing
but there’s also: third generation sequencing
![Page 215: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/215.jpg)
NGS madness
Next generation sequencing
aka second generation sequencing
but there’s also: third generation sequencing
fourth generation sequencing
![Page 216: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/216.jpg)
NGS madness
Next generation sequencing
aka second generation sequencing
but there’s also: third generation sequencing
fourth generation sequencing
next-next generation sequencing
![Page 217: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/217.jpg)
NGS madness
Next generation sequencing
aka second generation sequencing
but there’s also: third generation sequencing
fourth generation sequencing
next-next generation sequencing
next-next-next generation sequencing
![Page 218: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/218.jpg)
NGS madness
Technology
Complete Genomics
Ion Torrent
PacBio
Oxford Nanopore
According to some papers…
2nd generation
2nd generation
2nd generation
3rd generation
![Page 219: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/219.jpg)
NGS madness
Technology
Complete Genomics
Ion Torrent
PacBio
Oxford Nanopore
According to some papers…
2nd generation
2nd generation
2nd generation
3rd generation
According to other papers…
3rd generation
3rd generation
3rd generation
4th generation
![Page 220: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/220.jpg)
NGS madness
“PacBio is a 2.5th generation”
“Helicos lies between the transition of next-generation to third generation”
![Page 221: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/221.jpg)
NGS madness
There are different sequencing methodologies, !and there are different sequencing platforms.
![Page 222: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/222.jpg)
NGS madness
There are different sequencing methodologies, !and there are different sequencing platforms.
Use one or the other.
![Page 223: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/223.jpg)
NGS madness
There are different sequencing methodologies, !and there are different sequencing platforms.
Use one or the other.
Or just say ‘current sequencing technologies’.
![Page 224: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/224.jpg)
Intermission
![Page 228: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/228.jpg)
![Page 229: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/229.jpg)
![Page 230: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/230.jpg)
![Page 231: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/231.jpg)
I looked at the shortest 10 sequences in 34 different genome assemblies…
![Page 232: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/232.jpg)
I looked at the shortest 10 sequences in 34 different genome assemblies…
![Page 233: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/233.jpg)
I looked at the shortest 10 sequences in 34 different genome assemblies…
![Page 234: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/234.jpg)
I looked at the shortest 10 sequences in 34 different genome assemblies…
![Page 235: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/235.jpg)
From a vertebrate genome assembly with 72,214 sequences…
![Page 236: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/236.jpg)
From a vertebrate genome assembly with 72,214 sequences…
![Page 237: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/237.jpg)
From a vertebrate genome assembly with 72,214 sequences…
![Page 238: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/238.jpg)
From a vertebrate genome assembly with 72,214 sequences…
![Page 239: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/239.jpg)
From a vertebrate genome assembly with 72,214 sequences…
![Page 240: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/240.jpg)
From a vertebrate genome assembly with 72,214 sequences…
Length of 10 shortest sequences: !100, 100, 99, 88, 87, 76, 73, 63, 12, and 3 bp!
![Page 241: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/241.jpg)
![Page 242: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/242.jpg)
![Page 244: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/244.jpg)
Data from Lex Nederbragt’s blog, June 2014
![Page 245: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/245.jpg)
Data from Lex Nederbragt’s blog, June 2014
![Page 246: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/246.jpg)
Long-read technology
Moleculo read data from Illumina BaseSpace, July 2013
![Page 247: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/247.jpg)
Long-read technology
From https://flxlexblog.wordpress.com (Lex Nederbragt's blog)
PacBio!data
![Page 248: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/248.jpg)
Long-read technology
MinIon from Oxford Nanopore
![Page 249: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/249.jpg)
Long-read technology
MinIon from Oxford Nanopore
![Page 250: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/250.jpg)
Where is the data?
![Page 251: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/251.jpg)
Where is the data?
![Page 252: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/252.jpg)
Where is the data?
Nick Loman published the first real-world data on June 10th
![Page 253: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/253.jpg)
![Page 254: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/254.jpg)
Single chromosome assembly?
![Page 255: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/255.jpg)
Single chromosome assembly?
![Page 256: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/256.jpg)
Single chromosome assembly?
![Page 257: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/257.jpg)
Tackling heterozygosity
1000 Genomes project plans to sequence 15 'trios' in high-depth
![Page 258: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/258.jpg)
Hi-C
✤ Nature Biotechnology, 31, 2013 !
✤ Burton et al.!
✤ Selvaraj et al.!
✤ Kaplan & Dekker
![Page 259: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/259.jpg)
The future of genome assembly
![Page 260: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/260.jpg)
Kwik-E-Assembler
acgtaacacaancac gggaacnnnacatta acnactagcataata nnnnnnnnnnaacac actttaaattatatc
The future of genome assembly
![Page 261: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/261.jpg)
The future of genome assembly
![Page 262: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/262.jpg)
The future of genome assembly
✤ At some point we will look back with embarrassment at this era.
![Page 263: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/263.jpg)
The future of genome assembly
✤ At some point we will look back with embarrassment at this era.
✤ Assembly must, and will, get better, but...
![Page 264: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/264.jpg)
The future of genome assembly
✤ At some point we will look back with embarrassment at this era.
✤ Assembly must, and will, get better, but...
✤ ...'perfect' genomes may remain elusive.
![Page 265: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/265.jpg)
The future of genome assembly
✤ At some point we will look back with embarrassment at this era.
✤ Assembly must, and will, get better, but...
✤ ...'perfect' genomes may remain elusive.
✤ Data management will remain an issue:
![Page 266: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/266.jpg)
The future of genome assembly
✤ At some point we will look back with embarrassment at this era.
✤ Assembly must, and will, get better, but...
✤ ...'perfect' genomes may remain elusive.
✤ Data management will remain an issue:
✤ the human genome -> human genomes -> tissue-specific genomes
![Page 267: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/267.jpg)
Summary
![Page 268: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/268.jpg)
Summary
✤ There is no real consensus on how to make a good genome assembly
![Page 269: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/269.jpg)
Summary
✤ There is no real consensus on how to make a good genome assembly
✤ Try different assemblers, try different command-line options
![Page 270: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/270.jpg)
Summary
✤ There is no real consensus on how to make a good genome assembly
✤ Try different assemblers, try different command-line options
✤ Decide what it is you want to get out of a genome assembly
![Page 271: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/271.jpg)
Summary
✤ There is no real consensus on how to make a good genome assembly
✤ Try different assemblers, try different command-line options
✤ Decide what it is you want to get out of a genome assembly
✤ Look at your input and output data
![Page 272: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/272.jpg)
Summary
✤ There is no real consensus on how to make a good genome assembly
✤ Try different assemblers, try different command-line options
✤ Decide what it is you want to get out of a genome assembly
✤ Look at your input and output data
✤ Wait 5 years and come back, we’ll (probably) have solved everything!
![Page 273: Genome assembly: then and now — v1.1](https://reader030.vdocuments.us/reader030/viewer/2022013003/554f471eb4c905423f8b49e0/html5/thumbnails/273.jpg)
Resources
✤ Lex Nederbragt’s blog - https://flxlexblog.wordpress.com!
✤ Nick Loman’s blog - http://pathogenomics.bham.ac.uk/blog/!
✤ Assemblathon twitter feed - https://twitter.com/assemblathon