new genome assembly with next generation sequencers · 2011. 5. 6. · next generation sequencing...
TRANSCRIPT
![Page 1: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/1.jpg)
Genome Assembly With Next Generation Sequencers
3 May, 2011Jongsun Park
Personal Genomics Institute
![Page 2: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/2.jpg)
Table of Contents
1 Central Dogma and –Omics Studies
2 History of Sequencing Technologies
3 Genome Assembly Processes With NGS Sequences
4 Current Status of Plant Genomes
![Page 3: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/3.jpg)
Central Dogma and -Omics Studies
![Page 4: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/4.jpg)
Central Dogma in Molecular Biology and Bioinformatics
DNA RNA Proteins
Genomics Transcriptomics ProteomicsAd
van
ces
of
Bio
tech
no
logi
es
Bioinformatics
![Page 5: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/5.jpg)
Further –Omics Studies With Bioinformatics
Genomics Population Genomics
Phylogenomics
Transcriptomics
Gene Expression
Regulatory Network
Non-coding RNAs
Epigenomics
Proteomics Metabolomics
Pathway Analysis
3D structure
![Page 6: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/6.jpg)
Position of Sequences in Central Dogma!
AGCUACGUGAGAGACGUACUGUAC…
AGCTACGTGAGAGACGTACUGTAC…
MASTWTSWAMTCCAAMST…
Target for “Sequencing”
Predicted from sequences
![Page 7: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/7.jpg)
History of Sequencing Technologies
![Page 8: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/8.jpg)
Classical Method: Sanger Sequencing
- Using dideoxynucleotides, Dna synthesis can be stopped randomly with four
different reaction tubes.
DNA Template
DNA Template
DNA Template
DNA Template
Primer
Primer
Primer
Primer
New DNA strain A
New DNA strain T
New DNA strain G
New DNA strain C
+ddATP
+ddTTP
+ddGTP
+ddCTP
http://en.wikipedia.org/wiki/File:Sequencing.jpg
![Page 9: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/9.jpg)
Automated Method: Sanger Sequencing
- Using four fluorescent dyes, scanner can read sequences directly.
- With 96 capillaries, machine can read 96 or 384 different samples at one time.
DNA Template
DNA Template
DNA Template
DNA Template
Primer
Primer
Primer
Primer
New DNA strain A
New DNA strain T
New DNA strain G
New DNA strain C
+ddATPwith dye
+ddTTPwith dye
+ddGTPwith dye
+ddCTPwith dye
http://en.wikipedia.org/wiki/File:Radioactive_Fluorescent_Seq.jpg
![Page 10: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/10.jpg)
Capacity of Automated Sanger Sequencing
- Per one capillary, 700 bases (Quality Value is
higher than 20) can be read in one hour.
- Per one machine, ABI3730 with 384
capillaries kit, 700 * 384 * 24 = 6,451,200 bp
can be obtained without considering sample
preparation process.
https://products.appliedbiosystems.com/ab/en/US/adirect/ab?cmd=catNavigate2&catID=600533&tab=Overview
- If you want to get 1x human genome sequences using one this machine,
it will take 465 days (30,000,000,000 bp / 6,451,200 bp = 465.029..).
- For de novo assembly, usually 6x to 10x coverage are required: it will take
4,650 days for obtaining enough sequences for genome project.
![Page 11: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/11.jpg)
Next Generation Sequencing (NGS) Technologies
- Next (or Current) generation sequencing technologies have accelerated the speed
of genome sequencing projects and have broaden application range of genome
sequences.
Solexa; Illumina
SOLiD; ABI
GS-Titanium; Roche 454
SMRT; Pacific Bioscience Helicos; Helicos Bioscience
![Page 12: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/12.jpg)
NGS: 454 Technology
![Page 13: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/13.jpg)
NGS: Solexa Technology
![Page 14: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/14.jpg)
NGS: SOLiD Technology (1)
![Page 15: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/15.jpg)
NGS: SOLiD Technology (2)
![Page 16: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/16.jpg)
NGS: SOLiD Technology (3)
![Page 17: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/17.jpg)
Capacities of Next Generation Sequencers
Solexa GA2; Illumina
SOLiD 4; ABI
GS-Titanium; Roche 454
ABI 3730; ABI
384 x 700 bp = 268,800 bp = 269Kb (per one reaction / 1 hr)
950,000 x 450 bp = 405,000,000 bp = 405Mb (per one reaction / 2-3 days)
35,000,000 x 7 x (151 x 2) bp = 73,990,000,000 bp = 74.0 Gb (per one reaction / 12 days)
1,400,000,000 x 75 bp (50+25) = 105,500,000,000 bp = 105.5Gb (per one reaction / 11 days)
HiSeq2000; Illumina
70,000,000 x 14 x (101 x 2) bp = 197,960,000,000 bp = 198.0 Gb (per one reaction / 8 days)
![Page 18: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/18.jpg)
Single Read and Mate-Pair (Pair-end) Sequences
- Single read is the most simple method: just read one
direction for each sample (cluster or bead).
- Mate-pair (or pair-end) method can generate both
side of short-read sequences of each read, which is
similar to BAC-end, Fosmid-end, Cosmid-end
sequences.
- Mate-pair (pair-end) methods are useful for
generating larger sequences with gaps (We called it
scaffolding.)
![Page 19: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/19.jpg)
Summary of Costs For NGS Technologies
Sequence explosion, Nature, 46(1), 670-671
![Page 20: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/20.jpg)
Pros and Cons of NGS Technologies
- Large number of reads per one run
- Extremely low costs for generating
huge amount of sequences
- Diverse applications not only for
genomics but also for transcriptomics,
small RNAs, epigenomics, and etc.
with small modification of protocols.
Pros Cons
- Short read length per each reads
(36 bp to 151 bp or 450bp)
- Different types of sequencing
qualities (GS, Solexa, and SOLiD)
- Difficult to deal with large size of
sequences with normal programs
including de novo assembly
- Impossible to get small amount of
sequences with low costs
![Page 21: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/21.jpg)
Genome Assembly ProcessesWith NGS Sequences
![Page 22: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/22.jpg)
Whole Shotgun Sequence Strategy
- Assembly process is essential for genome project because read length of each
sequence is less than 1 kb.
- Assembly process was conducted by several popular programs, such as phrap
and PCAP3, for Sanger sequences.
Genome AssemblyScaffolding
Assembled genome
![Page 23: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/23.jpg)
Genome Assembly Process
- We can perform genome assembly manually!
23 sequences should be compared with each other!
23C2 = 23*22 / 2 = 253 comparison!
![Page 24: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/24.jpg)
How to Find Overlapped Sequences?
- Using dynamic algorithm, we can make the program for finding similar sequences.
- Complexity of this algorithm is O(n3).
http://www.avatar.se/molbioinfo2001/dynprog/dynamic.html
![Page 25: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/25.jpg)
Classification of Alignments (1)
Considering whole sequences for
alignment?
Global Alignment
Yes
No
Local alignment
![Page 26: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/26.jpg)
Classification of Alignments (2)
How many sequences are considered for
alignment?
Pair-wise alignment
2 sequences!
More than 2 sequences!
Multiple alignment
![Page 27: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/27.jpg)
Famous Bioinformatics Tools for Alignments
- Global alignment : ClustalW, T-coffee, and MUSCLE
- Local alignment : FASTA, and BLAST (Basic Local Alignment Searching Tools)
provided by NCBI.
- Pair-wise alignment : BLAST and FASTA
- Multiple sequence alignment : ClustalW, T-coffee, MUSCLE, and etc.
![Page 28: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/28.jpg)
Example of Genome Assembly: Vitis Vinifera
Pair-wise comparison of 6,200,000 reads
6,200,000C2 = 19,219,996,900,000 comparisons
![Page 29: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/29.jpg)
Genome Assemblers
http://www.phrap.org/
![Page 30: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/30.jpg)
Short-read Sequence Assembly (1)
- Short-read sequences generated by NGS machines cause several problems
of already well-established genome assemblers.
- Too many reads require near to infinite computational power.
- Too short reads cannot find reliable overlaps to make long contigs.
- To reduce computational power, new algorithm was required.
- Short reads require another strategy to make reliable contig sequences.
- Dealing a lot of sequences also caused several technical problems, such as
physical memory problems, harddisk space, and computing power.
![Page 31: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/31.jpg)
Short-read Sequence Assembly (2)
563,466,202C563,466,201 = 158,747,080,116,419,301 comparison?!
563,466,202
![Page 32: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/32.jpg)
De brujin Graph: Alternative Method For Alignment
![Page 33: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/33.jpg)
De brujin Graph Algorithm For Alignment (1)
- This algorithm has been utilized for finding overlapped short-read sequences
quickly.
- This algorithm consists of three parts:
i) Generating k-mer sequences
ii) Constructing de brujin graph
iii) Resolving the graph with generating sequences
![Page 34: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/34.jpg)
K = 3GCAAAACACTTA…
GCA
CAA
AAA
AAA
AAC
De brujin Graph Algorithm For Alignment (2)
ACA
CAC
1-3
1-4
1-1 1-2
1-5
1-6
1-71-8
1-9
ACT
CTT
TTA
1-10
![Page 35: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/35.jpg)
ACA
CACACT
1-3
1-4
1-1 1-2
1-5
1-6
1-71-8
1-9
1-10
GCAAAACACTTA…
De brujin Graph Algorithm For Alignment (3)
CTT
TTA
ACACTTATTCGT
TAT
ATT
TTC
TCG
CGT
TAT
ATT
TTC
TCGCGTK = 3
2-1
2-22-3
2-4
2-5
2-6
2-7
2-8
2-92-10
![Page 36: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/36.jpg)
De brujin Graph Algorithm For Alignment (4)
1-3
1-4
1-1 1-2
1-5
1-6
1-71-8
1-9
1-10
TAT
ATT
TTC
TCGCGT
2-1
2-22-3
2-4
2-5
2-6
2-7
2-8
2-92-10
GCAAAACACTTATTCGT
![Page 37: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/37.jpg)
Genome Assemblers For Short Read Sequences
![Page 38: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/38.jpg)
Examples of Plant Genome de novo Assembly
5,937,915,739 bp
# of contigs 950 ea
Total length 976,089 bp
Maximum length 12,606 bp
Average length 1,027.46 bp
N50 length 3,061 bp
Lithocarpushancei
Ficusaltissima
Ficusaltissima
Ficusaltissima
Ficustinctoria
Ficustinctoria
# of contigs 462,868 355,052 132,590 247,376 337,777 476,937
Total length 112,614,098 87,502,701 33,293,636 61,369,608 87,427,716 116,554,688
Maximumlength
1,748 1,090 1,688 1,334 1,274 1,578
Average length
243.30 246.45 251.10 248.08 258.83 244.38
N50 length 237 239 245 241 248 3061
![Page 39: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/39.jpg)
Giant Panda Genome Project
![Page 40: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/40.jpg)
Sordaria macrospora Genome Project
![Page 41: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/41.jpg)
Large Scale Genome Projects: 1000 Human Genomes
http://www.1000genomes.org/page.phphttp://en.wikipedia.org/wiki/File:Genetic_Variation.jpg
![Page 42: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/42.jpg)
http://www.ldl.genomics.cn/page/index.jsp
Large Scale Genome Projects: BGI 1000 Genomes
![Page 43: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/43.jpg)
http://genome10k.org/
Large Scale Genome Projects: Genome 10K Project
![Page 44: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/44.jpg)
Current Status ofPlant Genome Projects
![Page 45: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/45.jpg)
Species name Method Size (Mb) # of contigs # of transcripts
Arabidopsis lyrata WGS 206.67 695 32,670
Medicago truncatula BAC, WGS 278.69 9 38,334
Selaginella moellendorffii WGS 212.76 768 22,285
Lycopersicon esculentum WGS, BAC 794.60 7,409 49,389
Solanum phureja WGS 702.58 57,681 110,512
Ricinus communis WGS 362.47 28,518 38,613
Mimulus guttatus WGS 416.66 11,243 47,442
Manihot esculenta WGS 321.73 2,216 27,501
Phoenix dactylifera WGS 284.68 234,704 -
Prunus persica WGS 227.25 202 28,689
Oryza glaberrima WGS 316.08 5,309 -
Unpublished 11 Higher Plant Genomes
All pictures are from wikipedia.
![Page 46: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/46.jpg)
Species name Journal Method Size (Mb) # of contigs # of transcripts
Arabidopsis thaliana Nature, 2000 BAC +α 119.19 5 32,615
Oryza sativa japonicaScience, 2002Nature, 2005
BAC +α 372.08 12 66,710
Oryza sativa indicaScience, 2002PLoS Biology, 2005
WGS 426.32 10,267 49,710
Oryza sativa japonica (syngenta)
PLoS Biology, 2005 WGS 391.14 7,777 45,824
Populus trichocarpa Science, 2006 WGS 485.51 22,012 45,555
Vitis vinifera Nature, 2007WGS, Complete
497.51 35 30,434
Carica papaya Nature, 2008 WGS 369.69 17,677 28,589
Lotus japonicus DNA Research, 2008 WGS 323.24 110,945 26,700
Sorghum bicolor Nature, 2009 WGS 738.54 3,304 36,338
Zea mays Science, 2009 BAC, WGS 2,061.02 11 53,764
Cucumis sativusNature genetics, 2009
WGS 243.57 47,488 26,682
Glycine max Nature, 2010 WGS 996.90 4,262 62,199
Brachypodiumdistachyon
Nature, 2010 WGS 273.27 197 32,255
Malus x domesticaNature Genetics,2010
WGS 871.56 144,621 57,386*
14 Published Higher Plant Genomes from 11 Species
All pictures are from wikipedia.
![Page 47: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/47.jpg)
Species name Journal MethodSize (Mb)
# of contigs
# of transcripts
Chlamydomonas reinhardtii Science, 2007 WGS 112.31 88 16,709
Micromonas pusilla CCMP1545 Science, 2009 WGS 22.04 27 10,547
Micromonas sp. RCC299 Science, 2009 WGS 20.99 17 10,108
Ostreococcus lucimarinus CCE9901 PNAS, 2007 WGS 13.2 21 7,488
Ostreococcus sp. RCC809 Not published yet WGS 13.41 22 7,492
Ostreococcus tauri PNAS, 2007 WGS 12.58 118 7,725
Coccomyxa sp. C169 Not published yet WGS 48.95 45 9,629
7 Unicellular Plants Genomes
All pictures are from wikipedia.
![Page 48: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/48.jpg)
Distribution of 32 Plant Genome Size
Mb
0.0
500.0
1000.0
1500.0
2000.0
2500.0
12.6 13.2 13.4 21.0 22.0 49.0 112.3 119.7
206.7 212.8 225.9 227.3 273.3 284.7 307.5 316.1 321.7 323.2 362.5 369.7 372.3 391.1 416.7 426.3
485.5 497.5
702.6 738.5 781.8
871.6
996.9
2061.0
Unicellular Plants
![Page 49: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/49.jpg)
Distribution of Number of Transcripts of 29 Plants
# of transcripts
0
20,000
40,000
60,000
80,000
100,000
120,000
7,4887,4927,7259,62910,10810,547
16,709
22,2852668226,70027,50128,58928,68930,43432,25532,67033,410
36,33838,613
45,55545,82447,4424948349,71053,42353,764
62,19967,393
110,512
Unicellular Plants
![Page 50: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/50.jpg)
Relationship between Genome Size and Transcripts# of transcript/Genome Size (Mb)
0.0
100.0
200.0
300.0
400.0
500.0
600.0
700.0
26.1 49.2
61.2 62.4 63.3 77.3 82.6 85.5 93.8
104.7 106.5 113.9 116.6 117.2 118.0 118.1 126.2 148.8 157.3 158.1
173.7 181.0 196.7
279.2
478.5 481.6
558.7 567.3
614.1 Unicellular Plants
![Page 51: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/51.jpg)
Comparisons with Genomes in Other Kingdoms
Mb
5.0 102.9
23.8 24.1
1587.1
392.2
-500.0
0.0
500.0
1000.0
1500.0
2000.0
2500.0
3000.0
3500.0
Bacteria Oomycota Eukaryota Fungi Metazoa Viridaeplanta
5 species 6 species 23 species 256 species 90 species 32 species
![Page 52: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/52.jpg)
Cucumber Genome Sequences
![Page 53: New Genome Assembly With Next Generation Sequencers · 2011. 5. 6. · Next Generation Sequencing (NGS) Technologies - Next (or Current) generation sequencing technologies have accelerated](https://reader036.vdocuments.us/reader036/viewer/2022071217/60496bb1d1ba0e0e501fcdd1/html5/thumbnails/53.jpg)
Thank you for your attention!