the international tomato sequencing project: the first cornerstone of the sol project lukas mueller...
Post on 15-Jan-2016
218 views
TRANSCRIPT
The International Tomato Sequencing Project: The first
Cornerstone of the SOL Project
Lukas Mueller
on behalf of
International SOL Tomato Sequencing Project
Overview • Aims• Why sequence the tomato genome?• How to sequence the tomato genome?• Who is sequencing the tomato genome?• Resources for Sequencing the Tomato Genome
– Genetic Map– BAC libraries– Overgo mapping– BAC End Sequences– Minimal Tiling Path– Bioinformatics
• Summary
Steven D. TanksleyJim J. Giovannoni
Stephen Stack,Joyce van Eck
Doil ChoiByung Dong Kim
Mingsheng ChenZhukuan Cheng
Chuanyou LiHongqing LingYongbiao Xue
Graham SeymourGerard Bishop
Ramesh Sharma Jiten KhuranaAkhilesh Tyagi
Willem StiekemaP. LindhoutTaco Jesse
Rene Klein Lankhorst
Mondher BouzayenMathilde Causse
Daisuke ShibataSatoshi Tabata
Antonio GranellMiguel A. Botella
Giovanni GiulianoLuigi Fruciante
Aims
• Provide a high quality reference sequence for the Solanaceae genomes
• Using mapping of other Solanaceae sequences onto the tomato sequence, and comparative genetic maps to derive “virtual” genomes for other Solanaceae
• Prerequisite for studying natural diversity and linking genotype to phenotype
• Build a Solanaceae bioinformatics platform to integrate, analyze and distribute the information
asterid I
asterid II
asterid III
asterid IV
asterid V
rosid I
rosid II
rosid III
caryophyllids
hamamelid I
hamamelid II
ranunculids
paleoherb II
Magnoliales
monocots
Laurales
Rubiaceae (coffee)
Compositeae (sunflower, safflower, lettuce)
Leguminosae (soybean, MedicagoRosaceae (apple, peach, cherry); Salicaceae (poplar)
Malvaceae (cotton) ; Sterculiaceae (cocoa)
; Rutaceae (citrus) Brassicaceae
Gramineae (maize, wheat) ; Musaceae (banana)
Liliaceae (onion)
Chenopodiaceae (sugarbeet, spinach)
SolanaceaeSolanaceae
Arabidopsis
Rice
WHY SOLANACEAE? WHY SOLANACEAE? Solanaceae is part of unique clade of Solanaceae is part of unique clade of flowering plants. Genome research in Solanaceae will provide a flowering plants. Genome research in Solanaceae will provide a reference reference anchoranchor and enable comparative genomics and systematic and enable comparative genomics and systematic throughout this cladethroughout this clade
Why sequence tomato?
• Tomato is the most intensively researched Solanaceae genome encoding approx. 35,000 genes euchromatic regions corresponding to less than a 25% of the total DNA in the tomato nucleus (220~250 Mb).
• Tomato provides the smallest diploid genome for which homozygous inbreds are available.
• Its sequence will facilitate positional cloning in tomato and other Solanaceae genomes (via synteny maps).
How to sequence the tomato genome?
• Whole Genome Shotgun– Advantages: Fast, cheaper, ok with reference genome– Disadvantages: Unordered contigs
• Methylation Filtering (Tobacco)– Advantages: Selects for expressed genome, cheaper– Disadvantages: unordered contigs
• Tiling Path (Arabidopsis, Drosophila, Rice)– Advantages: Sequence and gene order; select gene rich
regions; easy to divide work – Disadvantages: Relatively expensive, time consuming
• ORDER IMPORTANT FOR COMPARING GENOMES
Tomato Genome Structure• 12 chromosomes• 950MB of total DNA• 220MB contiguous, gene
rich euchromatin• Sequence only gene-rich
euchromatin (>90% all genes)
• Tiling path method preferred
• Drosophila used and Medicago is using similar strategy
pericentricheterochromatin
162 bp sub-telomeric repeatcentromere
telomereeuchromatin
euchromatin
pericentricheterochromatin
7 bp telomeric repeat
telomerestructure
BAC libraries• All libraries derived from Solanum lycopersicum
Heinz 1706.• HindIII library (Rod Wing, Clemson U)
– ~120,000 clones, 120kB average size– ~15x coverage– FPC contigged– Overgo analysis– 75,000 clones BAC end sequenced
• MboI library – 50,000 clones, 140kb average size– Will be BAC end sequenced
• EcoRI library (being prepared)– Will be BAC end sequenced
F2-2000 Genetic Map
• Parents: – Solanum lycopersicum x Solanum
pennellii
• Mapping population of 80 F2 individuals
• # Markers: 1579• Total cM: 1453• Density: 1 marker/0.92cM• SGN http://sgn.cornell.edu/
rflp 345ssr 149tm 43p-mrkr 39cos 576est-by-clone 265unknown 8caps 21cosii 98kfg 35Total 1579
Marker-Types:
Tying the Genetic Map to the Physical Map: Overgos
• Overgos are “overlapping oligos”, short, very hot probes, developed from genetic markers of the F2-2000 map
• Overgos are organized in 96 well plates, analyses are carried out with row and column pools
• Pools are hybridized to BAC filters, raw pool results are deconvoluted
• A total of 1536 overgos developed (16 plates)
• Analyses of all plates is complete
Overgo Anchoring ResultsAnchors:• 652 anchor markers are involved in plausible non-conflicted
associations with BACs.• 4857 good marker--BAC associations
FPC contigs:• 1880 BACs in 705 plausible contigs• 2166 BAC singletons
• 652 seed BACs ==> 1/3 of euchromatic genome sequence
921651.8
791431.8
671712.6
621372.2
401193.0
631011.6
511122.2
34872.6
401162.9
41872.1
431032.4
391203.1
# anchorscM chr lengthcM per anchor
Distribution of Anchor Markers on Chromosomes
+ 1000 markers from Keygene AFLP map
Verification of overgo mappings
• Fluorescence In-Situe Hybridization (FISH)– BAC probe on pachytene chromosomes
• IL lines (Zamir lab)– Map BACs to IL lines– CAPS assays
(Hans de Jong)
Summary of FISH verification• Song-Bin Cheng, Hans de Jong (Holland, chromosome 6):
– 9 BACs analyzed
– 8 mapped to chromsome 8 in right order
– 1 BAC gave signals on centromere of chromosome 1
• Sangheob Lee, Doil Choi (Korea, chromosome 2):
– 27 BACs analyzed with FISH
– 25 confirmed to specific location, same order as F2-2000 map
– 2 match to other chromosomes
• Chuanyou Li (China, chromosome 3)
– >30 BACs being analyzed
• Steven Stack (USA): – Telomere and heterochromatic boundary determination
– FISH service for countries without FISH capability
BAC end sequences
• Total of 400,000 reads (200,000 BACs from both ends) selected from the 3 BAC libraries
• Batch of 75,000 BACs in process (HindIII library)
• ~45,000 BAC end sequences already obtained (ftp://ftp.sgn.cornell.edu/tomato_genome/)
• Average read length 655bp
• Annotation in progress
• SeqWright Inc, Houston, TX
• SeqWright is sponsoring a happy hour after this session.
anchored bacs
Obtaining the Tiling PathA B C
genetic map
overgos
“seed BAC” “seed BAC”
20
14
US Korea China UK India NL France Japan Spain US US Italy
BACs finished:
in process: 16 59
4
5
Overview: sgn.cornell.edu -> About -> tomato sequencing
Building a Bioinformatics Platform for the Solanaceae
• Project-wide standards for quality, gene naming, annotation (http://sgn.cornell.edu/solanaceae-project/)
• Create a unified web presence for the entire project
• Develop distributed model for annotation, web presentation, involving different centers in SOL countries
• All data and programs developed in the project are shared in an open source format
• Integrate all data into the SOL bioinformatics platform, facilitating a systems approach to explore diversity and adaptation and the complex interactions that occur on all levels of biological organization
SGN
Agronanotech
Kazusa
CAS Genome India
VIB Ghent
Annotation Phases
1. First pass annotations of sequences and gene models on BAC basis, available immediately
2. BAC based, common, distributed platform, stable BAC-based identifiers
3. Chromosome based, stable identifiers
Summary
• Sequencing of tomato is under way by a consortium of 10 countries
• High quality, ordered sequence using BAC tiling path
• BAC ends available, overgo results verified by FISH analyses
• Sequence will be tied to other Solanaceae and closely related species (coffee and beyond)
• Provide a foundation for shared biology for this economically important clade of plants
AcknowledgmentsSOL communityTomato Sequencing ProjectFunding
National Science FoundationOther National Funding Sources
Keygene NVSeqwright Inc. (Happy Hour)
ColleaguesSteven Tanksley, Jim Giovannoni, Joyce van Eck , Steven Stack
SGN:Teri Solow, Beth Skwarecky, Nick Taylor, Robert Buels, John Binns, Chenwei Lin