the new zealand institute for plant & food research limited potato genome sequencing consortium,...
TRANSCRIPT
The New Zealand Institute for Plant & Food Research Limited
Potato Genome Sequencing Consortium, notes from the edge
Dr Susan Thomson, Dr Mark Fiers, Dr Jeanne Jacobs
The New Zealand Institute for Plant & Food Research Limited
Potato Genome Sequencing – why?
Solanaceae - important family (tomato, eggplant, petunia, tobacco, and capsicum)
Potato is now the 3rd largest global food crop
The New Zealand Institute for Plant & Food Research Limited
Potato Genome Sequencing – the beginning
The Potato Genome Sequencing Consortium is an initiative of Wageningen University & Research Center
PGSC brings together a global community to complete the project.
Individual partners were assigned different chromosomes.
The New Zealand Institute for Plant & Food Research Limited
PGSC – member countries
The New Zealand Institute for Plant & Food Research Limited
PGSC – the beginning
1995 – Genetic map of potato, diploid mapping populationSH (SH83-92-488) x RH (RH89-039-16)
The New Zealand Institute for Plant & Food Research Limited
Genetic map 1995
2001 – Ultra High Density genetic map generated,~10,000 AFLP markers (genome ~840Mb, 12 markers/Mb)
PGSC – the beginning
The New Zealand Institute for Plant & Food Research Limited
Ultra high-densitygenetic map
2001
Genetic map 1995
2002 – BAC library, using RH89-039-16.85,000 BACs average insert of 120Kb.73,000 fingerprinted by AFLP
PGSC – the beginning
The New Zealand Institute for Plant & Food Research Limited
Ultra high-densitygenetic map
2001
BAC library2002
Genetic map 1995
2006 – AFLP analysis of BACs used to build up contigs of overlapping BACs. Selective AFLPs used to anchor certain BACs (and contigs) to physical map.
PGSC – the beginning
The New Zealand Institute for Plant & Food Research Limited
Ultra high-densitygenetic map
2001
BAC library2002
Genetic map 1995
Physical map2006
2005/6 – Initiate genome sequencing. BAC by BAC Sanger sequence. Start with anchored seed BACs.6x coverage, 800- 1000 BACs/chromosome.
PGSC – the beginning
The New Zealand Institute for Plant & Food Research Limited
Ultra high-densitygenetic map
2001
BAC library2002
Genetic map 1995
Physical map2006
Dec 2009 – end date for full annotated potato genome sequence
Sequencing start
2005/6
PGSC – the beginning
The New Zealand Institute for Plant & Food Research Limited
Sequencing start
2005/6
PGSC – the beginning
Annotation and sequence Dec 2009
Early 2008 – BAC sequencing status: chromosome 7 not started, others very few BACs done.
The New Zealand Institute for Plant & Food Research Limited
PGSC – the worries
Sanger BAC by BAC slow Despite UHD map of 10,000 markers,still large gaps in physical map reducing number of seed BACs
Made more problematic by ‘stops’ caused by repeat elementsand lack of overlapping BACs
The New Zealand Institute for Plant & Food Research Limited
PGSC – the solutions
Bigger and better machines!
Next Generation Sequencing (NGS) technologies making Whole Genome Shotgun (WGS) sequencing more financiallyfeasible (data/$).
RH is highly heterozygous, leading to assembly issues.
Continue RH sequencing using mainly NGS methods
The New Zealand Institute for Plant & Food Research Limited
PGSC – the solutions
Introducing a new line, DM: DM 1-3 516 R44
Doubled Monohaploid, homozygous, line.(Ref: Lightbourn GJ, Jelesko JG, Veillieux RE. 2007. Genome 50 (5):492–501.)
DM flowers well.
Can be used as female parent in crosses with most diploid potato germplasm.
The New Zealand Institute for Plant & Food Research Limited
PGSC – mapping
No genetic knowledge for DM 1-3 516 R44
Diploid mapping population:
DM x DI (China Runtush)
F1 x DI
Mapping population
2 x 96 well plates with DNA of mapping population, along with parents.Generated by International Potato Center (CIP), Peru.
The New Zealand Institute for Plant & Food Research Limited
PGSC – mapping
Preliminary Scaffold assembly of DM derived from Illumina data:(generated by Beijing Genome Institute, BGI)
No. of sequences 57681 max scaffold length 5398072 min scaffold length 100 total assembly length 702581734 average scaffold length 12180 median scaffold length 179 n50* 429749
* n50 = largest first, align along length of genome. n50 is size of scaffold at 50% genome.
As at 20 August 2009
The New Zealand Institute for Plant & Food Research Limited
PGSC – mapping
550 newly generated SSR markers*; SSRs Institute Country 100 Plant and Food Research New Zealand 100 Universidad Nacional Agraria La Molina Peru 100 International Potato Centre Peru 100 Scottish Crop Research Institute Scotland 50 Instituto Nacional de Tecnologia Agropecuaria Argentina
50
The Irish Agriculture and Food Development Authority, Teagasc Eire
50 Institute of Bioengineering Russian Federation
*SSRs generated by BGI, China
Preliminary results 14/44 were monomorphic, 15/44 tested show polymorphism in DI, 15/44 show polymorphism between DM/DI
The New Zealand Institute for Plant & Food Research Limited
PGSC – mapping
- 148 Sequence Tagged Markers (STM). Known to map to regions spanning all 12 chromosomes.
- ~60 Ste markers, currently being mapped in an SHxRH population. Generated by large scale in-silico design of SSRs from ESTs in public database. (Ref: Tang J, Baldwin SJ, Jacobs JM, Linden CG, Voorrips RE, Leunissen JA, van Eck H, Vosman B. BMC Bioinformatics. 2008 Sep 15;9:374)
The New Zealand Institute for Plant & Food Research Limited
PGSC – mapping
SNP data – EST data aligned to DM scaffold. (Robin Buell, courtesy of SolCAP USDA project http://solcap.msu.edu/)
Design ~ 2000 markers for use with BeadXpress (Illumina) (Glenn Bryan, Scottish Crop Research Institute)
Aiming for > 1000 mapped.
DArT data*– Two discovery arrays with over 30,000 probes to begin. Discovered 3000 candidate markers.
It is hoped that 1000 to 1500 unique DM markers will segregate in the mapping population.
Sequencing of 7000 DArT markers will also be carried out.
* Diversity Arrays http://www.diversityarrays.com/
Mapping data will be combined with results from:
The New Zealand Institute for Plant & Food Research Limited
PGSC – assembly
Plans for an in silico* pipeline to improve scaffold bringing together data from:
- SOL Genomics Network
- Tomato genome
- Markers; SSR, SNP and DArT
- RH UHD/physical map information
* Dan Bolser, University of Dundee, Scotland
The New Zealand Institute for Plant & Food Research Limited
PGSC – the present & future
Line In progress Sanger sequencingIllumina
runsRoche/454
runs
RH WGS + Long Jump libraries
10 X coverage
WGS 60 X coverage
BAC library150,000 BAC end sequences + 2,000 BAC clones
Random sheared BAC library (~100kb) 120,000 BAC end sequences
DM WGS + Long jump libraries
10 X coverage
WGS + 500bp to 10kb libraries
65 X coverage
Fosmid library (~ 35kb) 100,000 end sequences
BAC libray 200,000 BAC end sequences
The New Zealand Institute for Plant & Food Research Limited
Add into assembly pipeline, data from Transcriptome sequencing:16 runs, a combination of different tissues and conditions for DM and also RH
The New Zealand Institute for Plant & Food Research Limited
Acknowledgements
Plant & Food Research is part of the international Potato Genome Sequencing Consortium (PGSC).
For more information, visit http://potatogenome.net.Website going live as of 1st September.
PFR – Lincoln
Jeanne JacobsMark Fiers
Samantha Baldwin