next generation sequencing in crop improvement s.pdf · straw berry fragaria vesca ssp. vesca 240...
TRANSCRIPT
Next Generation Sequencing in
Crop Improvement
Dr.S.Uma,
Principal Scientist & Head,
Crop Improvement Division,
NRCB, Trichy
Fruit crops belong to several families of flowering plants.
Selection for valuable traits by classical breeding strategies is
complicated
- long duration crops
- Long Juvenile period
- clonal propagation
Genomic tools - dense genetic maps
- whole-genome sequences
.
FRUIT CROPS
Overall view of crop improvement
Several "omics" (genomics, transcriptomics, metabolomics)
aim to dissect biological events, with particular attention
- to fruit and flower development,
- plant-pathogen interaction,
- parthenocarpy,
- plant habits,
- dormancy break,
- post-harvest
- as other events correlated to plant management,
- plant productivity and
- fruit shelf-life.
State of plant genome sequencing
• 45 genome sequences publically available for 19 plant species
• 15 projects were underway
List of whole genome sequenced plants Sl.No. Category No. Crops
1 Cereals 4 Rice, Sorghum, Maize, Foxtail Millet
2 Legumes 3 Soybean, Pigeon pea, Common bean
3 Fibre crops 3 Cotton, Cannabis, Flax,
4 Tuber crops 2 Potato, Cassava
5 Vegetables 3 Tomato, Sugar beet, Cucumber
6 Fruits 10 Grapes, Papaya, Peach, Apple, Sweet Orange,
Clementine orange, Straw berry, Pear ,
Banana
7 Oil and Fuel crops 4 Mustard, Date Palm, Jatropa, Cater bean
8 Tree 4 Rose cum, Amborella, Cocoa, Poplar
9 Flowers 3 Monkey flower, Lotus, Columbine
10 Medicinal plants 2 Capsella, Selaginella
11 Fodder crops 2 Medicago, Eragrostis tef
12 Grasses 2 Switch grass, Brachypodium
13 Non flowering 1 Physcomitrella
14 Algae 1 Chlamydomonas
15 Model crop 1 Arabidopsis
Total 45
Fruit crops and their genome details
Fruit crop Scientific name Genome size Year of
completion
Annotated
Genes
Grapes Vitis vinifera 490 Mb 2007 26346
Papaya Carica Papaya 372 Mb 2008 28,629
Peach Prunus persica 227 Mb 2010 27,852
Apple Malus domestica 742.3 Mb 2010 57,386
Sweet orange Citrus sinensis 319 Mb 2010 25,376
Clementine orange Citrus clementina 296 Mb 2010 25,385
Straw berry Fragaria vesca ssp. vesca 240 Mb 2010 34,809
Banana Musa acuminata 613 Mb 2012 36,542
Banana Musa balbisiana 523 Mb 2013 36,638
Pear Pyrus spp. 577.3 Mb 2014 43,419
Why banana….?
• Kalpatharu’-’ Plant of virtues’.
• Form the basis of food and nutritional security
• Adaptation and acclimatization to wide range of agro-climatic zones
• Socio-economic importance
• Banana alone contributes 1.99% of the Indian GDP.
• India accounts for 17% of the world’sproduction
• As center of origin - wide Genetic diversity,
How diverse is the crop….?
Global Status of Banana
India - leader in global banana industry
But stand low in production of quality fruits
Nil with respect to export
Growth of Banana
Industry Country
Area (million ha) Production
(million tonnes)
World 5.14 106.53
India 0.796 28.45
Fusarium wilt
BBrMVPseudostem borerScarring Beetle
Bunchy Top
Why genomics for banana….?
Genome sequencing for model crops like Arabidopsis and rice - available
But limited synteny exists between banana and model crops .
Banana genome size (613Mb) is 4X of Arabidopsis (125Mb) and
30% larger than rice (466Mb).
Crop based uniqueness are – Non grassy monocot
- Tropical fruit crop
- Polyploidy ( 2x, 3x, 4x)
- Parthenocarpy
Diploid (2X) Triploid (3X) Tetraploid (4X) Pentaploid (5X)
Polyploidy
Non grassy monocot Parthenocarpy
Apart from these,
Good model crop for genetic studies
- as it is one of the few plant species with bi- parental
cytoplasmic inheritance
-one of the few crops where virus has integrated in the
host genome ( Banana streak virus)
Domestication
- Human interventions
Super Domestication
- Human + Biotechnology
X
M.acuminata diploid - AA M.balbisiana diploid - BB
Banana A Genome Sequencing
36,542 protein-coding genes
Combining Sanger, Roche 454 and Illumina – used DH plants
-M.Acuminata type Pahang was sequenced.
-Almost 90% of the genome (523Mb) has been sequenced.
-From this, 36,542 protein coding genes identified
- slightly higher than human genome
-Interestingly, among the sequenced plants – has highest no. of
Regulatory genes (TFs) have been identified.
Banana B Genome Sequencing(Pisang Klutuk Wulung)
Re-sequencing project - collaboration between-Lab. of Fruit Breeding and Biotechnology, Department of Biosystems, Katholieke
Universiteit Leuven, Belgium
- The Centre for Research in Biotechnology for Agriculture and the Institute of
Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur,
Malaysia.
Published by Davey et al.
BMC Genomics 2013, 14:683
Illumina Hi Seq 2000 II used
BGI, China
B-genome is 79% the size of the A-genome
36,638 predicted functional genes
Sequences which are nearly identical to the 36,542 (87%) of the
A-genome.
Valuable – B genome harbours diverse resistant genes for many
Biotic and abiotic stresses affecting banana, especially Black
streak, xanthomonas wilt and cold and drought tolerance.
It provides wide therapeutic and non-ediblle applications.
Decoding B-genome will halp in trapping trait specific genes,
regulatory genes which may not be in AA genome.
Comparison B & A Genome Sequencing
Genes for specific traits - Parthenocarphy, resistant genes etc.
Regulatory genes – Silencing the genes
Comparison of A & B Genome
Markers
Among A and B genomes,
more no. of repeats identified
in A genome.
But the SSR’s (bp’s and
kbp’s) higher in B genome.
These markers are already
being exploited For finger
printing and QTL
development.
Isolation of novel genes, promoters, miRNA etc
miRNA
Inspite of smaller genome size,
B genome contains more
repetitive elements and
more known and novel miRNA’s
but fewer conserved miRNA families
than the A genome.
High no. of mi RNA loci here could be
related to higher no. of transposable
Elements present.
These are thought to have contributed for the generation of species
Specific miRNA genes.
Several B specific miRNA predicted targets are proteins with
unknown function and appear to be present only in B genome
representing B genome distinct functional networks.
= miRNAa = small, single-stranded RNA molecules (~21-mer) that bind to complementary of one or more mRNAs (often transcription factor mRNAs)
Regulation Of Transcript Abundance By MicroRNAs
Mature miRNA within RNA-induced silencing complex (RISC)miRNA – RISC complex
binding to complementary mRNA
miRNAs either degrade or impair the translation of their target mRNAs!
Taken from http://www.ambion.com/techlib/resources/miRNA/index.html
Importance of transcriptome sequencing
WGS provides overall information about the genome
But transcriptome sequencing is essential as it provides info. On
- Gene expression under specific physiological status
(specific tissues, developmental/ disease challenge etc.)
- As it analyses only transcribed portion of the genome
- Thus helps to interpret the functional elements of the
genome.
Variety/cultivar Techniques Tissues Stage/stress No. of
expresse
d genes
Reference Laboratory
M. acuminata,
subgroup Caven- dish
‘Grand Nain’
cDNA
AFLP
Fruit Crown rot
disease
10 Lassois et al.
2011
University of Liege,
Gembloux,
Belgium
M. acuminata ssp.
burmannicoides
‘Calcutta-4,
cv. Rose, FHIA 17,
Williams’
cDNA
AFLP
Root Fusarium 76 Munro 2008 University of
Pretoria,
Pretoria, South Africa
M. acuminata ssp.
burmannicoides
‘Calcutta-4’
SSH Root Fusarium 83 Swarupa et al.
2013
IIHR, Bangalore
India
M. acuminata cv.
Manoranjitham
(AAA)
SSH Leaf Eumusae
leaf spot
805 Uma et
al.2011
NRCB, Trichy,
India
M. acuminata cv.
Karthobiumtham
(ABB)
SSH Root Nematode 256 Backiyarani et
al. 2014
NRCB, Trichy,
India
M. acuminata cv. Tuu
Gia
SAGE Leaf 5292 Coemans et
al.2005
KUL, Belgium
M. acuminata ssp.
burmannicoides
‘Calcutta-4’
EST
Sequencing
Hot and
cold stress
Fruit
ripening
220 Santos et al.
2005
EMBRAPA, Brazil
Status of EST sequencing in Banana - pre NGS Era
Variety/cultivar Techniques Tissues Stage/stress No. of
expresse
d genes
Reference Laboratory
M. acuminata ‘
Williams'
Combina- tion of
SSH and microarray
FruitFusarium
79 Van den berg
et al. 2007FABI, University of Pretoria,
SouthAfrica
M. acuminata,
cv. Grand Nain
Combination of SSH
and microarray
Fruit Early ripening of
banana
84 Man- rique-
Trujillo et al.
2007
Centro
de Investigación y de
Estudio- sAvanzados del
IPN, México
M. acuminata AAA
group
Combination of SSH
and microarray
Fruit Early ripening of
banana
265 Xu et al. 2007Institute of Tropical
Bioscience and
Biotechnology, Chinese
Academy of Tropical
Agricultural Sciences,
China
M. acuminata L.
AAA group
Combination of SSH
and microarray
Fruit Ethylene
biosynthesis
initiation in fruits.
22
Jin et al. 2009 Institute of Tropical
Bioscience and
Biotechnology, Chinese
Academy of Tropical
Agricultural Sciences,
China
M. acuminata,
Dwarf Cavandish
( AAA)
Combination of SSH
and microarray
Fruit Ethylene
biosynthesis
initiation in fruits
37 Kesari et al.
2007
NBRI, Lucknow, India
Exploitation of NGS in banana for various stresses
Variety/cultivar Techniques Tissues Stage/stres
s
No. of
expresse
d genes
Reference Laboratory
M. acuminata cv.
Manoranjitham (AAA)
Illumina
Sequencing
Leaf Eumusae
leaf spot
6195 Uma et al.
2014
NRCB, Trichy
M. acuminata, cv.
Grand Naine
Illumina
Sequencing
Leaf Eumusae
leaf spot
6212 Uma et al.
2014
NRCB, Trichy
M. acuminata cv.
Karthobiumtham
(ABB)
Illumina
Sequencing
Leaf Nematode 5736 Uma et al.
2014
NRCB, Trichy
M. acuminata cv.
Nendran
Illumina
Sequencing
Leaf Nematode 6521 Uma et al.
2014
NRCB, Trichy
M. acuminata cv. Saba Illumina
Sequencing
Leaf Drought 2546 Uma et al.
2014
NRCB, Trichy
M. balbisiana
cv. Attikol
Ion Torrent
Sequencing
Whole
Parts
Wild 3301 Uma et al.
2014
NRCB, Trichy
M. acuminata Illumina
Sequencing
- - - Ravishankar
et al. 2013
IIHR, Trichy
M. Acuminata
‘
Illumina
Sequencing
Roots Fusarium 3331 Ting-Ting
Bai et al.
2013.
SAU,Guangdong,
China
Variety/cultivar Techniqu
es
Tissue
s
Stage/stress No. of
expresse
d genes
Reference Laboratory
M. acuminata
Calcutta 4 and
Cavendish Grande
Naine
454 GS-
FLX
Titanium
technolog
y
Leaf Mycosphaerella
musicola
10,645 Passos et al.
2013
Instituto de Ciências
Biológicas, Brazil
M. acuminata
subgroup Cavendish
(AAA)
Illumina
Sequenci
ng
Root Fusarium 842 Chunqiang
Li et al. 2013
Chinese Academy of
Tropical
Agricultural
Sciences, Haikou,
China.
M. acuminata
subgroup Cavendish
(AAA)
Illumina
Sequenci
ng
Root Fusarium 5008 Chunqiang
Li et al. 2012
Guangdong
Academy of
Agricultural
Sciences,
Guangzhou 510640,
China
Musa acuminata . AAA
group,
‘Brazilian’
Illumina
Sequenci
ng
Root Fusarium 2,825 Zhuo Wang
et al. 2012
Chinese Academy of
Tropical
Agricultural
Sciences, Hainan
571101, China
Pre NGS Era - From 2005-14,
Techniques - SSH, RDA, cDNA AFLP, SAGE, DNA microarray etc.
only 12 EST- sequencing completed with an average of 600 genes.
-Information was limited and not sufficient to understand
the Host- stress interaction.
- It also limited the prediction of level of gene expression
as it needs the prior knowledge of genome sequence.
NGS Era- from 2012-14
- 12 transcriptome sequencing completed within 2 years which
led to the annotation with an average of 30,000 genes.
This is the magnitude of importance of NGS.
Musa B genome whole transcriptome
Assembled using a MIRA tool
Generated 82,413 contigs
Functionally Annotated –
BLAST2GO- SWISSPROT, TrEMBL,
COG and PlantCYC databases.
35,783 protein coding genes
1,93,826 Gene ontology terms
predicted
cv. Attikol, wild Indigenous, unique accession, donor source
Musa B genome whole transcriptome
Annotated results
Max. no transcripts were hit with
Arabidopsis gene followed by rice
Functional classification through COG indictaed
Most of the transcripts have the function of
- Post translational modification
- Protein turnover and chaperones,
-Followed by signal transduction mechanism and intre
And intracellular trafficking and vesicular
transport
-Total of 3,301 defense related genes detected
Max. no. of transcripts were involved in
-PAMP followed by
Musa B genome whole transcriptome
Total of 3,301 defense related genes
detected
Max. no. of transcripts were involved in
-PAMP followed by
-Plant hormone biosynthesis
- genes involved in cell wall modifications
SSR markers identified from this
genome has been tested in various
germplasm accessions and it clarely
distinguish the A genome B genome and
wild speccies.
Musa transcriptome for different stresses
Sigatoka
Transcriptome
Nematode
Transcriptome
Drought
Transcriptome
Fusarium wilt
Transcriptome (IIHR)
Challenged Unchallenged
Resistant C-R U-R
Susceptible C-S U-S
For each stress four different c-DNA libraries- constructed and sequenced through Illumina Platform.
Based on digital gene expression studies, unique genes expressed only In resistant cultivar under challenged conditions - identified.
Commonly expressed defense related genes expressed in all stress Conditions in respective resistant cultivars were also identified.
60.00%
62.00%
64.00%
66.00%
68.00%
70.00%
72.00%
74.00%
76.00%
78.00%
UR CR US CS
Venn diagram for genes significantly
expressed (>2 fold) in challenged and
unchallenged resistant and susceptible
cultivars
Percentage of genes expressed in challenged
and unchallenged resistant and susceptible
cultivars
Significantly expressed genes in resistant
and susceptible cultivarsTotal genes expressed in resistant and
susceptible cultivars
Heat map for significantly expressed
challenged and unchallenged resistant
and susceptible cultivars
Another tool to quick identification of resistant genes having more no. of transcripts
Allows us to speculate the possible pathways involved
in resistance mechanism/s
Schematic representation of nematode (P. coffeae)
resistant mechanism /s in banana
SSRs and SNPs developed from various Transcriptome of
Musa cultivars
Libraries
M. Eumusae Nematode ( P. coffeae) Water deficit stress
SSR SNP SSR SNP SSRSNP
Unchallenged Resistant
4825 83 11650 432 9692 451
Challenged Resistant
9153 355 3408 287 8128 351
Unchallenged Susceptible
5008 79 8250 143 8478 92
Challenged Susceptible
9239 271 7512 305 3775 334
Total 28225 788 30820 1167 30073 1228
6 transcriptome data available with us facilitated the identification of SSR and
SNP markers across R and S cvs.
In-silico polymorphic markers predicted are being validated for MAS
Bioinformatics
Databases and software developments
Genetic resources (MGIS),
Genetic data (TropGENE DB) (INIBAP, CIRAD)
EMBRAPA).
Genomic data (data_Musa)
Development of Biomoby services to improve
Interoperability and data integration of databases
Development of software tools, Genoma, for storage,
annotation and analysis of EST
Development of a web-based infrastructure for the
management of the genomics resources
Development of a web-based infrastructure based
on GBrowse to visualize sequence annotations
(BIOVERSITY)
(BIOVERSITY)
(EMBARAPA)
(CIRAD, INIBAP)
Banana Marker Database Development
(SSR)
Markers identified have been linked to their respective metabolic pathwayswhich shall narrow down the selection of primers based on our objective.
• - To develop trait specific markers for use in Marker Assisted
• breeding
• - Development of QTLs in banana
• - Can also be used for developing Musa DNA chip
Database developed is being used
- to develop High resolution map for A and B genome
With Transcriptome data obtained through NGS,
We have gained knowledge on
- gene sequence information- novel/unigenes
- role of regulatory elements- kinases, TFs
- cultivar specific gene expression
- stress specific gene expression
- highly regulated pathway
- change in metabolic activities- enzymes
- sub cellular level gene expression- e.g. mitochondria
- most influenced organ
- genes not affected by stress
- common gene expression or cascade pattern for a particular
stress
- development of stress specific marker /s etc.