proteome bioinformatics and genetics for associating proteins with grain phenotype
DESCRIPTION
International Gluten Workshop, 11th; Beijing (China); 12-15 Aug 2012TRANSCRIPT
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Rudi Appels, Centre for Comparative Genomics, Murdoch University, Australia
Paula Moolhuijzen, Brett Chapman, Wujun Ma, Dean Diepeveen, Matthew
Bellgard,
Centre for Comparative Genomics, Murdoch University and Department
of Food and Agriculture WA, Australia.
Yueming Yan, Shunli Wang,
Capital Normal University, Beijing
Angela Juhasz,
Agricultural Institute, Martonvásár, Hungary
Frank Bekes,
FBFD Pty Ltd, Beecroft, Sydney, Australia 2119
CENTRE FOR
COMPARATIVE GENOMICS
• Stage 1A Pawsey Centre (SKA) • Ranked 87 in the world • 9600 cores
Centre for Comparative Genomics (CCG) at Murdoch University
CENTRE FOR
COMPARATIVE GENOMICS
Supercomputer
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Rudi Appels, Centre for Comparative Genomics, Murdoch University, Australia
Paula Moolhuijzen, Brett Chapman, Wujun Ma, Dean Diepeveen, Matthew
Bellgard,
Centre for Comparative Genomics, Murdoch University and Department
of Food and Agriculture WA, Australia.
Yueming Yan, Shunli Wang,
Capital Normal University, Beijing
Angela Juhasz,
Agricultural Institute, Martonvásár, Hungary
Frank Bekes,
FBFD Pty Ltd, Beecroft, Sydney, Australia 2119
CENTRE FOR
COMPARATIVE GENOMICS
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
• Genome sequencing and high resolution genetic maps of wheat
• Integrating new wheat protein level analyses
• Translating research findings to industry – the Decision Matrix
• The integration of new efforts to obtain reference sequences for bread
wheat and barley genomes is accelerating gene discovery.
• Locations of traits and proteins on DNA sequence assemblies via
genetic maps define gene networks
•The genomic resources are refining molecular marker development and
mapping strategies for combining yield with quality attributes of the
grain that meet markets requirements
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
• Genome sequencing and high resolution genetic maps of wheat
• Integrating new wheat protein level analyses
• Translating research findings to industry – the Decision Matrix
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Locations of proteins within a genetic map can be determined One of the first examples was published by Amiour (2003) using 2D gels to identify chromosomal locations of amphiphilic proteins from wheat grains . Later Chen et al (2007) carried out mapping using MALDI-TOF defined peaks of gliadin Progress in the DNA sequencing of the wheat transcribed genes and now allows higher resolution maps to be established
Amiour N, et al (2003) Theor. Appl. Genet. 108: 62–72. .
Chen J, et al (2007) Rapid Comm Mass Spectrometry 21: 2913 – 2917
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
2007 – 2012 Suites of genomic resources and knowledge have been established to provide
the foundation for sequencing the wheat and barley
• International Wheat Genome Sequencing Consortium (www.wheatgenome.org)
• UK WISP consortium (www.wheatisp.org)
• International Barley Sequencing Consortium (www.barleygenome.org)
• European TriticeaeGenome FP7 project (www.triticeaegenome.eu)
The initiatives built on long standing resources such as:
• KOMUGI in Japan (www.shigen.nig.ac.jp/wheat/komugi/)
• Graingenes in the USA (wheat.pw.usda.gov/GG2/index.shtml)
• Extensive EST collections (ITEC http://avena.pw.usda.gov/genome/)
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
• Reducing the complexity of the
wheat genome through flow
sorting of chromosome arms has
formed the basis for the
international effort to produce a
reference sequence for the variety
Chinese Spring
• All the chromosome arms now
have a completed survey sequence
analysis. This provides a pool of
DNA contigs that can be used to
anchor gene sequences and
proteins to chromosome arms
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
The array technologies to
assay single nucleotide
polymorphisms (SNPs) is now
establishing genetic maps with
2000-3000 molecular markers
.
map for chromosomes
1A, 1B, 1D, from a cross,
Avalon x Cadenza
Allen AM, Barker GLA, Berry ST, Coghill, JA, Gwilliam R, Kirby S, Robinson P, Brenchley RC, D’Amore R,
McKenzie N, Waite D, Hall A, Bevan M, Neil Hall N, Edwards KJ. (2011)Transcript-specific, single-nucleotide
polymorphism discovery and linkage analysis in hexaploid bread wheat (Triticum aestivum L.). Plant Biotechnology
Journal 2011: 1–14
The 9000 SNP array (“chip”) technology for assaying
SNPs has been used to establish a 2000 molecular
marker map for a set of 225 double haploid lines from a
Westonia x Kauz cross.
A large study in Australia is examining progeny from a
complex cross (MAGIC, currently a 4 –way cross using
Baxter, Yitpi, Westonia, Chara, 1500 lines, with markers
from a 9K SNP chip and markers from a 90K chip
planned). This work at CSIRO with Colin Cavanagh.
Proteome bioinformatics and genetics for associating
proteins with grain phenotype Chromosome 7A
An 8 –way cross using Baxter, Yitpi, Westonia, AC
Barrie (Canada), Alsen (US), Pastor (CIMMYT),
Xiaoyan 54 (China), and Volcani (Israel), 5000 lines are
being characterized.
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
In a large population of 5,000 lines (as required for accurate mapping) it is not
feasible to phenotype all progeny
The marker information can be used to define families of progeny for
phenotyping
For the 1500 lines from the 4x MAGIC lines, a population 370 families have
been defined for phenotyping (in duplicated/randomized designs) and while we
are still in the middle of this analysis (includes milling yield), some QTL for %
wet gluten at the LMW-glutenin locus of chromosome 1B are evident.
It is interesting that in the high resolution maps the QTL may not be exactly
superimposed on the LMW-glutenin locus.
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
GluStar system
for “wet
gluten”
measurements
on 4.5 g flour
• MAGIC and
assignment of a QTL
for % wet gluten to
1B near the LMW
glutenin locus but
not coincident with it
• The high density of
markers allows a
fine resolution of
map location when
1,500 progeny are
analyzed
Tomoshozi S, Budapest University of Technology and Economy; http://www.labintern.hu
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Li et al (2010). BMC Plant Biology 10:124
To determine protein fingerprints as a “phenotype” we have explored MALDI-
TOF as a means for increasing the number of lines we can analyse.
Low molecular weight glutenins
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
High molecular weight glutenins (70,000– 90,000 Da)
Gao L et al (2010). J Ag Food Chem 58: 2777–2786 Li et al (2009). Cereal Sci. 50: 295-301;
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
HMW-GS Mr (Da) deduced from coding gene Mr (Da) by MALDI-TOF
1Ax2* 86309 86200
1Bx6 Unknown 86500
1Bx7 82524 82300
1Bx7OE 83134 82900
1Bx7b* Unknown 82600
1Bx13 Unknown 83000
1Bx14 84012 83600
1Bx17 78607 77900, 78400
1Bx20 Unknown 82100
1Dx2 87022 87000
1Dx3 Unknown 85400
1Dx5 88128 87900
1By8 75156 74900
1By8a* Unknown 74800
1By8b* Unknown 75000
1By9 73515 73300
1By15 75733 74900
1By16 Unknown 76900
1By18 Unknown 75000
1By20 Unknown 74900
1Dy10 67473 67300
1Dy12 68652 68300 Li et al (2009) Cereal
Sci. 50: 295-301;
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
The MALDI-TOF based analyses of the LMW and HMW glutenins have
provided a good basis for establishing a high throughput analysis for breeding
programs. This analysis now runs as a fee-for-service (Saturn Biotech;
AUS$6/sample).
The glutenin subunit protein loci we know to date however can only account
for approximately 60% of the variation in measured grain quality attributes.
More detailed genetic analyses is yielding new information
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Chromosome 1D
L29183 L33288 L33529
The classic designation of the LMW
glutenin locus Westonia on
chromosome 1D is LMWG-D3c (in
addition to A3c, B3h).
Kauz designation is not known
Peaks from:
Westonia = L33288
Kauz = L29183, L33529
Peaks found in LMWG-D3c (based on
Li et al 2010):
33021
33290
33453
Map based on DH lines from a
Westonia x Kauz cross
Li et al (2010). BMC Plant Biology 10:124
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Chromosome 1D
L29183 L33288 L33529
The classic designation of the LMW
glutenin locus Westonia on
chromosome 1D is LMWG-D3c (in
addition to A3c, B3h).
Kauz designation is not known
Peaks from:
Westonia = L33288
Kauz = L29183, L33529
Peaks found in LMWG-D3c (based on
Li et al 2010):
33021
33290
33453
Map based on DH lines from a
Westonia x Kauz cross
Li et al (2010). BMC Plant Biology 10:124
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
L32831 L31965
Chromosome 7A
Classical mapping of LMW-glutenin loci defined the
chromosome 1A, 1B and 1D loci based on single
dimension SDS PAGE technology (Gupta and Shepherd,
1994) and it was noted then that the protein family was
complex.
We now find some of the peaks in the MALDI-TOF are
mapping to other chromosomes such as chromosome
7A
We used our wheat proteome data base to see if we
could identify the L32831 and L31965 proteins
Gupta and Shepherd (1994. Two-step one-dimensional SDS-PAGE
analysis of LMW subunits of glutenin. I. Variation and genetic control of
the subunits in hexaploid wheats. Theor. Appl. Genet. 80:65-74)
Proteome bioinformatics and genetics for associating
proteins with grain phenotype Chromosome 7A
In this analysis we are accessing a complex
part of the LMW glutenin protein
spectrum that was not available for
analysis in the earlier SDS gel-based
studies
L32831 L31965
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
L32831 L31965
Chromosome 7A
In this analysis we are accessing a complex
part of the LMW glutenin protein
spectrum that was not available for
analysis in the earlier SDS gel-based
studies
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Criteria for database search:
(1) Qualitative – amino acid composition (occurrence of QQQ etc) consistent
with being co-extracted with LMW-glutenins (gliadins removed before-
hand)
(2) Quantitative – molecular weight within 10 dalton
IWGSC_4DS_v1_2275417.fa.genscan.pep.1 31960 IWGSC_2AL_v1_6356128.fa.genscan.pep.2 31960 IWGSC_4BS_v1_4917914.fa.genscan.pep.1 31960 IWGSC_1AL_v2_3915175.fa.genscan.pep.1 31960 Komugi_ AJ133603_1 31960
IWGSC_3B_v1_10586963.fa.genscan.pep.1 31961 IWGSC_5DS_v1_2734070.fa.genscan.pep.1 31961 IWGSC_2BS_v1_5247743.fa.genscan.pep.3 31961
>Komugi_AJ133603_1 AJ133603
7209247 [Triticum aestivum]
Triticum aestivum mRNA for alpha-
gliadin storage protein, clone alpha-9
MVRVTVPQLQPQNPSQQQPQEQ
VPLVQQQQFLGQQQPFPPQQPYP
QPQPFPSQQPYLQLQPFPQPQLP
YSQPQPFRPQQPYPQPQPQYSQP
QQPISQQQQQQQQQQQQQQQQ
QQQQQQQILQQILQQQLIPCMDV
VLQQHNIVHGRSQVLQQSTYQLL
QELCCQHLWQIPEQSQCQAIHNV
VHAIILHQQQKQQQQPSSQVSFQ
QPLQQYPLGQGSFRPSQQNPQAQ
GSVQPQQLPQFEEIRNLALQTLPA
MCNVYIPPYCTIAPFGIFGTNYR
Query : L31965
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Criteria for database search:
(1) Qualitative – amino acid composition (occurrence of QQQ etc) consistent
with being co-extracted with LMW-glutenins (gliadins removed before-
hand)
(2) Quantitative – molecular weight within 10 dalton
Query : L32831
IWGSC_4BL_v1_6996674.fa.genscan.pep.4 31980
Solomon_Q8H0J4_WHEAT 31934
Solomon_B2ZRD2_WHEAT 32829
>Solomon_B2ZRD2_WHEAT B2ZRD2
SubName: Full=Alpha-gliadin; [Triticum
aestivum (Wheat).]
MKTFLILALLAIVATTATTAGRVPVPQL
QPQNPSQQQPQEQVPLVQQQQFLGQ
QQPFPPQQPYPQPQPFPSQQPYLQLQP
FPQPQLPYSQPQPFRPQQPYPQPQPQY
SQPQQPISQQQQQQQQQQQQQQQEQ
QILQQILQQQLIPCMDVVLQQHNIAH
GRSQVLQQSTYQLLQELCCQHLWQIP
EQSQCQAIHNVVHAIILHQQQKQQQQ
PSSQFSFQQPLQQYPLGQGSSRPSQQN
PQAQGSVQPQQLPQFEEIRNLALQTLP
AMCNVYIPPYCTIAPFGIFGTN
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
This analysis suggests that there are probably
more genetic loci for major seed storage proteins
than we have found to date.
Genome sequencing and proteome analyses,
combined with genetic mapping can define these
new loci and provide molecular markers for
breeding and selection.
Chromosome 7A
L32831 L31965
It turns out that a 1980 report did find
LMWG/gliadins on 4B and 7A
Salcedo G, Prada J, Sanchez-Monge R,
Aragoncillo C (1980). Aneuploid analysis of low
molecular weight gliadins from wheat. Theor
Appl Genet 56 ; 65-69
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
The “hits” on chromosome 7A will be resolved
as we have now started to sequence this
chromosome, as a national project in Australia.
This is part of the International Wheat
Genome Sequencing Consortium (IWGSC) in
which different countries around the world are
doing a chromosome each.
Chromosome 7A
L32831 L31965
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
• Genome sequencing and high resolution genetic maps of wheat
• Integrating new wheat protein level analyses
• Translating research findings to industry – the Decision Matrix
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
The Wheat Proteome database:
Motivation : wheat genome, transcriptome and proteome studies are now advanced
and need a reference proteome database for
• annotating the genes in the wheat
• assigning peptides, obtained from high level proteomic analyses, to wheat proteins
Content of proteins/peptides:
• wheat/Triticum entries from SwissProt, UniProt, TrEMBL (2,690)
• translation from the KOMUGI full-length cDNA collection (13,717)
• peptides from INRA (France), USDA (USA), CNU (China) labs (still sorting out a
final non-redundant set)
• IWGSC-genome-wide-sequence (GWS) gene model translations (144,920)
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
The Wheat Proteome database:
(1) Translations of conserved genes.
The IWGSC-GWS database for each chromosome arm typically identifies 4000-9000
genic sequences per chromosome. These include gene fragments and pseudogenes.
Following their identification, genes conserved between wheat, Brachypodium, rice,
sorghum and barley (Klaus Mayer “chromosome zipper”) can be clustered into
syntenic groups.
(2) Non-redundant proteins/wheat known to originate from wheat
30-40% of the gene complement in wheat and barley do not reside in the conserved
syntenic gene order space
All genes and protein/peptide sequences need to be anchored to the IWGSC-GWS
chromosome arms DNA sequences. So far only 205 KOMUGI translations and 6 from
the SwissProt/UniProt/TrEMBL dataset have been anchored to the IWGSC-GWS
translations so there is quite a bit of curation to carry out.
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
• Genome sequencing and high resolution genetic maps of wheat
• Integrating new wheat protein level analyses
• Translating research findings to industry – the Decision Matrix
To complete this presentation it
is important to consider
translating research findings to
industry.
(1) Further stream-lining of the
MALDI-TOF scoring of wheat
proteins
(2) Assigning a toxicity score to
specific proteins in considering
celiac and wheat allergy
reactions to wheat flour
The aim is to be able to enter
specific features of the wheat grain
as a number into a Decision Matrix
Feature
Genome
fingerprint
Gene
marker
Protein
marker
Other
traits
sele
ctio
n i
nd
ex v
alu
es
Weights assigned to features
For each breeding line
(matrix rows) the
feature score (matrix
columns) is multiplied
by the feature weight.
These are then added
to provide a selection
index (SI)
This SI is used to rank
breeding lines or
suitability for an end-
product in industry
(1) Further stream-lining of the MALDI-TOF scoring of wheat proteins we are following
the MALDIquant process described by Sebastian Gibb (IMISE, University of Leipzig)
Dean Diepeveen
1: raw 2: variance stabilization 3: smoothing
4: base line correction 5: peak detection 6: peak plot
(2) Assigning a toxicity score to specific proteins in considering celiac disease (CD) and
wheat allergy (WA) reactions to wheat flour
Proof of concept by Angla Juhasz and Frank Bekes carried on
the data set published by DuPont et al (2011)
Every protein in the wheat grain defined by DuPont et al
(2011) was assigned a toxicity score which is the result of the
amount of protein in the grain x the number of epitopes
present that are known to relate to CD and or WA
(2) Assigning a toxicity score to specific proteins in considering celiac disease (CD) and
wheat allergy (WA) reactions to wheat flour
Proof of concept by Angla Juhasz and Frank Bekes carried on
the data set published by DuPont et al (2011)
Every protein in the wheat grain defined by DuPont et al
(2011) was assigned a toxicity score which is the result of the
amount of protein in the grain x the number of epitopes
present that are known to relate to CD and or WA
Proteome bioinformatics and genetics for associating
proteins with grain phenotype
• Genome sequencing and high resolution genetic maps of wheat
• Integrating new wheat protein level analyses
• Translating research findings to industry – the Decision Matrix
The proteins of the wheat grain form a significant
phenotype in breeding, industry processing and
marketing, and will become more important in
defining the product