modeling struct function

Upload: martinmune

Post on 08-Jan-2016

275 views

Category:

Documents


0 download

DESCRIPTION

Modelling struct_function of proteins

TRANSCRIPT

  • Thushan S. Withana-Gamage

    food application relies on stepwise isolation and

    their structures and functions/properties can be narroweddown through three-dimensional (3-D) molecular model-

    Development of molecular modelling programs andtheir application has been formalized in designing new

    Trends in Food Science & TechnIntroductionThe conventional approach of discovering new protein for

    mics has the potential to inform knowledge of proteinfunction.* Corresponding author.

    0924-2244/$ - see front matter Crown Copyright 2012 Published by Ehttp://dx.doi.org/10.1016/j.tifs.2012.06.014ling based on homology models. Protein structure isclosely linked with protein function; the structural geno-Increasing global food protein demand drives research to im-

    prove existing sources for efficient use or to convert unconven-

    tional sources to mainstream protein ingredients. The structure

    at all levels is the most important intrinsic property that dic-

    tates suitability of a protein for food use. High throughput se-

    quencing has facilitated genome mapping of food plants at an

    accelerated pace, however, the information is poorly utilized

    in food protein research. Use of bioinformatics for data mining

    and molecular modelling in revealing structureefunction rela-

    tions is discussed using soybean glycinin as a model protein.

    This in silico approach is complementary to the process of un-

    derstanding food protein molecule structure, and linking mol-

    ecule physico-chemical properties with the functionalities that

    protein provides in food.and Janitha P.D.Wanasundaraa,b,*

    aAgriculture and Agri-Food Canada, Saskatoon SK,Canada S7N 0X2 (e-mails: [email protected].

    ca; [email protected])bDepartment of Food and Bioproduct Sciences,

    University of Saskatchewan, Saskatoon SK, Canada

    S7N 5A9 (Tel.: D1 306 956 7684; fax: D1 306 956

    7247; e-mail: [email protected])Molecular modelling

    for investigating

    structureefunction

    relationships of soy

    glycinina,blsevier LViewpoint

    purification, and generating functional property profiles un-der predetermined conditions applicable to foods. Moststudies on functional properties (FP) of food proteinshave dealt with screening protein sources in vitro and inmodel foods by a hit-or-miss approach. In this researcharea, the emphasis is on the extrinsic factors that governprotein functionality. For example, conditions involved inprotein processing, protein denaturation state, and other as-sociated components affecting protein properties are mostlyhighlighted. The link between the technologically valuableFP and innate properties of the protein at molecular andstructural level is less explored.

    Modelling structureefunction relations to quantify foodprotein functionalities using Quantitative Structure ActivityRelationship (QSAR) approach has evolved since Nakai(Nakai, 1983; Nakai & Li-Chan, 1988; Townsend &Nakai, 1983; Voustinas, Cheung, & Nakai, 1983;Voustinas, Nakai, & Harwalker, 1983) and other groupscontinue enhancing the capability of this area (Pripp,Isaksson, Stepaniak, Srhaug, & Ardo, 2005). Furthermore,Liebman (1998) evaluated data mining approaches to inves-tigate the relationships between structure and functions ofproteins for rational molecular design for directed uses.The development of structureefunction relationships offood proteins through molecular modelling approach re-viewed by Kumosinski, Brown, and Farrel (1991a, 1991b)used primary sequences of k-casein and as1-casein to gen-erate secondary and unrefined three dimensional structuresdemonstrating the ability of molecular modelling in solvingcertain structureefunction relations of these proteins rele-vant in food applications.

    Bioinformatics of post genomic era has revealed consid-erable information on plant proteins particularly related tounderstanding desirable quality traits of food crops, how-ever, these exponentially expanding data and tools havebarely been used to advance food protein research. Theknowledge gap on molecular biology of food proteins,

    ology 28 (2012) 153e167drugs and called computer assisted drug design (CADD)

    td. All rights reserved.

  • 154 T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167or computer assisted molecular design (CAMD). A basicprinciple of this research area is that the biological activityof a molecule is dependent on the three dimensional place-ment of specific functional groups. In this area of research,hypotheses associating structural properties with bioactiv-ities have been developed and validated in predicting prop-erties and activities of new chemical entities usingcomputational tools in conjunction with conventionalresearch techniques that examine structural properties ofexisting compounds. Food protein functionality researchcan take a similar approach in understanding FP of foodprotein molecules and rationalizing modifications to themolecules for enhanced functionalities.

    According to global per capita food consumption data,plant foods comprise 61.3% of dietary protein intake(FAOSTAT, 2010) indicating the value of plant proteinsas a food macromolecule. About 84.1% of dietary plantproteins originate from seeds (FAOSTAT, 2010) and in-clude cereals, legumes, oilseeds and nuts because of theirhigh protein density and abundant utilization in our food.Although the protein content of seeds varies considerably,a major fraction of proteins is stored as a source of C, N,and S for mobilization and utilization to support seedlinggrowth and these proteins are known as seed storage pro-teins; SSP (Derbyshire, Wright, & Boulter, 1976). In foodindustry, as the main protein source of seeds, SSP playa crucial role in satisfying the protein demand of the humanfood supply. It is estimated that on average, six units ofplant protein is required to produce one unit of muscle pro-tein (Pimentel & Pimentel, 2003), consequently, reachingonly w15% of protein and energy of animal feed cropsto human mouth (Aiking, 2011). Increase in the directuse of plant proteins in human food may help to overcomethis inefficiency. Under the current forecast for global fooddemand and supply, efficient use of plant proteins will beneeded to satisfy future protein requirements for food andfeed. This means the improved efficiency of plant proteinuse should be achieved at both levels of food consumption;before and during eating (involves desired functionalities infood products) and after eating (includes nutritional valueand safety).

    In this communication, we show the value of bioinfor-matics databases and tools in predicting important parame-ters of protein molecules related to food functionality andpossibility of using these tools and information to screenproteins for selected FP. In addition, the ability of compu-tational proteomics approach to investigate the relation-ships of SSP among food crops and to direct molecularstructure model selection is discussed.

    Among the plant food proteins and SSP, soybean glyci-nin has information ranging from the genes involved in ex-pressing protomers (subunits) of the multimeric protein,crystal structure, to FP of homotrimers and homohexamers(Fukushima, 1991; Tandang-Silvas, Tecson-Mendoza,Mikami, Utsumi, & Maruyama, 2011). In this communica-tion, we show that the 3-D structures of trimers andhexamers of soybean glycinin can be developed throughhomology modelling using available information on molec-ular structure and genetic relationship. We analyzed themost obvious characteristics of glycinin homology-basedstructures that may be useful in predicting physico-chemical properties and functionalities of hexameric andtrimeric forms, and compared them with the laboratorydata available in literature on similar glycinin moleculeforms.

    Homology modellingHomology or comparative modelling involves computa-

    tional procedures that can be employed in predicting the3-D structure of a protein using its amino acid sequenceas the target and solved homologous protein as the tem-plate. Homology modelling of the 3-D structure of an un-known protein based on evolutionary relationship withexperimentally determined structure is regarded asa high-throughput and low-resolution technique andhas successfully been incorporated in drug discovery pro-cess by screening many homology models in pharmaceuti-cal industry (Cavasotto & Phatak, 2009; Hillisch, Pineda, &Hilgenfeld, 2004; Maggio & Ramnarayan, 2001). Bordoliet al. (2009) describe step-by-step process of protein struc-ture homology modelling using SWISS-Model workspace.The review by Forster (2002) describes the value of 3-Dmodels and modelling techniques for exploring pro-teineligand and proteineprotein complexes. The ProFuncserver (http://www.ebi.ac.uk/thornton-srv/databases/ProFunc) helps in identifying the likely biochemical func-tion/s of a protein from its 3-D structure (Laskowski,Watson, & Thornton, 2005). To date, homology modellingof 3-D structure of food related protein and computer (insilico) prediction of protein characteristics in relation tofood use is limited to the few studies on allergenic proteins(Barre, Borges, & Rouge, 2005; Barre, Jacquet, Sordet,Culerrier, & Rouge, 2007; Cabanos et al., 2010; Schein,Ivanciuc, & Braun, 2007). Recently, we described on theexpected properties of cruciferin, the main SSP in Brassica-ceae family, using homology models (Withana-Gamage,Hegedus, Qiu, & Wanasundara, 2011).

    For the structureefunction or structureeproperty stud-ies, many food proteins are without experimentallyobtained structure details. X-ray crystallography and two-dimensional nuclear magnetic resonance (2-D NMR)spectroscopy have been the most employed techniques forprotein 3-D structure determination. However, the limita-tions of obtaining satisfactory crystals for X-ray analysisand low solubility of large molecules for NMR analysishave restricted structure availability of many food proteins.Therefore, structure modelling approaches offer a plausibleway to obtain 3-D structure of such food protein that has noavailable experimental data but closely related to a wellcharacterized protein. The de novo or ab initio techniqueis an alternative protein structure modelling approach andis template-independent. This approach mainly depends

  • 155T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167on the primary amino acid sequence and in silico folding ofproteins to its native state according to the free energy land-scape theory using computational algorithms (Bonneau &Baker, 2001). Application of ab initio method is still lim-ited to small protein molecules due to the complex calcula-tion; therefore, the application to SSP that have fewhundreds of amino acid residues is marginalized. Consider-ing the limitations of template independent modellingmethods, the homology based modelling offers an alterna-tive approach to understand and investigate the protein mol-ecule structure and the relationships of structural propertieswith the functions under defined conditions which could beuseful for food proteins. Such understanding will enable todesign plausible structural modifications that can be ex-tended to the genetic expression level to achieve desirablemolecular properties.

    Amino acid sequence, structure, and geneticrelationship of SSP

    The amino acid sequence of SSP provides a means ofunderstanding the relationship of similar groups of proteinsin different plants that have evolved from common ances-tral gene(s). A total of 723 sequences results from theterm seed storage proteins with protein knowledgebase(UniProtKB, 2010) search. Among these 723 clusters,477 amino acid sequences are categorized as nutrient res-ervoir activity according to Gene Ontology (GO) hierar-chy. Comparison of primary structure of SSP provideshomologies within each plant family and it is one of themost important considerations in the knowledge-basedstructure modelling for protein property prediction. Thephylogenetic tree constructed using interactive tree of life(iTOL) web server (Ciccarelli et al., 2006; Letunic &Bork, 2007, http://itol.embl.de/) for the above mentioned477 amino acid sequences after sequence alignment withClustalW 2.0 (Larkin et al., 2007, http://www.ebi.ac.uk/Tools/msa/clustalw2/) is provided in Fig. 1. A majority ofthese amino acid sequences is from the globulin family(Fig. 1, Ring a & b) that is widely distributed in cereals(mainly rice and oat), legumes, and oilseeds. The total pro-tein content of these seeds ranges from 10% (cereals) tow40% (legumes and oilseeds) (Shewry, Napier, &Tatham, 1995). Currently, the Protein Data Bank (PDB,http://www.pdb.org/pdb/home/home.do) contains molecu-lar structures of 35 SSP. Most of these SSP with elucidatedstructure are from the clade of eudicotyledons (Fig. 1, Ringd). Availability of such experimentally determined tertiaryand quaternary structures makes it possible to model upto hundreds of similar yet unknown proteins within eachhomology group or each fold-class (Maggio &Ramnarayan, 2001). The sequence identities (>30%match) of SSP that align with the primary sequence ofthe best available 3-D structure are illustrated in the Ringc of Fig. 1. Availability of matching templates is indicatedas filled columns in the Ring e of Fig. 1. Sequence identityvalue greater than 30% is found for most of the globulinsand albumins of eudicotyledons including food legumes(e.g. Cicer arietinum, Lupinus albus, Phaseolus vulgaris,Pisum sativum, Vicia faba), oilseeds (Ara h 3 of Arachis hy-pogaea, procruciferin of Brassica napus, A1bB2 andA2B1a subunits of Glycine max) and some cereal globulins(Avena sativa and Oryza sativa). However, 3-D structuresof prolamins with known primary sequences (i.e. gliadin/glutelin of Triticum aestivum, Hordeum vulgare and Avenasativa, zein of Zea mays, Oryza sativa, and Sorghum bi-color) of the Poaceae (Gramineae) family have yet to becharacterized experimentally (Fig. 1, Ring c). In the ab-sence of homologous structures, knowledge-based structureprediction of proteins such as in the prolamin superfamily,especially the high molecular weight proteins, is compli-cated. According to phylogenetic analysis of SSP shownhere, it is clear that 7S and 11/12S proteins have close re-latedness mostly owing to the sharing of common ancestor(Fig. 1, Ring a). Considering the available number of tem-plate structures and knowledge on structures of globulinfamily SSP, it is possible to build good quality homologymodels for globulins of many food crops.

    Knowledge-based structure predictionAmong the higher plants, sequencing of a total of four

    whole genomes and twenty draft assemble genomes hasbeen completed and another 76 genome sequencingprojects are in progress (The National Center forBiotechnology Information, NCBI, Entrez GenomeProject Database, http://www.ncbi.nlm.nih.gov/sites/entrez?dbgenomeprj). Recently, 1.1-gigabase size entiregenome of Glycine max has been sequenced usinga whole-genome shotgun approach (Schmutz et al.,2010). These projects have generated large number ofamino acid sequences but only partial information aboutthose genes and their products (i.e. proteins) is available.The structural information of proteins is required to under-stand their physiological (in-plant system) and physico-chemical (related to FP in food system) properties.According to the research collaboratory for structural bioin-formatics of protein data bank statistics (RCSB PDB; asa member of the wwPDB, http://www.wwpdb.org/, theRCSB PDB curates and annotates PDB data), 86.8% of pro-tein structures have been refined by X-ray crystallographyand only 12.4% have been determined by solution NMRspectroscopy with a rate of deposition in PDB around 23structures per day (March, 2011). In general, both structureresolution methods are expensive, slow, and difficult to beapplied for all proteins, especially on SSP. The microheter-ogeneity of most SSP is a barrier to obtain good quality ex-perimental structures and only 19 food-related SSP havebeen characterized to date. Fortunately, structure predic-tions using the evolution (i.e. homology modelling andthreading methods) and global free energy minimization(i.e. ab initio method) principles have narrowed the gap be-tween protein sequences and 3-D structures. In thehomology modelling process, a high quality structure can

  • 156 T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167be modelled with a template when sequence identity is over50%, and a sequence identity over 30% may result in a rea-sonable good structure. The accuracy of such modelledstructures is comparable to those obtained from medium-resolution NMR or low-resolution X-ray diffraction(Baker & Sali, 2001). Comparative models with good ste-reochemistry can be obtained from web-based modellingexpert systems such as SWISS-MODEL workspace(Bordoli et al., 2009) or installable computer software pro-grams such as MODELLER (Sali & Blundell, 1993).

    Homology or comparativemodelling uses the principle ofevolutionary conservation of primary structural features ofan unknown protein to a known molecule structure. Basic

    Fig. 1. The phylogenetic tree showing evolutionary relationships among storaceae in red, Fabaceae in yellow, Poaceae in blue, other species in black. Rin(green), 7S globulins (blue), 2S albumins (magenta), prolamin zein (red), andwhite) of each SSP to the best available template, Ring (d) Relationship of SSPscotyledon group (cyan). Ring (e) Bars represent the length of the sequence: fillsequences and few selected structures are shown outside of Ring e. Inset relUniProt database. The phylogenetic tree was generated using ClustalW 2.0 (

    teractive tree of life (iTOL) web server (Ciccarelli et al.,steps of homology model building for a protein with knownprimary sequence are shown in Fig. 2a. The process includes;the initial step of recognizing the known experimental struc-ture (i.e. template), alignment of target-template, buildingmodel in silico (including back bone, side chain, and loopgeneration), and finally refinement of modelled structure (in-cluding energy minimization and model validation) (Bordoliet al., 2009; Kopp & Schwede, 2004; Mart-Renom et al.,2000). The amino acid sequence of SSP of interest can be ob-tained from available protein databases (e.g. UniProtKB/TrEMBL) if the protein is sequenced and information is de-posited. The structures of protein molecules with close ho-mology to the target can be identified using a gapped

    ge proteins (SSP) of edible seeds. Ring (a) Major SSP clades: Brassica-g (b) The phylogeny of major plant protein families: 11e13S globulinsprolamin gliadin/glutelin (orange). Ring (c) Percentage identity (blackewith the plant clade and group: Eudicotyledon clade (pink) and mono-ed bar indicate experimentally determined structure is available for theates to Ring b and Ring e. Amino acid sequences were obtained fromLarkin et al., 2007, http://www.ebi.ac.uk/Tools/msa/clustalw2/) and in-2006; Letunic & Bork, 2007, http://itol.embl.de/).

  • BLAST or PSI-BLAST query (Altschul et al., 1997) againstthe template library such as PDB. A suitable template is se-lected based on two criteria; (i) percentage sequence identitybetween the target and the template sequence and (ii) the ex-perimental quality of the available solved structure (i.e. res-olution) (Bordoli et al., 2009). After selecting the besttemplate, the target sequence and the template has to bealigned to construct a 3-D model. Sometimes manual

    intervention may be required to minimize anymisalignment.In the next phase, if necessary, geometries of side chainpacking should be corrected by energy minimization usingforce-field approaches such as GROMOS96 (vanGunsteren et al., 1996) or CHARMM22 force fields(Brooks et al., 1983). The loop regions of the moleculemust be carefully optimized or even reconstructed usingthe programs like MODELLER (Sali & Blundell, 1993).

    ing sodellin, A1aerime twoof seqgrey

    157T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167Fig. 2. Process steps of homology modelling and its application for buildor comparative modelling of protein tertiary structure. (b) Structure modivided into two groups according to their homology; group I e A1bB2of two protomers A1aB1b of group I and A3B4 of group II has been expand quaternary structures of all five promoters were modelled using thesstructed disordered regions. (c) Bar charts show the homology in terms

    responding sequence (black andybean glycinin protomer models. (a) Major steps involved in homologyg of the five known subunits of soybean (Glycine max). Subunits areB1b, and A2B1a, and group II e A3B4 and A5A4B3. The 3-D structureentally determined (PDB codes: 1FXZ and 2D5F, respectively). Tertiarytemplates. Dark coloured loop areas in final protomers show the con-uence identity, sequence similarity, and gap of each subunit with cor-) and template (other colours).

  • 158 T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167The stereochemical quality of the modelled structures can beassessed using tools such as PROCHECK (Laskowski,MacArthur, Moss, & Thornton, 1993), Verify3D (Luthy,Bowie, & Eisenberg, 1992), and ProSA (Wiederstein &Sippl, 2007). As mentioned earlier, the alternative proteinstructure modelling approach ab initio technique is indepen-dent of template matching and conserved structure patternsof the molecule.

    Structure of soybean 12S proteinInformation on the protein structure and FP is available

    for the soybean glycinin (a 12S globulin) in literature(Maruyama et al., 2004; Prak et al., 2005; Tezuka,Yagasaki, & Ono, 2004). In addition, data on physico-chemical, thermal and techno-functional properties andmolecular structures are available for the identified fivesubunit or protomer variants of glycinin (Adachi,Kanamori, et al., 2003; Adachi, Okuda, et al., 2003;Adachi, Yagasaki, Gidamis, Mikami, & Utsumi, 2001;Fukushima, 1991; Maruyama et al., 1999, 2004; Praket al., 2005; Tezuka, Taira, Igarashi, Yagasaki, & Ono,2000; Tezuka et al., 2004). Glycinin, the major soybeanSSP has a hexameric quaternary structure. The five subunitvariants out of the 7 identified protomers may randomly as-semble to form the glycinin hexamer (Fukushima, 1991).Two groups of glycinin subunit variants have been identi-fied according to their homology; group I (A1aB1b,A2B1a, and A1bB2) and group II (A3B4 and A5A4B3)(Neilsen et al., 1989). Molecular structures of A1aB1b ofgroup I and A3B4 of group II have been determined usingX-ray crystallography at a resolution of 2.80 A (Adachiet al., 2001, PDB code: 1FXZ) and 1.90 A (Adachi,Kanamori, et al., 2003, PDB code: 2D5F), respectively.Three other glycinin protomers can be built using corre-sponding template of the respective group (Fig. 2b) utiliz-ing the high degree of homology found among them(Fig. 2c, Supplementary data Fig. S1). The hyper-variableregions (HVRs) of glycinin (Wright, 1987) have resultedin poor atomic density maps (Adachi, Kanamori, et al.,2003; Adachi et al., 2001) due to their molecular heteroge-neity, therefore the available crystal structures do not con-tain them. As a result, no structural or functionalinformation about these inserted regions can be extractedfrom the available glycinin template structures. When phys-iological (e.g. immunogenicity) or physico-chemical (e.g.hydration properties, electrochemical properties, chemicalreactivity) properties are concerned perhaps the loops in-cluding the disordered regions or HVRs are equally impor-tant. Kealley et al. (2008) have indicated that non-orderedand ordered structure domains of glycinin contribute differ-ently to the molecule structure mobility and rigidity.

    Structure-based prediction of physico-chemical andfunctional properties of glycinin

    The available data on structure, and functional andphysico-chemical properties of glycinin subunit variantshave obtained either from protein isolates of relevant mu-tant lines or microbial expression of cDNA sequences.The glycinin protomers A1bB2, A2B1a, and A5A4B3could be modelled without any HVRs to be consistentwith the available crystal structures of A1aB1b and A3B4subunits. These structures are available as Supplementarydata Fig. S2 and called as core-structures (hereafterreferred to as Modelcore throughout this communication).This referencing is similar to the explanation ofMaruyama et al. (1999) for the b-conglycinin subunits iso-lated from deletion mutants ac and a

    0c that are devoid of ex-

    tension regions or HVRs and designating them as coreregions. The loops can be constructed for all five glycininsubunit variants using MODELLER program which usesthe optimization-based approach (Fiser, Do, & Sali, 2000)and hereafter the structures with loops are referred to asModelcore HVR. The loop regions or HVRs(Supplementary data Fig. S1) involving over 12 aminoacid residues can be built or modelled using the step-by-step procedure of the MODELLER program (Sali &Blundell, 1993). According to Fiser et al. (2000), the loopscontaining 12 residues can be predicted using the MODEL-LER with an average accuracy of 2.61 0.16 A. The ste-reochemistry evaluation of loop regions using thePROCHECK and the Verify3D programs confirms these re-gions have been built without any serious errors (data notshown). Lack of higher order secondary structures in thedisordered loop regions (Adachi, Kanamori, et al., 2003)may cause flexible conformation (free flowing) in proteinregions. Therefore, rather than not having any of these re-gions in the molecule, it is better to include at least less-accurate loops to understand properties within the proteinstructure. Details of homology modelling for 11S proteinused in this study are explained in our previous communi-cation using 11S cruciferin (Withana-Gamage et al., 2011).

    The molecular structures of protomers of 5 defined gly-cinin subunit variants A1aB1b, A2B1a, A1bB2, A3B4, andA5A4B3 and their respective homotrimers generated with(Modelcore HVR) and without (Modelcore) loop regionsare used to explore physico-chemical properties and to un-derstand and predict structureefunction relations.

    Surface hydrophobicity and related propertiesThe hydropathy profile of a protein with known structure

    determined using the linear amino acid sequence gives littleinformation with respect to the overall hydrophobicity ofthe molecule at its tertiary structure or any higher level.The surface hydrophobicity (S0) of a protein plays an im-portant role in determining solubility, emulsifying andfoaming properties (Nakai, 1983) for food related systems.The surface hydrophobicity of a protein can be measured intwo ways; by its ability to bind small fluorescent moleculesuch as cis-parinaric acid (CPA) or 8-anilino-1-naphthale-nesulfonic acid (ANS), and to adsorb on to polymer mate-rials such as phenyl- or butyl-Sepharose generallydetermined using hydrophobic column chromatography

  • (the higher the surface hydrophobicity, the stronger the ad-sorption to the column).

    Glycinin protomers can be arranged in the descendingorder of average hydrophobicity (H) calculatedbased on primary amino acid sequence, as A1bB2 >A2B1a > A1aB1b > A3B4 > A5A4B3 (Table 1 &Supplementary data Table S1). When the hydrophobicityvalues of amino acid residues assigned according to thescale proposed by Kyte and Doolittle (1982) are plottedon the solvent accessible surface of the homotrimers ofthe Modelcore (Supplementary data Fig. S2), for all glyci-nin protomers (Table 1), relatively more hydrophobic res-idues on IE face (interchain SeS bond containing face;Adachi et al., 2001) can be observed than that of the IAface (intrachain SeS bonds containing face; Adachiet al., 2001). Among the Modelcore subunit variants, thenumber of hydrophobic residues on IA (36e47) and IE

    (52e59) faces is not very different (Table 1). However,the measured surface hydrophobicity values of glycininsubunits of both trimeric (proglycinin) and hexameric (ma-ture glycinin) form of soybean 11S are different(Maruyama et al., 1999; Prak et al., 2005; Tezuka et al.,2004).

    Multiple sequence alignments of group I and II glycininproteins and the elucidated structures of A1aB1b (PDBcode: 1FXZ) and A3B4 (PDB code: 2D5F) verify sixHVRs in group I and five HVRs in group II(Supplementary data Fig. S1). In glycinin group I, HRV-II(A1aB1b A2B1a: 18 and A1bB2: 15 residues) andHVR-V (A1aB1b: 48, A2B1a: 41, and A1bB2: 35 residues)are located on IE face, while HVR-III (A1aB1b A2B1a:20 and A1bB2: 22 residues) is located on IA face(Supplementary data Fig. S3). Among the group II glycinin,HVR-II (A3B4: 12 and A5A4B3: 13 residues), HVR-IV

    Table 1. Theoretical physico-chemical parameters of modelled glycinin subunits and experimental values reported for functional properties ofglycinin subunits.

    Property Subunit

    A1aB1b A1bB2 A2B1a A3B4 A5A4B3

    Formula C2333H3660N686O741S14

    C2377H3719N691O736S17

    C2363H3714N694O748S18

    C2333H3660N686O741S14

    C2765H4325N817O902S11

    Mra (KDa) 53.6 54.3 54.4 58.2 63.8Asx Glx (%) 30.1 28.3 30.0 28.2 30.3His Arg Lys (%) 12.4 11.3 10.7 12.8 13.9Acidic:Basic 2.4:1 2.5:1 2.8:1 2.2:1 2.2:1Ha 0.81 0.59 0.66 0.82 0.95pIa 5.78 6.01 5.46 5.52 5.17ASAb (A2) 24,515.0 23,547.4 23,018.2 26,034.1 26,247.2Pocket area of central chanel cavityc (A3) 2603.3 5639.8 4151.5 3478.0 3999.9Individual pocket openingc (A3) 477.6 (9) 883.0 (9) 990.6 (10) 644.3 (3) 708.9 (3)Proline residuesd (%) 6.1 5.6 5.4 7.2 6.7Hydrophobic residuese (%) 28.3 30.3 28.8 29.9 28.5Surface hydrophobic residuesf: IA face 36 43 37 41 39ModelCore IE face 54 52 56 59 58Surface hydrophobic residuesf: IA face 28 24 25 38 35ModelCore HVR IE face 34 31 38 50 38Number of eSH groups IA face 6 3 6 3 3

    IE face 6 6 6 3 3335769646

    usiners btructu

    ter vibunit

    159T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167Number of SeS groups IA face 3IE face 3

    Hydrophobic chromatography (min)g,h

    Hydrophobic chromatographyi (min) 67.0ANS binding So (at 30

    C)g,h

    Solubility (pH 4.8, m 0.05) g,h (%)Emulsifying propertiesg,h (mm)Emulsifying propertiesi (mm) 10.5Denaturation temperaturei (C) 78.1

    a H is the grand average hydrophobicity. pI and Mr are calculatedb Solvent-accessible surface area (ASA) was calculated for homotrimc Pocket area: size of the cavities around central channel of core-s

    and the number of openings is given in parenthesis.d No. of proline residues per single subunit.e Mol% of sum of Val, Pro, Leu, Ile, Phe, and Trp residues.f Surface exposed hydrophobic residues were counted manually afg A1aB1b, A1bB2 and A2B1a were reported together as Group I suh Maruyama et al., 2004 (glycinin hexamer form).i Prak et al., 2005 (proglycinin trimer form).3 3 33 3 3

    7.7 0.3 71.7 0.3 61.2 0.24.4 e 71.1 65.5

    9.5 128 95 30.8 3.1 2.37.3 19.4 11.7 18.85.1 73.3 78.0 73.9

    g 1sequence of the molecule.y rolling ball method with a radius of 1.4 A.res, in addition the size of the mouth opening of individual pocket

    sualizing the molecule with VMD software.s.

  • 160 T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167(the longest among glycinin protomers, A3B4: 73 andA5A4B3: 104 residues) and HVR-III (A3B4 A5A4B3:22 residues) are found on both IE and IA faces. Whenthe size of the disordered regions is considered, these re-gions can occupy a significant portion of molecular surfacearea of both IE and IA faces of protomers. According to thesurface properties of glycinin protomers with HVRs(Modelcore HVR; Fig. 3) and without HVRs (Modelcore;Supplementary data Fig. S2), drastically different hydro-phobic residue profiles are observed although most of theHVRs are deficient in hydrophobic residues. Maruyamaet al. (2004) and Tezuka et al. (2004) have reportedcontradicting surface hydrophobicity estimations forsoybean glycinin composed of similar type of subunitswhen the estimation method is changed from ANSbinding to hydrophobic column chromatography.According to ANS binding-based surface hydrophobicity,the glycinin homohexamers are in the decreasing order ofA5A4B3 > A3B4 > group I (A1aB1b A1bB2 A2B1a) (Tezuka et al., 2004), whereas this order is changedto A3B4 > A5A4B3 > group I when assessed by hydro-phobic column chromatography (Maruyama et al., 2004)(Supplementary data Table S1). Careful examination ofglycinin Modelcore HVR structures shows that the groupII has higher number of well-exposed hydrophobic residueson IA face than that of group I (Fig. 3 and Table 1). For allfive subunit variants, the third hypervariable region (HVR-III) is located on IA face, whereas HVR-IV and HVR-V arelocated on IE face of the trimeric protein molecules(Supplementary data Fig. S1 and Fig. S3). According tothe primary sequence of the glycinin subunits, the HVR-III region does not contain hydrophobic residues butHVR-V of group I and HVR-IV of group II contain consid-erable number of lipophilic residues and reside on the IEface (8e9 residues and 14e16 residues, respectively)(Supplementary data Fig. S1). However, the relatively shortHVR-Vof group I protomers may not get exposed when thehexamer is formed. The HVR-IV of the group II protomersmay protrude of the homohexamer molecule surface be-cause of its long chain length (Fig. 3). The number of hy-drophobic residues found in HVR-IV of group II is morethan those of other HVRs have. This protruded HVR-IVcan enhance the surface hydrophobicity of group II glyci-nin. The glycinin molecule composed of A5A4B3 subunitshas higher surface hydrophobicity than that of the A3B4counterpart assessed by ANS binding (Tezuka et al.,2004). An opposite trend (i.e. A3B4 > A5A4B3) hasbeen reported for the hydrophobicity values assessed usingphenyl- and butyl-Sepharose column chromatography(Maruyama et al., 2004). The discrepancies found in thesestudies may be related to the differences in the accessibilityof the molecular surface. The surface area of a protein mol-ecule that is exposed to the surrounding solvent is referredto as solvent accessible surface area (ASA) and can be es-timated by rolling ball method with a radius of 1.4 A (Lee& Richards, 1971). The ASA of glycinin Modelcore HVRare reported in Table 1. The hydrophobic residues and elec-trostatic potential of the glycinin trimers can be mapped onthe ASA (Fig. 3). The expansion of extension region ofHVR-III of A3B4 and A5A4B3 protomers is different.This is evident in the side view of the molecules (90 rota-tion of IE or IA face molecule; Fig. 3 iv and v). Further-more, the centre channel of the A3B4 homotrimer iscovered by the HVR-III (Fig. 3 iv) but the extension ofthis disordered region is not as great as HVR-III ofA5A4B3 (Fig. 3 v and Supplementary data Fig. 2 v). Thecentre channel of A5A4B3 trimer is not covered byHVR-III suggesting that accessibility of ANS is easierthan in the A3B4 molecule. Surface adsorption of proteinmolecules in the hydrophobic (Sepharose) column may beeasier in A3B4 due to the shorter HVR-III, on the otherhand, much relaxed and highly hydrophilic arms of HVR-III may sterically hinder the binding ability of A5A4B3hexamer via IA face to the hydrophobic column. High sur-face hydrophobicity reported for A1bB2 proglycinin (i.e. intrimer configuration and has the shortest HVRs among allfive protomers) that is determined by hydrophobic columnchromatography (Prak et al., 2005) may be due to the dif-ferences in HVR length.

    SolubilityThe solubility properties of a protein depend on the

    physico-chemical nature of the molecular surface.Moreover, protein solubility under a given set of conditionsis the thermodynamic manifestation of the equilibrium be-tween proteineprotein and proteinesolvent interactionsand relates to the net free energy changes due to the inter-action of hydrophobic and hydrophilic residues on the pro-tein surface with the surrounding solvent. Therefore, thedistribution of electrostatic surface potential (may relateto the salt binding sites) of a molecule and its surfacehydrophobicity are critical factors influencing solubilityproperties of a protein (Damodaran, 2008). For the Mod-elcore HVR glycinin homotrimers, we calculated the elec-trostatic surface potential by solving PoissoneBoltzmannequation using the Adaptive PoissoneBoltzmann Solver(APBS) (Baker, Sept, Joseph, Holst, & McCammon,2001) plug-in (developed by Michael G. Lerner, Universityof Michigan) of PyMol (Warren L. DeLano, DeLano Scien-tific, San Carlos, CA, http://www.pymol.org). Electrostaticsurface potentials of Modelcore of soybean protomers showgenerally slight positive (basic) charge on IE face than theIA face. Electrostatic surface potential of IA face of groupII homotrimers (i.e. A3B4 & A5A4B3, Supplementary dataFig. S1) shows a prominent negative charge and aligns wellwith the lowest value for acidic:basic residues (2.2:1)among the glycinin protomers (Table 1). Similar to surfacehydrophobicity, the surface electrostatic potential of glyci-nin Modelcore HVR shows remarkable differences whenmapped to the surface representation of homology models(Fig. 3). Generally, the HVRs are rich in acidic residues(Asx and Glx, Supplementary data Fig. S1) and may result

  • 161T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167in reduced intensity of positive charge on the IE face. Thelong HVRs with high number of polar residues and thedominant negative charge on both faces of homotrimersor homohexamers may lead to a high solubility in groupII proteins. Repulsion of proteineprotein molecules due

    Fig. 3. Surface characterization of developed soybean glycinin models witA1bB2, iii: A2B1a, iv: A3B4, & v: A5A4B3). Distribution of hydrophilic(1982) scale is represented in green (hydrophilic) and red (hydrophobic) on

    of molecular surfaces of glycinin models are indicated in colour ato negative charges may contribute to this property. Accord-ing to Prak et al. (2005), low ionic strength (m 0.08) re-sulted in precipitation of proglycinin protomers withincomplete solubility for A2B1a and A1bB2 when pHchanged from 5.7 to 6.7. In the same study, very low

    h HVRs (Modelcore HVR). (a) Surface hydrophobicity (i: A1aB1b, ii:and hydrophobic residues assigned according to Kyte and Doolittlethe solvent accessible surface of the models. (b) Electrostatic potentialnd the values range from 5 kT/e (blue) and 5 kT/e (red).

  • 162 T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167solubility has been observed at high ionic strength(m 0.5) for all proglycinin protomers except A1aB1b(60% soluble) for the pHs lower than 5.8. At pI and lowionic strength, neutralization of charged residues of the pro-tein surface may occur and hydrophobic interaction be-tween proteineprotein may result in reduced solubility.The high solubility of A1aB1b (Prak et al., 2005) may berelated to the less number of hydrophobic residues on sur-face. In the hexamer structure, solubility is a combinationof surface hydrophobicity and charge distribution of IAface because IE face of the trimers interacts to form thehexamer (Adachi, Okuda, et al., 2003) and may not be ex-posed to solvent phase. According to Maruyama et al.(2004), the solubility values are in the order of groupI > A3B4 > A5A4B3 of glycinin at m 0.5 which agreeswith the decreasing order of surface hydrophobic differ-ences explained here.

    Emulsion and foam formationThe emulsifying and foaming properties of proteins are

    related to molecular surface properties including surfacehydrophobicity, ligand binding ability, molecular flexibility,and structure stability (Damodaran, 2008; Kumosinski &Farrell, 1994; Prak et al., 2005; Voutsinas, Cheung, &Nakai, 1983). The solubility of a protein strongly relatesto the proteinesaltewater interactions and depends on hy-drophobicity and availability of salt-binding sites of themolecules (Kumosinski & Farrell, 1994) which in turn in-fluence emulsifying and foaming properties as well asheat induced gel formation. The ability to denature at theinterface (surface denaturation) and the molecular flexibil-ity are critical factors that contribute to the interfacial activ-ity of a protein (Damodaran, 2008). In addition, the degreeof exposure of hydrophobic residues influences the thermo-dynamic stability of the protein; the high number of ex-posed hydrophobic patches in the protein leads to higherthermal and interfacial denaturation susceptibility thanthose with more hydrophobic residues buried inside(Damodaran, 1994). The emulsifying ability of soybeanglycinin subunits has been reported in various studies(Supplementary data Table S1). The order of emulsifyingability of glycinin subunits reported by Maruyama et al.(2004) is similar to that of the order of hydrophobicityvalues obtained using ANS probe by Tezuka et al. (2004)than those resulted in using hydrophobic chromatography.The high number of hydrophobic residues on both IA andIE faces and exposed hydrophobic residues in HVRs ofA5A4B3 and A3B4 (Fig. 3) may have contributed to theirhigher emulsifying ability than that of group I protomers(A1aB1b, A1bB2, and A2B1a). The calculated value ofmain cavity area around the central channel of glycininModelcore HVR structures is in the decreasing order ofA1bB2 (5639.8 A3) > A2B1a (4151.5 A3) > A5A4B3(3999.9 A3) > A3B4 (3478.9 A3) > A1aB1b (2603.3 A3)and similar to the descending order of emulsifying abilityvalues reported by Prak et al. (2005) (Table 1). The sizeof the pocket opening also shows a similar pattern exceptfor A1bB2 and A2B1a (Table 1) indicating a potential rela-tionship of the central cavity area with emulsifying proper-ties of these proteins. Globulin Modelcore structures aregenerally compact and less flexible. The work ofMaruyama et al. (1999) on b-conglycinin devoid of exten-sion regions (ac and a

    0c) confirms the reduced flexibility of

    compact molecule indicating that the HVRs can remarkablyinfluence emulsifying ability. The order of proglycinin sub-units according to the residue length of the longest HVR onIE face (i.e. HVR-V in group I and HVR-IV in group II) is;A5A4B3 (104) > A3B4 (74) > A1aB1b (48) > A2B1a(41) > A1bB2 (35) (Supplementary data Fig. S1) andclosely follows the order of emulsifying ability of proglyci-nin except higher values for A1aB1b than A5A4B3 re-ported previously (Table 1, Prak et al., 2005). Thenumber of lipophilic entities of the HVRs of group I andII are 7e8 and 12e14 hydrophobic residues, respectively.It can be postulated that the lengthy HVRs on IE faceand considerably high number of lipophilic residues of pro-glycinin or dissociated product of hexamer glycinin (i.e. tri-mers) together may contribute to favourable interfacialactivities for emulsion formation than that of the glycininhexamer. According to Martin, Bos, and van Vilet (2002),glycinin in the 3S/7S form (at pH 3 due to dissociation of11S form) adsorbs much faster at the air/water interfacethan the 11S form (at pH 6.7) showing that the less compactstructure and exposure of favourable residues of the IE facemay affect interfacial activities of glycinin.

    Heat-induced gel formationHeat-induced gel formation or thermal gelation is an-

    other important property that proteins contribute to foodproducts such as processed meat, heat set gels, cakes, etc.The properties that glycinin contributes to the understand-ing of heat-induced gelation of SSP are well documented(Salleh et al., 2002; Tezuka et al., 2000, 2004; Utsumi,Matsumara, & Mori, 1997). According to Yamauchi,Yamagishi, and Iwabuchi (1991), heat-induced gel forma-tion of soy glycinin follows first thermal unfolding, thenassociation-dissociation of subunits, followed by aggrega-tion involving to certain extent, sulfhydryledisulphide ex-change. Partial denaturation of native protein structuredue to heating depends on the stability of the molecularstructure for increased thermal energy and is a prerequisitefor subsequent aggregation to form the protein gel network(Utsumi et al., 1997). In fact, the thermal stability of a pro-tein is related to the structural features such as cavity sizeof the molecule, number of proline residues (Fukuda,Maruyama, Salleh, Mikami, & Utsumi, 2008; Tandang-Silvas et al., 2010), occurrence of hydrophobic amino acids(Kumar & Nussinov, 2001), length of loop regions(Chakravarty & Varadarajan, 2002), and elimination of sur-face loops (Kumar & Nussinov, 2001). The calculated maincavity area around the central channel of glycininModelcore HVR is in the order of A1bB2

  • 163T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167(5639.8 A3) > A2B1a (4151.5 A3) > A5A4B3(3999.9 A3) > A3B4 (3478.9 A3) > A1aB1b (2603.3 A3)(Table 1). Studies on 7S (Fukuda et al., 2008) and 11S(Tandang-Silvas et al., 2011) globulins have demonstratedthat the proteins with large cavity size have low thermalstability, therefore the thermal stability of glycinin homo-trimers could be predicted in the descending order asA1aB1b > A3B4 > A5A4B3 > A2B1a > A1bB2. Thisis exactly the same order of thermal stability for soybeanproglycinin reported by Prak et al. (2005) at subunit levelssuggesting that the cavity size of a protein molecule isa good parameter to predict proteins thermal stability.The lesser number of proline residues in A1bB2 (5.6%)and A2B1a (5.4%) than A1aB1b, A5A4B3, and A3B4(6.1, 6.7 and 7.2%, respectively) may contribute furtherto thermal destabilization of A1bB2 and A2B1a homo-trimers (Table 1). The proteins with long loops are suscep-tible to heat induced denaturation than those with shorterloops (Chakravarty & Varadarajan, 2002; Kumar &Nussinov, 2001). Although the A3B4, A5A4B3 andA1aB1b have longer HVRs than other subunit variants, fea-tures such as high number of proline residues and smallcavity size may have negated the effect of loop length dif-ference on thermal stability. The type and stability of a ther-mally induced gel can be predicted by evaluating surfacehydrophobicity, charge distribution, disulphide/sulfhydryl(eSH/SeS) content and size of the cavities (Damodaran,2008; Shimada & Matsushita, 1980). Soybean glycinincontains two SeS bonds; one is interchain (between theacidic and basic chain, A1aB1b: Cys124Cys45, A1bB2:Cys314Cys64, A2B1a: Cys284Cys61, A3B4: Cy-s324Cys65, and A5A4B3: Cys334Cys66) and the otheris intrachain (within acidic chain, A1aB1b: Cy-s884Cys298, A1bB2: Cys1074Cys304, A2B1a: Cy-s1044Cys307, A3B4: Cys1084Cys385, and A5A4B3:Cys1094Cys351) (Supplementary data Fig. S3). Using di-sulphide bond-deficient mutants C12G and C88S of progly-cinin A1aB1b, Adachi and group revealed that thecontribution of inter- and intrachain disulphide bonds tothermal stability is low, particularly for the proglycininA1aB1b protomer (Adachi, Okuda, et al., 2003).

    The content of eSH and SeS bonds affects hardness ofheat-induced protein gel because of the disulphide bond ex-change that may occur during heating (Shimada &Matsushita, 1980; Tezuka et al., 2004). The HVR-V ofgroup I and HVR-IV of group II contain six (two per pro-tomer) and three (one per protomer) eSH residues on IEface, respectively (Supplementary data Fig. S2 andFig. S3). In the glycinin hexamer, the eSH residues maybe hidden inside the molecule and may not participate informing SeS bonds in the initial stage of heating. All gly-cinin trimers have three more eSH residues embedded inthe IA face with the potential to form disulphide bondswhen conditions are favourable (such as during heat in-duced aggregation). Three additional eSH residues arefound in group I subunit variants except A1bB2 protomer(Cys53 of A1aB1b and Cys69 of A2B1a) on the sheet Bof acidic chain (Supplementary data Fig. S1a) which areprotruded towards the periphery of the molecule(Supplementary data Fig. S3) suggesting their availabilityto form extra cross-links. Such molecular characteristicsof group I variants may contribute to higher gel strengththan that of group II. This rationalization is also in accor-dance with the reported higher breaking stress of the curdgenerated from group I variants than those of groups IIa(A5A4B3) and group IIb (A3B4) by Tezuka et al. (2004).The presence of high number of free eSH groups on IEface of group I variants and external conditions (e.g. pH,ionic strength, and temperature changes) that facilitate theopening of hexameric structure may further enhance SeSbond formation during aggregation and strengthen gelstructure.

    Opportunity of homology modelling in food proteinfunctionality studies

    Protein tertiary structure is a source of useful informa-tion for predicting functions and is widely used instructure-based methods for functional annotations(Kinoshita & Nakamura, 2003; Thornton, Todd, Milburn,Borkakoti, & Orengo, 2000). Global folds of proteins aswell as local structural motifs are important in structure-based prediction of biochemical functions. OBrien(1991) suggests that the physico-chemical properties, geo-metrical indices, topological indices and electrostatic prop-erties of protein molecule structure as four closely relatedcategories of molecular descriptors which can be employedto describe protein functional properties and biological ac-tivities. Similarly, the properties of protein structure and se-quence i.e. numerical parameters derived from surfacegeometry, sequence conservation, electrostatics, solvent ac-cessibility are used in predicting proteins likely active sitesfor biological functions (Laskowski et al., 2005; ProFunchttp://www.ebi.ac.uk/thornton-srv/databases/profunc/index.html). For example, ligand docking that is related to bind-ing site identification for predicting enzyme activity utilizesbinding pockets of a protein surface that are derived consid-ering residue conservation, compactness, convexity, protru-sion, rigidity, hydrophobicity and charge density (Ma et al.,2001; Rossi, Marti-Renom, & Sali, 2006). When the struc-tureeproperty relationships are concerned, the physicalproperties of protein as well as the structural determinantsof protein intermolecular forces are important. The mostimportant FP of food proteins such as formation of emul-sions, foams, gels and adhesive/cohesive films are relatedto the protein surface characteristics that include chargedensity, hydrophobicity, steric, and electrical forces. Theknowledge on quantitative characterization of interactionsof protein molecule in the chemical space (and biologicalspace) in terms of food functional properties is limited.Therefore choosing appropriate molecular descriptors isone of the challenges to derive structureeproperty relation-ships of food proteins.

  • odelsbasesible.ally vto w

    164 T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167In this communication we selected Asx Glx,His Arg Lys, S0, pI, solvent accessible surface area,pocket area of central cavity, size of individual pocketopening, number of Pro residues, total number of hydro-phobic residues, and number of hydrophobic residues,eSH residues and SeS groups separately of IE and IAfaces, as descriptors or indices of the molecular structureof glycinin variants generated through homology model-ling. The relationships and trends observed between thequantitative values of these molecular parameters and func-tional properties experimentally obtained for the same gly-cinin variants (from literature) show moderate predictivepower (Supplementary data Table S2) indicating the needof selecting most suitable molecular parameters and dataanalysis methods for enhanced predictive power. Identifica-

    Fig. 4. The proposed pathway for applying homology modelling and menormous amount of amino acid sequence information available in databased molecular structure models of several food proteins is quite possselected physico-chemical properties that are predictors of technologic

    and time comparedtion of best descriptors and predictive model developmentusing appropriate subset of descriptors may be achievedthrough regression analysis or pattern recognition tech-niques. In addition, digestibility and release of bioactive se-quences (by nicksite prediction for gastro-intestinalenzymes), flavour molecule binding (docking or ligandbinding), allergenicity prediction (epitope sequences, pro-teineprotein interaction), and designing modification sitesfor site-directed mutagenesis are possible with the mod-elled molecular structure.

    A focused and integrated approach is needed to link foodcrop genomics information to expressed proteins and thento their structure and properties required in food. Thereforebuilding interfaces between in vitro screening and biologi-cal/chemical modification will help to identify and selectproteins for efficient use. The proposed mechanism forthe homology structure-based pre-screening of food pro-teins for desired functional proteins and intervention stagesfor functionality modifications is depicted in Fig. 4. The ex-pected outcome is the efficient use of proteins synthesizedand deposited in plants, which is through a selection or en-hancement process at molecular level to perform desired bi-ological activity or functionality in a complex systemwhich is our food.

    ConclusionIn this paper, we show the possibility of using homol-

    ogy modelling to predict structure and physico-chemicalproperties of glycinin at the molecular level in the in sil-ico platform as an approach to understand and investigateproperties that are important in processing functionality.Although functional properties of food proteins are atmacroscopic length scale, the structure related propertiesof constituting molecules largely contribute to these.Homology modelling allows to predict 3D-structure

    in screening food proteins for desired functional properties. With theand the limited number of 3-D structures, development of knowledge-Initial screening of proteins at molecular or subunit levels in silico foraluable functional properties can be achieved at a fraction of the costet-lab techniques.based on genetic relationship or related proteins that arewell studied. This communication shows one of theways that food protein scientists can utilize bioinfor-matics (emphasis on homology modelling) to screen orinvestigate suitability of a protein for specific functional-ities needed in food. This approach resembles designingof drugs in pharmaceutical and medicinal chemistry.This proposed approach indeed requires proper validationwith well-defined food proteins and appropriate in vitrodata for FP. Homology modelling allow to derive molec-ular structure of a protein of interest and structure prop-erties can be investigated to obtain physico-chemicalproperties of the molecule that are important in process-ing functionality. Therefore, homology modelling can becomplementary to the existing approaches of food proteinstructure and function prediction.

    AcknowledgementsThis work is supported by the Agriculture and Agri-

    Food Canada (AAFC) funded project RBPI 1827.

  • calculations. Journal of Computational Chemistry, 4, 187e217.Cabanos, C., Tandang-Silvas, M. R., Odijk, V., Brostedt, P., Tanaka, A.,

    165T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167Utsumi, S., et al. (2010). Expression, purification, cross-reactivityand homology modeling of peanut profilin. Protein Expression andPurification, 73, 36e45.

    Cavasotto, C. N., & Phatak, S. S. (2009). Homology modeling in drugdiscovery: current trends and applications. Drug Discovery Today,14, 676e682.SaskCanola is acknowledged for Dr. Roger RimmerGraduate scholarship provided to T. S. Withana-Gamage.We thank Dr. B. Dave Oomah of AAFC Summerland, BCfor his valuable suggestions and critical reading of themanuscript.

    Supplementary data

    Supplementary data related to this article can be foundonline at http://dx.doi.org/10.1016/j.tifs.2012.06.014.

    References

    Adachi, M., Kanamori, J., Masuda, T., Yagasaki, K., Kitamura, K.,Mikami, B., et al. (2003). Crystal structure of soybean 11Sglobulin: glycinin A3B4 homohexamer. Proceedings of theNational Academy of Sciences of the United States of America,100, 7395e7400.

    Adachi, M., Okuda, E., Kaneda, Y., Hashimoto, A., Shutov, A. D.,Becker, C., et al. (2003). Crystal structures and structural stabilitiesof the disulfide bond-deficient soybean proglycinin mutants C12Gand C88G. Journal of Agricultural and Food Chemistry, 51,4633e4639.

    Adachi, M., Yagasaki, K., Gidamis, A. B., Mikami, B., & Utsumi, S.(2001). Crystal structure of soybean proglycinin A1aB1bhomotrimer. Journal of Molecular Biology, 305, 291e305.

    Aiking, H. (2011). Future protein supply. Trends in Food Science &Technology, 22, 112e120.

    Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z.,Miller, W., et al. (1997). Gapped BLAST and PSI-BLAST: a newgeneration of protein database search programs. Nucleic AcidsResearch, 25, 3389e3402.

    Baker, D., & Sali, A. (2001). Protein structure prediction and structuralgenomics. Science, 294, 93e96.

    Baker, N. A., Sept, D., Joseph, S., Holst, M. J., & McCammon, J. A.(2001). Electrostatics of nanosystems: application to microtubulesand the ribosome. Proceedings of the National Academy ofSciences of the United States of America, 98, 10037e10041.

    Barre, A., Borges, J.-P., & Rouge, P. (2005). Molecular modelling of themajor peanut allergen, Ara h 1 and other homotrimeric allergensof the cupin superfamily: a structural basis for their IgE-bindingcross-reactivity. Biochimie, 78, 499e506.

    Barre, A., Jacquet, G., Sordet, C., Culerrier, R., & Rouge, P. (2007).Homology modelling and conformational analysis of IgE-bindingepitopes of Ara h 3 and other legumin allergens with a cupin foldfrom tree nuts. Molecular Immunology, 44, 3243e3255.

    Bonneau, R., & Baker, D. (2001). Ab initio protein structureprediction: progress and prospects. Annual Review of Biophysicsand Biomolecular Structure, 30, 173e189.

    Bordoli, L., Kiefer, F., Arnold, K., Benkert, P., Battey, J., & Schwed, T.(2009). Protein structure homology modeling using SWISS-MODEL work place. Nature Protocols, 4, 1e13.

    Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., States, D. J.,Swaminathan, S., & Karplus, M. (1983). CHARMM: a program formacromolecular energy, minimization, and dynamicsChakravarty, S., & Varadarajan, R. (2002). Elucidation of factorsresponsible for enhanced thermal stability of proteins: a structuralgenomics based study. Biochemistry, 25, 8152e8161.

    Ciccarelli, F. D., Doerks, T., von Mering, C., Creevey, C. J., Snel, B., &Bork, P. (2006). Toward automatic reconstruction of a highlyresolved tree of life. Science, 311, 1283e1287.

    Damodaran, S. (1994). Structureefunction relationship of foodproteins. In N. S. Hettiarachchy, & G. R. Ziegler (Eds.), Proteinfunctionality in food systems (pp. 1e37). New York: MarcelDekker.

    Damodaran, S. (2008). Amino acids, peptides and proteins. InS. Damodaran, K. L. Parkin, & O. R. Fennema (Eds.), Fennemasfood chemistry (pp. 217e329). Boca Raton: CRC Press.

    Derbyshire, E., Wright, D. J., & Boulter, D. (1976). Legumin andvicillin, storage proteins of legume seeds. Phytochemistry, 15,3e24.

    FAOSTAT (December, 2010). Food and Agricultural Organization,Statistics Division, FAOSTAT food balance sheets. http://faostat.fao.org/site/368/

    Fiser, A., Do, R. K. G., & Sali, A. (2000). Modeling of loops in proteinstructures. Protein Science, 9, 1753e1773.

    Forster, M. J. (2002). Molecular modelling in structural biology.Micron, 33, 365e384.

    Fukuda, T., Maruyama, N., Salleh, M. R., Mikami, B., & Utsumi, S.(2008). Characterization and crystallography of recombinant 7Sglobulins of Adzuki bean and structureefunction relationshipswith 7S globulins of various crops. Journal of Agricultural andFood Chemistry, 56, 4145e4153.

    Fukushima, D. (1991). Structures of plant storage proteins and theirfunctions. Food Reviews International, 7, 353e381.

    Hillisch, A., Pineda, L. F., & Hilgenfeld, R. (2004). Utility of homologymodels in the drug discovery process. Drug Discovery Today, 9,659e669.

    Kealley, C. S., Rout, M. K., Dezfoili, M. R., Strounina, E.,Whittaker, A. K., Appleqvist, I. A. M., et al. (2008). Structure andmolecular mobility of soy glycinin the solid state.Biomacromolecules, 9, 2937e2946.

    Kinoshita, K., & Nakamura, H. (2003). Identification of proteinbiochemical functions by similarity search using themolecular surface database eF-site. Protein Science, 12,1589e1595.

    Kopp, J., & Schwede, T. (2004). Automated protein structurehomology modeling: a progress report. Pharmacogenomics, 5,405e416.

    Kumar, S., & Nussinov, R. (2001). How do thermophilic proteinsdeal with heat? Cellular and Molecular Life Sciences, 58,1216e1233.

    Kumosinski, T. F., Brown, E. M., & Farrel Jr., H. M. (1991a). Molecularmodeling in food research. Trends in Food Science & Technology,2, 110e115.

    Kumosinski, T. F., Brown, E. M., & Farrel Jr., H. M. (1991b). Molecularmodeling in food research: applications. Trends in Food Science &Technology, 2, 190e193.

    Kumosinski, T. F., & Farrell Jr., H. M. (1994). Solubility of proteins:proteinesaltewater interactions. In N. S. Hettiarachchy, &G. R. Ziegler (Eds.), Protein functionality in food systems(pp. 39e77). New York: Marcel Dekker.

    Kyte, J., & Doolittle, R. F. (1982). A simple method for displaying thehydropathic character of a protein. Journal of Molecular Biology,157, 105e132.

    Larkin, M. A., Blackshields, G., Brown, N. P., Chenna, R.,McGettigan, P. A., McWilliam, H., et al. (2007). ClustalW andClustalX version 2. Bioinformatics, 23, 2947e2948.

    Laskowski, R. A., MacArthur, M. W., Moss, D. S., & Thornton, J. M.(1993). PROCHECK: a program to check the stereochemicalquality of protein structures. Journal of Applied Crystallography,26, 283e291.

  • 166 T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167Laskowski, R. A., Watson, J. D., & Thornton, J. M. (2005). ProFunc:a server for predicting protein function from 3D structure. NucleicAcids Research, 33(Web Server issue), W89eW93.

    Lee, B., & Richards, F. M. (1971). The interpretation of proteinstructure: estimation of static accessibility. Journal of MolecularBiology, 55, 379e400.

    Letunic, I., & Bork, P. (2007). Interactive Tree Of Life (iTOL): an onlinetool for phylogenetic tree display and annotation. Bioinformatics,23, 127e128.

    Liebman, M. N. (1998). Information: a renewable resource in theanalysis of protein structure and function. In D. J. Sessa, &J. L. Willett (Eds.), Paradigm for successful utilization of renewableresources (pp. 88e106). Champaign: AOCS Press.

    Luthy, R., Bowie, J. U., & Eisenberg, D. (1992). Assessment of proteinmodels with three-dimensional profiles. Nature, 356, 283e285.

    Ma, B., Kumar, S., Tsai, C.-J., Wolfson, H., Sinha, N., & Nussinov, R.(2001). Protein-ligand interactions: Induced fit. Encyclopedia ofLife Sciences. Chichester: John Wiley & Sons Ltd. http://dx.doi.org/10.1038/npg.els.0003140. http://www.els.net

    Maggio, E. T., & Ramnarayan, K. (2001). Recent developments incomputational proteomics. Structural Bioinformatics, 6,996e1004.

    Martin, A. H., Bos, M. A., & van Vilet, T. (2002). Interfacialrheological properties and conformational aspects of soy glycininat the air/water interface. Food Hydrocolloids, 16, 63e71.

    Mart-Renom, M. A., Stuart, A. C., Fiser, A., Sanchez, R., Melo, F., &Sali, A. (2000). Comparative protein structure modeling of genesand genomes. Annual Review of Biophysics and BiomolecularStructure, 29, 291e325.

    Maruyama, N., Prak, K., Motoyama, S., Choi, S.-K., Yagasaki, K.,Ishimoto, M., et al. (2004). Structureephysicochemical functionrelationships of soybean glycinin at subunit levels assessed byusing mutant lines. Journal of Agricultural and Food Chemistry,52, 8197e8201.

    Maruyama, N., Sato, R., Wada, Y., Matsumura, Y., Goto, H.,Okuda, E., et al. (1999). Structureephysicochemical functionalrelationships of soybean b-conglycinin constituent subunits.Journal of Agricultural and Food Chemistry, 47, 5278e5284.

    Nakai, S. (1983). Structureefunction relationships of food proteinswith an emphasis on the importance of protein hydrophobicity.Journal of Agricultural and Food Chemistry, 31, 676e683.

    Nakai, S., & Li-Chan, E. (1988). Hydrophobic interactions in foodsystems. Boca Raton: CRC Press.

    Neilsen, N. C., Dickinson, C. D., Cho, T. J., Thanh, V. H.,Scallon, B. J., Fischer, R. L., et al. (1989). Characterization of theglycinin gene family in soybean. The Plant Cell, 1, 313e328.

    OBrien, J. (1991). Molecular modeling in food research: an excitingfuture. Trends in Food Science & Technology, 2, 185e186.

    Pimentel, D., & Pimentel, M. (2003). Sustainability of meat-based andplant-based diets and the environment. American Journal ofClinical Nutrition, 78, 660Se663S.

    Prak, K., Nakatani, K., Katsube-Tanaka, T., Adachi, M., Maruyama, N.,& Utsumi, S. (2005). Structureefunction relationships of soybeanproglycinins at subunit levels. Journal of Agricultural and FoodChemistry, 53, 3650e3657.

    Pripp, A. H., Isaksson, T., Stepaniak, L., Srhaug, T., & Ardo, Y. (2005).Quantitative structure activity relationship modelling of peptidesand proteins as a tool in food science. Trends in Food Science &Technology, 16, 484e494.

    Rossi, A., Marti-Renom, M. A., & Sali, A. (2006). Localization ofbinding sites in protein structure by optimization of a compositescoring function. Protein Science, 15, 2366e2380.

    Sali, A., & Blundell, L. T. (1993). Comparative protein modelling bysatisfaction of spatial restraints. Journal of Molecular Biology, 234,779e815.

    Salleh, M. R. B. M., Maruyama, N., Adachi, M., Hontani, N., Saka, S.,Kato, N., et al. (2002). Comparison of protein chemical andphysicochemical properties of rapeseed cruciferin with those ofsoybean glycinin. Journal of Agricultural and Food Chemistry, 50,7380e7385.

    Schein, C. H., Ivanciuc, O., & Braun, W. (2007). Bioinformaticsapproaches to classifying allergens and predicting cross-reactivity. Immunology and Allergy Clinics of North America,27, 1e27.

    Schmutz, J., Cannon, S. B., Schlueter, J., Ma, J., Mitros, T., Nelson, W.,et al. (2010). Genome sequence of the palaeopolyploid soybean.Nature, 463, 178e183.

    Shewry, P. R., Napier, J. A., & Tatham, A. S. (1995). Seedstorage proteins: structure and biosynthesis. The Plant Cell, 7,945e956.

    Shimada, K., & Matsushita, S. (1980). Relationship betweenthermocoagulation of proteins and amino acid compositions.Journal of Agricultural and Food Chemistry, 28, 413e417.

    Tandang-Silvas, M. R. G., Fukuda, T., Fukuda, C., Prak, K.,Cabanos, C., Kimura, A., et al. (2010). Conservation anddivergence on plant seed 11S globulins based on crystal structures.Biochimica et Biophysica Acta, 1804, 1432e1442.

    Tandang-Silvas, M. R. G., Tecson-Mendoza, E. M., Mikami, B.,Utsumi, S., & Maruyama, N. (2011). Molecular design ofseed storage proteins for enhanced food physicochemicalproperties. Annual Review of Food Science and Technology, 2,59e73.

    Tezuka, M., Taira, H., Igarashi, Y., Yagasaki, K., & Ono, T. (2000).Properties of tofus and soy milks prepared from soybeans havingdifferent subunits of glycinin. Journal of Agricultural and FoodChemistry, 48, 1111e1117.

    Tezuka, M., Yagasaki, K., & Ono, T. (2004). Changes in characters ofsoybean glycinin groups I, IIa, and IIb caused by heating. Journalof Agricultural and Food Chemistry, 52, 1693e1699.

    Thornton, J. M., Todd, A. E., Milburn, D., Borkakoti, N., &Orengo, C. A. (2000). Form structure to function: approaches andlimitations. Nature Structural Biology, (Supplement), 11,991e994.

    Townsend, A.-A., & Nakai, S. (1983). Relationships betweenhydrophobicity and foaming characteristics of food proteins.Journal Food Science, 48, 588e594.

    Utsumi, S., Matsumara, Y., & Mori, T. (1997). Structureefunctionrelationships of soy proteins. In S. Damodaran, & A. Paraf (Eds.),Food proteins and their application (pp. 257e291). New York:Marcel Dekker Inc.

    van Gunsteren, W. F., Billeter, S. R., Eising, A. A., Hunenberger, P. H.,Kruger, P., Mark, A. E., et al. (1996). Biomolecular simulation: TheGROMOS96 manual and user guide. Zurich: vdf HochschulverlagAG an der ETH Zurich.

    Voutsinas, L. P., Cheung, E., & Nakai, S. (1983). Relationships ofhydrophobicity to emulsifying properties of heat denaturedproteins. Journal of Food Science, 48, 26e32.

    Voustinas, L. P., Nakai, S., & Harwalker, V. R. (1983). Relationshipsbetween protein hydrophobicity and thermal functional propertiesof food proteins. Canadian Institute of Food Science andTechnology, 16, 185e190.

    Wiederstein, M., & Sippl, M. (2007). ProSA-web: interactive webservice for the recognition of errors in three-dimensional structuresof proteins. Nucleic Acids Research, 35, W407eW410.

    Withana-Gamage, T. S., Hegedus, D. D., Qiu, X., &Wanasundara, J. P. D. (2011). In silico homology modeling topredict functional properties of cruciferin. Journal of Agriculturaland Food Chemistry, 59, 12925e12938.

    Wright, D. J. (1987). The seed globulins. In Hudson, B J. F. (Ed.).(1987). Developments in food proteins, Vol. 5 (pp. 81e157).London: Elsevier.

    Yamauchi, F., Yamagishi, T., & Iwabuchi, S. (1991). Molecularunderstanding of heat induced phenomena of soybean protein.Food Reviews International, 7, 283e322.

  • Websites (accessed on December, 2010)

    http://itol.embl.de/itol.cgi.http://modbase.compbio.ucsf.edu/modbase-cgi/index.cgi.http://swissmodel.expasy.org/.

    http://www.ebi.ac.uk/Tools/msa/clustalw2/.http://www.ncbi.nlm.nih.gov/bioproject/.http://www.pdb.org/pdb/home/home.do.http://www.proteinmodelportal.org/.http://www.pymol.org/.http://www.uniprot.org/.

    Put your research ahead of the curveExperience SciVerse - the new platform for ScienceDirect and Scopus users - with:

    Integrated search across ScienceDirect, Scopus and the scientific web, ranked by relevance and without duplication

    New applications that enhance search and discovery, allowing you to search methodologies and protocols, view search terms highlighted in full sentences and see the most prolific authors for search results

    Experience it for yourself at www.info.sciverse.com/p

    Open to accelerate science

    167T.S. Withana-Gamage, J.P.D. Wanasundara / Trends in Food Science & Technology 28 (2012) 153e167review

    Molecular modelling for investigating structurefunction relationships of soy glycininIntroductionHomology modellingAmino acid sequence, structure, and genetic relationship of SSPKnowledge-based structure predictionStructure of soybean 12S proteinStructure-based prediction of physico-chemical and functional properties of glycininSurface hydrophobicity and related propertiesSolubilityEmulsion and foam formationHeat-induced gel formationOpportunity of homology modelling in food protein functionality studiesConclusionAcknowledgementsAppendix A. Supplementary dataReferencesWebsites (accessed on December, 2010)