19 biplot analysis of multi-environment trial data

15
19 Biplot Analysis of Multi-environment Trial Data Weikai Yan and L.A. Hunt Department of Plant Agriculture, University of Guelph, Guelph, Ontario N1G 2W1, Canada Introduction Regional multi-environment trials (MET) are conducted every year for all major crops throughout the world, constituting a costly but essential step towards new crop geno- type release and cultivar recommendation. MET are essential because the presence of genotype–environment interaction (GE), i.e. differential genotype responses in different environments, complicates cultivar evalua- tion. Some important concepts, such as eco- logical region, ecotype, mega-environment, specific adaptation, stability, etc., all origi- nate from GE. Were there no GE, a single cultivar would prevail all over the world and a single trial would suffice for cultivar eval- uation (Gauch and Zobel, 1996). GE consti- tutes a major challenge to cultivar improve- ment, and MET data analysis constitutes an important aspect of plant breeding. Because of this, improvement in the methods used for MET data analysis should be of interest to the plant-breeding community. This chap- ter deals with the biplot method, which has been receiving attention in recent years. Utilities of multi-environment trial data analysis The primary objective of MET is, of course, to identify superior cultivars. The most common practice used to achieve this end is to compare the mean yield of genotypes across test environments (usually year– location combinations) represented in the MET. The validity of this practice is, however, based on the usually unstated assumption that the environments in the MET belong to a single mega-environment, defined as a group of locations in which the same set of cultivars perform best across a number of years. Although usually unstated, cultivar evaluation is always specific to single mega-environments. If the test environments are sufficiently heteroge- neous, the cultivars that are selected based on mean yield may not be the best in some of the test environments; in extreme cases, they may even not be the best in any of the environments. Thus, a second utility of MET data analysis, prior to cultivar evaluation, should be to investigate the relationships among the test environments and the possibility of mega-environment differentiation within the target environ- ment. Identification of mega-environments would allow exploitation of the GE that is repeatable across years. For a given mega-environment, geno- types should be evaluated for mean yield (or, in more general terms, mean performance) and stability across test environments. The ideal cultivar should be one that is both high-yielding and stable. Mean performance ©CAB International 2002. Quantitative Genetics, Genomics and Plant Breeding (ed. M.S. Kang) 289

Upload: others

Post on 02-May-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 19 Biplot Analysis of Multi-environment Trial Data

19 Biplot Analysis of Multi-environmentTrial Data

Weikai Yan and L.A. HuntDepartment of Plant Agriculture, University of Guelph, Guelph,

Ontario N1G 2W1, Canada

Introduction

Regional multi-environment trials (MET)are conducted every year for all major cropsthroughout the world, constituting a costlybut essential step towards new crop geno-type release and cultivar recommendation.MET are essential because the presence ofgenotype–environment interaction (GE), i.e.differential genotype responses in differentenvironments, complicates cultivar evalua-tion. Some important concepts, such as eco-logical region, ecotype, mega-environment,specific adaptation, stability, etc., all origi-nate from GE. Were there no GE, a singlecultivar would prevail all over the world anda single trial would suffice for cultivar eval-uation (Gauch and Zobel, 1996). GE consti-tutes a major challenge to cultivar improve-ment, and MET data analysis constitutes animportant aspect of plant breeding. Becauseof this, improvement in the methods usedfor MET data analysis should be of interestto the plant-breeding community. This chap-ter deals with the biplot method, which hasbeen receiving attention in recent years.

Utilities of multi-environment trialdata analysis

The primary objective of MET is, of course,to identify superior cultivars. The most

common practice used to achieve this endis to compare the mean yield of genotypesacross test environments (usually year–location combinations) represented inthe MET. The validity of this practice is,however, based on the usually unstatedassumption that the environments in theMET belong to a single mega-environment,defined as a group of locations in whichthe same set of cultivars perform bestacross a number of years. Although usuallyunstated, cultivar evaluation is alwaysspecific to single mega-environments. If thetest environments are sufficiently heteroge-neous, the cultivars that are selected basedon mean yield may not be the best in someof the test environments; in extreme cases,they may even not be the best in any ofthe environments. Thus, a second utilityof MET data analysis, prior to cultivarevaluation, should be to investigate therelationships among the test environmentsand the possibility of mega-environmentdifferentiation within the target environ-ment. Identification of mega-environmentswould allow exploitation of the GE that isrepeatable across years.

For a given mega-environment, geno-types should be evaluated for mean yield (or,in more general terms, mean performance)and stability across test environments. Theideal cultivar should be one that is bothhigh-yielding and stable. Mean performance

©CAB International 2002. Quantitative Genetics, Genomics and Plant Breeding(ed. M.S. Kang) 289

305Z:\Customer\CABI\A4265 - Kang\A4265 - Kang Rev #D.vpMonday, August 19, 2002 4:43:02 PM

Color profile: DisabledComposite 150 lpi at 45 degrees

Page 2: 19 Biplot Analysis of Multi-environment Trial Data

is simply the mean across all environments,whereas stability is a measure of variabilityacross environments. Most research hasfocused on quantification of stability, andnumerous stability measures have been pro-posed (Lin et al., 1986; Lin and Binns, 1994;Kang, 1998). For a given mega-environmentand parallel to cultivar evaluation, individ-ual test environments should be evaluatedfor their ability to provide data that allowfor discrimination among genotypes and, atthe same time, for the extent to which theyrepresent the target mega-environment.

The ultimate reason for differential sta-bility among genotypes and for differentialresults from various test environments isnon-repeatable GE. Since this type of GEcannot be effectively exploited, it must beavoided. A fourth utility of MET data analy-sis should be the development of a betterunderstanding of the causes of GE. Suchan understanding may help to avoid con-founding plant responses to specific and rareconditions with overall cultivar evaluation.

To summarize, MET data analysis should,and potentially can, fulfil four functions: (i)investigation of possible mega-environmentdifferentiation in the target environment; (ii)selection of superior cultivars for individualmega-environments; (iii) selection of bettertest environments; and (iv) development ofa better understanding of the causes of GE.An ideal MET data-analysis system shouldaccomplish all four tasks so that the infor-mation contained in the MET is maximallyexploited and utilized.

Visualization of multi-environment trial data

With the belief that ‘a picture is worth athousand words’, many attempts have beenmade to graphically present MET data. Thegeneral pattern of such a graphical displayof MET data is to plot the mean yieldof each genotype against a measure ofstability, which can be any parameter thatis listed in Lin et al. (1986), among others.

Another popular presentation of METdata is based on the Finlay and Wilkinson

(1963) model, in which the yield of eachgenotype is plotted against the mean yieldof each environment and in which eachgenotype is represented by a fitted straightline. Philosophically, this type of graphicaldisplay of MET data is very attractive, sinceit clearly indicates differential genotyperesponses to test environments. The prob-lem with this method is that the environ-mental means are not always a good, and arefrequently a poor, measure of environments,such that the fitted lines in most cases onlyaccount for a small fraction of the total GE(Zobel et al., 1988).

A visualization method that is similarto that of Finlay and Wilkinson (1963) butwhich explains more GE was developedby Gauch and Zobel (1997). In this method,the nominal yields of genotypes are plottedagainst the first interaction principal compo-nent (IPC1) scores of environments, so thateach genotype is represented by a line withthe mean yield as the intercept and thegenotype IPC1 score as the slope. Such a plotindicates the ‘which-won-where’ patterns ofthe data, provided that the IPC1 explainsmost of the GE.

The recently developed GGE-biplotmethod (Yan et al., 2000, 2001) providesa more elegant and useful display of METdata. It effectively addresses both theissue of mega-environment differentiationand the issue of genotype selection fora given mega-environment based on meanyield and stability. It also allows environ-ments to be evaluated just as well asgenotypes. In addition, it facilitates inter-pretation of GE as genotypic factor byenvironmental factor interactions (Yan andHunt, 2001). In the rest of the chapter, weshall describe the rationale and applicationsof the GGE-biplot methodology in MET dataanalysis.

The GGE-biplot Methodology

The GGE-biplot methodology consists oftwo concepts: biplot and ‘GGE’. Bothcomponents are discussed below.

290 W. Yan and L.A. Hunt

306Z:\Customer\CABI\A4265 - Kang\A4265 - Kang Rev #D.vpMonday, August 19, 2002 4:43:02 PM

Color profile: DisabledComposite 150 lpi at 45 degrees

Page 3: 19 Biplot Analysis of Multi-environment Trial Data

The concept of biplot

The concept of biplot was first proposed byGabriel (1971). The main ideas follow. Anytwo-way table or matrix X that contains nrows and m columns can be regarded as theproduct of two matrices A, with n rowsand r columns, and B, with r rows and mcolumns. Therefore, matrix X can always bedecomposed into its two component matri-ces A and B. If r happens to be 2, matrix Xis referred to as a rank-two matrix. Eachrow in matrix A has two values that can bedisplayed as a point in a two-dimensionalplot. Similarly, each column in matrix B hastwo values and can also be displayed as apoint in a two-dimensional plot. When boththe n rows of A and the m columns of B aredisplayed in a single plot, the plot is calleda ‘biplot’. Therefore, the biplot of a rank-twomatrix contains n + m points, as comparedwith n × m values in the matrix per se,and yet contains all the information of thematrix.

One interesting property of a biplot isthat each of the n × m values can be precisely

recovered by viewing the n + m pointson the biplot. Assume that we have three-genotype × three-environment data on yieldand that it is a rank-two matrix. After decom-position of the data into its two componentmatrices, the three genotypes and three envi-ronments can be presented in a biplot, asshown in Fig. 19.1. The yield of genotype i inenvironment j, Yij, can be recovered by thefollowing formula:

Y OE OG OE OPij j ij i j ij= =cos α

where OGi (or OGi) is the absolute distancefrom the biplot origin O to the marker ofthe genotype i, OE j (or OEj) is the absolutedistance from the biplot origin O to themarker of environment j, αij is the anglebetween the vectors OGi and OE j and OPij

(or OPij) = cos αijOGi is the projection ofthe marker of genotype i to the vector ofenvironment j. To compare yields of thethree genotypes in environment E1, we have

Y11 = (OE1) (cosα11)(OG1) = (OE1)(OP11)Y21 = (OE1) (cosα21)(OG2) = (OE1)(OP21)Y31 = (OE1) (cosα31)(OG3) = (OE1)(OP31)

Biplot Analysis of Multi-environment Trial Data 291

Fig. 19.1. The geometry of the biplot.

307Z:\Customer\CABI\A4265 - Kang\A4265 - Kang Rev #D.vpMonday, August 19, 2002 4:43:03 PM

Color profile: DisabledComposite 150 lpi at 45 degrees

Page 4: 19 Biplot Analysis of Multi-environment Trial Data

where OP11, OP21 and OP31 are the projec-tions of the markers of the genotypes on tothe vector or its extension of environmentE1. Since OE1 is non-negative and commonto all genotypes, comparisons among Y11,Y21 and Y31 can be performed by simplyvisualizing OP11, OP21 and OP31. In ourexample (Fig. 19.1), it is obvious that OP11 >OP21 > OP31, and therefore, Y11 > Y21 > Y31.Note that OP11 and OP21 are above average,whereas OP31 is below average, since cos α11

and cos α21 are positive whereas cos α31 isnegative.

Approximation of any two-way tableusing a rank-two matrix

A biplot is obviously an elegant display of arank-two matrix. In reality, however, it israre that a two-way data set is exactly arank-two matrix. Nevertheless, if a two-waydata set, e.g. the yield data of a number ofgenotypes tested in a number of environ-ments, can be approximated by a rank-twomatrix, the latter can then be displayedin a biplot (Gabriel, 1971). The process ofdecomposing matrix X into its componentmatrices A and B is called ‘singular valuedecomposition’ (SVD), the result of whichis r principal components (r equals thesmaller of n and m). If the first two principalcomponents (PC1 and PC2) explain a largeproportion of the total variation of X, X issaid to be sufficiently approximated by arank-two matrix and can be approximatelydisplayed in a biplot.

The concept of GGE

The concept of GGE originates from analy-sis of MET of crop cultivars. The yield of agenotype (or any other measure of genotypeperformance) in an environment is a mixedeffect of genotype main effect (G), environ-ment main effect (E) and GE. In normalMET, E accounts for 80% and G and GEeach account for about 10% of the totalvariation. For the purpose of cultivarevaluation, however, only G and GE are

relevant (Gauch and Zobel, 1996). Further-more, both G and GE must be consideredin cultivar evaluation: hence the term ‘GGE’(Yan et al., 2000). Simultaneous examina-tion of G and GE is, thus, an importantprinciple in cultivar evaluation.

Models for constructing a GGE biplot

The GGE biplot displays the GGE part of aMET data set. Compared with other typesof biplots, a GGE biplot has the advantagein that it: (i) displays most informationthat is relevant to cultivar evaluation;and (ii) displays only the information thatis relevant to cultivar evaluation. A GGEbiplot can be generated based on SVD of: (i)environment-centred data; (ii) environment-centred and within-environment standarddeviation-scaled data; and (iii) environment-centred and within-environment standarderror-scaled data.

Singular value decomposition ofenvironment-centred data

The model for a GGE biplot based on SVD ofenvironment-centred data is:

Y Yij j i j i j ij− = + +λ ξ η λ ξ η ε1 1 1 2 2 2 (19.1)

where:

Yij is the mean yield of genotype i inenvironment jY j is the mean yield across allgenotypes in environment jλ1 and λ2 are the singular values forthe first and second principalcomponents, PC1 and PC2,respectivelyξi1 and ξi2 are the PC1 and PC2 scores,respectively, for genotype iηj1 and ηj2 are the PC1 and PC2 scores,respectively, for environment jεij is the residual of the modelassociated with genotype i inenvironment j

To display the PC1 and PC2 in a biplot, theequation is rewritten as:

Y Yij j i j i j ij− = + +ξ η ξ η ε1 1 2 2* * * *

292 W. Yan and L.A. Hunt

308Z:\Customer\CABI\A4265 - Kang\A4265 - Kang Rev #D.vpMonday, August 19, 2002 4:43:04 PM

Color profile: DisabledComposite 150 lpi at 45 degrees

Page 5: 19 Biplot Analysis of Multi-environment Trial Data

where ξ λ ξin n in* = 1 2/ and η λ ηjn n jn

* = 1 2/ , withn = 1, 2. Although all scaling methods areequally valid, this method has the advan-tage that PC1 and PC2 have the same unit,which is the square root of the original unit,such as t ha−1 for yield. This property isimportant when genotypes are visuallyevaluated by both mean yield and stability(discussed later).

A GGE biplot is generated by plottingξ i1* and η j1

* against ξ i2* and η j 2

* , respectively.Although this type of biplot has been usedpreviously in MET data analysis (e.g. Cooperet al., 1997), methods for the utilization ofthe information contained in a biplot to itsfullest extent became available only recently(Yan, 1999; Yan et al., 2000).

Singular value decomposition of within-environment standard deviation-scaled data

The second model that can be used togenerate a GGE biplot is:

( )Y Y sij j j i j i j ij− = + +/ λ ξ η λ ξ η ε1 1 1 1 1 2 (19.2)

where sj is the standard deviation for geno-type means for environment j, and all otherparameters are the same as in Equation 19.1.This model removes the units of the dataand assumes an equal ability of all environ-ments to discriminate among genotypes,which may be an undesired propertyfor genotype–environment data analysis.It is useful for analysing genotype–traitdata, however, in which different traits usedifferent units.

Singular value decomposition of within-environment standard error-scaled data

The third model is based on:

( )Y Y zij j j i j i j ij− = + +/ λ ξ η λ ξ η ε1 1 1 1 1 2 (19.3)

where zj is the standard error for environ-ment j. Since zj can be estimated only withreplicated data, this model can only be usedwhen replicated data are available. It is pre-ferred for all types of two-way data whenreplicated observations are available, sinceit adjusts any heterogeneity among testers,which can be environments, traits, etc.

Alternative models for generatinga GGE biplot

To make sure that the abscissa of the GGEbiplot represents the mean yield of thegenotypes, Yan et al. (2001) proposed thefollowing model:

Y Y bij j j i i j ij− = + +α λ ξ η ε1 1 1 (19.4)

where αi is the main effect of genotypei, bj is the regression coefficient of theenvironment-centred yield of genotypes inenvironment j against the genotype maineffects, λ1 is the singular value for thePC1 from subjecting the residue of theregressions to SVD, ξi1 and ηj1 are thescores for genotype i and environment jon PC1, respectively, and εij is the residualassociated with genotype i in environmentj. The regressions bjαi in Equation 19.4correspond to PC1 in Equation 19.1, andthe PC1 in Equation 19.4 corresponds toPC2 in Equation 19.1. Equations 19.2 and19.3 can also have their counterparts ofEquation 19.4.

Biplots based on Equation 19.4 are moreinterpretative than Equation 19.1, since itsabscissa of the biplot represents exactlythe genotype main effects and, therefore,its ordinate is a measure of stability (orvariability, or GE). This advantage, however,is offset by the explanation of a smallerpercentage of the total GGE variation (Yanet al., 2001). A recent finding is that biplotsbased on Equation 19.1 can also be usedto approximately indicate the main effectsand stability of the genotypes through axisrotation (discussed later). Therefore, thealternative models will not be furtherdiscussed in this chapter.

Biplot Analysis of Multi-environmentTrial Data: an Example

This section exemplifies biplot analysis ofMET data using the 1993 Ontario winter-wheat performance trial data. Efforts will bemade to demonstrate how a GGE biplot canbe used to address the four major utilities ofMET data analysis.

Biplot Analysis of Multi-environment Trial Data 293

309Z:\Customer\CABI\A4265 - Kang\A4265 - Kang Rev #D.vpMonday, August 19, 2002 4:43:05 PM

Color profile: DisabledComposite 150 lpi at 45 degrees

Page 6: 19 Biplot Analysis of Multi-environment Trial Data

The steps in biplot analysis

The sample data are presented in Table19.1, which contains the mean yield of18 winter-wheat genotypes tested in nineOntario locations in 1993. The trials werereplicated four to six times at each location,

but we present only the mean data for thepurpose of illustration. Generating a GGEbiplot based on Equation 19.1 from Table19.2 data involves the following steps:

1. Centring the data, i.e. subtracting therespective environmental means from eachof the cells.

294 W. Yan and L.A. Hunt

Environments

Genotypes BH93 EA93 HW93 IN93 KE93 NN93 OA93 RN93 WP93 Mean

ANNARIAUGCASDELDIAENAFUNHAMHARKARKATLUCM12REBRONRUBZAV

4.54.44.74.74.45.23.44.95.05.24.33.24.13.34.44.93.84.2

4.24.84.64.74.64.54.24.74.74.74.53.03.93.94.74.75.04.7

2.92.93.13.43.53.02.74.43.53.62.82.42.32.43.73.03.43.6

3.13.53.53.93.93.83.24.03.43.83.42.43.72.83.63.93.43.9

5.95.76.16.25.86.65.35.56.05.96.14.24.64.66.26.14.86.6

4.55.25.05.35.45.04.35.84.95.35.34.35.25.15.15.35.34.8

4.45.04.74.25.24.04.24.25.03.94.93.42.63.33.94.34.35.0

4.04.43.94.94.14.34.15.14.54.54.14.15.03.94.24.34.94.4

2.72.92.63.52.82.82.03.62.93.33.22.12.92.62.93.03.43.1

4.04.34.24.54.44.43.74.74.44.54.33.23.83.54.34.44.34.5

Table 19.1. Yield data (t ha−1) of 18 genotypes in nine environments.

Genotypes PC1 PC2 Environments PC1 PC2

ANNARIAUGCASDELDIAENAFUNHAMHARKARKATLUCM12REBRONRUBZAV

−0.14−0.17−0.19−0.43−0.32−0.31−0.60−0.51−0.40−0.37−0.17−1.35−0.73−0.94−0.20−0.31−0.11−0.48

−0.44−0.22−0.41−0.31−0.23−0.07−0.51−0.79−0.22−0.39−0.32−0.18−0.86−0.10−0.06−0.05−0.42−0.37

RN93NN93WP93IN93HW93BH93EA93KE93OA93

0.190.440.540.660.770.970.761.110.85

−0.72−0.63−0.59−0.36−0.32−0.23−0.08−0.62−0.96

Table 19.2. PC1 and PC2 scores for each genotype and each environment used in constructing theGGE biplot.

310Z:\Customer\CABI\A4265 - Kang\A4265 - Kang Rev #D.vpMonday, August 19, 2002 4:43:05 PM

Color profile: DisabledComposite 150 lpi at 45 degrees

Page 7: 19 Biplot Analysis of Multi-environment Trial Data

2. Subjecting the environment-centreddata to SVD, which results in singular values– genotype and environment scores for eachof the n principal components, n being thenumber of environments. SVD is a complexmathematical operation that decomposes amatrix into two component matrices usingthe least-squares method. Fortunately, itbecomes a routine function in all majorstatistical analysis systems. The SASpackage (SAS Institute, 1996) has an SVDfunction in the IML or MATRIX procedure,so that performing the SVD of a matrixtakes no more than a single statement. ThePRINCOMP procedure of SAS, which per-forms principal-component analysis, givesoutputs in which the singular values are tiedwith the genotype (row) eigenvectors.3. Partitioning the singular value intogenotype and environment scores for eachof the principal components. Theoretically,the singular value can be partitioned in anyproportion, but symmetrical partitioning ispreferred because it results in the sameunits for both the genotype scores and theenvironment scores and for all principalcomponents.4. Plotting the PC1 scores against the PC2scores to generate a biplot. Biplots usingother principal components are also possi-ble. The plotting can be done using a spread-sheet, but the abscissa and ordinate must bedrawn to scale.5. Labelling the biplot with the genotypeand environment names, which can be avery tedious job.6. Adding supplementary lines to facili-tate visualization and interpretation of thebiplot.

As can be seen, although the biplot isan elegant tool for visualizing MET data,the process is tedious, if not difficult, evenfor well-trained biometricians. Fortunately,a Windows application, GGEbiplot, wasrecently created (Yan, 2001), which fullyautomated the biplot analysis process. Allbiplots presented below are the direct out-puts of this software. In these biplots,the genotypes are labelled with lower-case

letters and the environments with upper-case letters.

Visualizing the performance of differentgenotypes in a given environment

This is a direct application of the biplottheory described in Fig. 19.1 and associateddescriptions. To visualize the performanceof different genotypes in a given environ-ment, say, BH93, draw a line that passesthrough the biplot origin and the marker ofBH93; this may be called the BH93 axis. Thegenotypes will be ranked according to theirprojections on to the BH93 axis (Fig. 19.2).Thus, the order of yields of the genotypesin BH93 was: kat < m12 < ena < luc < ann< . . . < har ≈ cas < fun. The line passingthrough the biplot origin and perpendicularto the BH93 axis separates genotypes thatyielded below the mean (kat, m12, ena, lucand ann) from genotypes that yielded abovethe mean (all other genotypes) in BH93.

Visualizing the relative adaptation of a givengenotype in different environments

Analogous to the above, to visualize therelative performance of a given genotype,say, rub, in different environments, draw aline that passes through the biplot originand the marker of rub, which may be calledthe rub axis. The environments would beranked along the rub axis in the directiontowards the marker of rub (Fig. 19.3). Thus,the relative performance of rub in differentenvironments was: RN93 > NN93 > WP93 >IN93 > BH93 > EA93 > KE93 > OA93. Theline passing through the biplot origin andperpendicular to the rub axis separatesenvironments in which rub yielded belowthe mean (OA93, KE93 and EA93) fromenvironments in which rub yielded abovethe mean (all other environments, exceptBH93). Environment BH93 was right onthe perpendicular line, implying that rubyielded near the mean in BH93.

Biplot Analysis of Multi-environment Trial Data 295

311Z:\Customer\CABI\A4265 - Kang\A4265 - Kang Rev #D.vpMonday, August 19, 2002 4:43:06 PM

Color profile: DisabledComposite 150 lpi at 45 degrees

Page 8: 19 Biplot Analysis of Multi-environment Trial Data

Visual comparison of two genotypes indifferent environments

Biplot comparison of two genotypes is anextension of the basic biplot principle. Tocompare two genotypes, connect the twogenotypes to be compared, say, aug and rub,with a straight line (called a connector line)and draw a line that is perpendicular to the

connector line and passes through thebiplot origin (Fig. 19.4). This perpendicularline separates environments where augyielded better than rub from environmentswhere rub yielded better than aug. Thus,Fig. 19.4 indicates that aug was better thanrub in OA93, KE93, EA93 and BH93, andrub was better than aug in the other fiveenvironments. Based on the basic principle

296 W. Yan and L.A. Hunt

Fig. 19.2. Ranking of the genotypes based on their performance in environment BH93.

Fig. 19.3. Ranking of the environments based on the relative performance of genotype rub.

312Z:\Customer\CABI\A4265 - Kang\A4265 - Kang Rev #D.vpMonday, August 19, 2002 4:43:07 PM

Color profile: DisabledComposite 150 lpi at 45 degrees

Page 9: 19 Biplot Analysis of Multi-environment Trial Data

of biplot geometry described earlier, the twogenotypes would yield exactly the same inenvironments whose markers fall on theperpendicular line. If all environments fallon the same side of the perpendicular line,the genotype with the environments on itsside would yield better than the othergenotype in all environments. If the twogenotypes are spatially close, they are likelyto have yielded similarly in all or most ofthe environments.

Visual identification of the best genotype(s)for each environment

A further extended application of the biplotgeometry is to visually identify the highest-yielding genotypes for each of the environ-ments in a single step. For this purpose, thegenotypes that are located far away fromthe biplot origin are connected with straightlines so that a polygon or vertex hull isformed with all other genotypes containedwithin the vertex hull (Fig. 19.5). The vertexgenotypes in our example are fun, zav, ena,kat and luc. These genotypes are the mostresponsive genotypes; they are either thebest or the poorest genotypes in some or allof the environments. Perpendicular lines to

the sides of the vertex hull are drawn,starting from the biplot origin, to divide thebiplot into five sectors or quadrants, eachhaving a vertex genotype. The beauty of Fig.19.5 is this: the vertex genotype for eachquadrant is the one that gave the highestyield for the environments that fall withinthat quadrant. Thus, genotype fun gave thehighest yield in environments RN93, NN93,WP93, IN93, HW93, BH93 and EA93 andgenotype zav gave the highest yield inenvironments OA93 and KE93. The othervertex genotypes, i.e. ena, kat and luc, didnot give the highest yield in any of theenvironments. Actually, they were thepoorest genotypes in some or all of theenvironments.

Now we explain why the above state-ments are valid. According to the section‘Visual comparison of two genotypes in dif-ferent environments’, the line perpendicularto the polygon side that connects genotypesluc and fun facilitates the comparisonbetween luc and fun; fun yielded higher thanluc in all environments because all environ-ments are on the side of fun. Likewise, theline perpendicular to the polygon side thatconnects genotypes zav and fun facilitatesthe comparison between zav and fun;fun yielded higher than zav in seven

Biplot Analysis of Multi-environment Trial Data 297

Fig. 19.4. Comparison of the two genotypes aug and rub in different environments.

313Z:\Customer\CABI\A4265 - Kang\A4265 - Kang Rev #D.vpMonday, August 19, 2002 4:43:08 PM

Color profile: DisabledComposite 150 lpi at 45 degrees

Page 10: 19 Biplot Analysis of Multi-environment Trial Data

environments that fall into the fun sectorbecause they are on the side of fun. Withinthe fun sector, fun has the longest vector(distance from biplot origin to the marker ofa genotype); it therefore gave higher yieldsthan other genotypes in these seven environ-ments, for reasons discussed in the section‘Visualizing the performance of differentgenotypes in a given environment’. Collec-tively, fun gave the highest yield in envir-onments that fell in its sector. Using thesame reasoning, zav was the best genotype inenvironments KE93 and OA93.

Visualizing groups of environments

Another utility of Fig. 19.5 is that theenvironments are grouped based on thebest genotypes and we have two groupsof environments: KE93 and OA93 as onegroup, with zav being the highest-yieldinggenotype, and the other seven environ-ments as another group, with fun being thehighest-yielding genotype.

The environment groups suggest dif-ferent mega-environments. In our example,KE93 and OA93 represent eastern Ontarioand the other environments represent

western and southern Ontario. The hypothe-sis that eastern Ontario is a different mega-environment from the rest of Ontario forwinter-wheat production was tested andconfirmed using 1989–2000 Ontario winter-wheat performance trial data (Yan, 1999).Assuming two mega-environments, thevariance component for genotype–mega-environment interaction explained 80% ofthe total GE (Yan, 1999).

Visualizing the mean performance andstability of genotypes

Once mega-environments are defined, culti-var selection should be specific to individ-ual mega-environments. For a given mega-environment, genotypes are evaluated basedon mean performance (such as mean yield)and stability across environments. Assumingthat the nine environments in our examplebelong to a single mega-environment, a‘mean’ environment can be defined inthe biplot, using the mean-environmentPC1 and PC2 scores of all environments.The mean yield of the genotypes can thenbe approximated by nominal yields of thegenotypes in that mean environment.

298 W. Yan and L.A. Hunt

Fig. 19.5. The polygon view of the GGE biplot indicating the best genotype(s) in each environment andgroups of environments.

314Z:\Customer\CABI\A4265 - Kang\A4265 - Kang Rev #D.vpMonday, August 19, 2002 4:43:09 PM

Color profile: DisabledComposite 150 lpi at 45 degrees

Page 11: 19 Biplot Analysis of Multi-environment Trial Data

In Fig. 19.6, a line is drawn that passesthrough the biplot origin and the meanenvironment, which is marked by an ovalat its positive end. This line will be calledthe mean-environment axis. Another lineis drawn that passes through the biplotorigin and is perpendicular to the mean-environment axis. These two lines consti-tute ‘the mean-environment coordination’.

The projections of the genotypes to themean-environment axis approximate themean yield of the genotypes. Thus, the meanyield of the genotypes is in the followingorder: fun > cas ≈ har > . . . > rub > ann >luc > ena > m12 > kat. This order is highlyconsistent with the actual mean yield ofthe genotypes (Table 19.1). The parallel linesin Fig. 19.6 facilitate ranking the genotypesbased on their predicated mean yield. Sincethe biplot contains both G and GE and sincethe two axes of the mean-environmentcoordination are orthogonal, if projections ofthe genotypes to the mean-environment axisapproximate the mean yield of the geno-types, projections of the genotypes on to theperpendicular axis must approximate the GEassociated with the genotypes. The longerthe projection of a genotype, regardlessof direction, the greater the GE associatedwith the genotype, which is a measure

of variability or instability of the genotypeacross environments. Thus, the performanceof genotypes luc and fun is highly variable(less stable), whereas genotypes ron and rebare highly stable.

It should be pointed out that stabilityper se is not necessarily a positive factor.High stability is desirable only when associ-ated with a high mean yield. A genotypewith high stability is highly undesirable if itis associated with a low mean yield; it issimply a genotype that is consistently poor.It is even less desirable than genotypes withlow stability but high mean yield.

An ideal genotype is one that hasboth high mean yield and high stability. Thecentre of the concentric circles in Fig. 19.7arepresents the position of an ‘ideal’ geno-type, which is defined by a projection onto the mean-environment axis that equalsthe longest vector of the genotypes that hadabove-average mean yield and by a zeroprojection on to the perpendicular line(zero variability across environments). Agenotype is more desirable if it is closer tothe ‘ideal’ genotype. Thus, genotypes casand har are more desirable than genotypefun, even though the latter had the highestmean yield. The low-yielding genotypeskat, m12, luc, ena and ann, are, of course,

Biplot Analysis of Multi-environment Trial Data 299

Fig. 19.6. The mean-environment coordination showing the mean yield and stability of each of thegenotypes.

315Z:\Customer\CABI\A4265 - Kang\A4265 - Kang Rev #D.vpMonday, August 19, 2002 4:43:09 PM

Color profile: DisabledComposite 150 lpi at 45 degrees

Page 12: 19 Biplot Analysis of Multi-environment Trial Data

undesirable because they are far away fromthe ‘ideal’ genotype.

Visualizing the discriminating ability andrepresentativeness of environments

Although MET are conducted primarily forgenotype evaluation, they can also be usedin evaluating environments. An ideal envi-ronment should be highly differentiatingof the genotypes and at the same timerepresentative of the target environment.Assuming that the test environments usedin the MET are representative samples ofthe target environment, the ideal environ-ment should be located on the mean-environment axis. The centre of theconcentric circles represents the idealenvironment, which has the longest vectorof the test environments that had positiveprojections onto the mean environment axis(Fig. 19.7b). An environment is more desir-able if it is closer to the ‘ideal’ environment.Therefore, BH93, EA93, HW93 and IN93were relatively desirable test environments,whereas OA93 and RN93 were relativelyundesirable test environments.

Discussion and Conclusions

Strength of the GGE-biplot approach

The GGE-biplot approach graphically dis-plays genotype main effect and genotype–environment interaction of a MET, whichare the only two parts of yield variation thatare relevant to genotype evaluation andmega-environment identification. Assumingthat the GGE of a MET is sufficientlyapproximated by the first two principalcomponents, all individual genotype–environment relationships in the METshould be displayed by the GGE biplot.Such a biplot graphically addresses three ofthe four utilities of MET data analysis listedin the introduction of this chapter, namely:(i) investigating possible mega-environmentdifferentiation in the target environment; (ii)selecting superior genotypes for individual

mega-environments; and (iii) selectingbetter test environments. In addition, theGGE biplot also facilitates pairwise geno-type comparisons. The GGE biplot doesnot directly address the fourth utility of theMET data analysis, i.e. understanding thecauses of GE. To fulfil this task, informationother than yield per se is necessary. Oncesuch information is available, the genotypeand environment scores can be relatedto genotypic and environmental factors,so that the observed genotype–environmentinteractions can be explained in termsof interactions between genotypic factorsand environmental factors (Yan and Hunt,2001). Therefore, the GGE biplot is an idealapproach for MET data analysis.

Constraints of the GGE-biplot approach

All methods have their limitations. Thelimitations of the GGE biplot lie in fouraspects. First, it requires balanced data;secondly, it may explain only a smallportion of the total GGE; thirdly, it lacksa measure of uncertainty; and, fourthly,although elegant, GGE biplot analysis istedious to perform using conventionaltools. Now that the GGEbiplot software isavailable, the fourth constraint is no longeran issue. Once the data are prepared, allfunctions are just a ‘mouse-click’ away. Allthe figures presented in this chapter, alongwith many other options, are the directoutputs of this software.

Although quite common, unbalancedMET data are really not a problem of theGGE-biplot approach; they are a problem ofexperimental design and execution, whichcreate problems for all kinds of analyses. TheGGEbiplot software offers two options onthis problem. It allows generation of abalanced subset, which can be used inGGE-biplot analysis; alternatively, missingcells are automatically replaced by meanyields of the respective environments. Ineither case, unbalancedness means that partof the information contained in the datacannot be utilized effectively. Therefore,experimental design and execution should

300 W. Yan and L.A. Hunt

316Z:\Customer\CABI\A4265 - Kang\A4265 - Kang Rev #D.vpMonday, August 19, 2002 4:43:10 PM

Color profile: DisabledComposite 150 lpi at 45 degrees

Page 13: 19 Biplot Analysis of Multi-environment Trial Data

be improved to prevent missing cells as soonas possible.

The GGE biplot may explain only asmall proportion of the GGE when the geno-type main effect is considerably smaller thanthe GE and when the GE pattern is complex.In such cases, the GGE biplot consisting ofPC1 and PC2 may not be sufficient to explain

the GGE, even though the most importantpattern of the MET is already displayed. Toremedy this problem, the GGEbiplot soft-ware offers options for viewing biplots ofPC3 vs. PC4, PC5 vs. PC6, etc.

Unlike conventional approaches, whichallow calculation of probability for a par-ticular hypothesis, the GGE-biplot approach

Biplot Analysis of Multi-environment Trial Data 301

Fig. 19.7. Comprehensive evaluation of genotypes and environments. (a) Comparison of genotypes withthe ‘ideal’ genotype for both mean yield and stability. (b) Comparison of environments with the ‘ideal’environment based on both discriminating ability and representativeness of the target environment.

317Z:\Customer\CABI\A4265 - Kang\A4265 - Kang Rev #D.vpMonday, August 19, 2002 4:43:11 PM

Color profile: DisabledComposite 150 lpi at 45 degrees

Page 14: 19 Biplot Analysis of Multi-environment Trial Data

does not have a measure of uncertainty.Therefore, the GGE biplot is better used asa hypothesis-generator rather than as adecision-maker (Yan et al., 2001), andhypotheses based on biplots should betested using conventional statistical meth-ods. For example, biplots based on individ-ual years of Ontario winter-wheat perfor-mance trials suggested that eastern Ontariosites and other sites of Ontario belongto different mega-environments, and thishypothesis was tested and confirmed byvariance component analysis (Yan, 1999).Sometimes, the biplot distance of two geno-types, relative to the biplot size, may be suffi-ciently informative about the significance ofthe difference between two genotypes or twoenvironments.

Other applications of the GGE-biplotapproach

The GGE-biplot methodology was devel-oped for MET data analysis. It is a genericmethod, however. It has been successfullyused in analysing genotype–trait data (Yanand Rajcan, 2002), diallel-cross data (Yanand Hunt, 2002), host genotype–pathogenrace data (W. Yan, unpublished), etc. TheGGE-biplot methodology and the GGEbiplotsoftware described in this chapter shouldthus be useful for the graphical presentationof all types of two-way data that conformto an entry–tester data structure. A demoversion of the GGEbiplot software can bedownloaded at www.ggebiplot.com.

Acknowledgements

Dr Rich Zobel and Dr Hugh Gauch areacknowledged for their stimulating sugges-tions and critiques during the developmentof the GGE-biplot methodology. Dr PaulCornelius and Dr Jose Crossa are acknowl-edged for their valuable encouragement andediting of our first paper on the GGE-biplotmethodology.

References

Cooper, M., Stucker, R.E., DeLacy, I.H. and Harch,B.D. (1997) Wheat breeding nurseries, targetenvironments, and indirect selection forgrain yield. Crop Science 37, 1168–1176.

Finlay, K.W. and Wilkinson, G.N. (1963) Theanalysis of adaptation in a plant breedingprogram. Australian Journal of AgriculturalResearch 14, 742–754.

Gabriel, K.R. (1971) The biplot graphic display ofmatrices with application to principal com-ponent analysis. Biometrika 58, 453–467.

Gauch, H.G. and Zobel, R.W. (1996) AMMI analy-sis of yield trials. In: Kang, M.S. and Gauch,H.G. (eds) Genotype-by-Environment Inter-action. CRC Press, Boca Raton, Florida,pp. 1–40.

Gauch, H.G. and Zobel, R.W. (1997) Identifyingmega-environments and targeting genotypes.Crop Science 37, 311–326.

Kang, M.S. (1998) Using genotype by environmentinteraction for crop cultivar development.Advances in Agronomy 62, 199–252.

Lin, C.S. and Binns, M.R. (1994) Concepts andmethods for analysis regional trial datafor cultivar and location selection. PlantBreeding Reviews 11, 271–297.

Lin, C.S., Binns, M.R. and Lefkovitch, L.P. (1986)Stability analysis: where do we stand? CropScience 26, 894–900.

SAS Institute Inc. (1996) Version 6, SAS/STATUser’s Guide. SAS Institute, Cary, NorthCarolina.

Yan, W. (1999) Methodology of cultivar evalua-tion based on yield trial data – with specialreference to winter wheat in Ontario. PhDdissertation, University of Guelph, Guelph,Ontario, Canada.

Yan, W. (2001) GGEbiplot – a Windowsapplication for graphical analysis of multi-environment trial data and other types oftwo-way data. Agronomy Journal 93(5), 1111.

Yan, W. and Hunt, L.A. (2001) Genetic and envi-ronmental causes of genotype by environ-ment interaction for winter wheat yield inOntario. Crop Science 41(1), 19–25.

Yan, W. and Hunt, L.A. (2002) Biplot analysis ofdiallel data. Crop Science 41(1), 21–30.

Yan, W. and Rajcan, I. (2002) Biplot evaluation oftest sites and trait relations of soybean inOntario. Crop Science 41(1), 11–20.

Yan, W., Hunt, L.A., Sheng, Q. andSzlavnics, Z. (2000) Cultivar evaluation

302 W. Yan and L.A. Hunt

318Z:\Customer\CABI\A4265 - Kang\A4265 - Kang Rev #D.vpMonday, August 19, 2002 4:43:11 PM

Color profile: DisabledComposite 150 lpi at 45 degrees

Page 15: 19 Biplot Analysis of Multi-environment Trial Data

and mega-environment investigation basedon the GGE biplot. Crop Science 40(3),597–605.

Yan, W., Cornelius, P.L., Crossa, J. and Hunt,L.A. (2001) Two types of GGE biplots for

analyzing multi-environment trial data. CropScience 41(3), 656–663.

Zobel, R.W., Wright, M.J. and Gauch, H.G. (1988)Statistical analysis of a yield trial. AgronomyJournal 80, 388–393.

Biplot Analysis of Multi-environment Trial Data 303

319Z:\Customer\CABI\A4265 - Kang\A4265 - Kang Rev #D.vpMonday, August 19, 2002 4:43:11 PM

Color profile: DisabledComposite 150 lpi at 45 degrees