dealing with statistical interactions · labkeytm software, these estimates are obtained through...
TRANSCRIPT
Dealing with Statistical Interactions
in Plant Breeding within the LABKEYTM
software
Soula, Julie1 ; Vincourt, Patrick
2 ; Bernard, Jean-Marc
3 ; Bardet, Sébastien
1 ; Royer, Frédéric
3
1Doriane SAS, 31 av. Jean Médecin, 06000 Nice, France. 2Open-Source-Biology, 12 avenue V Segoffin, 31400 Toulouse, France.
3Biosearch Data Management, 39 bd Dubouchage, 06000 Nice, France
Conclusion
LABKEYTM
is currently developing tools to help the breeders in their aim to deal with various types
of interaction. From two practical examples, this study shows how this purpose is addressed in
LABKEYTM
developments.
Results and discussion: The PCA on interactions mainly exhibited a single important axis,
accounting for 20 % of the total variance. As shown in Figure 1.a, the drought stress index ap-
peared as not associated with the main component of GxE interaction. Actually, this drought index
is highly correlated with the variation of environment yield average (r2 = 0.28), and this explains
why it no longer impacts the GxE interaction. The soil depth, which is also a key parameter for the
SUNFLO model crop, is however highly correlated with the first principal component.
Figure 1.b shows how to detect particular adaptations of sunflower hybrids: some of them seem to
be more adapted to deep soil, while some others appear to be more adapted to soils not able to
provide water to the crop all along the development cycle.
This example shows how it is crucial, in order to decipher the GxE interaction, to record relevant
agronomic covariates.
References
1. Cadic E., Debaeke P., Langlade L., Grezes-Besset B., Pauquet J., Coque M., André T., Chatre S., Casadebaig P., Mangin B.,
Vincourt P. Phenotyping the response of sunflower (Helianthus annuus L.) to drought scenarios in multi-environmental trials for the
purpose of association genetics. Plant Animal Genome Int. Conf. 2012
2. Casadebaig P, Guilioni L, Lecoeur J, Christophe A, Champolivier L, Debaeke P (2011) SUNFLO, a model to simulate genotype-
specific performance of the sunflower crop in contrasting environments. AGRICULTURAL AND FOREST METEOROLOGY 151(2):
163-178
3. Soula J, Duminil T, Bardet S, Bernard JM, Royer F (2014) Consideration Of Field Heterogeneities In The Calculation Of Variety
Means From Agronomic Trial Data: Comparisons Between A Nearest Neighbor Adjustment Method And Experimental Designs Like
Lattices And Latinized Alpha-Designs. Plant Animal genome Conf., San Diego
4. Technow F, Schrag TA, Schipprack W, Bauer E, Simianer H, Melchinger AE. (2014) Genome Properties and Prospects of Ge-
nomic Prediction of Hybrid Performance in a Breeding Program of Maize. GENETICS 197 (4): 1343-135
5. Yates F. (1933). The analysis of replicated experiments when the field results are incomplete. Emp. Jour. Exp. Agr. 1, 129-142
* Acknowledgements
The experimental data for the GxE example were kindly provided by the companies (BIOGEMMA, RAGT, SOLTIS and SYNGENTA) members of the SUNYFUEL and
OLEOSOL Consortium, which benefited of financial supports from the French ANR, the French public funds for competitiveness clusters (FUI), the European Re-
gional Development Fund (ERDF), the Région Midi Pyrénées, the Departmental Board of Aveyron (France), and the Cities Cluster of Rodez (France).
II. Female x Male (FxM) cross design: predicting
SCA for unobserved crosses?
Material & Method: 25 female lines were crossed with 8 male lines to produce 90 hybrids, i.e.
with a high proportion of missing data. Each female line was crossed with a minimum of two male
lines. PCA was performed on the FxM interaction estimates using the same approach as de-
scribed above. We then used the SCA components on the first principal components to predict the
SCA for unobserved combinations.
Results and discussion: The first two components of SCA are accounting for the major part
of the {female * male} interaction (55% of the total variance). Based on these first two compo-
nents, there is a good adequacy (r2= 0.68) between the observed and calculated SCA for ob-
served combinations, as shown in Figure 2.a. We therefore propose to use the same pre-
dictor for unobserved combinations. In this way, we obtain a table of (positive and negative) pre-
dicted interactions: see Figure 2.b. As pointed out by Technow et al. (2014), hybrid breed-
ers are often testing the combining ability with a limited number and range of testers, thus limiting
the possibility to explore a wider range of combination for the detection of particular hybrid combi-
nation. This example shows how, with the same investment in hybrid production and experimenta-
tion, it becomes possible to explore a wider space of genetic diversity and to propose inferences
for unobserved combinations.
Figure 2. b: Predicted interaction values using the
first two principal components. Black: Predicted inter-
action values for tested hybrids. Red (negative),
green (positive): predicted interaction values for non
tested hybrids.
Figure 2.a: Modeling the specific combining abilities (SCA): the
first two components are accounting for 68% of the SCA varia-
tion on observed combinations. On the bottom right, males and
females parents on the first two principal components.
M1 M2 M3 M4 M5 M6 M7 M8 GCA-F
F1 -1,37 0,37 -1,09 2,56 1,01 -1,83 1,73 -1,38 2,49
F2 -0,04 0,03 -0,10 0,17 -0,02 -0,18 0,16 -0,03 -0,86
F3 -1,26 -0,07 0,45 0,19 1,90 1,15 -0,80 -1,55 -4,27
F4 0,03 0,04 -0,15 0,20 -0,13 -0,29 0,24 0,06 1,88
F5 -0,89 0,01 0,09 0,47 1,18 0,37 -0,19 -1,05 6,16
F6 0,58 -0,07 0,17 -0,64 -0,62 0,21 -0,25 0,64 -2,54
F7 -0,08 -0,02 0,07 -0,06 0,15 0,16 -0,12 -0,10 0,54
F8 6,88 -0,75 1,61 -7,10 -7,66 1,65 -2,33 7,69 2,11
F9 -0,50 0,12 -0,34 0,84 0,40 -0,56 0,54 -0,51 -1,00
F10 0,78 0,18 -0,74 0,58 -1,48 -1,62 1,26 1,04 -2,74
F11 -0,16 0,18 -0,63 1,04 -0,22 -1,19 1,03 -0,06 -2,70
F12 0,41 0,17 -0,66 0,70 -0,96 -1,37 1,10 0,61 -3,51
F13 1,06 -0,13 0,29 -1,16 -1,15 0,34 -0,43 1,18 8,82
F14 -0,06 -0,02 0,08 -0,08 0,14 0,17 -0,14 -0,09 3,11
F15 0,11 -0,39 1,37 -2,12 0,77 2,65 -2,25 -0,13 6,34
F16 -0,03 0,11 -0,37 0,57 -0,20 -0,71 0,61 0,03 -6,00
F17 1,22 -0,49 1,56 -3,16 -0,51 2,78 -2,52 1,12 -0,36
F18 -3,82 -0,75 3,22 -2,22 6,99 7,12 -5,49 -5,06 3,71
F19 0,47 0,56 -2,05 2,76 -1,96 -4,12 3,41 0,94 -5,00
F20 0,01 0,00 0,00 -0,01 -0,01 0,00 0,00 0,01 -7,10
F21 2,09 0,11 -0,70 -0,37 -3,12 -1,83 1,26 2,56 -2,41
F22 -3,03 0,65 -1,84 4,81 2,62 -2,93 2,89 -3,17 -0,85
F23 -0,02 -0,46 1,62 -2,41 1,10 3,17 -2,67 -0,33 4,54
F24 -2,23 0,61 -1,82 4,25 1,61 -3,08 2,91 -2,24 -3,21
F25 -0,16 0,02 -0,05 0,18 0,17 -0,06 0,07 -0,18 2,83
GCA-M -2,30 1,67 4,25 1,24 2,04 -2,63 -3,40 -0,87 29,22
One of the biggest challenges for plant breeders is to deal with interactions.
Indeed, they aim at identifying the specific adaptation of cultivars to a particu-
lar set of environments – with the hope also to detect a wide adaptation in
some cultivar. In the case of hybrid breeding, they also aim at identifying which
{female * male} combination is offering the most promising specific com-
bining ability (SCA). After the pioneer works of Tukey (1949) and Mandel
(1971), the AMMI (for “Additive main effects and multiplicative interaction”)
framework became very popular during the last decade. Based on two practi-
cal examples analyzed using the R software, this study is intending to show
how LABKEYTM
(Soula et al., 2014) will provide the users with tools helping
them to find out the key factors involved in Genotype * Environment
(GxE) interaction, and to predict specific combining abilities (SCA) for
non observed hybrid combinations.
The AMMI approach is modeling the result of the combination of two factors (Genotype and Envi-
ronment, or Female and Male, hereafter named I and J) as the sum of additive, main effects and
of multiplicative effects of both factors, those being actually calculated through a Principal
Component Analysis (PCA) on interaction estimates:
Where m is the grand mean, ai and bj are the main, additive effects of the factors I and J, and abij
represents their interaction: cik and djk are the coordinates of the component k of the interaction,
and Uk is the kth
component eigenvalue.
In the two examples described below, there were missing data in the two-way designs. In the par-
ticular case of the factorial cross design, this study is addressing the question of interaction pre-
Introduction
General framework
I. GxE case: understand the trends exhibited
by the PCA thanks to environmental covariates
Material & Method: 347 sunflower hybrids were tested in at least 8 of 16 {year * location * ex-
perimental conditions} combinations (3 years, France, irrigated/non irrigated; Cadic et al., 2012, ACK
). In each environment, yield estimates were obtained by a BLUP, taking into account the spa-
tial variation in the field when necessary. GxE interactions were obtained in subtracting the esti-
mates of additive effects of genotype and of environment from the observed yield. In the
LABKEYTM
software, these estimates are obtained through Yates' (Yates, 1933) method of estima-
tion of missing data, a method which implicitly assumes a null interaction between the unobserved
{genotype * environment} combinations. These estimates are also those produced by the
LSMEANS statement of the GLM procedure under SAS©.
A PCA was performed on the GxE interaction. Various environmental covariates, such a
drought stress index evaluated in each environment through the SUNFLO model crop
(Casadebaig et al., 2011) and soil depth, were then correlated with the main components of inter-
action.
Figure 1. a: Correlation circle of the 16 environments and
two environmental covariates (drought index and soil
depth) on the first two principal components.
Figure 1. b: Hybrids showing a high interaction with environments
on the first principal component. The other hybrids are represent-
ed by black dots. Arrows represent the 16 environments and the
soil depth (in green).