dealing with statistical interactions · labkeytm software, these estimates are obtained through...

1
Dealing with Statistical Interactions in Plant Breeding within the LABKEY TM software Soula, Julie 1 ; Vincourt, Patrick 2 ; Bernard, Jean-Marc 3 ; Bardet, Sébastien 1 ; Royer, Frédéric 3 1 Doriane SAS, 31 av. Jean Médecin, 06000 Nice, France. 2 Open-Source-Biology, 12 avenue V Segoffin, 31400 Toulouse, France. 3 Biosearch Data Management, 39 bd Dubouchage, 06000 Nice, France Conclusion LABKEY TM is currently developing tools to help the breeders in their aim to deal with various types of interaction. From two practical examples, this study shows how this purpose is addressed in LABKEY TM developments. Results and discussion: The PCA on interactions mainly exhibited a single important axis, accounting for 20 % of the total variance. As shown in Figure 1.a, the drought stress index ap- peared as not associated with the main component of GxE interaction. Actually, this drought index is highly correlated with the variation of environment yield average (r2 = 0.28), and this explains why it no longer impacts the GxE interaction. The soil depth, which is also a key parameter for the SUNFLO model crop, is however highly correlated with the first principal component. Figure 1.b shows how to detect particular adaptations of sunflower hybrids : some of them seem to be more adapted to deep soil, while some others appear to be more adapted to soils not able to provide water to the crop all along the development cycle. This example shows how it is crucial, in order to decipher the GxE interaction, to record relevant agronomic covariates. References 1. Cadic E., Debaeke P., Langlade L., Grezes-Besset B., Pauquet J., Coque M., André T., Chatre S., Casadebaig P., Mangin B., Vincourt P. Phenotyping the response of sunflower (Helianthus annuus L.) to drought scenarios in multi-environmental trials for the purpose of association genetics. Plant Animal Genome Int. Conf. 2012 2. Casadebaig P, Guilioni L, Lecoeur J, Christophe A, Champolivier L, Debaeke P (2011) SUNFLO, a model to simulate genotype- specific performance of the sunflower crop in contrasting environments. AGRICULTURAL AND FOREST METEOROLOGY 151(2): 163-178 3. Soula J, Duminil T, Bardet S, Bernard JM, Royer F (2014) Consideration Of Field Heterogeneities In The Calculation Of Variety Means From Agronomic Trial Data: Comparisons Between A Nearest Neighbor Adjustment Method And Experimental Designs Like Lattices And Latinized Alpha-Designs. Plant Animal genome Conf., San Diego 4. Technow F, Schrag TA, Schipprack W, Bauer E, Simianer H, Melchinger AE. (2014) Genome Properties and Prospects of Ge- nomic Prediction of Hybrid Performance in a Breeding Program of Maize. GENETICS 197 (4): 1343-135 5. Yates F. (1933). The analysis of replicated experiments when the field results are incomplete. Emp. Jour. Exp. Agr. 1, 129-142 * Acknowledgements The experimental data for the GxE example were kindly provided by the companies (BIOGEMMA, RAGT, SOLTIS and SYNGENTA) members of the SUNYFUEL and OLEOSOL Consorum, which benefited of financial supports from the French ANR, the French public funds for compeveness clusters (FUI), the European Re- gional Development Fund (ERDF), the Région Midi Pyrénées, the Departmental Board of Aveyron (France), and the Cies Cluster of Rodez (France). II. Female x Male (FxM) cross design: predicting SCA for unobserved crosses? Material & Method: 25 female lines were crossed with 8 male lines to produce 90 hybrids, i.e. with a high proportion of missing data. Each female line was crossed with a minimum of two male lines. PCA was performed on the FxM interaction estimates using the same approach as de- scribed above. We then used the SCA components on the first principal components to predict the SCA for unobserved combinations. Results and discussion: The first two components of SCA are accounting for the major part of the {female * male} interaction (55% of the total variance). Based on these first two compo- nents, there is a good adequacy (r2= 0.68) between the observed and calculated SCA for ob- served combinations, as shown in Figure 2.a. We therefore propose to use the same pre- dictor for unobserved combinations. In this way, we obtain a table of (positive and negative) pre- dicted interactions: see Figure 2.b. As pointed out by Technow et al. (2014), hybrid breed- ers are often testing the combining ability with a limited number and range of testers, thus limiting the possibility to explore a wider range of combination for the detection of particular hybrid combi- nation. This example shows how, with the same investment in hybrid production and experimenta- tion, it becomes possible to explore a wider space of genetic diversity and to propose inferences for unobserved combinations. Figure 2. b: Predicted interaction values using the first two principal components. Black: Predicted inter- action values for tested hybrids. Red (negative), green (positive): predicted interaction values for non tested hybrids. Figure 2.a: Modeling the specific combining abilities (SCA): the first two components are accounting for 68% of the SCA varia- tion on observed combinations. On the bottom right, males and females parents on the first two principal components. M1 M2 M3 M4 M5 M6 M7 M8 GCA-F F1 -1,37 0,37 -1,09 2,56 1,01 -1,83 1,73 -1,38 2,49 F2 -0,04 0,03 -0,10 0,17 -0,02 -0,18 0,16 -0,03 -0,86 F3 -1,26 -0,07 0,45 0,19 1,90 1,15 -0,80 -1,55 -4,27 F4 0,03 0,04 -0,15 0,20 -0,13 -0,29 0,24 0,06 1,88 F5 -0,89 0,01 0,09 0,47 1,18 0,37 -0,19 -1,05 6,16 F6 0,58 -0,07 0,17 -0,64 -0,62 0,21 -0,25 0,64 -2,54 F7 -0,08 -0,02 0,07 -0,06 0,15 0,16 -0,12 -0,10 0,54 F8 6,88 -0,75 1,61 -7,10 -7,66 1,65 -2,33 7,69 2,11 F9 -0,50 0,12 -0,34 0,84 0,40 -0,56 0,54 -0,51 -1,00 F10 0,78 0,18 -0,74 0,58 -1,48 -1,62 1,26 1,04 -2,74 F11 -0,16 0,18 -0,63 1,04 -0,22 -1,19 1,03 -0,06 -2,70 F12 0,41 0,17 -0,66 0,70 -0,96 -1,37 1,10 0,61 -3,51 F13 1,06 -0,13 0,29 -1,16 -1,15 0,34 -0,43 1,18 8,82 F14 -0,06 -0,02 0,08 -0,08 0,14 0,17 -0,14 -0,09 3,11 F15 0,11 -0,39 1,37 -2,12 0,77 2,65 -2,25 -0,13 6,34 F16 -0,03 0,11 -0,37 0,57 -0,20 -0,71 0,61 0,03 -6,00 F17 1,22 -0,49 1,56 -3,16 -0,51 2,78 -2,52 1,12 -0,36 F18 -3,82 -0,75 3,22 -2,22 6,99 7,12 -5,49 -5,06 3,71 F19 0,47 0,56 -2,05 2,76 -1,96 -4,12 3,41 0,94 -5,00 F20 0,01 0,00 0,00 -0,01 -0,01 0,00 0,00 0,01 -7,10 F21 2,09 0,11 -0,70 -0,37 -3,12 -1,83 1,26 2,56 -2,41 F22 -3,03 0,65 -1,84 4,81 2,62 -2,93 2,89 -3,17 -0,85 F23 -0,02 -0,46 1,62 -2,41 1,10 3,17 -2,67 -0,33 4,54 F24 -2,23 0,61 -1,82 4,25 1,61 -3,08 2,91 -2,24 -3,21 F25 -0,16 0,02 -0,05 0,18 0,17 -0,06 0,07 -0,18 2,83 GCA-M -2,30 1,67 4,25 1,24 2,04 -2,63 -3,40 -0,87 29,22 One of the biggest challenges for plant breeders is to deal with interactions. Indeed, they aim at identifying the specific adaptation of cultivars to a particu- lar set of environments – with the hope also to detect a wide adaptation in some cultivar. In the case of hybrid breeding, they also aim at identifying which {female * male} combination is offering the most promising specific com- bining ability (SCA). After the pioneer works of Tukey (1949) and Mandel (1971), the AMMI (for “Additive main effects and multiplicative interaction”) framework became very popular during the last decade. Based on two practi- cal examples analyzed using the R software, this study is intending to show how LABKEY TM (Soula et al., 2014) will provide the users with tools helping them to find out the key factors involved in Genotype * Environment (GxE) interaction, and to predict specific combining abilities (SCA) for non observed hybrid combinations. The AMMI approach is modeling the result of the combination of two factors (Genotype and Envi- ronment, or Female and Male, hereafter named I and J) as the sum of additive, main effects and of multiplicative effects of both factors, those being actually calculated through a Principal Component Analysis (PCA) on interaction estimates: Where m is the grand mean, a i and b j are the main, additive effects of the factors I and J, and ab ij represents their interaction: c ik and d jk are the coordinates of the component k of the interaction, and U k is the k th component eigenvalue. In the two examples described below, there were missing data in the two-way designs. In the par- ticular case of the factorial cross design, this study is addressing the question of interaction pre- Introduction General framework I. GxE case: understand the trends exhibited by the PCA thanks to environmental covariates Material & Method: 347 sunflower hybrids were tested in at least 8 of 16 {year * location * ex- perimental conditions} combinations (3 years, France, irrigated/non irrigated; Cadic et al., 2012, ACK ). In each environment, yield estimates were obtained by a BLUP, taking into account the spa- tial variation in the field when necessary. GxE interactions were obtained in subtracting the esti- mates of additive effects of genotype and of environment from the observed yield. In the LABKEY TM software, these estimates are obtained through Yates' (Yates, 1933) method of estima- tion of missing data, a method which implicitly assumes a null interaction between the unobserved {genotype * environment} combinations. These estimates are also those produced by the LSMEANS statement of the GLM procedure under SAS©. A PCA was performed on the GxE interaction. Various environmental covariates, such a drought stress index evaluated in each environment through the SUNFLO model crop (Casadebaig et al., 2011) and soil depth, were then correlated with the main components of inter- action. Figure 1. a: Correlation circle of the 16 environments and two environmental covariates (drought index and soil depth) on the first two principal components. Figure 1. b: Hybrids showing a high interaction with environments on the first principal component. The other hybrids are represent- ed by black dots. Arrows represent the 16 environments and the soil depth (in green).

Upload: others

Post on 11-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dealing with Statistical Interactions · LABKEYTM software, these estimates are obtained through Yates' (Yates, 1933) method of estima-tion of missing data, a method which implicitly

Dealing with Statistical Interactions

in Plant Breeding within the LABKEYTM

software

Soula, Julie1 ; Vincourt, Patrick

2 ; Bernard, Jean-Marc

3 ; Bardet, Sébastien

1 ; Royer, Frédéric

3

1Doriane SAS, 31 av. Jean Médecin, 06000 Nice, France. 2Open-Source-Biology, 12 avenue V Segoffin, 31400 Toulouse, France.

3Biosearch Data Management, 39 bd Dubouchage, 06000 Nice, France

Conclusion

LABKEYTM

is currently developing tools to help the breeders in their aim to deal with various types

of interaction. From two practical examples, this study shows how this purpose is addressed in

LABKEYTM

developments.

Results and discussion: The PCA on interactions mainly exhibited a single important axis,

accounting for 20 % of the total variance. As shown in Figure 1.a, the drought stress index ap-

peared as not associated with the main component of GxE interaction. Actually, this drought index

is highly correlated with the variation of environment yield average (r2 = 0.28), and this explains

why it no longer impacts the GxE interaction. The soil depth, which is also a key parameter for the

SUNFLO model crop, is however highly correlated with the first principal component.

Figure 1.b shows how to detect particular adaptations of sunflower hybrids: some of them seem to

be more adapted to deep soil, while some others appear to be more adapted to soils not able to

provide water to the crop all along the development cycle.

This example shows how it is crucial, in order to decipher the GxE interaction, to record relevant

agronomic covariates.

References

1. Cadic E., Debaeke P., Langlade L., Grezes-Besset B., Pauquet J., Coque M., André T., Chatre S., Casadebaig P., Mangin B.,

Vincourt P. Phenotyping the response of sunflower (Helianthus annuus L.) to drought scenarios in multi-environmental trials for the

purpose of association genetics. Plant Animal Genome Int. Conf. 2012

2. Casadebaig P, Guilioni L, Lecoeur J, Christophe A, Champolivier L, Debaeke P (2011) SUNFLO, a model to simulate genotype-

specific performance of the sunflower crop in contrasting environments. AGRICULTURAL AND FOREST METEOROLOGY 151(2):

163-178

3. Soula J, Duminil T, Bardet S, Bernard JM, Royer F (2014) Consideration Of Field Heterogeneities In The Calculation Of Variety

Means From Agronomic Trial Data: Comparisons Between A Nearest Neighbor Adjustment Method And Experimental Designs Like

Lattices And Latinized Alpha-Designs. Plant Animal genome Conf., San Diego

4. Technow F, Schrag TA, Schipprack W, Bauer E, Simianer H, Melchinger AE. (2014) Genome Properties and Prospects of Ge-

nomic Prediction of Hybrid Performance in a Breeding Program of Maize. GENETICS 197 (4): 1343-135

5. Yates F. (1933). The analysis of replicated experiments when the field results are incomplete. Emp. Jour. Exp. Agr. 1, 129-142

* Acknowledgements

The experimental data for the GxE example were kindly provided by the companies (BIOGEMMA, RAGT, SOLTIS and SYNGENTA) members of the SUNYFUEL and

OLEOSOL Consortium, which benefited of financial supports from the French ANR, the French public funds for competitiveness clusters (FUI), the European Re-

gional Development Fund (ERDF), the Région Midi Pyrénées, the Departmental Board of Aveyron (France), and the Cities Cluster of Rodez (France).

II. Female x Male (FxM) cross design: predicting

SCA for unobserved crosses?

Material & Method: 25 female lines were crossed with 8 male lines to produce 90 hybrids, i.e.

with a high proportion of missing data. Each female line was crossed with a minimum of two male

lines. PCA was performed on the FxM interaction estimates using the same approach as de-

scribed above. We then used the SCA components on the first principal components to predict the

SCA for unobserved combinations.

Results and discussion: The first two components of SCA are accounting for the major part

of the {female * male} interaction (55% of the total variance). Based on these first two compo-

nents, there is a good adequacy (r2= 0.68) between the observed and calculated SCA for ob-

served combinations, as shown in Figure 2.a. We therefore propose to use the same pre-

dictor for unobserved combinations. In this way, we obtain a table of (positive and negative) pre-

dicted interactions: see Figure 2.b. As pointed out by Technow et al. (2014), hybrid breed-

ers are often testing the combining ability with a limited number and range of testers, thus limiting

the possibility to explore a wider range of combination for the detection of particular hybrid combi-

nation. This example shows how, with the same investment in hybrid production and experimenta-

tion, it becomes possible to explore a wider space of genetic diversity and to propose inferences

for unobserved combinations.

Figure 2. b: Predicted interaction values using the

first two principal components. Black: Predicted inter-

action values for tested hybrids. Red (negative),

green (positive): predicted interaction values for non

tested hybrids.

Figure 2.a: Modeling the specific combining abilities (SCA): the

first two components are accounting for 68% of the SCA varia-

tion on observed combinations. On the bottom right, males and

females parents on the first two principal components.

M1 M2 M3 M4 M5 M6 M7 M8 GCA-F

F1 -1,37 0,37 -1,09 2,56 1,01 -1,83 1,73 -1,38 2,49

F2 -0,04 0,03 -0,10 0,17 -0,02 -0,18 0,16 -0,03 -0,86

F3 -1,26 -0,07 0,45 0,19 1,90 1,15 -0,80 -1,55 -4,27

F4 0,03 0,04 -0,15 0,20 -0,13 -0,29 0,24 0,06 1,88

F5 -0,89 0,01 0,09 0,47 1,18 0,37 -0,19 -1,05 6,16

F6 0,58 -0,07 0,17 -0,64 -0,62 0,21 -0,25 0,64 -2,54

F7 -0,08 -0,02 0,07 -0,06 0,15 0,16 -0,12 -0,10 0,54

F8 6,88 -0,75 1,61 -7,10 -7,66 1,65 -2,33 7,69 2,11

F9 -0,50 0,12 -0,34 0,84 0,40 -0,56 0,54 -0,51 -1,00

F10 0,78 0,18 -0,74 0,58 -1,48 -1,62 1,26 1,04 -2,74

F11 -0,16 0,18 -0,63 1,04 -0,22 -1,19 1,03 -0,06 -2,70

F12 0,41 0,17 -0,66 0,70 -0,96 -1,37 1,10 0,61 -3,51

F13 1,06 -0,13 0,29 -1,16 -1,15 0,34 -0,43 1,18 8,82

F14 -0,06 -0,02 0,08 -0,08 0,14 0,17 -0,14 -0,09 3,11

F15 0,11 -0,39 1,37 -2,12 0,77 2,65 -2,25 -0,13 6,34

F16 -0,03 0,11 -0,37 0,57 -0,20 -0,71 0,61 0,03 -6,00

F17 1,22 -0,49 1,56 -3,16 -0,51 2,78 -2,52 1,12 -0,36

F18 -3,82 -0,75 3,22 -2,22 6,99 7,12 -5,49 -5,06 3,71

F19 0,47 0,56 -2,05 2,76 -1,96 -4,12 3,41 0,94 -5,00

F20 0,01 0,00 0,00 -0,01 -0,01 0,00 0,00 0,01 -7,10

F21 2,09 0,11 -0,70 -0,37 -3,12 -1,83 1,26 2,56 -2,41

F22 -3,03 0,65 -1,84 4,81 2,62 -2,93 2,89 -3,17 -0,85

F23 -0,02 -0,46 1,62 -2,41 1,10 3,17 -2,67 -0,33 4,54

F24 -2,23 0,61 -1,82 4,25 1,61 -3,08 2,91 -2,24 -3,21

F25 -0,16 0,02 -0,05 0,18 0,17 -0,06 0,07 -0,18 2,83

GCA-M -2,30 1,67 4,25 1,24 2,04 -2,63 -3,40 -0,87 29,22

One of the biggest challenges for plant breeders is to deal with interactions.

Indeed, they aim at identifying the specific adaptation of cultivars to a particu-

lar set of environments – with the hope also to detect a wide adaptation in

some cultivar. In the case of hybrid breeding, they also aim at identifying which

{female * male} combination is offering the most promising specific com-

bining ability (SCA). After the pioneer works of Tukey (1949) and Mandel

(1971), the AMMI (for “Additive main effects and multiplicative interaction”)

framework became very popular during the last decade. Based on two practi-

cal examples analyzed using the R software, this study is intending to show

how LABKEYTM

(Soula et al., 2014) will provide the users with tools helping

them to find out the key factors involved in Genotype * Environment

(GxE) interaction, and to predict specific combining abilities (SCA) for

non observed hybrid combinations.

The AMMI approach is modeling the result of the combination of two factors (Genotype and Envi-

ronment, or Female and Male, hereafter named I and J) as the sum of additive, main effects and

of multiplicative effects of both factors, those being actually calculated through a Principal

Component Analysis (PCA) on interaction estimates:

Where m is the grand mean, ai and bj are the main, additive effects of the factors I and J, and abij

represents their interaction: cik and djk are the coordinates of the component k of the interaction,

and Uk is the kth

component eigenvalue.

In the two examples described below, there were missing data in the two-way designs. In the par-

ticular case of the factorial cross design, this study is addressing the question of interaction pre-

Introduction

General framework

I. GxE case: understand the trends exhibited

by the PCA thanks to environmental covariates

Material & Method: 347 sunflower hybrids were tested in at least 8 of 16 {year * location * ex-

perimental conditions} combinations (3 years, France, irrigated/non irrigated; Cadic et al., 2012, ACK

). In each environment, yield estimates were obtained by a BLUP, taking into account the spa-

tial variation in the field when necessary. GxE interactions were obtained in subtracting the esti-

mates of additive effects of genotype and of environment from the observed yield. In the

LABKEYTM

software, these estimates are obtained through Yates' (Yates, 1933) method of estima-

tion of missing data, a method which implicitly assumes a null interaction between the unobserved

{genotype * environment} combinations. These estimates are also those produced by the

LSMEANS statement of the GLM procedure under SAS©.

A PCA was performed on the GxE interaction. Various environmental covariates, such a

drought stress index evaluated in each environment through the SUNFLO model crop

(Casadebaig et al., 2011) and soil depth, were then correlated with the main components of inter-

action.

Figure 1. a: Correlation circle of the 16 environments and

two environmental covariates (drought index and soil

depth) on the first two principal components.

Figure 1. b: Hybrids showing a high interaction with environments

on the first principal component. The other hybrids are represent-

ed by black dots. Arrows represent the 16 environments and the

soil depth (in green).