ii - graduate school imgs/sample manuscript.pdfselection and phenotypic analysis. he has also an...
TRANSCRIPT
![Page 1: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/1.jpg)
![Page 2: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/2.jpg)
ii
ii
PREDICTIVE HYBRID RICE BREEDING USING GENOMIC SELECTION
AND ITS INTEGRATION INTO RICE BREEDING PROGRAMS
USING RESEARCH MANAGEMENT APPROACHES
TAMERLANE MARK SIARON NAS
SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF THE PHILIPPINES LOS BAÑOS
IN PARTIAL FULFILLMENT OF THE
REQUIREMENTS FOR
THE DEGREE OF
DOCTOR OF PHILOSOPHY
(Genetics)
JUNE 2016
![Page 3: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/3.jpg)
![Page 4: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/4.jpg)
iv
iv
BIOGRAPHICAL SKETCH
The author is a rice researcher in Syngenta, a leading global crop solutions and crop
biotechnology company, as a Genetics Project and Molecular Breeding Lead since 2014.
Previously, he was involved in hybrid rice breeding at DuPont Pioneer (2006-2014) and at
the International Rice Research Institute (1999-2006), and in fruit crops breeding at the
Institute of Plant Breeding in UPLB (1997-1999). His current interests and research focus
are increasing genetic gain, rice heterotic pools and associated methods such as genomic
selection and phenotypic analysis. He has also an interest in the application of good
leadership and management principles in breeding programs.
He graduated with a B.S. Biology degree major in Genetics from UPLB in 1997.
In 2003, he received his M.Sc. degree in Genetics also from UPLB with a minor in Plant
Breeding, and was a DOST scholar.
The author is the eldest among five children of Mr. Felicito C. Nas and Mrs. Loida
Siaron Nas of Polangui, Albay. He is married to Mrs. Gretchen Ocampo Nas, and blessed
with one daughter, Rebekah Ysabelle.
TAMERLANE MARK S. NAS
![Page 5: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/5.jpg)
v
v
ACKNOWLEDGEMENTS
I wish to share this very significant milestone of my career with the following
individuals and institutions, and honor them for their invaluable contributions to this work.
First, my academic advisers, Dr. Jose E. Hernandez, Dr. Merlyn S. Mendioro, Dr.
Consorcia E. Reaño, Dr. Ma. Genaleen Q. Diaz and Dr. Mimosa C. Ocampo, for their
guidance since the beginning of my graduate program and through the conduct of this
research study. Syngenta was very generous in providing financial support in every aspect
of this work. Dr. John de Leon, Dr. Manny Logroño and Dr. Harish Gandhi, my
Syngenta superiors for supporting my career development and for allowing me to take this
graduate program on top of my responsibilities. Dr. Suresh Kadaru, my Syngenta
colleague, provided tremendous help in many areas of this study, particularly the marker
work. Dr. Nonoy Bandillo of University of Nebraska at Lincoln and Dr. Franco Asoro
of Iowa State University were very thorough in sharing their knowledge on various
strategies in genomic selection, and programming codes used in this study. Syngenta’s
Trialing Team provided excellent legwork in conducting the field trials, while the HMU
Team performed a great job in producing the hybrid seeds. My colleagues in Seed Product
Development also deserve a toast to teamwork, for setting a very rewarding workplace
which is very conducive for this study. I want to thank my former team in DuPont Pioneer:
Jahleel Acedo Mendoza, Jomar Punzalan, Gelo Fontanilla, Nerio Camposano, Jessie
Fernandez, Jr. and Herson Arcilla for our shared loyalty to integrity in conducting
research. My former superiors, Dr. Dennis Byron and Mr. Emmanuel Serrano (DuPont
Pioneer), Dr. Sant S. Virmani (IRRI) and Dr. Violeta N. Villegas (IPB), also influenced
my decisions regarding my career by serving as role models and inspirations. Dr. Conrad
Balatero, Dr. Glenn Gregorio, Dr. Bert Collard and Dr. Emma Sales served as mentors
in various points of my professional life. My spiritual family, Victory General Santos,
kept me grounded on what is essential in life. My wife Gretchen and daughter Rebekah
Ysabelle were always there to give their love, support, and understanding. Finally to Jesus
Christ, my Lord and Savior for providing all these wonderful people, to You be the glory
and honor.
![Page 6: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/6.jpg)
vi
vi
TABLE OF CONTENTS
PAGE
IPR PAGE
TITLE PAGE
APPROVAL PAGE
BIOGRAPHICAL SKETCH
ACKNOWLEDGEMENT
TABLE OF CONTENTS
LIST OF TABLES
LIST OF FIGURES
LIST OF APPENDICES
LIST OF ACRONYMS
ABSTRACT
CHAPTER 1. INTRODUCTION
CHAPTER 2. REVIEW OF LITERATURE
Increasing Genetic Gain
Yield Trialing and Phenotypic Analysis
Achieving High Heritability in Field Trials
Best Linear Unbiased Prediction (BLUP)
Use of General Combining Ability in Hybrid Breeding
Marker-Aided Selection
Current use of markers in rice breeding
Mapping Quantitative Trait Loci
i
ii
iii
iv
v
vi
xi
xiii
xvi
xvii
xix
1
5
5
6
7
8
10
11
11
12
![Page 7: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/7.jpg)
vii
vii
Limitations of Traditional MAS
Linkage disequilibrium-based mapping
Genomic Selection
Statistical Models of Estimating GEBV
Limitations of Stepwise Regression Models
Ridge Regression BLUP
Bayesian Methods
Kernel and Machine Learning Methods
Accuracy of Genomic Selection
Research Management of Plant Breeding Programs
Knowledge, Experience and Skill Requirements from
Plant Breeders
Breeding Programs as Part of Meta-Organizations
Structure of Research Organizations
Introducing Change into Breeding Organizations
CHAPTER 3. MATERIALS AND METHODS
Phenotyping and phenotypic analysis
Genotyping
Preparation and Processing of Tissue Samples
Quality Filtering and Reformatting of SNP Markers
Estimation of Genetic Relationships
Implementing Genomic Selection Models
Design of Training and Validation Populations
PAGE
14
16
17
19
19
20
21
24
25
26
26
27
28
29
32
32
33
33
33
34
34
35
![Page 8: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/8.jpg)
viii
viii
Procedure for Cross Validation
Comparison of Prediction Accuracies
Optimization of Genomic Selection Parameters
Creating a Genomic Selection Project Proposal
Declaration of Research Funding and Non Conflict of Interest
CHAPTER 4. RESULTS AND DISCUSSION
Quality of Field Trial Data
By-Location Coefficients of Variation
Distributions of Yield, Days to 50% Flowering and Plant Height
Distribution of Hybrids Across Locations
Analysis of Multiple Locations
Variance Components and Computed Trait Heritabilities
Deriving BLUPs: Fitting Linear Mixed Models
Shrinkage Towards the Mean
Deriving General Combining Ability
Marker Coverage and Population Structure
Descriptive Statistics on SNP Marker Data
Genomic Relationships and Principal Components
Evaluation of Genomic Prediction Methods
Genomic BLUP (GBLUP) and Ridge Regression
Effect of Trait Heritability on Prediction Accuracy
Effect of Training Population Size on Prediction Accuracy
PAGE
36
37
38
39
39
40
40
40
41
43
43
44
46
47
49
51
52
53
56
57
59
62
![Page 9: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/9.jpg)
ix
ix
Effect of Genomic Selection Model on Prediction Accuracy
Correlations of the Different Genomic Selection Models
Population Structure as Covariate
Optimization of Genomic Selection Parameters
Whole Model Test for the Generalized Linear Model
Effect Summary and Effect Tests
Prediction Profiles and Application to Breeding Programs
Integrating Genomic Selection into Hybrid Rice Breeding Programs
Assumptions on the Hypothetical Breeding Program
Rationale on Increasing the Effectiveness of Breeding
Programs
Objectives of the Project Being Proposed
Stakeholder Analysis
Problem Analysis
Project Planning Matrix
Implementation Schedule
Management Arrangements
Budgetary Requirements
Recommendations for Inbred Rice Breeding Programs
CHAPTER 5. SUMMARY AND CONCLUSION
Usefulness of Genomic Selection
Optimizing Genomic Selection Procedures
PAGE
65
69
71
76
78
78
80
83
83
85
85
86
88
92
95
96
99
102
105
105
106
![Page 10: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/10.jpg)
x
x
Implementing Genomic Selection through a Research Management
Approach
LITERATURE CITED
APPENDICES
PAGE
107
108
118
![Page 11: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/11.jpg)
xi
xi
LIST OF TABLES
TABLE PAGE
1
2
3
4
5
6
7
8
9
10
11a
11b
Full factorial design for optimization of genomic selection
parameters.
Variance components of the phenotypes yield, days to 50%
flowering and plant height derived from REML-fitted linear
mixed models, and computed heritabilities. Variance components,
except genotypic variance, were divided by the number of levels
per source of variation.
LSD threshold matrix comparing prediction accuracy means of
GBLUP and Ridge Regression for GCA of 122 rice parental lines
for three traits.
HSD threshold matrix of genomic selection accuracy means
between all pairs of traits.
HSD threshold matrix of prediction accuracies between two cross
validation methods or training population size.
HSD threshold matrix among prediction accuracy means between
all pairs of genomic selection models.
Spearman’s rank correlation coefficient between pairs of genomic
selection models across all traits.
HSD threshold matrix of prediction accuracy means between
subpopulations and mixed population.
HSD threshold matrix of prediction accuracy means between
pairs of genomic selection models using subpopulation prediction
Full factorial design of genomic selection accuracy means,
heritability, training population size and genomic selection model
used in optimization.
Whole model test for the generalized linear model created to
optimize genomic selection.
Goodness of fit test for the generalized linear model created to
optimize genomic selection.
38
45
58
62
65
68
69
73
76
77
78
78
![Page 12: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/12.jpg)
xii
xii
TABLE
12
13
14
15
16
17
18
19
20
LogWorth, FDR LogWorth and FDR p-values of main effects and
interaction in the generalized linear model.
Effects Test of main effects and interactions in the generalized
linear model.
Operational considerations of a hypothetical hybrid rice breeding
program using DH as a means of rapid inbred production.
Stakeholder map of the project summarizing the concerns of the
most relevant stakeholders in project implementation.
Project planning matrix on increasing genetic gain of breeding
programs by integrating genomic selection.
Project Gantt chart showing milestones in project
implementation.
Service provided to breeders by breeding program support staff.
Cost comparison between breeding programs with and without
genomic selection.
Projected budget of integrating genomic selection over a ten-year
period
PAGE
79
79
84
87
93
96
98
100
101
![Page 13: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/13.jpg)
xiii
xiii
LIST OF FIGURES
FIGURE PAGE
1
2
3
4
5
6
7
8
9
10
11
A general genomic selection scheme applied to a breeding
population showing the partitioning of the population into
training and prediction sets. Only the training set is phenotyped
(e.g. yield trials) instead of the whole population.
A diagram of a matrix organization with product delivery
managed as projects across functions.
Cross validation procedure for ten-fold and three-fold schemes,
representing 0.9 and 0.667 proportions of training population
size. Each partition was successively used as validation
population from the prediction model derived from the training
population.
Box plots of coefficients of variation of 332 locations grouped
into seasons. Connecting letter report was derived from
comparison of means using Student’s t-test, α=0.05.
Trait x location box plots used in checking location data for
quality showing the spread of data points per location.
Unexplained data points in each location were discarded.
Distributions of yield, days to flowering (DTF) and plant height
across locations planted in wet and dry seasons.
Heat map showing distribution of genotypes across locations.
Male and female parents are also shown, as represented by hybrid
progenies and not as tested genotypes.
Observed and adjusted values (BLUPs) for yield, days to 50%
flowering and plant height of 510 genotypes showing shrinkage
of adjusted means toward the analysis mean.
General combining ability (GCA) for yield (kg), DTF (days) and
plant height (cm) of parental lines.
Distribution of {-1,0,1} allele calls in the marker matrix derived
from scores from 43,344 SNP loci.
Heat map of realized relationship matrix of 122 parental lines
showing three main clusters representing one female and two
male clusters.
19
28
37
41
42
42
44
49
51
53
55
![Page 14: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/14.jpg)
xiv
xiv
FIGURE
12
13
14
15
16
17
18
19
20
21
22
Principal component analysis using marker data: (a) highest two
principal components in 2D plot, (b) highest three principal
components in 3D plot, (c) scree plot showing magnitude of
eigenvalues and variance explained by the principal components,
and (d) summary table of eigenvalues.
Correlation of GEBVs in RR-BLUP using marker data directly
(Ridge Regression) and genomic relationships (GBLUP).
Variability chart of prediction accuracies per trait in 122 rice
parental lines. First and second level factors were interchanged
between the two graphs.
Box plots of prediction accuracy per trait in 122 rice parental
lines and comparison circles based on Tukey’s honest significant
difference test.
Variability chart of prediction accuracy per cross validation
method. The first and second level factors were interchanged
between charts.
Box plots of prediction accuracy per cross validation method
(training population size) and comparison circles based on
Tukey’s honest significant difference test.
Variability chart of prediction accuracy per genomic selection
model. The first and second level factors were interchanged
between charts.
Box plots of prediction accuracy per genomic selection model
and comparison circles based on Tukey’s honest significant
difference test.
Scatterplot matrices of correlations between pairs of genomic
selection models for all traits and each trait individually.
Box plots of prediction accuracy of overall means using whole
population and subpopulations 2 and 3 jointly, and comparison
circles based on Tukey’s honest significant difference test.
Variability chart for prediction accuracy showing mean
differences for mixed population (All) and subpopulation
(Subpop) predictions for each genomic selection model per trait.
PAGE
56
58
61
62
64
65
67
68
70
73
74
![Page 15: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/15.jpg)
xv
xv
FIGURE
23
24
25
26
27
28
29
30
31
Box plots of prediction accuracy of GS model means using
subpopulations 2 and 3 jointly, and comparison circles based on
Tukey’s honest significant difference test.
Contour map showing general trend of relationship between
genomic selection accuracy, heritability and training population
size.
Prediction profiles of selected combinations of variables in the
genomic selection model: (A) Low heritability and small training
population size, (B) Low heritability and large training population
size, (C) High heritability and large training population size, and
(D) High heritability and small training population size.
A scheme for hybrid breeding program using a reciprocal
recurrent selection that creates 10,000 new inbreds and 10,000
new hybrids every breeding cycle.
Problem tree diagram showing some causes and effects of low
rate of genetic gain in breeding programs.
Life cycle of products from a breeding program that releases one
new hybrid every year. Monitoring of objectively verifiable
indicators as described in the project planning matrix is shown by
the arrows.
Organization structure of the hypothetical breeding program in
which genomic selection is to be applied in the project proposal.
A breeding scheme with full integration of genomic selection
showing the porportions of tested and predicted inbred GCAs
Breeding schemes for inbred rice development with and without
genomic selection. Genomic selection can drastically reduce trial
plots. In these schemes, testcrossing is not required.
PAGE
75
80
82
83
91
92
97
102
104
![Page 16: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/16.jpg)
xvi
xvi
LIST OF APPENDICES
APPENDIX
A
B
C
D
E
Sample script for deriving BLUPs and GCAs implemented in
R.
Sample script for predicting phenotypes using RR- BLUP
implemented in R.
Sample script for predicting phenotypes using Bayesian Ridge
Regression implemented in R.
Sample script for predicting phenotypes using Bayesian CPi
implemented in R.
Sample script for predicting phenotypes using Bayesian Lasso
implemented in R.
PAGE
118
120
121
122
123
![Page 17: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/17.jpg)
xvii
xvii
LIST OF ACRONYMS
BC
BLB
BLUE
BLUP
COGS
CV
DA
DH
DNA
DTF
EBV
FDR
GBLUP
GCA
GEBV
GLM
HRCP
LASSO
LSD
MAS
OFTD
Backcross
Bacterial Leaf Blight
Best Linear Unbiased Estimates
Best Linear Unbiased Prediction
Cost of Goods
Coefficient of Variation
Department of Agriculture
Doubled Haploids
Deoxyribonucleic Acid
Days to 50% flowering
Empirical Breeding Value
False Discovery Rate
Genomic Best Linear Unbiased Prediction
General Combining Ability
Genome Estimated Breeding Values
Generalized Linear Model
Hybrid Rice Commercialization Program
Least Absolute Shrinkage and Selection Operator
Least Significant Difference
Marker Assisted Selection
On-Farm Techno-Demo
![Page 18: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/18.jpg)
xviii
xviii
QTL
REML
RFLP
RIL
RR-BLUP
SCA
SNP
TBV
Quantitative Trait Loci
Residual Maximum Likelihood
Restriction Fragment Length Polymorphism
Recombinant Inbred Lines
Ridge Regression Best Linear Unbiased Prediction
Specific Combining Ability
Single Nucleotide Polymorphism
True Breeding Value
![Page 19: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/19.jpg)
xix
xix
ABSTRACT
NAS, TAMERLANE MARK SIARON. University of the Philippines Los Baños, June
2016. Predictive Hybrid Rice Breeding Using Genomic Selection and Its Integration
into Rice Breeding Programs Using Research Management Approaches.
Major Professor: Jose E. Hernandez, Ph.D.
This is the first research work on genomic selection in hybrid rice, a new procedure
in crop breeding that uses genotype data from a large set of random markers across the
genome that allows prediction of the phenotype from the marker data alone, the genomic
prediction model being trained from a population with phenotype and genotype data.
The accuracy of genomic selection in predicting general combining ability (GCA)
for the quantitative traits yield, days to 50% flowering and plant height was assessed, each
trait with computed heritabilities of 0.3130, 0.5036 and 0.5486, respectively. GCA of 122
parental lines were computed from yield trials and historical data using Best Linear
Unbiased Predictions (BLUPs). Concurrently, the 122 parental lines were fingerprinted
using 60,000 SNP markers, resulting to 43,344 high-quality SNP loci. Principal component
analysis and the realized relationship matrix revealed a population structure with one
female cluster (CMS) and two male clusters (restorers). Four genomic selection models
were explored: Ridge Regression BLUP (RR-BLUP), Bayesian Ridge Regression
(BayesRR), Bayesian C Pi (BayesCPi) and Bayesian Lasso (BayesL). In addition, training
population sizes of 0.667 and 0.9 as proportions of overall population size were used.
Marker densities were not considered as these were not relevant for chip-based marker
platforms.
Overall prediction accuracy was significantly influenced by trait heritability and
training population size, but not by genomic prediction model. Plant height has the highest
![Page 20: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/20.jpg)
xx
xx
mean prediction accuracy (0.59) while yield has the lowest (0.31). Larger training
population size (0.9) has a higher mean prediction accuracy of 0.47 compared to a smaller
training population size (0.667) with a mean accuracy of 0.36. There were no significant
differences among the prediction accuracy means for the four genomic selection models.
To account for population structure, prediction was done on pooled members of the
two clusters representing restorer lines, excluding the cluster with CMS lines. Mean
accuracy of within subpopulation prediction was 0.52 compared to 0.45 obtained with
whole population predictions. All trait predictions also increased, in particular, yield with
an increased prediction accuracy of 0.38.
A generalized linear model was used to create prediction profiles for various
combinations of trait heritability, training population size and genomic selection model.
The prediction profiles can simulate heritability and training population size values not
included in the experiment, which can be used to optimize genomic selection parameters.
We propose the introduction of genomic selection into a crop breeding program
using a research management approach by identifying and analyzing the problem (low
genetic gain), identifying impact to stakeholders, and proposing a project that will
implement genomic selection proofs of concept and training for breeders. Proofs of
concepts can be completed within two years while full scale up of genomic selection to
steady state can be attained in six years. Genomic selection is projected to save 32% of
breeding program resources by substituting DNA fingerprint for phenotype plot data.
![Page 21: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/21.jpg)
1
CHAPTER 1
INTRODUCTION
Plant breeding programs in many research institutions and companies worldwide
are constantly being upgraded to maximize genetic gain. These programs incorporate
schemes that increase response to selection such as improving the accuracy of phenotyping,
increasing selection differential, and reducing cost and breeding cycle time. Response to
selection has been a key driver of decisions in product development strategies particularly
in the private industry. Decision-makers involved in setting the general direction of plant
breeding programs realize that frontloading investments are necessary to answer key
questions and establish optimized schemes that will ultimately increase genetic gain in
routine breeding processes.
Efforts to implement more efficient breeding programs by addressing the associated
factors of the breeding process are part of a coordinated approach of plant breeding and
other fields such as social sciences on the greater issue of global food security. The global
food security outlook for the 21st century is not very bright. FAO (2009) estimates that the
world’s population will reach 9 billion by 2050, 90% of which will occur in the developing
world. Consequently, cereal production needs to increase by 50%, from 2.1 to 3.0 billion
tons; 80% of the food production increases is projected to come from increases in yields
and cropping intensity. Plant breeding programs should exert constant intervention by
improving crop yields per unit area and unit time, thus this task presents a formidable
challenge to plant breeders.
![Page 22: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/22.jpg)
2
Conventional plant breeding has proven to be a then cutting edge tool in combating
an impending worldwide famine in the middle of the 20th century. The Green Revolution
transformed agriculture worldwide with an increase in world grain production by over
250% from 1950 to 1984 (Kindall and Pimentel, 1994). With the current increasing global
trends in population, shortage of arable land, and decrease in yield gains, plant breeding
will once again address this challenge. But it will do so in a better capacity because plant
breeders have at their disposal new tools such as molecular markers. The challenges now
will not be solved by the same conventional technologies that brought us the Green
Revolution. In rice, molecular markers have been documented to dramatically simplify
breeding (Collard et al., 2008). An excellent example of a high-impact marker-enhanced
breeding is the development of submergence tolerant rice varieties (Septiningsih et al.,
2009).
However, molecular markers in rice breeding have been used mostly for major
genes with large effects. Small-effect quantitative trait loci (QTL) governing extremely
complex traits that are agronomically important such as yield are currently being assessed
by phenotyping in multiple locations. This adds constraints to plant breeding research in
developing countries, which is largely underfunded. Phenotyping, or conducting multi-
environment yield trials for example, is perhaps the most expensive component of a
breeding program.
In large agri-biotech companies, a significant percentage of resources and funding
of local breeding programs are devoted to multi-location testing. In corn breeding
programs, breeders evaluate tens of thousands of doubled haploid (DH) lines and
testcrosses every cropping season in multiple locations in a target population of
![Page 23: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/23.jpg)
3
environments. In a purely phenotype-based evaluation system, this will translate to tens of
thousands of yield plots. The capacity to test genotypes is usually the baseline from which
logistical decisions are made in a local breeding program, such as number of populations
or families, number of inbreds or DHs per family, number of testers and number of
locations. It is therefore important for breeding programs to devise and adopt strategies that
will address the factors that contribute to genetic gain through efficient evaluation of
genotypes.
The general objective of this research was to evaluate the use of genome-wide
markers in augmenting phenotyping of quantitative traits that are most important in hybrid
rice breeding by predicting breeding values, and initially explore the use of genomic
prediction models in a hypothetical breeding program.
The specific objectives of this research were as follows:
1. Investigate the usefulness of whole-genome markers in generating genomic
prediction models for yield, plant height and days to flowering in hybrids rice
breeding by conducting multi-location trials and augmenting the data generated
with existing trial datasets.
2. Compare the adequacy of different statistical models for genomic selection for
quantitative traits in different types of training and validation population and
different types of genomic selection models, and create an optimized model for
various combinations of heritability, training population size and genomic selection
model.
3. Propose a research management-based approach in introducing genomic selection
into existing breeding programs
![Page 24: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/24.jpg)
4
This research was conducted in various Syngenta locations. Genotyping service
was provided by Syngenta's high-throughput SNP marker facilities located in Toulouse,
France. Field trials were conducted in various locations throughout the Philippines.
![Page 25: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/25.jpg)
5
CHAPTER 2
REVIEW OF LITERATURE
Increasing Genetic Gain
A breeding program's year-over-year success is usually measured by genetic gain.
Genetic gain (G), broadly defined as the increase in performance through artificial genetic
improvement programs, is positively correlated to the standardized selection differential
(i), accuracy of selection (rA) broad-sense heritability (H) and phenotypic standard
deviation (σP). In the seed industry, cost (c) and time (t) are considered in the area of
resource management and are used in estimating return of research investment, as shown
in the equation below.
𝐺 =𝑖𝑟𝐴√𝐻𝜎𝑃
(𝑡)(𝑐)
Improvements in breeding processes typically address these factors. Use of markers
to screen for traits increases the mean of selected individuals and therefore increases
standardized selection differential. In most cases, such as yield trials, testing under
conditions that simulate the target environment ensures high correlation between testing
and target environments and therefore increases phenotyping accuracy. Optimization of
trial designs by improving field techniques to minimize within-location and across-location
errors increase broad sense heritability. A carefully-planned breeding scheme takes into
account the need for necessary variation, and would address phenotypic standard deviation.
![Page 26: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/26.jpg)
6
The easiest genetic gain component to manipulate so far is breeding cycle time. In
a hybrid breeding program based on heterotic pools, the most critical stage is to testcross
new inbreds. In many seed companies with an industrialized inbred production system,
new inbreds are generated in a sufficiently rapid manner, hence hybrid crop breeders would
generally focus more on evaluating testcrosses rather than evaluating segregating breeding
populations. To rapidly generate inbreds for testcrossing, a number of approaches have
been incorporated into breeding programs worldwide such as rapid generation advance
(Ikehashi and HilleRisLambers, 1977) and doubled haploid technology (Maluszinski et al.,
2003). A breeding program that uses doubled haploids can provide thousands of
homozygous lines for testcrossing per year.
Yield Trialing and Phenotypic Analysis
Phenotyping has been the cornerstone of the numerous plant breeding success
stories, and will continue to be so in the future. Yield is arguably the most important crop
trait. It is also the most evaluated, yet it has one of the lowest heritabilities (Teich, 1984;
Gomez and Gomez, 1984). This is mainly due to genotype x environment interactions or
GxE (Horner and Frey, 1957; Fox and Rosielle, 1982), spatial trends in the field (Vollmann
et al., 1996) and extraneous errors associated with experimental procedure (Gilmour A.R.
et al., 1997).
Fisher (1930) first proposed the decomposition of phenotypic variance in his classic
book “The Genetical Theory of Natural Selection,” and further elaborated in succeeding
plant breeding textbooks (Falconer, 1960; Lynch and Walsh, 1998; Bernardo, 2010). The
general model for phenotypic variance is:
![Page 27: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/27.jpg)
7
𝜎𝑃2 = 𝜎𝐺
2 + 𝜎𝐺𝑥𝐸2 + 𝜎𝑒
2
The variance due to environment (𝜎𝐸2) is held at zero in the context of plant breeding
because genotypes are assumed to be tested in similar environments. Genetic variance (𝜎𝐺2)
can further be decomposed into additive variance (𝜎𝐴2), dominance variance (𝜎𝐷
2) and
epistatic variance (𝜎𝐼2).
Achieving High Heritability in Field Trials
Heritability can provide an estimate of the quality of a field trial by comparing the
realized heritability with published values from other experiments. Broad sense heritability
(H) is defined as the proportion of the phenotypic variance that is explained by genetic
variance:
𝐻 =𝜎𝐺
2
𝜎𝐺2 + 𝜎𝐺𝑥𝐸
2 + 𝜎𝑒2
Heritability is also defined as the regression coefficient of G on P:
𝛽𝐺,𝑃 =𝑐𝑜𝑣(𝐺, 𝑃)
𝜎𝑃2 =
𝜎𝐺2
𝜎𝐺2 + 𝜎𝐺𝑥𝐸
2 + 𝜎𝑒2
= 𝐻
For most quantitative traits such as yield, plant breeders are more interested in the
narrow sense heritability (ℎ2) because alleles and not genotypes are passed from parent to
![Page 28: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/28.jpg)
8
progeny (Bernardo, 2010). Narrow sense heritability is the amount of phenotypic variance
that can be attributed to additive genetic variance, and can be represented as the regression
of breeding values on phenotypic values:
𝛽𝐴,𝑃 =𝑐𝑜𝑣(𝐴, 𝑃)
𝜎𝑃2 =
𝜎𝐴2
𝜎𝐺2 + 𝜎𝐺𝑥𝐸
2 + 𝜎𝑒2
= ℎ2
The variance of phenotypic means across 𝐽 environments and 𝑅 replications is
given by the formula:
𝜎𝑃2 = 𝜎𝐺
2 +𝜎𝐺𝑥𝐸
2
𝐽+
𝜎𝑒2
𝐽𝑅
The formula above reinforces the importance of locations and replications in field
trials because they reduce the effects of “masking” variances (Bernardo, 2010). Trials can
be established in multiple locations, in homogenous environmental conditions and in
several replications. Increasing the number of replications reduce 𝜎𝑒2 while increasing the
number of locations minimize both 𝜎𝐺𝑥𝐸2 and 𝜎𝑒
2.
Best Linear Unbiased Prediction (BLUP)
BLUP was originally developed in animal breeding but has not gained immediate
appreciation in plant breeding until in recent years, particularly in annual crops. Piepho and
co-workers (2007) postulate that this is due to the large amount of phenotypic information
generated per genotype so best linear unbiased estimates (BLUE) and BLUP may not
necessarily be advantageous. Also in animal breeding, estimation procedures are necessary
![Page 29: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/29.jpg)
9
due to lack of direct observations as in the case of selecting bulls for dairy milk yield
breeding. In fact, the first application of BLUPs was in dairy herds. Another reason is the
relative inaccuracy of genetic variance estimates in plants, due to limited number of
genotypes and more complex covariance structures.
Henderson (1949, 1950) first used linear mixed models in animal breeding by using
correction factors for estimating genetic improvement and predicting breeding values in
dairy herds. These concepts were subsequently proven mathematically as BLUEs
(Henderson et al., 1959) and BLUPs (Henderson, 1963) although it was not until 1973 that
the term Best Linear Unbiased Estimates and Best Linear Unbiased Predictions were
coined (Henderson, 1973).
BLUP is a method for estimating random effects from the mixed model represented
by the equation:
𝑦 = 𝑋𝛽 + 𝑍𝑢 + 𝑒
where 𝑦 is a vector of phenotypic observations, 𝛽 is a vector of fixed effects, 𝑢 is a vector
of random effects, 𝑋 and 𝑍 are the associated matrices and 𝑒 is a vector of random residuals.
The fixed effects can be estimated by Best Linear Unbiased Estimates (BLUEs). The
random effects, assumed have the distribution 𝑢~𝑀𝑉𝑁(0, 𝐺) and 𝑒~𝑀𝑉𝑁(0, 𝑅) where
𝑀𝑉𝑁(𝜇, 𝑉) denotes the multivariate normal distribution with mean vector 𝜇 and variance-
covariance matrix 𝑉, can be estimated by BLUPs (Piepho et al., 2007). Variance
components of 𝐺 and 𝑅 are estimated by statistical programs usually by using Residual
Maximum Likelihood (REML) proposed by Bartlett (1937) and first applied on estimating
components of variance in unbalanced data by Patterson and Thompson (1971).
![Page 30: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/30.jpg)
10
A desirable property of BLUP is shrinkage towards the mean. In plant breeding
context, the “shrinkage mean” anticipates the regression of progeny to observed mean (Hill
and Rosenberger, 1985). Shrinkage increases accuracy by reducing variance, resulting to
smaller mean squared errors (MSE). Searle et. al. (2006) reported that BLUP generally
maximizes correlation of true genotypic values and predicted genotypic values, which is
aligned with the phenotyping accuracy component of the genetic gain equation. BLUP can
also exploit information from relatives by using genetic correlation arising from pedigrees
and marker relationships.
Use of General Combining Ability in Hybrid Breeding
The concept of general combining ability (GCA) and specific combining ability
(SCA) was originally defined in corn by Sprague and Tatum (1942), in which they defined
GCA as “the average performance of a line in hybrid combinations,” and SCA as cases
wherein “certain combinations do relatively better or worse than would be expected on the
basis of the average performance of the lines involved.” An incomplete diallel that excludes
reciprocal crosses will have this model for the trait to be analyzed:
𝑌𝑖𝑗𝑘 = 𝜇 + 𝑔𝑖 + 𝑔𝑗 + 𝑠𝑖𝑗 + 𝑒𝑖𝑗𝑘
where 𝑔𝑖 and 𝑔𝑗 are the GCAs of parents i and j, and 𝑠𝑖𝑗 (= 𝑠𝑗𝑖) is the SCA of i x j cross.
If the parents are drawn from the same distribution, for example absence of heterotic
pooling and no gender limitations (e.g. male sterility), the total phenotypic variance is:
𝜎𝑃2 = 2𝜎𝐺𝐶𝐴
2 + 𝜎𝑆𝐶𝐴2 + 𝜎𝑒
2
![Page 31: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/31.jpg)
11
However in established hybrid crops such as corn, distinct heterotic pools exist,
while in hybrid rice, parental lines are usually classified into restorers and cytoplasmic
male-sterile (CMS) lines. The model can more clearly be represented as:
𝑌 = 𝜇 + 𝐺𝐶𝐴𝑚𝑎𝑙𝑒 + 𝐺𝐶𝐴𝑓𝑒𝑚𝑎𝑙𝑒 + 𝑆𝐶𝐴𝑓𝑒𝑚𝑎𝑙𝑒𝑥𝑚𝑎𝑙𝑒 + 𝑒
GCAs are often assumed similar to breeding values in hybrid breeding hence
additive variance is the most relevant component of genetic variance to hybrid breeders. In
reciprocal recurrent selection schemes, alleles are continuously accumulated every
breeding cycle, which is very important in achieving increases in rate of genetic gain. Use
of GCA and SCA have been further elaborated by Comstock and co-workers (1949).
Marker-Aided Selection
Current Use of Markers in Rice Breeding
Markers have been a key component of rice breeding programs in the last few
decades. Improvements in marker technologies from the first use of isozyme markers
(Tanksley and Rick, 1980) and restriction fragment length polymorphisms or RFLPs
(Beckmann and Soller, 1986) to the current technologies such as single nucleotide
polymorphisms (SNPs) and genotyping by sequencing (GbS) have increased the
popularity, accuracy and usefulness of MAS particularly for simple, monogenic traits, as
well as assessment of germplasm diversity (Virk et al., 1996).
![Page 32: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/32.jpg)
12
In rice breeding, markers have been generally used to screen for characters
controlled by major genes in inbred parental lines. Such characters include fertility
restoration (Nas et al., 2000; Sattari et al., 2007), bacterial leaf blight (Zhai et al., 2002;
Shanti et al., 2010; Basavaraj et al., 2010), rice blast (Singh et al., 2012) and
thermosensitive genetic male sterility (Nas et al., 2005). The most common traits on which
markers were used in hybrid rice breeding are disease resistance (bacterial leaf blight and
blast). Mackill (2007), Collard et al. (2005) and Collard et al. (2008) provided a
comprehensive reviews of the use of markers in rice breeding and associated technologies
in the marker lab that can be translated into the field.
Although markers have been used in diversity analysis, purity assessments, plant
variety protection, hybridity screens and many other applications, mapping and tagging
genes comprise the majority of marker use (Mackill, 2007).
Mapping Quantitative Trait Loci
One of the main uses of molecular markers is the construction of linkage maps,
which are useful in determining regions on chromosomes that contain genes. Chromosome
regions containing genes that control complex traits (polygenic or multifactorial traits) are
called quantitative trait loci (QTL).
Mapping QTLs require generating bi-parental populations of which the parental
lines have contrasting phenotypes for the trait of interest, for example, chalky and non-
chalky grains in rice. Mohan and co-workers (1997) suggests a population size of 50 to 250
individuals for preliminary mapping and larger populations for higher resolution mapping.
It is ideal that both parental lines are highly homozygous, which is not an issue in self-
![Page 33: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/33.jpg)
13
pollinated crops such as rice. In corn and other cross-pollinated crops, inbreeding
depression may present some challenges.
Mapping populations may be derived from various generations. F2 populations are
the easiest to create as these are just selfed seeds harvested from F1 plants. Backcross (BC)
populations are generated by crossing the F1 to one of the parental lines. Recombinant
inbred lines (RILs) are usually obtained by single seed descent from an F2 population for
six or more generations. Doubled haploids (DHs) are essentially similar to RILs except that
they are derived from F1 plants through anther culture or crossing with an inducer genotype.
McCouch and Deorge (1995), and Paterson (1996) outlined the advantages and
disadvantages of these types of mapping populations.
QTL mapping is based on the association between the phenotype and the genotype.
It is therefore important to attain accurate and precise phenotyping because marker-trait
associations will be concluded from this initial phenotyping of the mapping population,
which will then be subsequently applied across breeding populations.
Collard (2005) explains QTL mapping as dividing the mapping population into
different genotypic groups and finding significant differences in phenotype between
groups. Phenotypic means between groups are compared and significant differences would
indicate that the marker locus being used to partition the population is linked to a QTL
controlling the trait of interest.
Some statistical methods to detect QTLs include single marker analysis, simple
interval mapping and composite interval mapping (Tanksley, 1993). Single marker analysis
can be done by linear regression wherein the coefficient of determination (R2) from the
marker explains the phenotypic variation arising from the QTL linked to the marker
![Page 34: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/34.jpg)
14
(Collard, 2005). Simple interval mapping simultaneously analyzes intervals between
linked markers along chromosomes and is widely considered to be more powerful than
single marker analysis (Liu, 1998). Composite interval mapping combines interval
mapping with linear regression in a statistical model and includes additional markers in
addition to the adjacent markers that define an interval (Jansen, 1993).
Limitations of Traditional MAS
Dekkers and Hospital (2002) described the current methods in MAS as better suited
to genes with major effects than genes with small effects, which is in agreement with the
review of the technology by Xu and Crouch (2008). MAS is not effective for traits
controlled by many genes with small effects. In forward breeding, any addition of loci for
MAS would mean increasing the effective population size to attain the target population
after conducting MAS. For example, a target F2 population size of 100 individuals after
performing MAS translates to an initial 400 individuals if a breeder wants to select one
homozygote class for a single locus (p = 0.25). Selection for homozygotes for two loci (p
= 0.0625) puts the effective population at 1,600 individuals to attain a target population
size of 100 after MAS.
While allele enrichment schemes can manipulate probabilities by including
heterozygotes, these approaches are suitable only for few major genes. Unfortunately,
many traits that are of agronomic importance are controlled by small-effect minor genes
and such traits are important to the success of new crop varieties (Crosbie et al., 2003).
Heffner et al. (2009) cite two primary limitations of MAS: bi-parental populations used in
most QTL studies do not readily translate to breeding applications, and the statistical
![Page 35: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/35.jpg)
15
methods used are inadequate for polygenic traits controlled by numerous small-effect loci.
Collard (2013, pers. comm.) also points out that statistical methods currently in place for
MAS in public rice research institutes are not yet capable enough in resolving polygenic
traits.
The method by which MAS performs QTL mapping may be poorly suited to crop
improvement (Jannink et al., 2010). Bi-parental populations may not represent the level of
allelic diversity and phase of the breeding program and other breeding programs that intend
to use QTL mapping results. MAS partitions QTL mapping into two components: QTL
identification and estimation of effects. This frequently results to bias in estimation of
marker effects (Beavis, 1994; Melchinger et al., 1998), and small-effect QTLs may be
disregarded in the model (Lande and Thompson, 1990) due to the use of stringent
significance thresholds.
Estimation bias has been demonstrated in a simulation by Beavis (1994), showing
the impact of sampling estimated effects of QTLs from a truncated distribution. In the
study, Beavis showed that the average estimates of phenotypic variances associated with
correctly identified QTL were greatly overestimated with smaller population size (n=100),
slightly overestimated with n=500, and fairly close to the actual magnitude when n=1,000.
This phenomenon has been subsequently called the Beavis effect. A theoretical exploration
of the Beavis effect was performed by Xu (2003), and a statistical explanation has been
since put forward that will improve interpretation of QTL mapping results.
Other limitations such as those mentioned by Xu and Crouch (2008) are not
limitations of the technology per se, but on the lack of emphasis on the applied value of the
![Page 36: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/36.jpg)
16
technology in plant breeding, as he argued that logistical constraints in applying MAS are
rarely addressed in scientific publications.
Linkage Disequilibrium-Based Mapping
Linkage disequilibrium refers to a non-random association between alleles at
different loci (Bernardo, 2010). It is a parameter of a population and it is essentially the
ability to predict the allele at one locus based on the allele state at another locus, hence it
is defined in terms of correlation of alleles on two loci. Linkage disequilibrium (LD) is
measured as the difference between observed frequency of a gamete in a population and
the product of the frequencies of the corresponding alleles:
𝐿𝐷 = 𝜌(𝐴𝑖𝐵𝑗) − 𝜌(𝐴𝑖)𝜌(𝐵𝑗)
where 𝜌(𝐴𝑖𝐵𝑗) is the observed frequency of the 𝐴𝑖𝐵𝑗 gamete; 𝜌(𝐴𝑖) is the frequency of 𝐴𝑖
allele; and 𝜌(𝐵𝑗) is the frequency of 𝐵𝑗 allele. By definition, linked loci are in LD with
each other but loci from different chromosomes or linkage groups can be in LD with one
another. LD is influenced by many factors such as recombination events in the pedigree,
selection history, allele frequencies, and random drift among others.
Linkage disequilibrium-based mapping or association mapping addresses the non-
relevance of bi-parental populations by utilizing breeding populations (Rafalski, 2002).
Kraakman and co-workers (2004) demonstrated this approach by identifying QTLs for
yield and yield stability in breeding populations of spring barley.
![Page 37: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/37.jpg)
17
Association mapping allows identification of QTLs on breeding populations
already with existing extensive phenotypic data across locations and years. The obvious
advantage over bi-parental populations is that there is no need of developing such mapping
populations that would impose added investment on the breeding program. Directly using
breeding populations also eliminates the need for costly QTL validation experiments
because QTL values can be directly used in MAS on the breeding population (Breseghello
and Sorrels, 2006).
Recent association mapping studies in various cereal species explored the genetic
architecture of complex traits such as aluminum tolerance (Famoso et al., 2011) and harvest
index (Li et al., 2012) in rice, plant height components and inflorescence architecture in
sorghum (Morris et al., 2013), and Fusarium head blight resistance in wheat (Miedaner et
al., 2011). As with traditional QTL mapping however, association mapping uses arbitrary
significance thresholds that may result to identification of only a few QTLs with
overestimated effects (Beavis, 1998).
Genomic selection
Genomic selection (genome wide prediction or genome wide selection) is a
procedure that uses genotype data from a large set of random markers across the genome
that allows prediction of the phenotype from the marker data alone. It was first proposed
by Meuwissen and co-workers (2001) as an improvement on the two-stage procedure of
Lande and Thompson (1990). While the two-stage procedure requires selection of
significant markers from a large set then combining this information with phenotypic data
to create a selection index, Meuwissen's method uses all available data – locus, haplotype,
![Page 38: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/38.jpg)
18
or marker effects – in a single stage to calculate genome-estimated breeding values
(GEBV).
Genomic selection simultaneously estimates all effects included in the model while
traditional MAS first identifies significant QTLs and subsequently estimates their effects.
Genomic selection captures the total genetic variance through markers by fitting both large
and small effect QTLs without significance testing, and jointly analyzes all markers on a
population to explain the total genetic variance (de los Campos et al., 2009).
Figure 1 illustrates a general breeding scheme with genomic selection on a
population derived from a bi-parental cross. The basic concept is prediction of breeding
values for individuals (prediction set or population) with genotype data alone (hence,
genomic-estimated breeding values) based on a model created or "trained" from a separate
set of individuals (hence training set or population) having both genotype and phenotype
data. A subset of progenies from a bi-parental cross can be genotyped and phenotyped to
compose the training population or training set. The training population is used to estimate
parameters for the prediction model, which is then applied to the remainder of the
population using genotype information.
One of the most revolutionary ideas in genomic selection is that phenotyping is no
longer used as a means to select individuals, but as a means to train the prediction model.
GEBVs are predicted for untested individuals based on SNP profile alone, and selections
are then done on GEBVs. Effects are valid for entire population and are stable over
generations due to small segments of the chromosome represented. Genome-assisted
predictive hybrid breeding is best utilized in well-defined heterotic pools where parental
lines of inbred families share co-ancestry, having been derived from a set of founder lines.
![Page 39: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/39.jpg)
19
Here, the prediction model is based on the information obtained from multiple families (as
opposed to a single bi-parental population).
Figure 1. A general genomic selection scheme applied to a breeding population showing
the partitioning of the population into training and prediction sets. Only the
training set is phenotyped (e.g. yield trials) instead of the whole population.
Statistical Models of Estimating GEBV
Limitations of Stepwise Regression Models
The availability of high-density genotyping panels pushed the development of
genomic selection to predict complex traits (Meuwissen et al., 2001). As discussed,
traditional MAS models arbitrarily set marker effects to zero (not significant) or the full
value (significant), which results to overestimation of marker effects. Meuwissen and co-
workers (2001) attempted to address the bias of overestimated marker effects by avoiding
selection of "significant" markers during estimation of marker effects and calculation of
Training Set Prediction Set
Breeding Population
Prediction Model
Phenotyping and Genotyping
Genotyping
Calculate GEBVs
Make selections
![Page 40: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/40.jpg)
20
genetic values. This resulted to the number of predictor effects (p) to be estimated being
larger than the number of observations (n). Using least squares is not appropriate for
analyzing these "large p, small n" datasets due to insufficient degrees of freedom (reviewed
by Lorenz et al., 2011).
To resolve this, several statistical models for genomic selection have been proposed
and used in other crop species. The general model described by Moser et al. (2009) is as
follows:
𝑌𝑖 = 𝑔(𝑥𝑖) + 𝑒𝑖
where 𝑌𝑖 is the observed value of the phenotype of individual i, xi is a 1 x p vector of SNP
genotypes on individual i, g(xi) is a function relating genotypes to phenotypes or the GEBV,
and ei is the error term. Meuwissen et al. (2001) enumerated several statistical models such
as Ridge Regression Best Linear Unbiased Prediction (RR-BLUP) and Bayesian methods
(BayesA, BayesB , Bayes Cπ).
Ridge Regression BLUP
RR-BLUP is one of the first models proposed for genomic selection in bi-parental
crosses (Whittaker et al., 2000). Compared with stepwise regression models in traditional
MAS for which the number of markers cannot be more than the number of observations,
RR-BLUP is not limited by “large p, small n” problems. The basic model for RR-BLUP is
as follows:
𝑌 = 𝑊𝐺𝑢 + 𝑒
![Page 41: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/41.jpg)
21
where u is a vector of marker effects with a normal distribution, mean of zero and variance
of 𝐼𝜎𝑢2, G is the genotype matrix, and W is the design matrix relating lines to observations
(Y). Marker effect BLUPs can be estimated as:
�̂� = (𝑍′𝑍 + 𝜆𝐼)−1𝑍′𝑌
where Z = WG. The ridge parameter λ is the ratio between the residual and marker
variances, 𝜆 = 𝜎𝑒2 𝜎𝑢
2⁄ (Searle et al., 2006).
RR-BLUP shrinks marker effects toward zero and assumes markers as random
effects with a common variance (Whittaker et al., 2000). Bernardo and Yu (2007) clarified
that common variance does not mean that all markers have the same effects but that the
shrinkage toward zero are equal. This assumption however is not realistic; markers do not
have equal variances. Despite the fact that RR-BLUP incorrectly assumes equal marker
variances, it is superior to traditional MAS models (e.g. stepwise regression) because it can
simultaneously estimate marker effects, avoiding biases associated with selecting markers
in a stepwise regression.
Bayesian Methods
RR-BLUP can have the tendency to overshrink large effects. Bayesian models have
been applied to address the equal variances assumption and account for marker effects of
different sizes (Hayes, 2007), where separate variances are estimated for each marker and
are assumed to follow a specified prior distribution, allowing each marker to be shrunken
toward zero to a different degree. The Bayesian approach to analysis takes into account the
![Page 42: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/42.jpg)
22
following: prior knowledge about the parameters before data are observed, likelihood
probability of observing the data given a certain value of the parameters, and posterior
knowledge about the parameters after the data are observed, and estimates of the
“compromise” between the Data and Prior are derived.
Fernando (2007) describes that prior probabilities quantify beliefs about parameters
before the data are analyzed. Parameters are related to the data through the model or
“likelihood”, which is the conditional probability density for the data given the parameters.
The prior and the likelihood are combined using Bayesian theorem to obtain posterior
probabilities, which are conditional probabilities for the parameters given the data.
Inferences about parameters are based on the posterior. The Bayesian theorem is illustrated
by Fernando (2009) as follows:
Let 𝑓(𝜃) denote the prior probability density for θ;
Let 𝑓(𝑦|𝜃) denote the likelihood;
Then the posterior probability of θ is:
𝑓(𝜃|𝑦) = 𝑓(𝑦|𝜃)𝑓(𝜃)
𝑓(𝑦)
∝ 𝑓(𝑦|𝜃)𝑓(𝜃)
Meuwissen et al. (2001) initially proposed two types of prior distribution of marker
variance. In BayesA, each marker effect k is drawn from a normal distribution with its own
variance: 𝑁(0, 𝜎𝛽𝑘
2 ). BayesA uses an inverted chi-square distribution with degrees of
freedom and scale parameters chosen so that the mean and variance of the distribution
match the expected mean and variance of the marker variances (Heffner et al., 2009).
BayesA however, does not permit the value of zero for marker variances. The second type
![Page 43: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/43.jpg)
23
of prior, BayesB, assigns a probability that a marker has no effect at all, offering a more
realistic model since some regions in the genome will have no QTLs for a particular trait,
and therefore it is expected that markers on these region would have zero effects. The
Bayesian model can be represented as (Lorenz et al., 2011):
𝐺𝐸𝐵𝑉 = 𝑔(𝑋𝑖) = ∑ 𝑥𝑖𝑘𝛽𝑘𝛾𝑘
𝑝
𝑘=1
where 𝑔(𝑋𝑖) or GEBV is the sum of p marker effects, 𝑥𝑖𝑘 represents SNP score for
individual i at marker locus k, 𝛽𝑘 is the effect of marker k, and 𝛾𝑘 is an indicator variable
specifying the presence of marker k in the prediction model. The prior distribution of 𝛽𝑘
variances in BayesB are mixed such that 𝜎𝛽𝑘
2 = 0 with probability π and 𝜎𝛽𝑘
2 ~ 𝛸−2(𝑣, 𝑆)
with probability (1 − 𝜋). If 𝜋 = 0, the model becomes BayesA.
Lorenz et al. (2011) noted that reasonable values for the parameter π will be
unknown in the context of biological organisms. BayesCπ addresses this limitation by
estimating the parameter π itself, setting a uniform prior distribution between 0 and 1 for
the parameter. BayesCπ assumes the marker effect 𝛽𝑘 is zero when the indicator variable
𝛾𝑘 is also zero, and that that the prior variance for the effects of all markers for which 𝛾𝑘 =
1 is equal, or 𝛽𝑘 ~ 𝑁(0, �̂�𝛽2). This approach groups the markers into zero and non-zero
effects, from which estimates for marker effect variances are obtained. Bayesian Lasso
(Legarra et al., 2011) models the SNP effect 𝑎 as:
𝑝(𝑎|𝜎2, 𝛾) =𝛾
2𝑎exp [
−𝛾|𝑎|
𝜎]
![Page 44: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/44.jpg)
24
Kernel and Machine Learning Methods
Gianola and van Kaam (2008) were the first to apply reproducing kernel Hilbert
spaces (RKHS) regression for genomic selection by combining a classical additive genetic
model with a kernel function, shown as follows:
𝑌 = 𝜇 + 𝐾ℎ𝛼 + 𝑒
with prior distributions for marker effects 𝛼 ~ 𝑁(0, 𝐾ℎ𝜎𝛼2) and residuals 𝑒 ~ 𝑁(0, 𝐼𝜎𝑒
2).
The kernel matrix 𝐾ℎ is defined as:
𝐾ℎ(𝑥𝑖, 𝑥𝑗) = exp (−ℎ𝑑𝑖𝑗)
where 𝑑𝑖𝑗 is the squared Euclidean distance between individuals i and j derived from
marker genotypes and h is defined as 2/𝑑∗ where 𝑑∗ is the mean of the Euclidean distances.
This method was used by Neves et al. (2012) on mice populations.
Machine learning is being explored for massive amounts of information wherein
there is a need to mine knowledge from large, noisy, redundant, missing and fuzzy data,
extracting hidden relationships that exist in these huge volumes of data and do not follow
a particular parametric design (Gonzalez-Recio, 2010). An example of a machine learning
method is Random Forest, an ensemble learning method for classification (and regression)
that operates by constructing a multitude of decision trees at training time and outputting
the class that is the mode of the classes output by individual trees (Breiman, 2001).
Ensembles are combinations of different simple methods or models, resulting to very good
![Page 45: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/45.jpg)
25
predictive abilities compared to the individual models if used separately. Ensembles have
known statistics properties and have no prior assumptions similar to Bayesian methods.
Some of the advantages of Random Forest include not requiring specified inheritance
models (e.g. additive, dominance and epistasis), ability to capture more complex
interactions in the data, and reduction of error prediction by a factor of the number of trees
(Breiman, 2001).
Accuracy of Genomic Selection
The accuracy of genomic selection models is usually expressed in terms of Pearson
correlation coefficient of GEBV predicted by the model and the observed (empirical)
phenotypic data ((Storlie and Charmet, 2013), i.e. 𝑟(𝐺𝐸𝐵𝑉: 𝐸𝐵𝑉). Although a great
majority of researchers report genomic selection accuracy as 𝑟(𝐺𝐸𝐵𝑉: 𝐸𝐵𝑉), other
researchers such as Lorenz et al. (2011) argue that correlated error component generated
by 𝐺 × 𝐸 will be obtained for both GEBV and EBV if training and validation data are
collected in the same environment resulting to bias, i.e. overestimated prediction accuracy.
Therefore, correlation with the true breeding value 𝑟(𝐺𝐸𝐵𝑉: 𝑇𝐵𝑉) may be obtained by
using the assumption:
𝑟(𝐺𝐸𝐵𝑉: 𝐸𝐵𝑉) = 𝑟(𝐺𝐸𝐵𝑉: 𝑇𝐵𝑉) × 𝑟(𝐸𝐵𝑉: 𝑇𝐵𝑉)
which is true if the only component common between GEBV and EBV is the TBV.
Specifically, the training and validation data should be obtained from different
environments to satisfy the condition that residuals should be uncorrelated:
![Page 46: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/46.jpg)
26
𝐺𝐸𝐵𝑉 = 𝑇𝐵𝑉 + 𝑒1
𝐸𝐵𝑉 = 𝑇𝐵𝑉 + 𝑒2
The correlation 𝑟(𝐸𝐵𝑉: 𝑇𝐵𝑉) is equal to the square root of heritability within the validation
set.
Research Management of Plant Breeding Programs
Plant breeding research programs are becoming increasingly complex due to the
integration of allied sciences in addressing common social issues such as poverty and food
security. Plant breeders are now working much more closer with pathologists,
physiologists, social scientists, statisticians, geneticists, engineers and many others. It is
therefore crucial for plant breeders to be adept not just in genetics, breeding, statistics and
other technical skills, but also in the areas of leadership and management.
Knowledge, Experience and Skill Requirements from Plant Breeders
Repinski and co-workers (2011) discussed the expectations of various stakeholders
(public and private sectors, and institutes from developing countries) of plant breeders in
terms of critical knowledge, experience and skills. Knowledge in plant breeding, breeding
methodology, quantitative genetics, statistics and experimental design are highly required.
Equally important are knowledge in project management, which includes managing
personnel and budgets, establishing goals and timelines, and maintaining relationships
among multiple support teams within the organization and with external teams. Crucial
experiences include field know-how which includes data collection and analysis, writing
![Page 47: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/47.jpg)
27
scientific reports, mentorship, and oral presentations. Skills identified by the authors as
critical are leadership and teamwork. The multi-dimensional competencies required from
plant breeders have been proposed by Gepts and Hancock (2006) and Applegate (2002),
emphasizing the shift from purely research goals to a more inter-disciplinary approach.
Breeding Programs as Part of Meta-Organizations
Almost all private breeding programs are part of a meta-organization, in this case
the agriculture business organization. Large multinational companies such as Syngenta,
Monsanto, East-West Seeds, DuPont and Bayer all have breeding programs for crops that
fit their strategic direction. One common crop for these companies is hybrid corn.
Companies that also market a wide range of pesticides have breeding programs for rice and
a diverse range of vegetable crops. Monsanto’s seed business is largely based on biotech
traits, hence rice is not a good fit for their overall strategy. Private companies have large
support functions for allied sciences such as pathology, bioinformatics, statistics and other
fields. A substantial part of large projects are also outsourced such as building database
suites and analyses software, which are temporary undertakings. Logistics support such as
greenhouse and nursery teams, finance, purchase and procurement and legal teams provide
service to R&D teams and other teams as well. The core business functions responsible for
delivering products to the market include production, marketing and sales; there may be
variations among companies but the essential tasks are represented by these three business
functions.
![Page 48: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/48.jpg)
28
A typical breeding program consists of breeders and assistant breeders, support
scientists for pathology, molecular markers, doubled haploids and other technical areas,
program support for greenhouse and field management, field trial establishment and
maintenance, and database management. A full-fledged breeding program in the private
sector may have an annual funding of USD 250,000-500,000 (Bliss, 2006).
Structure of Research Organizations
Different research organizations have different approaches in organizing their R&D
function, the most common of which is the matrix organization (Galbraith, 1971). A matrix
organization uses teams of employees to accomplish work, in order to take advantage of
the strengths, as well as make up for the weaknesses, of functional and decentralized forms.
A matrix may exist within the meta-organization by grouping products as projects as shown
in Figure 2, and managing these projects across functions.
Figure 2. A diagram of a matrix organization with product delivery managed as projects
across functions.
![Page 49: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/49.jpg)
29
Within research organizations or R&D, matrix structure may also exist by
managing specific projects across research logistics service groups. For example, a
breeding organization that aims for resistance to bacterial leaf blight (BLB) in rice hybrids
may manage the project as follows:
1. Breeding crosses and maintenance of breeding populations are done through a team
in charge of field nurseries and hybridization work.
2. Sampling of leaf tissues and subsequent genotyping are accomplished through a
genotyping team.
3. Inbreds are screened for resistance to BLB by a pathology team.
4. Inbreds that are selected are submitted to a research seed production team for
creation of hybrids.
5. Hybrid seeds are handed over to a trialing team for evaluation in several locations
and in BLB hotspots.
6. Hybrids are also given to the pathology team for confirmation of BLB resistance.
Introducing Change into Breeding Organizations
Breeding technologies such as marker assisted selection have been introduced into
classical breeding programs in the past (Collard et al., 2008). These changes were largely
brought about by the maturity of the technology and numerous research works that confirm
the usefulness of MAS.
Kotter (2012) proposed the following steps in leading change, which have been
annotated in this work with examples from a plant breeding organization:
![Page 50: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/50.jpg)
30
1. Create a sense of urgency. The need to feed a predicted world population of 9 billion
in 2050 is an urgent scenario that must be addressed by all breeding programs in
public and private sectors. Among breeding programs in the private sector, the
urgency is expressed in preventing lost revenues and market share.
2. Build a guiding coalition. Kotter suggests to mobilize sponsors who are effective
people — coming from its own ranks — to guide, coordinate and communicate the
planned change. In plant breeding programs implementing genomic selection, a
guiding coalition may be composed of senior members of the group.
3. Form a strategic vision and initiatives. Breeding programs may target an additional
percent of increase of genetic gain and identify initiatives on how to attain the
vision such as implementing genomic selection and improving robustness of field
trialing and phenotyping.
4. Enlist a volunteer army. People who are open to new ideas are included in the early
stages of implementing new technologies. These individuals then become
champions for change and their stories serve to inspire buy-in from others.
5. Enable action by removing barriers. Implementation of change requires breeders’
substantial time and effort taken away from routine activities. A genomic selection
proof of concept allows the change being introduced to be managed as a project
separate from the whirlwind of everyday activities and provides focus to the
persons and teams involved.
6. Generate short-term wins. Gain in selection, resources saved and status of proof of
concept experiments must be collected and tracked to energize the team in pushing
the change forward.
![Page 51: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/51.jpg)
31
7. Sustain acceleration. Results from proof of concept experiments should be adapted
by the organization quickly to stay the course toward the vision of why genomic
selection helps attain that vision.
8. Institute change. Connections between adopting the change and organizational
success must be formally communicated and introduced to ensure that new
behaviors are repeated over the long term.
![Page 52: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/52.jpg)
32
CHAPTER 3
MATERIALS AND METHODS
Phenotyping and Phenotypic Analysis
Multi-location yield trials of 510 genotypes (experimental hybrids and checks)
were conducted from July 2013 to May 2015 as part of Syngenta’s trialing program,
consisting of 24,415 plots distributed across 332 locations. Since the trials were
incorporated into an established commercial breeding program, experimental design was
highly unbalanced as hybrids were differentially advanced and rejected during the two-
year trialing duration. Trials were conducted in randomized complete block design
(RCBD) with three replications.
Days to 50% flowering or DTF (in days after sowing) was measured according to
the Standard Evaluation System for rice (IRRI, 1980). Flowering date is a reliable measure
of crop maturity. Flowering usually occurs two weeks after heading, and the crop is ready
to harvest after another four weeks. Plant height (in cm) was measured from the base of
the plant to the tip of the primary panicle (IRRI, 1980) of pre-determined plants in the
harvest area of a plot. Plot yield (YLD) was obtained by harvesting the inner rows and
adjusting the weight to 14% moisture to compute for yield in kg per hectare.
Data were checked for quality by eliminating unexplained outliers in the
distributions visualized by running the datasets in JMP® software (SAS Institute). Cleaned
datasets were analyzed in R Version 3.2.4 (R Development Core Team, 2015) and best
linear unbiased predictors (BLUPs) for the traits including general combining ability
estimates which were computed using the R package lme4 (Bates et al., 2015).
![Page 53: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/53.jpg)
33
Genotyping
Preparation and Processing of Tissue Samples
Of the 214 parental lines of experimental hybrids, 122 were classified as elite lines
based on historical performance and usage in breeding crosses (not included in this study).
Hence, only these 122 lines were included in genotyping.
Leaf samples were collected from 21-d-old plants sown in Syngenta’s breeding
station in General Santos City, Philippines. Samples were freeze-dried for 48 hours using
a Virtis 12ES lyophilizer (SPS SCIENTIFIC, Gardiner, NY, USA), at −50 °C and 30.0
mTorr pressure and shipped to Syngenta’s high-density genotyping facility in Toulouse,
France. DNA was extracted using a sap (or juice) extractor (MEKU Erich Pollähne
G.m.b.H) and genomic fingerprints were generated using Syngenta’s proprietary 60K SNP
chip. The chip was built specifically for Syngenta’s rice germplasm and can assay 60,000
SNP loci.
Quality Filtering and Reformatting of SNP Markers
Quality filtering was applied to all 60,000 SNP markers. Markers with certainty of
<0.9 were not included. Markers with alleles having <0.05 frequency were also removed
from the dataset as they represent rare alleles which are not useful in genomic selection
although they are valuable in other types of analyses such as diversity and haplotype
analyses. About 43,000 high quality SNPs were selected.
Marker dataset was reformatted to a form required by prediction algorithms by
converting the SNP nucleotide calls into {-1,0,1} where “-1” represents the allele with
![Page 54: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/54.jpg)
34
lower frequency in the population, “1” represents the allele with the higher frequency, and
“0” refers to a heterozygote call. Naive imputation was used for missing SNP data for all
lines and markers. Naive imputation takes the mean score of the SNP markers per genotype
as values for the missing scores.
Estimation of Genetic Relationships
Relationships among parental lines were estimated using a realized relationship
matrix proposed by VanRaden (2008) for dairy cattle. A more detailed description on the
use of this matrix is discussed in the Results section. Principal component analysis (PCA)
explaining the genetic relationships was implemented in JMP® to assess the population
structure.
Implementing Genomic Selection Models
Genomic selection was implemented using various statistical models: Ridge
Regression Best Linear Unbiased Prediction (RR-BLUP), Bayesian Ridge Regression
(BRR), BayesCPi and Bayesian LASSO (BL). The rrBLUP package developed by
Endelman (2011) was used to estimate marker effects and breeding values using ridge
regression and genomic BLUPs (GBLUP).
Matrix algebra functions were used to obtain genetic and error variances (𝜎𝑔2 and
𝜎𝑒2) in training populations sampled or simulated from the dataset generated. The shrinkage
parameter 𝜎𝑒2 𝜎𝑔
2⁄ was included in the mixed model to estimate the marker effects.
![Page 55: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/55.jpg)
35
Bayesian methods were run as single chains of 2,000 iterations using BGLR
package (de los Campos and Perez, 2013) with the first 1,000 runs discarded as burn-ins.
Three variations of the models were implemented: Bayesian Ridge Regression (BayesRR),
Bayesian Cπ (BayesCPi) and Bayesian Lasso (BayesL). Descriptions of these models are
discussed in the Results section.
Marker effects computed from the models were used to predict the estimated
genetic values in the validation populations sampled from the dataset. The GEBV
prediction model is
𝐺𝐸𝐵𝑉 = 𝑀�̂�
where 𝑀 is the marker matrix and �̂� are the estimated values for marker effects.
Design of Training and Validation Populations
To validate the accuracy of GEBV, the dataset of phenotypic values were divided
into training sets and validation sets. Factors identified in the design of training and
validation populations are population size and prediction model. Training population size
was varied by performing two different cross validation procedures. A 90% training
population size is where 90% of the population is assigned to the training set and the
remaining 10% is assigned to the validation set.
![Page 56: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/56.jpg)
36
Procedure for Cross Validation
Repeated ten-fold and three-fold cross validation (Fig. 3) were performed on 122
lines with available phenotypic and genotypic data. These folds represent 90% and 67%
training sets, respectively. The parental lines dataset was split into 𝑛 partitions equal to
number of desired cross-validation folds. Statistical models were implemented on 𝑛 − 1
partitions to create the prediction models which were then applied on the remaining 𝑛 −
(𝑛 − 1) partition. Pearson’s correlation was determined between the 𝑛 − 1 training set and
𝑛 − (𝑛 − 1) validation set for each round of cross validation and the average correlations
were determined to obtain the prediction accuracy of the model.
Cross validation was performed for every genomic selection model implemented
in rrBLUP and BGLR packages. The sample R code below performs a ten-fold cross
validation for 122 lines, each partition with 12 lines.
for(i in 1:10)
{
yldTrain <- yldShuff
yldTrain[count, 2] <- NA
modelBRR <- BGLR(y=yldTrain[,2], ETA=ETA, burnIn = 1000,
nIter=2000, verbose=FALSE)
BRRGebvs <- modelBRR$yHat[count]
correl[i] <- cor(BRRGebvs, yldShuff[count, 2])
tf.brr[count,] <- BRRGebvs
if(i<10) count = count + 12 else count = count + 13
print(correl[i])
}
![Page 57: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/57.jpg)
37
Figure 3. Cross validation procedure for ten-fold and three-fold schemes, representing 0.9
and 0.667 proportions of training population size. Each partition was
successively used as validation population from the prediction model derived
from the training population.
Comparison of Prediction Accuracies
As discussed, accuracy may be defined as the correlation between the empirical
breeding values (EBV) or observed values and the genomic estimated breeding values
(GEBV) in the validation set for each training population design. Accuracies were
compared for each set of means for genomic selection model, training population size, and
trait. Comparison of means was implemented in JMP®. Correlations between pairs of
genomic selection models were obtained from multivariate analysis and were also
implemented in JMP®.
![Page 58: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/58.jpg)
38
Optimization of Genomic Selection Parameters
Average correlations per cross validation were taken as genomic selection
accuracies. The lowest and highest trait heritabilities were used in place of nominal traits
so that quantitative optimization can be performed at least for two factors. These accuracies
were obtained from a 2x2x4 full factorial design (Table 1).
Table 1. Full factorial design for optimization of genomic selection parameters.
FACTOR NO. OF LEVELS LEVELS
Heritability
2
0.3130
0.5486
Training population size 2 0.90
0.667
Genomic selection model 4 rrBLUP
BayesRR
BayesCPi
BayesL
Optimizations, construction of model, and determination of contributions of main
effects and interactions were implemented in JMP® using a generalized linear model
(GLM) to predict genomic selection accuracy values with varying heritability, training
population size and genomic selection model. Prediction profiles were generated and
recommendations were generalized for the dataset used in the study.
![Page 59: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/59.jpg)
39
Creating a Genomic Selection Project Proposal
Using a research management perspective, a breeding program using genomic
selection was simulated and proposed as a project for implementation. To compare, a
breeding program without genomic selection was used as baseline. The project proposal
was based on the initial problem of low rate of genetic gain, from which a number of root
causes and ensuing effects were identified. Root causes related to management of a
research organization were identified and addressed using a project approach to integrate
genomic selection.
Declaration of Research Funding and
Non Conflict of Interest
This research study was funded by Syngenta. The author declares no conflict of
interest and that this research was solely for academic purposes in support of Syngenta’s
strategy to develop its employees. While Syngenta data was used, no confidential
information, such as pedigrees, DNA fingerprints, SNP marker names, genomic prediction
models and breeding strategies, was released. This study does not in any way reflect
Syngenta’s breeding strategies, as methods used in this study are publicly available.
![Page 60: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/60.jpg)
40
CHAPTER 4
RESULTS AND DISCUSSION
Quality of Field Trial Data
By-Location Coefficients of Variation
In any field trial, partitioning of the variance will always include residuals as the
error term. Location errors can be minimized by statistical design such as replications and
accounting for spatial trends (Gomez and Gomez, 1984). Extraneous errors, which are due
to procedures associated with conducting the experiment such as fertilizer application,
harvesting method and measurement method, can be controlled by improvement of
experimental protocols. Errors can be further minimized by removing unexplained outliers
from a field trial dataset, a method commonly referred to as data quality control.
The phenotypic dataset included 332 locations after discarding locations with more
than 20% coefficient of variation for yield after cross referencing the location observations
from the researchers involved. Valid conditions for high CV included disease pressure,
drought, lodging and pest damage. A few locations with CV>20% were included after no
apparent factors were validated that would explain such CV.
Figure 4 shows box plots of CVs of the locations grouped into seasons. Dry season
location CVs are slightly lower than CVs of wet season locations as shown in the
comparison of means. This aligns well with experience that dry season trials are less
exposed to diseases and heavy rains that would induce lodging, and hence would have less
experimental errors.
![Page 61: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/61.jpg)
41
Figure 4. Box plots of coefficients of variation of 332 locations grouped into
seasons. Connecting letter report was derived from comparison of
means using Student’s t-test, α=0.05.
Distributions of Yield, Days to 50% Flowering and Plant Height
The identification of best performing genotypes for the population of environments
or groups of environments is not within the scope of this dissertation, hence analyses of
genotype performance in comparison to checks were not performed. Distributions of trait
measurements were obtained for each location and outliers were eliminated before
performing BLUP analysis. A snapshot of trait distributions across 332 locations is
presented in Figure 5. Examination and, if necessary, elimination of outliers were done on
a per location basis due to the varying location means. Figure 6 shows the distribution of
traits per season. Visual examination of the histograms suggests that observations were
drawn from the normal distribution, which is a characteristic of most biological datasets.
![Page 62: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/62.jpg)
42
Figure 5. Trait x location box plots used in checking location data for quality showing the
spread of data points per location. Unexplained data points in each location were
discarded.
Figure 6. Distributions of yield, days to flowering (DTF) and plant height across locations
planted in wet and dry seasons.
![Page 63: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/63.jpg)
43
Distribution of Hybrids Across Locations
Not all 510 hybrids and checks were planted in every location, hence the data is
highly unbalanced, which can be visualized in the heat maps in Figure 7. The heat maps
also show the distribution of male and female parents on how these were used as parental
lines in the hybrids tested in 332 locations. The combined analysis in succeeding
discussions considered the whole set of locations as the subset of the target population of
environments, hence environmental variance with regard to locations was assumed to be
zero. This does not mean however that there are no differences among locations. In terms
of plant breeding practice, the whole set of locations was held as an orthogonal set. This
assumption will be reinforced by Best Linear Unbiased Prediction (BLUP) analysis, which
predicts the performance of hybrids in similar locations not included in the field trials.
Analysis of Multiple Locations
The highly unbalanced data structure requires prediction of genotype performance
to obtain adjusted means. For this requirement, several steps were taken to obtain BLUPs
for yield, days to flowering and plant height. Main effects and interactions (location,
season, location x season, replication, genotype, male parent, female parent, female x
male, genotype x location, genotype x season, and genotype x location x season) were
fitted in a linear mixed model that also includes effects necessary to derive general
combining ability (GCA) for the traits being analyzed.
![Page 64: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/64.jpg)
44
Figure 7. Heat map showing distribution of genotypes across locations. Male and female
parents are also shown, as represented by hybrid progenies and not as tested
genotypes.
Variance Components and Computed Trait Heritabilities
Variance components were assessed to partition genotypic, environmental and
genotype x environment variances from which trait heritabilities were computed (Table 2).
The general mixed model equation fitted by Residual Maximum Likelihood (REML) using
lme4 package (Bates et al., 2015) in R was of the form:
𝑡𝑟𝑎𝑖𝑡 ~ (1 | 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛) + (1 | 𝑠𝑒𝑎𝑠𝑜𝑛) + (1 | 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛: 𝑠𝑒𝑎𝑠𝑜𝑛) + (1 | 𝑟𝑒𝑝) + (1 | 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒)+ (1 | 𝑚𝑎𝑙𝑒) + (1 | 𝑓𝑒𝑚𝑎𝑙𝑒) + (1 | 𝑓𝑒𝑚𝑎𝑙𝑒: 𝑚𝑎𝑙𝑒) + (1 | 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒: 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛) + (1 | 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒: 𝑠𝑒𝑎𝑠𝑜𝑛) + (1 | 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒: 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛: 𝑠𝑒𝑎𝑠𝑜𝑛)
![Page 65: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/65.jpg)
45
Table 2. Variance components of the phenotypes yield, days to 50% flowering and plant
height derived from REML-fitted linear mixed models, and computed
heritabilities. Variance components, except genotypic variance, were divided by
the number of levels per source of variation.
SOURCE OF
VARIATION
YIELD
(H=0.313)
DTF
(H=0.5036)
PLANT HEIGHT
(H=0.5486)
df Variance df Variance df Variance
Environment
Location (L)
Season (S)
L X S
Reps (Environment)
Genotype (G)
Male
Female
Female X Male
G X L
G X S
G X S X L
Pooled Error
331
1
331
2
509
167
47
440
7939
567
7939
1566097.2
747332.3
66760.4
664.8
318181.7
24477.4
10242.9
13662.8
188247.8
9470.3
273579.1
227158.9
327
1
327
2
509
167
47
440
7857
567
7857
25.635899
0.215002
1.416065
0.001114
10.164752
2.196878
2.860957
0.505497
1.924406
2.582941
3.380668
2.129475
330
1
330
2
509
167
47
440
8025
567
8025
40.413
7.648
15.829
0.001
45.501
9.775
18.56
0.001
10.558
3.837
1.775
21.272
The variance components suggest the presence of genotype x environment (GxE)
interaction, the environment being different locations or seasons. Incorporation of GxE into
the prediction model is not within the scope of this study; hence the overall genomic
prediction analysis was done in two stages: obtaining adjusted means for each genotype
across all environments, and fitting the adjusted values into the prediction model.
Within locations, presence of replication variance components in yield variance
suggests the presence of local variation. Experimental errors which are location parameters
can be minimized by blocking (Gomez and Gomez, 1984). Blocking creates homogeneous
partitions in the field in which the nuisance factors are held constant to increase the
detection of variation in the factor of interest. In the design used in this study, blocking was
implemented on paddy fields separated by bunds or levees. Blocking was done primarily
![Page 66: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/66.jpg)
46
to counteract variation on fertilization and irrigation among paddy fields. Rows and ranges
were not recorded hence spatial adjustment cannot be performed. Replication did not
significantly contribute to DTF and plant height variance, hence was not included in the
BLUP model for these traits.
Heritabilities were computed from genotype GxE variances and pooled error
variance. Environmental variances were excluded in the heritability equation because these
are considered orthogonal in the context of plant breeding, i.e. the genotypes are tested in
similar environments. The equation for heritability is
𝐻 = 𝜎𝑔
2
𝜎𝑔2 +
𝜎𝑔𝑙2
𝐿 + 𝜎𝑔𝑠
2
𝑆 + 𝜎𝑔𝑙𝑠
2
𝐿𝑆 + 𝜎𝑒
2
𝑛
where 𝜎𝑔2 is the genotypic variance, 𝜎𝑔𝑙
2 is the genotype x location variance, 𝜎𝑔𝑠2 is the
genotype x season variance, 𝜎𝑔𝑙𝑠2 is the genotype x location x season variance, and 𝜎𝑒
2 is
the pooled error variance.
Deriving BLUPs from Linear Mixed Models
Applications of best linear unbiased predictions have been extensively reviewed by
Robinson (1991), Piepho et al. (2007) and others. In this study, BLUPs are applied to
predict genotype performance as adjusted trait measurements from unbalanced data
(Bernardo, 1995), i.e. genotypes are not planted in all locations, years, seasons and other
combinations of environments.
![Page 67: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/67.jpg)
47
All variables were assigned as random effects. The ranef function (e.g.
yldr<ranef(yldblupmodel) ) of the lme4 package extracts the conditional modes of the
random variables.
Shrinkage Toward the Mean
BLUP adjustments shrinked the yield towards the analysis mean, which is
consistent with the BLUP concept (Robinson, 1991) and empirical results on phenotypic
BLUPs (Bernardo, 1996a and 1996b; Piepho et al., 2007) and marker-based BLUPs
(Crossa et al., 2010).
The shrinkage mean or BLUP mean for yield and other response variables for a
given level of a random factor (years, location, etc.) is a weighted combination of the
analysis mean, based on the fixed effects, and the ordinary mean for the level of the random
factor. The variety means are calculated as:
𝑦�̅� =∑ 𝑌𝑖𝑗
𝑛𝑗=1
𝑛
for 𝑛 replications for a single location. The variety effect can be calculated by subtracting
the overall mean �̅� from the variety mean:
𝜏�̂� =∑ (𝑌𝑖𝑗 − �̅�)𝑛
𝑗=1
𝑛
![Page 68: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/68.jpg)
48
which corresponds to the Best Linear Unbiased Estimator or BLUE (Gilmour, 2010).
BLUEs estimate the posterior conditions of a field trial but it is not the best predictor of
future performance. Location error variance and genotypic variance are taken from
expected mean square computations and can be represented by the ratio:
𝛾 =𝜎𝑔
2
𝜎𝑒2
and this ratio is incorporated into the BLUE equation above to become:
𝜏�̃� =∑ (𝑌𝑖𝑗 − �̅�)𝑛
𝑗=1
𝑛 +1𝛾
=∑ (𝑌𝑖𝑗 − �̅�)𝑛
𝑗=1
𝑛 +𝜎𝑒
2
𝜎𝑔2
The predictor 𝜏�̃� now becomes the BLUP. The variance ratio shrinks the treatment effects
when added into the denominator. BLUPs are more likely to represent future results
(Gilmour, 2010) and are more appropriate for two-stage genomic prediction approaches
such as the method used in this study. Figure 8 exhibits the shrinkage (adjusted) mean
overlaid on the observed mean.
Adjustments are therefore less if the genetic variance is significantly more than the
error variance, which again reinforces the importance of reducing the error term through
appropriate experimental designs, plotsmanship and minimizing introduction of extraneous
variability into the trials. BLUPs can theoretically approximate the BLUE values if the
error variance approaches zero.
![Page 69: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/69.jpg)
49
Figure 8. Observed and adjusted values (BLUPs) for yield, days to 50% flowering and
plant height of 510 genotypes showing shrinkage of adjusted means toward the
analysis mean.
Deriving General Combining Ability
Hybrid crop breeding only makes sense in the context of cycle-over-cycle genetic
gain if general combining ability is the main criterion in using inbreds. In reciprocal
population improvement, new inbreds are testcrossed to testers from a complementing
pool. The testcross performance of the inbreds with the testers is generally interpreted as
the GCA and inbreds with the highest GCAs are advanced or promoted and used in the
next breeding cycle. This procedure is repeated; hence it is also called reciprocal recurrent
selection and inter-population improvement, and is a basic concept in any hybrid breeding
program.
![Page 70: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/70.jpg)
50
This study uses GCA of 122 parental lines for yield, DTF and plant height as
phenotypic values for the prediction models. GCA was derived from the trait BLUP models
using the following scripts:
yldr <- ranef(yldblupmodel)
yldfgca <- yldr$female #(Yield GCAs of females)
yldmgca <- yldr$male #(Yield GCAs of males)
dtfr <- ranef(dtfblupmodel)
dtffgca <- dtfr$female #(DTF GCAs of females)
dtfmgca <- dtfr$male #(DTF GCAs of males)
plthtr <- ranef(plthtblupmodel)
plthtfgca <- plthtr$female #(Plant height GCAs of females)
plthtmgca <- plthtr$male #(Plant height GCAs of males)
Figure 9 plots the GCAs of the parental lines for the three traits as positive or
negative values which are interpreted as the average contribution of the parental line to
hybrid performance. Since the models also specify that the males and females are random
effects, this assumption can be extended to the set inbreds from which the set of 122
parental lines are drawn, i.e. a breeding program. Hence, GCA is useful because it is
predictive of the success of a hybrid breeding program.
If this study is a breeding program, the breeder would typically select those with
positive GCA values for yield from the set of inbreds. Depending on the product profile of
a breeding program’s target environment, the breeder can also select parental lines that can
contribute early or late maturity, and short or tall plant height.
![Page 71: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/71.jpg)
51
Figure 9. General combining ability (GCA) for yield (kg), DTF (days) and
plant height (cm) of 122 parental lines of rice hybrids.
Throughout this manuscript, the terms “yield,” “days to flowering,” and “plant
height” refer to the general combining abilities for these traits.
Marker Coverage and Population Structure
The top 122 parental lines (41 females and 81 males) representing elite lines were
pre-selected for genotyping using a chip-based platform with 60,000 single nucleotide
polymorphisms (SNPs). These lines were represented by several seed sources some of
which have been genotyped previously. There were no lines with significant non-
concordance of allele calls, indicating very low rate of technical error, hence no lines were
![Page 72: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/72.jpg)
52
discarded. Pre-processing of 60,000 SNP marker alleles on the samples resulted to
retention of 43,344 markers which are co-dominant and without rare alleles (<5% allele
frequency).
Descriptive Statistics on SNP Marker Data
The 43,344 SNP loci used in the study are distributed throughout the twelve rice
chromosomes. The marker x genotype dataset wherein genotypes are assigned as column
names were transposed as a requirement of the marker matrix needed in the succeeding
analysis steps, a procedure done in R. Matrices in R can accommodate thousands of
columns and limitations are usually set by the computing power of the computer
performing the calculations. Recoding the allele calls to numerical scores {-1,0,1} was also
implemented.
The distribution of SNP scores {-1,0,1} was taken from the marker matrix to
visualize the frequencies of allele calls. Frequency of heterozygotes and imputed calls is
not significant (Fig. 10). Naive imputation substitutes missing data with the mean value for
the locus. This resulted to marker scores between the {-1,0,1} values, although the imputed
values are negligible in the overall data structure. Accuracy of prediction models can be
negatively impacted by heterozygote calls as well as excessive missing and imputed marker
data. Marker linkage position and LD information, if available, can be used to impute the
actual {-1,0,1} values of missing marker data.
![Page 73: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/73.jpg)
53
Figure 10. Distribution of {-1,0,1} allele calls in the marker matrix derived
from scores from 43,344 SNP loci.
Genomic Relationships and Principal Components
Additive relationship matrices play an important role in the prediction of breeding
values. The genetic merit of additive relationship matrices is in the infinitesimal model
wherein breeding value is considered to be the sum of thousands of allele effects. In the
classic infinitesimal model, Fisher (1918) postulated that a quantitative trait is controlled
by an infinite number of loci and each locus has an infinitely small effect. Large numbers
of markers with whole genome coverage can capture genetic similarity with more accuracy
than pedigree-based relationships because the genetic covariances would be based on the
actual proportion of the genome that is identical by descent between any two individuals
(Van-Arendonk et al., 1994). VanRaden (2008) also proposed that whole genome markers
can estimate the proportion of chromosome segments shared by individuals including
identification of genes that are identical in state.
![Page 74: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/74.jpg)
54
The 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒 𝑥 𝑚𝑎𝑟𝑘𝑒𝑟 matrix M with recoded values {-1,0,1} was transposed
to M’ and the two matrices were multiplied to obtain the MM’ matrix, as illustrated in the
following example:
𝑀𝑀′ = [1 0 −10 0 01 1 −1
] [1 0 10 0 1
−1 0 −1] = [
2 0 20 0 02 0 3
] 𝑓𝑜𝑟 Inbred AInbred BInbred C
In the MM’ product of the two matrices, the diagonal values count the number of
homozygous loci for each inbred. In the example above, Inbred A has two homozygous
loci, Inbred B has none and Inbred C has three homozygous loci. Off-diagonals count the
number of alleles shared by the inbreds. Inbreds A and C share two homozygous loci, while
none is shared between A-B and B-C.
The matrix was then centered and scaled so that rarer alleles are given more weight
and to standardize the mean of the diagonal elements to 1 + 𝑓, where 𝑓 is the inbreeding
coefficient. The rrBLUP function 𝐴. 𝑚𝑎𝑡 returns an additive relationship matrix based on
the above principles. Figure 11 shows a color map of the realized relationship matrix of
122 parental lines.
Eigenvectors of the relationship matrix were calculated in JMP® to generate
principal components. Figure 12 summarizes and plots the principal components that
correspond to the main clusters in the realized relationship matrix heat map.
The first two principal components already explain most of the variance in the
population structure. Population structure agrees with the existing pedigree structure (not
used in this study). The three clusters represent three genotype groups – one group
consisting mostly of CMS lines and two groups which are mostly restorer lines.
![Page 75: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/75.jpg)
55
Figure 11. Heat map of realized relationship matrix of 122 parental lines showing three
main clusters representing one female and two male clusters.
![Page 76: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/76.jpg)
56
Figure 12. Principal component analysis using marker data: (a) highest two principal
components in 2D plot, (b) highest three principal components in 3D plot, (c)
scree plot showing magnitude of eigenvalues and variance explained by the
principal components, and (d) summary table of eigenvalues.
Evaluation of Genomic Prediction Methods
This work is one of the first genomic selection studies in rice along with Spindel et
al. (2015) and Grenier et al. (2015), and arguably the first in hybrid rice. Data collected
from more than 24,000 plots over four seasons of field trialing in more than 300 locations,
![Page 77: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/77.jpg)
57
were used in exploring suitability of genomic selection. Three traits with varying
heritabilities were predicted using four genomic selection models. Training population size
were varied in the cross validation procedure by differential partitioning of the datasets,
i.e. three-fold and ten-fold cross validation procedures correspond to 2/3 and 9/10 training
population sizes, respectively.
Since the study used SNP chips and also realizing that SNP chips and other fixed
platforms are becoming more common and more affordable, marker density was not held
as a variable. Marker density optimizations are applicable only to flexible marker platforms
but these are relatively more expensive than fixed platforms.
Genomic BLUP (GBLUP) and Ridge Regression
Genomic Estimated Breeding Values (GEBVs) can be estimated by ridge
regression directly relating the imputed marker matrix to the phenotype using the mixed
model function in rrBLUP:
ridgeyld<-mixed.solve (y=phenoyldgca$yldgca, Z=genoimputed)
Another method, GBLUP, uses genomic relationship matrix through the kinship
BLUP function (kin.blup) of rrBLUP package, instead of the imputed marker matrix:
gblupyld<-kin.blup(data=phenoyldgca,geno='parent',pheno='yldgca', K=G)
Using RR-BLUP as genomic selection model, the accuracies of the two methods
for three traits were compared (Figure 13; Table 3). Results indicate that the two methods
![Page 78: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/78.jpg)
58
provide almost similar accuracies. It was earlier discussed that the realized genomic
relationships used in GBLUP was directly derived from the marker matrix.
Table 3. LSD threshold matrix comparing prediction accuracy means of GBLUP and
Ridge Regression for GCA of 122 rice parental lines for three traits.
YIELD DTF PLANT HEIGHT
GBLUP Ridge
Regression
GBLUP Ridge
Regression
GBLUP Ridge
Regression
GBLUP
-0.18265
-0.18265
-0.14099
-0.14099
-0.12086
-0.12086
Ridge
Regression
-0.18265 -0.18265 -0.14099 -0.14099 -0.12086 -0.12086
Positive values show pairs of means that are significantly different.
Figure 13. Correlation of GEBVs in RR-BLUP using marker data directly (Ridge
Regression) and genomic relationships (GBLUP).
![Page 79: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/79.jpg)
59
Due to these results, genomic relationships were used in succeeding analyses
instead of marker matrix directly. The realized relationship matrix is a 122 x 122 matrix
corresponding to the number of lines in both the rows and columns, while the marker matrix
is a large 122 x 43,344 matrix corresponding to the number of lines in the rows and the
number of SNP loci in the columns.
Effect of Trait Heritability on Prediction Accuracy
Genomic selection accuracy is affected by the heritability of the trait (Lorenz et al.,
2011; Bernardo, 2010; Asoro et al., 2011). First, low heritability will be reflected by the
data collected from field trials which is used to train the genomic selection model. Second,
heritability is commonly utilized to account for the unknown true breeding value (TBV),
as what is observed in the field is the empirical breeding value (EBV). The correlation of
GEBV to EBV is divided by the square root of the heritability to relate GEBV to TBV:
𝛾𝐺𝐸𝐵𝑉,𝑇𝐵𝑉 =
𝛾𝐺𝐸𝐵𝑉,𝐸𝐵𝑉𝛾𝐸𝐵𝑉,𝑇𝐵𝑉
= 𝛾𝐺𝐸𝐵𝑉,𝐸𝐵𝑉
√𝐻
The effect of predicted trait on genomic selection accuracy is a function of the trait
heritability. The computed heritabilities of the traits are presented in Table 2. Heritability
was computed as genotypic variance divided by the total phenotypic variance, excluding
variance due to location. The yield trial was assumed to be conducted in an orthogonal set
of locations. Figure 14 illustrates the variability chart for the prediction accuracy with the
![Page 80: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/80.jpg)
60
predicted trait as the main variable group, showing that the trait with the lowest heritability,
yield, has the lowest prediction accuracy.
The variability plots of correlations of GEBV and EBV versus cross validation
method across genomic selection models and traits indicate that prediction accuracies are
generally lower in the three-fold method, except for Bayesian LASSO. The variability
graph also indicates that Bayesian LASSO has the lowest correlation except for 10-fold
validation on plant height. Yield again has the lowest prediction accuracy. The preceding
variability charts also show that traits with low heritability are predicted less accurately.
Prediction accuracy means for the three traits were compared using Tukey’s HSD
test (Tukey, 1949) at α=0.05 (Figure 15; Table 4). Prediction accuracies for the traits differ
significantly, although the effect is not contributed exclusively by trait alone. Yield, having
the lowest heritability, has the lowest prediction accuracy. Heritabilities of DTF and plant
height are similar by plant breeding standards, but their prediction accuracies significantly
differ. This suggests differences in trait architecture such as number of QTL involved and
the magnitude of the effects of each QTL.
![Page 81: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/81.jpg)
61
Figure 14. Variability chart of prediction accuracies per trait in 122 rice parental
lines. First and second level factors were interchanged between the two
graphs.
![Page 82: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/82.jpg)
62
Figure 15. Box plots of prediction accuracy per trait in 122 rice parental lines and
comparison circles based on Tukey’s honest significant difference test.
Table 4. HSD threshold matrix of genomic selection accuracy means between all pairs
of traits.
PLANT HEIGHT DTF YIELD
Plant height
-0.07295
0.07926
0.20425
DTF 0.07926 -0.07295 0.05204
Yield
0.20425 0.05204 -0.07295
Positive values show pairs of means that are significantly different.
Effect of Training Population Size on Prediction Accuracy
Training population size was varied by using two cross validation methods. Ten-
fold cross validation means dividing the population into ten parts and using nine parts as
training set and the tenth part as validation set, and performing ten rounds of cross
validation using different parts as validation sets. This therefore corresponds to a training
population size of 108 lines, or 90% of total number of lines. Three-fold cross validation
![Page 83: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/83.jpg)
63
method can also be interpreted as using two-thirds of the total number of lines as training
set. The variability chart in Figure 16 shows that a ten-fold cross validation generally has
greater prediction accuracy than a three-fold method. These results on training population
size are consistent with the findings of other researchers (Asoro et al., 2011; Hickey et al.,
2014).
The training population used so far in this study is composed of mixed
subpopulations, wherein the subpopulations or clusters were described previously. The
presence of subpopulations in a breeding germplasm may occur based on breeding history
i.e. frequency of use of a few elite lines as parents of breeding crosses. This is especially
true in hybrid breeding programs utilizing heterotic pools wherein heterotic pools represent
subpopulations.
Mixed subpopulation training sets have been used in cattle (Hayes et al., 2009)
where mixed-breed Jersey and Holstein populations were used to predict purebred Jersey
or Holstein individuals with similar accuracies to within-breed predictions.
Figure 17 shows the comparison of means between the cross validation methods or
training population size using Tukey’s HSD test at α=0.05. The HSD threshold matrix is
given in Table 5. Across all traits and prediction methods, training population size is
significantly different.
![Page 84: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/84.jpg)
64
Figure 16. Variability chart of prediction accuracy per cross validation method. The
first and second level factors were interchanged between charts.
![Page 85: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/85.jpg)
65
Figure 17. Box plots of prediction accuracy per cross validation method (training
population size) and comparison circles based on Tukey’s honest significant
difference test.
Table 5. HSD threshold matrix of prediction accuracy means between two cross
validation methods or training population size.
10-FOLD 3-FOLD
10-fold
3-fold
-0.04806
0.03558
0.03558
-0.08775
Positive values show pairs of means that are significantly different.
Effect of Genomic Selection Model on Prediction Accuracy
Genomic selection models used in this study were selected based on the reported
accuracies and usefulness of various models in literature. Most studies report the usefulness
of RR-BLUP and Bayesian methods (Heffner et al., 2009; Lorenz et al., 2011). Habier et
al. (2007) showed that RR-BLUP modelled genetic relationships more accurately because
it fitted more markers into the model than Bayesian methods, although Bayesian methods
![Page 86: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/86.jpg)
66
were able to consider marker-QTL association into the model. Hayes et al., (2009),
Lorenzana and Bernardo (2009), Moser et al. (2009) and VanRaden et al. (2009)
demonstrated that prediction models that assume many loci evenly distributed in the
genome (e.g. RR-BLUP) have similar prediction accuracies as methods that assume fewer
loci but with varying effects (e.g. Bayesian). In some cases, RR-BLUP models are even
more accurate than Bayesian models (Lorenzana and Bernardo, 2009). Lorenz et al., (2011)
attributes this to the fact that the genetic architecture of complex traits is more likely
aligned with the infinitesimal model rather than the model of few dozen loci with varying
effects.
Figure 18 illustrates a variability chart of prediction accuracy per prediction model.
It is apparent that there is no general trend among the prediction models. A comparison of
means (Fig. 19; Table 6) confirms this observation.
It should be noted at this point that the means used to compare genomic selection
models are overall means across traits and training population size. Different trait
heritabilities may require different genomic selection models and different training
population sizes. An optimization of genomic selection parameters may be able to provide
settings for desirable accuracy versus genomic selection model, and versus cost which is
largely coming from phenotyping of training population. A large training population size
is one of the resource intensive activities of genomic selection, hence an optimum
prediction accuracy with respect to training population size is recommended.
![Page 87: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/87.jpg)
67
Figure 18. Variability chart of prediction accuracy per genomic selection model. The
first and second level factors were interchanged between charts.
![Page 88: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/88.jpg)
68
Figure 19. Box plots of prediction accuracy per genomic selection model and
comparison circles based on Tukey’s honest significant difference
test.
Table 6. HSD threshold matrix among prediction accuracy means between all pairs of
genomic selection models.
RR-BLUP BAYESCPI BAYESRR BAYESL
RR-BLUP
BayesCPi
BayesRR
BL
-0.11330
-0.10805
-0.10515
-0.03876
-0.10805
-0.11330
-0.11040
-0.04401
-0.10515
-0.11040
-0.11330
-0.04691
-0.03876
-0.04401
-0.04691
-0.11330
Positive values show pairs of means that are significantly different.
![Page 89: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/89.jpg)
69
Correlations of the Different Genomic Selection Models
Almost all studies report varying correlations of genomic selection models (Asoro
et al., 2011; Lorenz et al., 2011; Bernardo and Yu, 2007; Heffner et al., 2009). This is
mainly because of the unique properties of breeding programs and crops on which genomic
selection studies are performed. Breeding programs and consequently populations for
genomic selection and resulting data may differ in trait heritabilities (Bernardo, 2010),
population structure (Hickey et al., 2014), marker density (Miewissen et al., 2001) and
many other factors. However, general trends may be similar.
The correlations among genomic prediction models reported in this study may only
be valid for the dataset used and by extension the larger population from which the dataset
is drawn. Correlations are given in Figure 20. RR-BLUP, BayesCPi and BayesRR are
highly correlated (>0.9). Spearman’s ρ test (Spearman, 1904) was performed and is given
in Table 7. Spearman’s ρ is a nonparametric measure of statistical dependence between
two variables by fitting the relationship in a monotonic function.
Table 7. Spearman’s rank correlation coefficient between pairs of genomic selection
models across all traits.
VARIABLE 1 VARIABLE 2 SPEARMAN’S ρ Prob>|ρ|
BayesRR
BayesCPi
BayesCPi
BayesL
BayesL
RR-BLUP
RR-BLUP
BayesRR
RR-BLUP
BayesRR
0.9757
0.9816
0.9721
0.6844
0.6654
<.0001*
<.0001*
<.0001*
<.0001*
<.0001*
![Page 90: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/90.jpg)
70
Figure 20. Scatterplot matrices of correlations between pairs of genomic selection models
for all traits and each trait individually.
![Page 91: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/91.jpg)
71
Population Structure as Covariate
The presence of population structure may confound prediction accuracies (Hayes
et al., 2009) and genome-wide association mapping or GWAS (Zhao et al., 2011). Several
methods have been able to incorporate population structure in genomic selection. The most
common but least accurate is incorporating the categorical grouping into the first stage
phenotypic linear mixed model as fixed effects as shown in the equation:
𝑌 = (1|𝑔) + (1|𝑙) + (1|𝑠) + (1|𝑟𝑒𝑝) + (1|𝑔: 𝑙) + (1|𝑔: 𝑠) + (1|𝑔: 𝑙: 𝑠) + 𝑠𝑢𝑏𝑝𝑜𝑝
Eigenvectors of the principal components, instead of the categorical groups, can
also be incorporated as fixed effects. Usually, the top principal components that explain
more than half of the total population variance is selected. The advantage of this method is
that the fixed effects due to population structure would have a continuous distribution, as
shown in the mixed model equation:
𝑌 = (1|𝑔) + (1|𝑙) + (1|𝑠) + ⋯ + 𝑃𝐶1 + 𝑃𝐶2 + ⋯ + 𝑃𝐶𝑛
Asoro and co-workers (2011) demonstrated that principal components can account
for population structure in the genomic prediction step by including significant eigenvalues
in the model, as shown in the mixed model equation:
𝑌 = 𝜇 + 𝑄𝑣 + 𝑀𝛼 + 𝑒
![Page 92: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/92.jpg)
72
where 𝑌 is the empirical phenotypic value, 𝜇 is the intercept, 𝑄𝑣 is a fixed effects term
where 𝑄 is a matrix of significant eigenvectors and 𝑣 is a vector of regression coefficients
relating the principal components to the phenotypic values, and 𝑀𝛼 is a random effects
term where 𝑀 is the marker matrix and 𝛼 is a vector of estimated marker effects.
In hybrid breeding programs, there is almost always the presence of significant
population structure in the form of heterotic pools due to the method of breeding used. As
such, genomic prediction is obviously done within subpopulations, i.e. heterotic pools.
In this study, each of the three subpopulations taken individually resulted to
spurious results because of very low sample sizes. The most reasonable strategy is to jointly
take subpopulations 2 and 3 in a prediction model because it aligns well with breeding
information that these are all members of the male subpopulation. Genomic prediction
accuracies were compared within subpopulation and to the full population predictions.
Figure 21 and Table 8 present the comparison of overall means between prediction
models run on whole population and on subpopulations 2 and 3 only, indicating that
prediction methods are generally more accurate when predicting more related individuals.
These results were similar to those obtained in other works by Crossa et al. (2010), Asoro
et al. (2011), Habier et al. (2007) and Hayes et al. (2009).
Trait x GS model treatments seemed to exhibit similar trends (Fig. 22), except for
Bayesian LASSO when used on plant height wherein predictions for the whole population
has a higher mean accuracy than subpopulation prediction. Comparison of means for each
prediction model (Fig. 23; Table 9) suggests no significant differences in all pairs of means.
Although this may imply that any of the genomic selection models can be used, it is
important to consider trait architecture when deciding on what genomic selection model to
![Page 93: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/93.jpg)
73
use. The prediction accuracy means can be used to optimize genomic prediction
parameters.
Figure 21. Box plots of prediction accuracy of overall means using whole
population and subpopulations 2 and 3 jointly, and comparison
circles based on Tukey’s honest significant difference test.
Table 8. HSD threshold matrix of prediction accuracy means between subpopulations
and mixed population.
Subpopulation Mixed population
Subpopulation
-0.05723
0.00164
Mixed population
0.00164 -0.05723
Positive values show pairs of means that are significantly different.
![Page 94: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/94.jpg)
74
Fig
ure
22.
Var
iabil
ity c
har
t fo
r pre
dic
tion a
ccu
racy s
how
ing m
ean
dif
fere
nce
s fo
r m
ixed
po
pula
tion (
All
) an
d
subpopula
tion
(S
ubpop)
pre
dic
tions
for
each
gen
om
ic s
elec
tion m
odel
per
tra
it.
![Page 95: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/95.jpg)
75
Figure 23. Box plots of prediction accuracy of GS model means using
subpopulations 2 and 3 jointly, and comparison circles based on
Tukey’s honest significant difference test.
![Page 96: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/96.jpg)
76
Table 9. HSD threshold matrix of prediction accuracy means between pairs of genomic
selection models using subpopulation prediction.
RR-BLUP BAYESCPI BAYESRR BAYESL
Yield
DTF
Plant height
RR-BLUP
BayesCPi
BayesRR
BayesL
RR-BLUP
BayesCPi
BayesRR
BayesL
RR-BLUP
BayesCPi
BayesRR
BayesL
-0.24388
-0.28552
-0.24931
-0.27533
-0.27918
-0.27811
-0.27772
-0.22064
-0.27349
-0.27029
-0.25630
-0.09530
-0.25407
-0.27533
-0.25950
-0.28552
-0.27811
-0.27918
-0.27878
-0.22171
-0.27029
-0.27349
-0.25310
-0.09209
-0.28010
-0.24931
-0.28552
-0.25950
-0.27772
-0.27878
-0.27918
-0.22210
-0.25630
-0.25310
-0.27349
-0.11248
-0.28552
-0.24388
-0.28010
-0.25407
-0.22064
-0.22171
-0.22210
-0.27918
-0.09530
-0.09209
-0.11248
-0.27349
Positive values show pairs of means that are significantly different.
Optimization of Genomic Selection Parameters
A generalized linear model (GLM) was fitted using maximum likelihood as
estimation method for the subpopulation prediction dataset means. GLM is an ANOVA
procedure that uses least squares regression to determine the statistical relationship
between predictors (i.e. heritability, training population size, genomic selection model) and
a continuous response variable (i.e. prediction accuracy). GLM was used in this study to
predict genomic selection accuracy for new observed trait heritabilities and training
population size and identify the combination of predictor values that jointly optimize fitted
prediction accuracy value. The highest and lowest heritability means were taken as
variables in the model to obtain a 2x2x4 full factorial (Table 10). Center points were not
![Page 97: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/97.jpg)
77
taken because no trait with a heritability that is midpoint of the heritability range was
included in the analysis. Therefore, curvatures of main effects were not detected. The full
factorial that includes the main effects and second degree of interactions is given as
follows:
Prediction Accuracy = Heritability + Training Population + GS model
+ Heritability * Training Population
+ Heritability * GS model
+ Training Population * GS model
Table 10. Full factorial design of genomic selection accuracy means, heritability, training
population size and genomic selection model used in optimization.
RESPONSE:
GS ACCURACY
HERITABILITY TRAINING
POPULATION
GS MODEL
0.180270343
0.256868733
0.178678867
0.354683567
0.437842510
0.416132110
0.434348604
0.381462360
0.752761667
0.706175833
0.750252000
0.686572700
0.717289110
0.682839460
0.533130330
0.467523715
0.3130
0.3130
0.3130
0.3130
0.3130
0.3130
0.3130
0.3130
0.5486
0.5486
0.5486
0.5486
0.5486
0.5486
0.5486
0.5486
0.667
0.667
0.667
0.667
0.9
0.9
0.9
0.9
0.667
0.667
0.667
0.667
0.9
0.9
0.9
0.9
RR-BLUP
BayesRR
BayesCPi
BayesL
RR-BLUP
BayesRR
BayesCPi
BayesL
RR-BLUP
BayesRR
BayesCPi
BayesL
RR-BLUP
BayesRR
BayesCPi
BayesL
![Page 98: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/98.jpg)
78
Whole Model Test for the Generalized Linear Model
The Whole Model Test given in Table 11a indicates that the regression coefficients
of the variables are not equal to zero, hence the model is significant. Deviance and Pearson
values are not significant (Table 11b), and thus do not indicate lack of fit.
Table 11a. Whole model test for the generalized linear model created to optimize genomic
selection.
MODEL -LOG
LIKELIHOOD
L-R CHI
SQUARE
DF PROB>CHI
SQUARE
Difference
31.5483883
63.0968
12
<.0001
Full -35.094803
Reduced
-3.5464143
Table 11b. Goodness of fit test for the generalized linear model created to optimize
genomic selection.
GOODNESS OF
FIT STATISTIC
CHI SQUARE DF PROB>CHI
SQUARE
OVERDISPERSION
Pearson
0.0117
3
0.9997
0.0007
Deviance 0.0117 3 0.9997
Effect Summary and Effect Tests
Table 12 shows the LogWorth, false discovery rate (FDR) LogWorth and FDR p-
values for the main effects and interactions included in the model. LogWorth is defined as
-log10(p-value). A value that exceeds 2 is significant at the 0.01 level. FDR LogWorth is
defined as -log10(FDR p-value). This is the best statistic for plotting and assessing
![Page 99: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/99.jpg)
79
significance (JMP, 2015). The FDR p-value is obtained by using the Benjamini-Hochberg
technique, adjusting the p-values to control the false discovery rate for multiple tests.
Table 12. LogWorth, FDR LogWorth and FDR p-values of main effects and interactions
in the generalized linear model.
SOURCE LOGWORTH FDR LOGWORTH FDR P-VALUE
Heritability
Heritability x Training
Heritability x GS Model
GS Model x Training
GS Model
Training
13.703
8.375
3.394
3.330
1.463
1.141
12.925
7.898
3.154
3.154
1.384
1.141
0.00000
0.00000
0.00070
0.00070
0.04129
0.07221
The Effects Summary is consistent with the Effect Tests (Table 13), which shows
that the main effect heritability and the interaction effects 𝐻𝑒𝑟𝑖𝑡𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑋 𝐺𝑆 𝑚𝑜𝑑𝑒𝑙,
𝐻𝑒𝑟𝑡𝑖𝑡𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑋 𝑇𝑟𝑎𝑖𝑛𝑖𝑛𝑔, 𝐺𝑆 𝑚𝑜𝑑𝑒𝑙 𝑋 𝑇𝑟𝑎𝑖𝑛𝑖𝑛𝑔 are significant at 0.05 level.
Table 13. Effects Test of main effects and interactions in the generalized linear model.
SOURCE DF L-R CHI SQUARE FDR P-VALUE
Heritability
GS Model
Training
Heritability x GS Model
Heritability x Training
GS Model x Training
1
3
1
3
1
3
58.551518
8.6445336
3.2321473
18.181572
34.51959
17.871884
<.0001*
0.0344*
0.0722
0.0004*
<.0001*
0.0005*
*Significant at 0.05 level
![Page 100: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/100.jpg)
80
Prediction Profiles and Application to Breeding Programs
This study was able to simulate prediction accuracy computed from varying
heritabilities, training population sizes and genomic selection models. It should be noted
that the prediction profiles generated by this study are only applicable to the set within the
scope of this study. However, the general trends may be similar with other studies.
Individual breeding programs should perform similar exploratory studies on the usefulness
of genomic selection because heritability of predicted traits will vary among breeding
programs.
Figure 24 shows a contour map predicting genomic selection accuracy based on
heritability and size of training population. The general trend is that the higher spectrum of
genomic selection accuracy is proportional to heritability and training population size.
Figure 24. Contour map showing general trend of relationship between genomic
selection accuracy, heritability and training population size.
![Page 101: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/101.jpg)
81
Figure 25 shows the prediction profiles of selected combinations of heritability,
genomic selection model and training population size. Genomic selection is most accurate
for high heritability traits, which agrees with the works of Asoro et al. (2011), Heffner et
al. (2009), Lorenz et al. (2011) and Lorenzana and Bernardo (2009). One reason lies at the
very beginning of any genomic selection effort, which is phenotyping. Data used in training
the genomic selection model is generated from phenotyping. A robust phenotyping system
is crucial in the successful implementation of any predictive breeding activity.
The prediction profiles in Figure 25 suggest a number of points for the dataset used
in this study. RR-BLUP and BayesRR are most suitable for either high heritability or large
training population size. In genomic selection, high heritability will not need a very large
proportion of training population as reflected in the prediction profiles. Low heritability
can be compensated by larger size of training population. However, this would entail
additional resources for the breeding program. The optimization of genomic selection in a
bigger picture therefore goes into the operational dimensions of running a breeding
program.
Future optimizations using this GLM prediction approach should include a trait
with a heritability that is midpoint of the high and low heritabilities, as well as midpoint
value for training population size. Marker density can be optimized if the breeding program
uses flexible marker platforms. It is recommended to use a feature similar to JMP’s Design
of Experiment (DOE) to create full factorials with center points. Center points are critical
in determining the curvature of main effects as well as establish hidden replications.
![Page 102: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/102.jpg)
82
Figure 25. Prediction profiles of selected combinations of variables in the genomic
selection model: (A) Low heritability and small training population size, (B)
Low heritability and large training population size, (C) High heritability and
large training population size, and (D) High heritability and small training
population size.
![Page 103: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/103.jpg)
83
Integrating Genomic Selection into Hybrid
Rice Breeding Programs
Any hybrid rice breeding program that intends to use genomic selection needs to
assess the impact of the technology on program resources and stakeholders who will use,
implement and benefit from the technology. Genomic selection replaces the cost of
phenotyping a plot by the cost of fingerprinting the line, which usually translates to cost
reduction. This section of the manuscript proposes a genomic selection component of a
hypothetical breeding program.
Assumptions on the Hypothetical Breeding Program
Baseline information about a breeding program to which genomic selection is to be
introduced needs to be generated. Figure 26 illustrates a basic hybrid breeding program
that utilizes reciprocal recurrent selection as described by Bertran and Hallauer (1996).
Figure 26. A scheme for hybrid breeding program using a reciprocal recurrent selection
that creates 10,000 new inbreds and 10,000 new hybrids every breeding cycle.
![Page 104: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/104.jpg)
84
The breeding program being described generates 10,000 new inbreds (5,000 from
each heterotic pool) and by testcrossing these inbreds to a tester, 10,000 testcross hybrids
are produced for field trialing in multiple locations. Hybrid performance is the passport
data that warrants if a new inbred is “advanced” or used in the next breeding cycle. Table
14 summarizes the operational assumptions of the breeding program.
Table 14. Operational considerations of a hypothetical hybrid rice breeding program using
DH as a means of rapid inbred production.
YEAR/
SEASON
STAGE NO. OF ROWS OR
PLOTS
NO. OF
LOCATIONS
1A
1B
2A-2B
3A
3B
Breeding crosses
F1
DH inbreds (marker
pre-screening)
Seed production
Field trialing
100 rows
100 rows
30,000 rows
10,000 rows (inbreds)
1,000 rows (testers)
10,000 plots (hybrids)
500 plots (checks)
1
1
1
1
1
2
2
Total rows = 41,200
Total plots = 10,500
The breeding program being discussed aims to test 10,000 hybrids in two locations,
bringing the total number of plots to more than 20,000, assuming the breeder uses single
replication in augmented randomized complete block design. Augmented designs allow
adjustments of phenotypes by removing the effects of spatial trends, thereby increasing the
detection of the genetic signal (i.e. yield) as demonstrated by Gilmour et al. (1997).
![Page 105: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/105.jpg)
85
Rationale on Increasing the Effectiveness of Breeding Programs
Rice production has been a major source of income of many Filipino farmers. The
average rice yield in the Philippines is 3.8 metric tons per hectare but could be as low as
2.0 t/ha in some farming villages. Planting of hybrids would give a yield of 4.0 t/ha to as
high as 9.0 t/ha. Genetic gain in hybrid rice breeding programs directly contributes to
increased rice production through the products released to the market, and to larger
acceptance of hybrid technology by farmers.
Objectives of the Project Being Proposed
Through a research management perspective, the integration of genomic selection
into an existing breeding program is presented here as a project proposal with the general
objective of increasing rate of genetic gain which will serve the larger outcome of
delivering high-yielding products to farmers. The specific objectives are as follows:
1. Identify the stakeholders who will be impacted by the project and list their
concerns and issues.
2. Understand the causes and effects of low rate of genetic gain in breeding
programs through a problem analysis using a problem tree diagram, and conduct
an objective analysis in response to the identified problem.
3. Design the project on the hypothetical breeding program using the logical
framework.
4. Discuss the management arrangements from planning and implementation to
post-implementation of the project in the hypothetical breeding program.
![Page 106: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/106.jpg)
86
Stakeholder Analysis
A breeding program has a range of stakeholders that will be impacted with its
success or failure, ranging from the business owner to the end user of its products which
are the farmers and consumers. The stakeholder map is presented in Table 15.
Farmers and consumers are served by the outcome of breeding programs which is
to increase crop yields. The breeding program described produces hybrid rice, generally
proven to have an average yield increase of 15% over inbred varieties (Virmani, 1999). For
a farmer getting an average yield of 5,000 kg/ha using inbreds, the yield advantage will
translate to an additional income of PHP 30,000.00 per hectare per year for paddy rice sold
at farm gate prices.
The Department of Agriculture’s (DA) concerns lie on the attainment of rice self-
sufficiency for the country. Currently at 90-95% self-sufficient because of unstable seasons
(typhoons, drought), any significant increase in average rice yields will dramatically
contribute to self-sufficiency. The DA’s scope however starts after product development
as far as hybrid rice business of a private company is concerned. However, the DA can
dictate the criteria on varietal accreditation and recommendation for release, which can
influence the target product profiles of breeding programs.
The DA has an extensive network involved in disseminating new agricultural
technologies. The Hybrid Rice Commercialization Program (HRCP) has been leading the
implementation of strategies for the adoption of hybrid rice seeds since 1998. Within the
DA, the Rice Varietal Improvement Group (RVIG) recommends varieties to be released to
the National Seed Industry Council (NSIC) after a series of multi-location yield trials.
![Page 107: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/107.jpg)
87
Table 15. Stakeholder map identifying concerns of various entities that will be impacted
by the project.
PARTICIPANTS PROBLEMS EXPECTA-
TIONS
WEAKNESSES POTENTIALS CONSEQUENCES
FOR A PROJECT
Farmers
Current yield
levels not
sufficient to
attain acceptable
living
conditions.
High yield from
high quality
rice that will
fetch a high
price.
Cannot address
their problem
apart from
agronomic
management.
Can adopt and
accept new
varieties.
Increased genetic
gain will translate to
increased yield in
farmers’ fields.
DA
Philippines is
not yet fully self-
sufficient in rice.
High yield from
rice farms that
will cover rice
requirement of
the country.
Limited to
recommending
existing
accredited
varieties and
agronomic
practices.
Well-established
network that can
bring products
to farmers.
Increased yield from
farmers’ fields will
result to attainment
of rice self-
sufficiency.
Breeders
Budget
constraints of
breeding
programs limit
genetic gain.
Increased
success rate of
products
advanced.
Lack of training
in modern
quantitative
genetics and
genomics.
Expertise in
development of
new varieties;
familiarity with
germplasm.
Increased efficiency
of breeding
programs will result
to increased rate of
genetic gain.
Trialists
Establishment of
thousands of
trial plots is a
complex job.
Reduction of
trial plots.
Expertise in
establishment
and
management of
trials.
Increased efficiency
of breeding
programs will result
to reduction of trial
plots.
Sales and
marketing
Difficult to sell
products that do
not differ
significantly
from
competitors.
Significant
yield advantage
over competing
products.
Well-established
channels for
selling and
marketing
products.
Significant yield
differentiator from
competitors will
make products easier
to sell.
Business owners/
shareholders
Investment in
R&D is costly.
Products that
can deliver
profit.
Can increase
R&D funding in
response to
successful
products.
More efficient
breeding programs
frees funds for other
R&D investments
and projects.
Environmentalists
Demand for
increased food
supply result to
increasing land
for cultivation
and loss of
biodiversity.
Reduce
conversion of
forests into
farms.
Increased yield can
help reduce
conversion of forests
to farmlands.
![Page 108: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/108.jpg)
88
The breeders, trialists, sales and marketing, and business owner/shareholders are
part of the internal private company setting. Among these, the breeders and trialists are
members of the breeding organization directly implementing the breeding program. The
roles of the breeders and trialists are discussed under the section on management
arrangements. Breeders and trialists will have a stake on the increased efficiency of the
breeding program that would result to reduced use of resources while attaining significantly
higher rate of genetic gain.
Higher rate of genetic gain will eventually result to products that can be
differentiated from the competitors in terms of the target product profile. This will allow
the sales and marketing function to market and sell these products easily. Business owners
and shareholders will always welcome the revenue from selling these products, which will
be channeled back to the breeding program as research funding. With the increased
efficiency of the breeding program, a significant part of the budget previously used for trial
plots can now be used for other activities such as disease resistance screening.
Increase in crop yield will prevent conversion of forest land to farmland and thus
preserve biodiversity, an issue considered very important by environmentalists. Hence, a
high rate of genetic gain will provide a cushion for the increase in yield to be addressed by
genetics instead of by clearing forests.
Problem Analysis
Increasing the rate of genetic gain is the primary goal of any breeding program, as
discussed under Chapter 1 (Introduction). Low rate of genetic gain was identified as the
starter problem (Fig. 27), which is caused by several factors, and in turn causes a range of
![Page 109: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/109.jpg)
89
issues. Factors contributing to low genetic gain can be classified into two categories: those
that are inherent in the genetic gain equation (phenotyping accuracy, phenotypic standard
deviation, heritability, cost and breeding cycle time) and human capacity factors.
The genetic gain equation was reviewed in Chapter 2 (Review of Literature) under
the section “Increasing Genetic Gain.” Some of the most common causes of low genetic
gain are substandard field trialing resulting from poor experimental design and poor choice
of locations, and lack of genetic variability. A robust experimental design will increase the
power of field trials while a good correlation of locations and the target population of
environments will increase phenotyping accuracy. Breeding cycle time in the breeding
program has been addressed with the implementation of DH technology, and will be further
enhanced with genomic selection. Phenotyping cost can also be reduced by genomic
selection by substituting yield plots with cheaper DNA fingerprinting and predicting
breeding values as discussed in this manuscript.
Then there’s the factor on human capacity. Plant breeders and other scientists run
research programs, but most of them are not well trained in managing research. Atlin
(2013) stated that human capacity has the largest potential to contribute to genetic gain in
any breeding program, further elucidating that plant breeders need to be more like
engineers – they need mechanization, computer programming, and higher-level
quantitative skills on top of their expertise in genetics and agronomy. The problem tree
describes unwillingness of researchers to adopt newer approaches to plant breeding and
insufficient use of new technologies, which stem from lack of understanding of these new
approaches. From a research management perspective, these causes will be addressed in a
proposed integration of genomic selection. The focal points of the proposal are logistical
![Page 110: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/110.jpg)
90
considerations in running a basic genomic-enhanced breeding program, and upgrading the
skills of the research team that will implement the project.
The effects of low rate of genetic gain was very evident in the 1950’s when an
impending worldwide famine was too great for the levels of grain yields then. The
Philippines had a constant rice production of 3.7 million tons annually in the 1950’s (FAO,
2011) and the annual yield increase from paddy fields is not sufficient to meet the demands
of the growing population. The establishment of the International Rice Research Institute
in 1960 effectively institutionalized rice breeding in a global scale and genetic gain in rice
in the form of annual yield increases from new varieties has been dramatic. The Philippines
attained 7.7 million tons annual rice production in 1980 (FAO, 2011).
The effects of the starter problem were identified with the ultimate impact of low
yield in farmers’ fields. Within the breeding organization, the effects include increased cost
in field trials and creation of inferior parental lines. These in turn will result to inferior
hybrid products and high price of seeds due to high cost of goods (COGs), finally resulting
to low market share and non-acceptance of hybrid products by farmers.
The proposal identifies these causes of the starter problem that will be targeted by
this project proposal:
1. Insufficient understanding of new breeding approaches (i.e. genomic selection),
insufficient use of technology and unwillingness of researchers to adopt new
technologies will be addressed by genomic selection proof of concepts and
technical training.
2. High cost of nurseries and long breeding cycle time will be addressed by the merits
of the technology itself (genomic selection).
![Page 111: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/111.jpg)
91
Fig
ure
27.
Pro
ble
m tre
e dia
gra
m s
ho
win
g s
om
e ca
use
s an
d e
ffec
ts o
f lo
w r
ate
of
gen
etic
gai
n in b
reed
ing p
rogra
ms.
![Page 112: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/112.jpg)
92
Project Planning Matrix
The project planning matrix is based on a multi-year implementation of the project
and eventual scale-up in farmers’ field. The general assumption in adoption of varieties is
cyclical, as varieties are replaced with newer ones (Fig. 27). The breeding program releases
one product every year as a result of advancement decisions from internal trialing efforts.
Products typically have a life cycle of 5-7 years from initial release to retirement, as shown
in Figure 28. On-farm techno-demos (OFTD) are started as soon as hybrid products are
released as showcase to farmers, coupled with marketing and promotion activities such as
harvest festivals. Yield trends over years can be monitored within internal trialing and
OFTDs as different varieties are advanced and planted. Scale-up into farmers’ fields can
be monitored as soon as the second year of the release of the first product.
Figure 28. Life cycle of products from a breeding program that releases one new hybrid
every year. Monitoring of objectively verifiable indicators as described in the
project planning matrix is shown by the arrows.
![Page 113: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/113.jpg)
93
Table 16. Project planning matrix on increasing genetic gain of breeding programs by
integrating genomic selection.
PROJECT STRATEGY OBJECTIVELY
VERIFIABLE
INDICATORS
METHOD OF
VERIFICATION
IMPORTANT
ASSUMPTIONS
Goal: To increase yields in
farmers’ fields. The
project will directly
address genetic gain in a
breeding program but will
aim to increase rice
production in farmers’
fields in a larger scheme
of things.
Yield trend in farmers’
fields planted with
varieties produced by the
breeding program at scale-
up.
Yield trend in pre-selected
fields planted with
varieties produced by the
breeding program as on-
farm demos.
Yield trend in
experimental yield trials
conducted internally.
Random surveys of
farmers who planted the
hybrid varieties produced
by the breeding program.
Monitoring of on-farm
demos and yields recorded
from harvest festivals.
Monitoring of average
location yields.
Records of purchase of
farmers kept by seed
distributors and disclosed
to seed producers.
Farmers follow
recommended cultural
practices.
Implementation of agreed
protocols.
Implementation of agreed
trial protocols.
Immediate Objective 1:
Conduct experiments that
will serve as proofs of
concept on the benefits of
genomic selection.
Correlation of predicted
breeding values and
empirical breeding values.
Statistical analysis of
results obtained from
experiments.
Immediate Objective 2:
Provide training sessions
to breeders on genomic
selection and associated
fields such as quantitative
genetics and statistics.
Training plan and course
outlines.
Training plan document. Availability of expert
resource persons.
Immediate Objective 3:
Effectively monitor yield
trends in internal trials,
OFTDs and farmers’
fields.
Yield in kg/ha obtained
every year.
Trial and OFTD data and
farmer surveys.
Farmers provide accurate
yield figures.
Output 1.1.
Assessment of accuracy of
genomic predictions.
Output 1.2.
Assessment of resources
saved by implementing
genomic selection.
Scientific report that
evaluates prediction
accuracy of genomic
selection
Feasibility report that
highlights phenotyping
cost saved.
One scientific report.
One feasibility report.
![Page 114: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/114.jpg)
94
Table 16. Continued.
PROJECT STRATEGY OBJECTIVELY
VERIFIABLE
INDICATORS
METHOD OF
VERIFICATION
IMPORTANT
ASSUMPTIONS
Output 2.1.
Introductory training
provided to all
researchers.
Output 2.2.
Advanced training
provided to researchers
identified as subject
matter experts.
Course outline on
genomic selection and
attendance records.
Course outline on
advanced genomic
selection topics and
attendance records.
Goals added to
performance management
of individuals identified as
subject matter experts.
One course report.
One course report.
Revised goals of target
individuals.
Output 3.1.
Year over year yield trend
of average yields from
internal trials.
Output 3.2.
Year over year yield trend
from OFTDs.
Output 3.3.
Year over year yield trend
from farmers’ fields.
Measurement of average
yield from experimental
entries in field trials.
Measurement of harvest
per hectare.
Farmer interviews
regarding previous
cropping season.
Advancement decision
reports.
Record of number of bags
per hectare.
Farmer estimates of yield
from previous cropping
season.
Farmers are fairly accurate
in their estimates.
Activity 1.1.
Initiate and conduct
genomic selection
validation experiments.
Activity 1.2.
Compare cost of project
with and without genomic
selection.
Activity 2.1.
Conduct training on
introduction to genomic
selection for breeders.
Activity 2.2.1.
Conduct advanced training
on identified breeders.
Activity 2.2.2
Identify breeders for
advanced training as
subject matter experts.
Science plans and activity
progress reports.
Cash flow with and
without project.
Number of breeders
trained.
Advanced training
conducted.
Subject matter experts
identified with
performance goals added.
Science plans approved
and progress reports
submitted monthly.
Cash flow document.
Attendance records.
Attendance records.
Performance management
document.
![Page 115: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/115.jpg)
95
Table 16. Continued.
PROJECT STRATEGY OBJECTIVELY
VERIFIABLE
INDICATORS
METHOD OF
VERIFICATION
IMPORTANT
ASSUMPTIONS
Activity 3.1.
Conduct internal yield
trials in the usual manner.
Activity 3.2.
Conduct OFTDs in target
markets.
Activity 3.3.
Identify farmers who
planted product from
breeding program and
conduct survey.
Trial info available in
breeding database.
Harvest festivals held in
OFTDs
Farmer interviews held.
Retrieve trial info and data
from breeding database.
Activity reports.
Interview questionnaires
and farmer responses.
Implementation Schedule
The breeding program takes about three years from the initial breeding cross to
produce hybrids for extensive testing, which will take another two years. The program is
on its steady state, releasing one hybrid product per year. Integration of genomic selection
can reduce the breeding cycle to at most four years and reduce cost by at least 30%.
The implementation schedule is summarized in the Gantt chart in Table 17. Proof
of concept by conducting validation of genomic selection will take two years, while hybrids
coming out of genomic selection activities will take another four years. In this duration,
internal trialing, OFTDs and farmer surveys can be conducted on products not derived from
genomic selection.
![Page 116: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/116.jpg)
96
Table 17. Project Gantt chart showing milestones in project implementation.
Management Arrangements
The project will utilize the existing organizational structure (Fig. 29). The Senior
Breeder drives the overall direction of the program. A number of senior staff work with the
Senior Breeder to deliver the goals of the breeding program, each focusing on specific key
aspect of the program.
YEAR 1 YEAR 2 YEAR 3 YEAR 4 YEAR 5 YEAR 6 YEAR 7 YEAR 8 YEAR 9 YEAR 10
Activity 1.1.
Initiate and conduct genomic
selection validation
experiments.
Activity 1.2.
Compare cost of project with
and without genomic
selection.
Activity 2.1.
Conduct training on
introduction to genomic
selection for breeders.
Activity 2.2.1.
Conduct advanced training on
identified breeders.
Activity 2.2.2
Identify breeders for
advanced training as subject
matter experts.
Activity 3.1.
Conduct internal yield trials in
the usual manner.
Activity 3.2.
Conduct OFTDs in target
markets.
Activity 3.3.
Identify farmers who planted
product from breeding
program and conduct survey.
![Page 117: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/117.jpg)
97
Fig
ure
29.
Org
aniz
atio
n s
truct
ure
of
the
hypoth
etic
al b
reed
ing p
rogra
m in w
hic
h g
eno
mic
sel
ecti
on
is
to b
e ap
pli
ed
in t
he
pro
ject
pro
posa
l.
![Page 118: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/118.jpg)
98
The matrix in Table 18 summarizes the work relationships between the breeders
and the service providers. The breeders will essentially implement the breeding strategies
and coordinate activities with breeding services (DH lab, genotyping, nurseries), data
management, and trialing team.
Table 18. Service provided to breeders by breeding program support staff.
SUPPORT STAFF SUPPORT PROVIDED TO BREEDERS
Breeding Services
Manager
DH Lab Supervisor
Genotyping Supervisor
Nurseries Supervisor
Database Manager
Trialing Manager
Trialing Supervisor
Coordinates the operations in the breeding center and
ensures optimal efficiency on use of resources. Coordinates
the logistics among DH laboratory, genotyping team, and
nursery services team.
This is a critical role that ensures sufficient DH lines are
produced from breeding crosses identified by breeders and
handed back to the breeders.
Coordinates collection of leaf tissues for DNA analysis.
Facilitates shipment of samples to genotyping facilities,
including liaising with the Plant Quarantine Service of BPI
for necessary clearances.
Coordinates all logistics related to establishment of breeding
nurseries, including assigning of manpower to various field
activities. All field operations are handled by the team under
the Nurseries Supervisor, including hybrid seed production.
Maintains phenotype and genotype data and delivers
formatted field books. Also in charge of germplasm and seed
storage, and compliance to material tracking such as bar
codes for plots and seed packets.
Coordinates multi-location trials and hands over compiled
data to the Senior Breeder for analysis. Organizes hybrid
advancement meetings.
Establishes and maintains local trials and provides updates to
breeders. In charge of field operations in trials including data
gathering.
![Page 119: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/119.jpg)
99
Implementation of genomic selection will reduce nursery plots handled by the
Nurseries Supervisor. Trial plots handled by the Trialing Supervisors will also be reduced.
This may result to higher quality trials giving higher estimates of heritability and high
quality data. Genomic selection requires a significant increase in genotyping activities.
Almost all DH lines will be subjected to genomic selection and the breeding program will
achieve record marker data points. This will increase the work load of the Genotyping
Supervisor. To implement the project, Breeders will coordinate with the Genotyping
Supervisor for timely fingerprinting of DH lines as well as timely delivery of genotyping
data. Breeders will also coordinate with the Trialing team in establishment and
maintenance of hybrid trials derived from the training population.
The management arrangements suggested here places the Senior Breeder as the
overall lead in strategic and operational decisions about the breeding program, highlighting
the precedence of the mission over the system. In some organizations, managers who are
in charge of product development teams are not scientists or breeders, putting the bias of
decision-making on what uses resources the least, i.e. bias on the system.
Budgetary Requirements
The project is expected to reduce research cost in phenotyping but increase the cost
in genotyping. The difference resulting from these changes is expected to be an overall
reduction in resource utilization, as shown in Table 19. Cost of producing DH lines is held
constant and not included in the comparison. All nursery costs presented already includes
proportional labor cost.
![Page 120: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/120.jpg)
100
Table 19. Cost comparison between breeding programs with and without genomic
selection.
STAGE
WITHOUT GENOMIC
SELECTION
WITH GENOMIC
SELECTION
Rows/Plots Cost (PHP) Rows/Plots Cost (PHP)
Crossing
F1
DH nursery
Genotyping
Testcrossing
Trialing (2 locs)
Total lines evaluated
(effective)
Total hybrids
evaluated (effective)
Total cost
100
60
10,000
10,000
21,000
10,000
10,000
23,300.00
5,288.00
480,000.00
216,000.00
13,440,000.00
14,164,588.00
100
60
10,000
10,000
5,000
10,500
10,000
10,000
23,300.00
5,288.00
480,000.00
2,250,000.00
108,000.00
6,720,000.00
9,586,588.00
Testcrossing includes all operational aspects in producing experimental hybrid
seeds such as labor and isolation barriers. Trialing cost is the most expensive component
of a breeding program. The cost presented includes all operations expenses in maintaining
trials, as well as the cost associated with the meticulous fertilizer application, and
harvesting and other phenotyping procedures. To compare, genomic selection saves the
breeding program about 32% of its budget.
Table 20 presents summarized project cost per activity. Activities 1.1 through 2.2.1
are focused on genomic selection proof of concept and training for breeding program staff.
![Page 121: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/121.jpg)
101
Activities 3.1 through 3.3 reflect the activities routinely done by the breeding program and
with the commercial team (OFTD, farmer surveys). Genomic selection is projected to be
fully implemented in the sixth year of the project, after the two-year creation of the proof
of concept and the four-year generation of new inbreds and hybrids. The cost reduction in
the breeding program will therefore take effect in the sixth year, as indicated in the table.
The project does not require investment on new equipment. It will utilize the existing
infrastructure, logistic and management arrangements. Genotyping will be outsourced to a
reputable company that provides reliable DNA fingerprinting services.
Table 20. Projected budget of integrating genomic selection over a ten-year period
YEAR 1 YEAR 2 YEAR 3 YEAR 4 YEAR 5 YEAR 6 YEAR 7 YEAR 8 YEAR 9 YEAR 10
Activity 1.1.
Initiate and conduct genomic
selection validation
experiments.
210,000.00 210,000.00
Activity 1.2.
Compare cost of project with
and without genomic
selection.
55,000.00 55,000.00 55,000.00 55,000.00 55,000.00 55,000.00
Activity 2.1.
Conduct training on
introduction to genomic
selection for breeders.
512,000.00
Activity 2.2.1.
Conduct advanced training on
identified breeders.128,000.00
Activity 2.2.2
Identify breeders for
advanced training as subject
matter experts.
Activity 3.1.
Conduct internal yield trials in
the usual manner.14,164,588.00 14,164,588.00 14,164,588.00 14,164,588.00 14,164,588.00 9,586,588.00 9,586,588.00 9,586,588.00 9,586,588.00 9,586,588.00
Activity 3.2.
Conduct OFTDs in target
markets.350,000.00 350,000.00 350,000.00 350,000.00 350,000.00 350,000.00 350,000.00 350,000.00 350,000.00 350,000.00
Activity 3.3.
Identify farmers who planted
product from breeding
program and conduct survey.155,000.00 155,000.00 155,000.00 155,000.00 155,000.00 155,000.00 155,000.00 155,000.00 155,000.00 155,000.00
![Page 122: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/122.jpg)
102
Figure 30 illustrates a breeding program with a fully implemented genomic
selection scheme. Half of the inbreds are not testcrossed, reducing the cost of hybrid
production and field trialing.
Figure 30. A breeding scheme with full integration of genomic selection showing the
porportions of tested and predicted inbred GCAs.
Recommendations for Inbred Rice
Breeding Programs
Most public research programs on rice deal with developing inbred varieties. Inbred
breeding in rice has exhibited tremendous success since the release of IR8. Inbred varieties
such as IR36, IR64, IR72, Ciherang, Swarna and many others have been classified as mega-
varieties because of their extreme popularity with farmers resulting to millions of hectares
coverage in rice producing regions worldwide (Jackson et al., 2014).
![Page 123: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/123.jpg)
103
Genomic selection in inbred rice has some differences from genomic selection in
hybrids. The first rice genomic selection research (Spindel et al., 2015) was done on
inbreds using per se yield performance. In contrast, yield in hybrid breeding is selected
based on inbred GCA which can be deduced from hybrid performance. Inbred breeding
does not undergo the testcross stage to select for GCA, but inbreds per se are evaluated for
yield and other traits. GEBVs in genomic selection in inbred breeding programs are
therefore obtained from inbred per se performance.
Mature hybrid breeding programs make use of heterotic pools. Breeding crosses
are strictly created within defined heterotic pools and such practice usually eliminates
population structure within pools. In inbred breeding, a collection of lines for evaluation
may come from different breeding crosses derived from various sources representing rice
sub-populations, e.g. indica x tropical japonica. It is therefore more common in inbred
breeding programs to have population structure, which must be accounted for in the
prediction model as discussed in this manuscript.
Large amounts of phenotypic data are available from various testing programs from
the past several years. These datasets can be used in conjunction with genotype data to
initiate genomic selection. Top inbreds may be used in breeding, with the resulting inbred
progenies subjected to prediction of breeding values, with a sufficient proportion tested in
the field to validate the prediction model.
Figure 31 outlines breeding schemes for inbreds with and without genomic
selection. Since testcrossing is not required in inbred breeding, R&D cost is generally less
than that of hybrid breeding.
![Page 124: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/124.jpg)
104
Figure 31. Breeding schemes for inbred rice development with and without genomic
selection. Genomic selection can drastically reduce trial plots. In these
schemes, testcrossing is not required.
![Page 125: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/125.jpg)
105
CHAPTER 5
SUMMARY AND CONCLUSION
This study is one of the first genomic selection works in rice and possibly the first
genomic selection application in hybrid rice. This study has successfully shown the genetic
and operational merits of genomic selection. The study was able to accomplish the stated
objectives.
Usefulness of Genomic Selection
Whole genome markers have been demonstrated to be useful in predicting
combining ability in hybrid rice. Contributions of parental lines to yield, days to flowering
and plant height can be predicted with accuracies comparable to published reports. This
study agrees with most research works that genomic selection is generally more accurate
for traits with high heritability. Corollary to this, increasing heritability of a trait by
implementing more robust phenotyping methods can increase prediction accuracy.
Training population size also influences prediction accuracy. The general trend is that
larger training population size increases prediction accuracy which agrees with published
reports.
Population structure can confound prediction accuracy as shown in this study and
in other works in other crop species and livestock. Population structure can be included in
the prediction model but for this study, the most practical and relevant approach to
population structure was to predict within subpopulations. Predicting within
![Page 126: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/126.jpg)
106
subpopulations is similar to predicting within heterotic pools in hybrid breeding programs.
Prediction accuracy increased when predicting within subpopulations.
It is recommended for inbred breeding programs to consider population structure,
for example indica, japonica tropical japonica and admixtures in rice, because inbred
breeding usually does not work within subpopulations unlike heterotic pools in hybrid
breeding programs. The best method to incorporate population structure is to use the
eigenvector matrix as a fixed term in the prediction model.
Optimizing Genomic Selection Procedures
A general linear model was used to create prediction profiles to predict genomic
selection accuracy with different values of heritability, training population size and
genomic selection model, obtained from a full factorial design. RR-BLUP and BayesRR
are most suitable for either high heritability or large training population size. Prediction of
highly heritable traits will not need a very large training population as reflected in the
prediction profiles. Predicting traits with low heritability can be done more accurately by
employing larger training population size. A breeding program based on these 122 inbreds
can therefore utilize the prediction profile created to incorporate genomic selection into the
breeding process.
It is recommended for breeding programs to explore genomic selection in the
manner outlined in this work. There are numerous recommendations from previous works
taking each effect individually. This work however used generalized linear model to
consider the main effects and interaction effects in a quantitative manner.
![Page 127: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/127.jpg)
107
Implementing Genomic Selection through a
Research Management Approach
A strategy in introducing genomic selection into an existing breeding program was
presented from a research management perspective using a project proposal approach. A
hybrid breeding program enhanced by genomic selection can effectively evaluate the full
range of desired lines and testcross hybrids by predicting the phenotype of a significant
portion of the population based on the observed phenotype of the populations actually
planted in the field, and the fingerprint information. This principle was shown here to save
almost 32% of a research organization’s budget, which can be used in other aspects of
breeding such as disease screening.
Implementing genomic selection as a new breeding procedure requires
consideration of the various stakeholders and the impact of the proposed changes. These
changes need to be constantly communicated, tested and implemented in the target program
for these to be effective.
![Page 128: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/128.jpg)
108
LITERATURE CITED
APPLEGATE, J.L. 2002. Engaged Graduate Education: Seeing with New Eyes. Preparing
Future Faculty Occas. Pap. 9. Assoc. of Am. Colleges and Universities and Council
of Graduate Schools, Washington, DC.
ASORO, F.G., M. A. NEWELL, W.D. BEAVIS, M.P. SCOTT and J.L. JANNINK. 2011.
Accuracy and Training Population Design for Genomic Selection on Quantitative
Traits in Elite North American Oats. The Plant Genome 4(2): 132-144.
ATLIN, G. 2013. Applying the Breeding Technology Revolution to the Acceleration of
Genetic Gains for Major Food Crops in the Developing World. Paper W734, Plant
and Animal Genome XXI. January 11 - 16, 2013, San Diego, CA.
BARTLETT, M. S. 1937. Properties of Sufficiency and Statistical Tests. Proceedings of
the Royal Society A: Mathematical, Physical and Engineering Sciences 160 (901):
268.
BASAVARAJ, S. H., V. K. SINGH, A. SINGH, A. SINGH, A. SINGH, D. ANAND and
S. YADAV. 2010. Marker-assisted improvement of bacterial blight resistance in
parental lines of Pusa RH10, a superfine grain aromatic rice hybrid. Molecular
Breeding 26: 293-305.
BATES, D., M. MAECHLER, B. BOLKER and S. WALKER. 2015. Fitting Linear Mixed-
Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-
48. doi:10.18637/jss.v067.i01.
BEAVIS WD. 1994. The power and deceit of QTL experiments: lessons from comparative
QTL studies. In: Wilkinson DB (ed). Proceedings of the 49th Annual Corn and
Sorghum Research Conference. Washington, DC: American Seed Trade
Association, 250–65.
BEAVIS, W.D. 1998. QTL analyses: Power, precision, and accuracy. p. 145–162. In A.H.
Patterson (ed.) Molecular dissection of complex traits. CRC Press, Boca Raton, FL.
BECKMANN, J.S., and M. SOLLER. 1986. Restriction fragment length polymorphisms
in plant genetic improvement. Oxford Surv. Plant Mol. Cell Biol. 3:196–250.
BERNARDO, R. 1995. Genetic models for predicting maize single cross performance in
unbalanced yield trial data. Crop Sci 35:141–147.
BERNARDO, R. 1996a. Best linear unbiased prediction of maize single-cross
performance. Crop Sci 36:50–56.
BERNARDO, R. 1996b. Best linear unbiased prediction of the performance of crosses
between untested maize inbreds. Crop Sci 36:872–876.
![Page 129: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/129.jpg)
109
BERNARDO, R. 2010. Breeding for Quantitative Traits in Plants. 2nd ed. Stemma Press,
Woodbury, MN. (ISBN 978‐0‐9720724‐1‐0).
BERNARDO, R., and J. YU. 2007. Prospects for genomewide selection for quantitative
traits in maize. Crop Sci. 47:1082–1090.
BERTRAN, F.J. and A.R. HALLAUER . 1996. Hybrid improvement after reciprocal
recurrent selection in BSSS and BSCB1 maize populations. Maydica 41:360–367.
BLISS, F. 2006. Plant Breeding in the US Private Sector. Horticultural Science 41 (1): 45-
47.
BREIMAN, L. 2001. Random Forests. Machine Learning 45 (1): 5–32.
BRESEGHELLO, F., and M.E. SORRELLS. 2006. Association mapping of kernel size
and milling quality in wheat (Triticum aestivum L.) cultivars. Genetics 172:1165–
1177.
COLLARD, B.C.Y., C.M. VERA CRUZ, K.L. MCNALLY, P.S. VIRK, and D.J.
MACKILL. 2008. Rice Molecular Breeding Laboratories in the Genomics Era:
Current Status and Future Considerations. International Journal of Plant Genomics,
vol. 2008.
COLLARD, B.C.Y., M.Z.Z. JAHUFER, J.B. BROUWER and E.C.K. PANG. 2005. An
introduction to markers, quantitative trait loci (QTL) mapping and marker-assisted
selection for crop improvement: The basic concepts. Euphytica 142: 169–196.
COMSTOCK, R.E., H.F. ROBINSON and P.H. HARVEY. 1949. A breeding procedure
designed to make maximum use of both general and specific combining ability.
Agron J 41:360–367.
CROSBIE, T.M., S.R. EATHINGTON, G.R. JOHNSON, M. EDWARDS, R. REITER
AND S. STARK. 2003. Plant breeding: Past, present, and future. p. 1–50. In K.R.
Lamkey and M. Lee (ed.) Plant Breeding: The Arnel R. Hallauer Int. Symp.,
Mexico City. 17–23 Aug. 2003. Blackwell, Oxford, UK.
CROSSA, J., G. D. L. CAMPOS, P. PEREZ, D. GIANOLA, J. BURGUEÑO, J.L.
ARAUS, D. MAKUMBI, R.P. SINGH, S. DREISIGACKER, J. YAN, V. ARIEF,
M. BANZIGER and H.J. BRAUN, 2010. Prediction of genetic values of
quantitative traits in plant breeding using pedigree and molecular markers.
Genetics, 186(2):713-24.
DE LOS CAMPOS G., H. NAYA, D. GIANOLA, J. CROSSA, A. LEGARRA, E.
MANFREDI, K. WEIGEL and J. COTES. 2009. Predicting quantitative traits with
regression models for dense molecular markers and pedigree. Genetics 182: 375-
385.
![Page 130: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/130.jpg)
110
DE LOS CAMPOS, G. and P. PEREZ. 2013. BGLR: Bayesian Generalized Regression R
package, version 1.0. R package version 1.0, URL:https://r-forge.r-
project.org/projects/bglr/.
DEKKERS, J.C.M., and F. HOSPITAL. 2002. The use of molecular genetics in the
improvement of agricultural populations. Nat. Rev. Genet. 3:22–32.
ENDELMAN, J.B. 2011. Ridge regression and other kernels for genomic selection with R
package rrBLUP. Plant Genome 4:250-255. doi: 10.3835/plantgenome
2011.08.0024
FALCONER, D.S. 1960. Introduction to Quantitative Genetics. Oliver and Boyd.
Edinburgh, United Kingdom.
FAMOSO, A.N., KE. ZHAO, R.T. CLARK, C.W. TUNG, M.H. WRIGHT, C.
BUSTAMANTE, L.V. KOCHIAN and S.R. MCCOUCH. 2011. Genetic
Architecture of Aluminum Tolerance in Rice (Oryza sativa) Determined through
Genome-Wide Association Analysis and QTL Mapping. PLoS Genet 7(8):
e1002221. doi:10.1371/journal.pgen. 1002221.
FAO. 2009. How to feed the world in 2050. http://www.fao.org. Accessed 02 Nov. 2013.
FAO. 2011. Rice paddies. FAO Fisheries and Agriculture. http://www.fao.org. Accessed 2
May 2016.
FERNANDO, R.L. 2007. Genomic selection. Acta Agric. Scand. Ser. Anim. Sci. 57:192–
195.
FERNANDO, R.L. 2009. Genomic Selection: Bayesian Methods. Available at
http://www.ans.iastate.edu/stud/courses/short/2009/B-Day2-3.pdf (verified 8 Nov.
2013). Iowa State University.
FISHER, R.A. 1918. The correlations between relatives on the supposition of Mendelian
inheritance. Philosophical Transactions of the Royal Society of Edinburgh 52: 399–
433.
FISHER, R.A. 1930. The genetical theory of natural selection. Oxford, England: Clarendon
Press. 272 pp.
FOX, P.N. and A.A. ROSIELLE. 1982. Reducing the influence of environmental main-
effects on pattern analysis of plant breeding environments. Euphytica 31:645–656.
GALBRAITH, J.R. 1971. Matrix Organization Designs: How to combine functional and
project forms. In: Business Horizons, February 1971, 29-40.
GEPTS, P. and J. HANCOCK. 2006. The Future of Plant Breeding. Crop Science 46: 1630-
1634.
![Page 131: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/131.jpg)
111
GIANOLA, D. and J.B. van Kaam. 2008. Reproducing kernel Hilbert spaces regression
methods for genomic assisted prediction of quantitative traits. Genetics 178: 2289–
2303.
GILMOUR, A. R., B. R. CULLIS and A.P. VERBYLA. 1997. Accounting for Natural and
Extraneous Variation in the Analysis of Field Experiments. Journal of Agricultural,
Biological, and Environmental Statistics, 2(3), 269–293.
GILMOUR, A.R. 2010. Why use BLUPs? An introduction to fixed and random effects for
plant breeders. CIMMYT Seminar Series, 17 August 2010.
GOMEZ, K.A. and A.A. GOMEZ. 1984. Statistical procedures for agricultural research
(2nd ed.). John wiley and sons, NewYork, 680p.
GONZALEZ-RECIO O., K.A. WEIGEL, D. GIANOLA, H. NAYA and G.J.M. ROSA.
2010. L2-Boosting algorithm applied to high-dimensional problems in genomic
selection. Genetics Research 92 (3): 227-37.
GOULDEN, C H. 1939. Problems in plant selection. In Proceedings of the Seventh
International Genetics Congress. Cambridge University Press, pp. 132-133.
GRENIER, C., T.V. CAO, Y. OSPINA, C. QUINTERO, M. H. CHÂTEL, J. TOHME, B.
COURTOIS and N. AHMADI. 2015. Accuracy of Genomic Selection in a Rice
Synthetic Population Developed for Recurrent Selection Breeding. PLoS ONE
10(8): e0136594. doi:10.1371/journal.pone.0136594.
HABIER, D., R.L. FERNANDO and J.C.M. DEKKERS. 2007. The impact of genetic
relationship information on genome-assisted breeding values. Genetics 177: 2389-
2397.
HAYES, B. 2007. QTL mapping, MAS, and genomic selection. Available at
http://www.ans.iastate.edu/section/abg/shortcourse/notes.pdf (verified 8 Nov.
2013). Animal Breeding & Genetics, Dep. of Animal Science, Iowa State Univ.,
Ames.
HAYES, B., P.J. BOWMAN, A. C. CHAMBERLAIN, K. VERBYLA and M. E.
GODDARD. 2009. Accuracy of genomic breeding values in multi-breed dairy
cattle populations. Genetics Selection Evolution 41:51. DOI: 10.1186/1297-9686-
41-51.
HEFFNER, E.L., M.E. SORRELLS, and J.L. JANNINK. 2009. Genomic Selection for
Crop Improvement. Crop Sci. 49:1–12.
HENDERSON, C.R. 1949. Estimation of changes in herd environment. J Dairy Sci. 32:
706.
HENDERSON, C.R. 1950. Estimation of genetic parameters. Ann Math Stat. 21: 309-310.
![Page 132: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/132.jpg)
112
HENDERSON, C.R. 1963. Selection index and expected genetic advance. In Statistical
Genetics and Plant Breeding 141-163. NAS-NRC 982, Washington, DC.
HENDERSON, C.R. 1973. Sire evaluation and genetic trends. In Proceedings of the
Animal Breeding and Genetics Symposium in Honour of Dr.Jay L. Lush 10-41.
ASAS and ADSA, Champaign, Ill.
HENDERSON, C.R., O. KEMPTHORNE, S.R. SEARLE, and C.M. VON KROSIGK.
1959. The Estimation of environmental and genetic trends from records subject to
culling. Biometrics 15: 192–218.
HICKEY, J.M, S. DREISIGACKER, J. CROSSA, S. HEARNE, R. BABU, B. M.
PRASANNA, M. GRONDONA, A.S ZAMBELLI, V. S. WINDHAUSEN, K.
MATHEWS and G. GORJANC. 2014. Evaluation of Genomic Selection Training
Population Designs and Genotyping Strategies in Plant Breeding Programs Using
Simulation. Crop Sci. 54:1476–1488. doi: 10.2135/cropsci2013.03.0195.
HILL, R.R. and J.L. ROSENBERGER. 1985. Methods for combining data from
germplasm evaluation trials. Crop Sci 25:467-470.
HORNER, T.W. and K.J. FREY. 1957. Methods for determining natural areas for oat
varietal recommendations. Agron J 49: 313–315.
IKEHASHI, H and D. HILLERISLAMBERS. 1977. Single Seed Descent with the Use of
Rapid Generation Advance. Paper presented at the International Rice Research
Conference, 18-22 April 1977. Los Baños, Laguna, Philippines.
INTERNATIONAL RICE RESEARCH INSTITUTE. 1980. Standard evaluation system
for rice. IRRI: Los Baños, Philippines.
JACKSON, M.T., B.V. FORD-LLOYD and M.L. PARRY. 2014. Plant Genetic Resources
and Climate Change. CAB International.
JANNINK, J.L., A.J. LORENZ and H. IWATA. 2010. Genomic selection in plant
breeding: from theory to practice. Briefings in Functional Genomics 9: 166-177.
JANSEN, R., 1993. Interval mapping of multiple quantitative trait loci. Genetics 135: 205–
211.
JMP®. Online Documentation. SAS Institute Inc., Cary, NC, 1989-2015.
KINDALL, H.W. & D. PIMENTEL. 1994. Constraints on the Expansion of the Global
Food Supply. Ambio. 23 (3).
KOTTER, J.P. 2012. Leading Change. Boston: Harvard Business School Press.
![Page 133: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/133.jpg)
113
KRAAKMAN A.T.W., R.E. NIKS, P.M. VAN DEN BERG, P. STAM and F.A. VAN
EEUWIJK. 2004. Linkage disequilibrium mapping of yield and yield stability in
modern spring barley cultivars. Genetics 2004;168: 435–46.
LANDE, R. and R. THOMPSON. 1990. Efficiency of marker-assisted selection in the
improvement of quantitative traits. Genetics 124: 743–56.
LEGARRA. A.S., C. ROBERT-GRANIE and P. CROISEAU. 2011. Improved Lasso for
genomic selection. Genet. Res., Camb., 93, pp. 77–87.
LI, X., W. YAN, H. AGRAMA, L. JIA, A. JACKSON, K. MOLDENHAUER, K.
YEATER, A. MCCLUNG and D. WU. 2012. Unraveling the Complex Trait of
Harvest Index with Association Mapping in Rice (Oryza sativa L.). PLoS ONE
7(1): e29350. doi:10.1371/journal.pone.0029350.
LIU, B., 1998. Statistical Genomics: Linkage, Mapping and QTL Analysis CRC Press,
Boca Raton.
LORENZ, A.J., S. CHAO, F.G. ASORO, E.L. HEFFNER, T. HAYASHI, H. IWATA,
K.P. SMITH, M.E. SORRELLS, and J.L. JANNINK. 2011. Genomic Selection in
Plant Breeding: Knowledge and Prospects. Advances in Agronomy, Volume 110:
77-123.
LORENZANA, R. and R. BERNARDO. 2009. Accuracy of genotypic value predictions
for marker-based selection in biparental plant populations. Theor. Appl. Genet.
120: 151-161.
LYNCH, M. and B. WALSH. 1998. Genetics and Analysis of Quantitative Traits. Sinauer
Associates. Sunderland, MA, USA.
MACKILL, D.J. 2007. Molecular Markers and Marker-Assisted Selection in Rice. In
“Genomics-Assisted Crop Improvement Vol 2: Genomics Applications in Crops”
by R. K. Varshney and R. Tuberosa (eds.). Springer. Dordrecht, The Netherlands
pp 147-168.
MALUSZINSKI M., K.J. KASHA, B.P. FORSTER and I. SZAREJKO (eds.). 2003.
Doubled Haploid Production in Crop Plants: A Manual. Kluwer Academic
Publishers, Dordrecht, The Netherlands.
MCCOUCH, S.R. and R.W. DOERGE. 1995. QTL mapping in rice. Trends Genet 11:
482–487.
MELCHINGER A.E., H.F. UTZ, C.C. SCHON. 1998. Quantitative trait locus (QTL)
mapping using different testers and independent population samples in maize
reveals low power of QTL detection and large bias in estimates of QTL effects.
Genetics 1998;149:383–403.
![Page 134: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/134.jpg)
114
MEUWISSEN, T.H.E., B.J. HAYES, and M.E. GODDARD. 2001. Prediction of total
genetic value using genome-wide dense marker maps. Genetics 157:1819–1829.
MIEDANER T., T. WÜRSCHUM, H.P. MAURER, V. KORZUN, E. EBMEYER and J.C.
REIF. 2011. Association mapping for Fusarium head blight resistance in European
soft winter wheat. Molecular Breeding Volume 28, Issue 4, pp 647-655.
MOHAN, M., S. NAIR, A. BHAGWAT, T.G. KRISHNA, M. YANO, C.R. BHATIA and
T. SASAKI, 1997. Genome mapping, molecular markers and marker-assisted
selection in crop plants. Mol Breed 3: 87–103.
MORRIS, G.P., P. RAMU, S.P. DESHPANDE, C.T. HASH, T. SHAH, H.D.
UPADHYAYA, O. RIERA-LIZARAZU, P.J. BROWN, C.B. ACHARYA, S.E.
MITCHELL, J.HARRIMAN, J.C. GLAUBITZ, E.S. BUCKLER and
S.KRESOVICH. 2013. Population genomic and genome-wide association studies
of agroclimatic traits in sorghum. PNAS 2013 110: 453-458.
MOSER, G., B. TIER, R.R. CRUMP, M.S. KHATKAR, and H.W. RAADSMA. 2009. A
comparison of five methods to predict genomic breeding values of dairy bulls from
genome-wide SNP markers. Genet. Sel. Evol. 41, 56.
NAS, T.M.S., C.S. CASAL, Jr., Z. LI and S.S. VIRMANI. 2000. Application of Molecular
Markers for Identification of Restorers. Rice Genetics Newsletter, Vol. 20.
International Rice Research Institute.
NAS, T.M.S., D.L. SANCHEZ, G.Q. DIAZ, M.S. MENDIORO, and S.S. VIRMANI.
2005. Pyramiding of thermosensitive genetic male sterility (TGMS) genes and
identification of a candidate tms5 gene in rice. Euphytica 145: 67-75.
NEVES, H.H.R., R. CARVALHEIRO and S.A. QUEIRO. A comparison of statistical
methods for genomic selection in a mice population
PATERSON, A.H. 1996. Making genetic maps. In: A.H. Paterson (Ed.), Genome Mapping
in Plants, pp. 23–39. R. G. Landes Company, San Diego, California; Academic
Press, Austin, Texas.
PATTERSON, H.D. and R. THOMPSON. 1971. Recovery of Inter-Block Information
when Block Sizes are Unequal. Biometrika, Vol. 58, No. 3, pp. 545-554.
PHILIPPINE RICE RESEARCH INSTITUTE. Training on Grain Quality Evaluation.
May 9-10, 2012.
PIEPHO, H. P., J MOHRING, A.E. MELCHINGER, and A. BUCHSE. 2007. BLUP for
phenotypic selection in plant breeding and variety testing. Euphytica, 161(1-
2):209_228, 2007.
![Page 135: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/135.jpg)
115
R DEVELOPMENT CORE TEAM. 2015. R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-
900051-07-0, URL http://www.R-project.org/.
RAFALSKI J.A. 2002. Novel genetic mapping tools in plants: SNPs and LD-based
approaches. Plant Sci. 162: 329–33.
REPINSKI, S.L., K.N. HAYES, J.K. MILLER, C.J. TREXLER and F.A. BLISS. 2011.
Plant Breeding Graduate Education: Opinions about Critical Knowledge,
Experience and Skill Requirements from Public and Private Stakeholders
Worldwide. Crop Science 51: 2325-2336.
ROBINSON, G.K. 1991. That BLUP Is a Good Thing: The Estimation of Random Effects.
Statistical Science,Vol. 6, No.1, 15-61.
SATTARI, M., A. KATHIRESAN, G.B. GREGORIO, J.E. HERNANDEZ, T.M.S. NAS
and S.S. VIRMANI. 2007. Development and use of a two-gene marker-aided
selection system for fertility restorer genes in rice. Euphytica 153: 35-42.
SCHAEFFER, L.R. 2006. Strategy for applying genome-wide selection in dairy cattle J.
Anim. Breed Genet. 123 (4): 218–223.
SEARLE, S.R., G. CASELLA, and C.E. MCCULLOCH. 2006. Variance components.
John Wiley & Sons, Hoboken, NJ.
SEPTININGSIH, E.M., A.M. PAMPLONA, D.L. SANCHEZ, C.N. NEERAJA, G.V.
VERGARA, S. HEUER, A.M. ISMAIl, and D.J. MACKILL. 2009. Development
of submergence-tolerant rice cultivars: the Sub1 locus and beyond. Ann. Bot. 103
(2): 151-160.
SINGH, V. K., A. SINGH, S.P. SINGH, R.K. ELLUR, V. CHOUDHARY, S. SARKEL,
S. DEVINDER, S.G. KRISHNANA, M. NAGARAJAN, K.K. VINOD, U.D.
SINGH, R. RATHORE, S.K. PRASHANTHI, P.K. AGRAWAL, J.C. BHATT, T.
MOHAPATRA, K.V. PRABHU and A.K. SINGH. 2012. Incorporation of blast
resistance into “PRR78”, an elite Basmati rice restorer line, through marker assisted
backcross breeding. Field Crops Research, 128, 8-16.
SPEARMAN, C. 1904. The proof and measurement of association between two things.
American Journal of Psychology 15: 72–101. doi:10.2307/1412159.
SPINDEL J., H. BEGUM, D. AKDEMIR, P. VIRK, B. COLLARD, E. REDOÑA, G.
ATLIN, J.L. JANNINK and S.R. MCCOUCH. 2015. Genomic Selection and
Association Mapping in Rice (Oryza sativa): Effect of Trait Genetic Architecture,
Training Population Composition, Marker Number and Statistical Model on
Accuracy of Rice Genomic Selection in Elite, Tropical Rice Breeding Lines. PLoS
Genet 11(6): e1005350. doi: 10.1371/journal.pgen.1005350.
![Page 136: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/136.jpg)
116
SPRAGUE, G. F. and L.A. TATUM. 1942. General versus specific combining ability in
single crosses of corn. J. Amer. Soc. Agron. 34: 923-32.
STORLIE, E. and G. CHARMET. 2013. Genomic Selection Accuracy using Historical
Data Generated in a Wheat Breeding Program. The Plant Genome, 6(1):1-9.
TANKSLEY, S.D. and C.M. RICK. 1980. Isozymic gene linkage map of the tomato:
Applications in genetics and breeding. Theoretical and Applied Genetics 58(2):
161-170.
TANKSLEY, S.D., 1993. Mapping polygenes. Annu Rev Genet 27: 205–233.
TEICH, A.H. 1984. Heritability of grain yield, plant height and test weight of a population
of winter wheat adapted to Southwestern Ontario. Theor Appl Genet. 1984
May;68(1-2):21-3.
TUKEY, J. 1949. Comparing Individual Means in the Analysis of Variance. Biometrics 5
(2): 99–114.
VAN-ARENDONK, J., B. TIER, and B.P. KINGHORN. 1994. Use of Multiple Genetic
Markers in Prediction of Breeding Values. Genetics, 137(1), 319–329.
VANRADEN, P. M. 2008. Efficient methods to compute genomic predictions. Journal of
dairy science, 91(11):4414.
VANRADEN, P.M., C.P. VAN TASSELL, G.R. WIGGANS, T.S. SONSTEGARD, R.D.
SCHNABEL, J.F. TAYLOR and F.S. SCHENKEL. 2009. Invited review:
Reliability of genomic predictions for North American Holstein Bulls. J. Dairy Sci.
92: 16-24.
VIRK, P.S., FORD-LLOYD, B.V., JACKSON, M.T., POONI, H.S., CLEMENO, T.P. and
NEWBURY, H.J. 1996. Predicting quantitative variation within rice germplasm
using molecular markers. Heredity 76: 296–304.
VIRMANI, S.S. 1999. Exploitation of heterosis for shifting the yield frontier in rice. p.
423-438 in J.G. Coors and S. Pandey (eds.) The egetics and exploitation of heterosis
in crops. Am. Soc. Agron., Crop Sci. Soc. Am., Madison, Wisconsin.
VOLLMANN, J., H. BUERSTMAYR and P. RUCKENBAUER. 1996. Efficient Control
of Spatial Variation in Yield Trials Using Neighbour Plot Residuals. Experimental
Agriculture, 32, pp 185-197.
WHITTAKER, J.C., R. THOMPSON and M.C. Denham. 2000. Marker-assisted selection
using ridge regression. Genet. Res. 75:249–252.
XU, S. 2003. Theoretical basis of the Beavis effect. Genetics 165(4): 2259-2268.
![Page 137: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/137.jpg)
117
XU, Y. and J.H. CROUCH. 2008. Marker-assisted selection in plant breeding: from
publications to practice. Crop Sci. 48: 391–407.
ZHAI, W., W.G. WANG, Y.I. ZHOU, X. LI, X. ZHENG, Q. ZHANG, G. WANG and L.
ZHU. 2002. Breeding bacterial blight-resistant hybrid rice with the cloned bacterial
blight resistance gene Xa21. Molecular Breeding Vol. 8: 285-293
ZHAO, K., TUNG, T.W., EIZENGA, G.C., WRIGHT, M.H., ALI, M.L., PRICE, A.H.,
NORTON, G.J., ISLAM, M.R., REYNOLDS, A., MEZEY, J., MCCLUNG, A.M.,
BUSTAMANTE, C.D. and MCCOUCK, S.R. 2011. Genome-wide association
mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat.
Commun 2:467. doi: 10.1038/ncomms1467.
![Page 138: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/138.jpg)
118
APPENDIX A. Sample script for deriving BLUPs and GCAs implemented in R.
©Mark Nas
Script provided as reference to future students working on genomic selection in crops.
Please cite this manuscript when using this script. Send email to [email protected] for
questions.
library(lme4)
setwd("C:/Mark's Briefcase/Syngenta Thesis/Final analysis")
phenodata <- read.csv("phenodataset.csv", header=T)
attach(phenodata)
#Fit BLUP models, compute combining abilities and computations
#include male and female parents to compute for GCAs
#yield BLUPs and GCAs
yldblupmodel <- lmer(yield~(1|location)+(1|season)
+(1|location:season)+(1|rep)+(1|genotype)+(1|male)+(1|female)
+(1|female:male)+(1|genotype:location)+(1|genotype:season)
+(1|genotype:location:season), data=phenodata)
yldblupsumm <- summary(yldblupmodel) #variance components
capture.output(yldblupsumm, file="yldblupmodel.txt")
yldr <- ranef(yldblupmodel)
yieldblup <- yldr$genotype #(Hybrid yield BLUPs)
write.csv(yieldblup, file="yieldblup.csv")
yldfgca <- yldr$female #(Yield GCAs of female parents)
write.csv(yldfgca, file="yldGCA_female.csv")
yldmgca <- yldr$male #(Yield GCAs of male parents)
write.csv(yldmgca, file="yldGCA_male.csv")
#days to flowering BLUPs and GCAs
dtfblupmodel <- lmer(yield~(1|location)+(1|season)
+(1|location:season)+(1|rep)+(1|genotype)+(1|male)+(1|female)
+(1|female:male)+(1|genotype:location)+(1|genotype:season)
+(1|genotype:location:season), data=phenodata)
dtfblupsumm <- summary(dtfblupmodel) #variance components
capture.output(dtfblupsumm, file="dtfblupmodel.txt")
dtfr <- ranef(dtfblupmodel)
dtfblup <- dtfr$genotype #(Hybrid DTF BLUP)
write.csv(dtfblup, file="dtfblup.csv")
dtffgca <- dtfr$female #(DTF GCAs of female parents)
write.csv(dtffgca, file="dtfGCA_female.csv")
dtfmgca <- dtfr$male #(DTF GCAs of male parents)
write.csv(dtfmgca, file="dtfGCA_male.csv")
![Page 139: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/139.jpg)
119
#plant height BLUPs and GCAs
plthtblupmodel <- lmer(yield~(1|location)+(1|season)
+(1|location:season)+(1|rep)+(1|genotype)+(1|male)+(1|female)
+(1|female:male)+(1|genotype:location)+(1|genotype:season)
+(1|genotype:location:season), data=phenodata)
plthtblupsumm <- summary(plthtblupmodel)
capture.output(plthtblupsumm, file="plthtblupmodel.txt")
plthtr <- ranef(plthtblupmodel)
plthtblup <- plthtr$genotype
write.csv(plthtblup, file="plthtblup.csv")
plthtfgca <- plthtr$female
write.csv(plthtfgca, file="plthtGCA_female.csv")
plthtmgca <- plthtr$male
write.csv(plthtmgca, file="plthtGCA_male.csv")
![Page 140: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/140.jpg)
120
APPENDIX B. Sample script for predicting phenotypes using RR- BLUP implemented
in R.
©Mark Nas, Nonoy Bandillo
Script provided as reference to future students working on genomic selection in crops.
Please cite this manuscript when using this script. Send email to [email protected] for
questions.
#Ridge regression BLUP
library(rrBLUP)
setwd("C:/Mark's Briefcase/Syngenta Thesis/Final analysis")
phenoyldgca <- read.csv('yldcga.csv') #for yield
names(phenoyldgca) <- c('line', 'yldgca')
load(file='genoImputed.rda') #load your marker matrix in {-1,0,1}
#cross validation for 10-fold using GBLUP
G <- A.mat(genoImputed) #calculate additive relationship matrix
gblupyld <- kin.blup(data=phenoyldgca, geno='line', pheno='yldgca',
K=G)
gbyldGEBV <- gblupyld$g
trainsubset <- dim(genoImputed)[1]
set.seed(30109)
xvalgblup <- sample(1:trainsubset, trainsubset)
yldShuff <- phenoyldgca[xvalgblup, ]
gShuff <- G[xvalgblup,xvalgblup]
# Set a 10 fold CV.
count <- 1:12
corVec.gblup <- vector(length=10)
tf.gblup <- matrix(NA,nrow=trainsubset,ncol=1)
for(i in 1:10)
{
yldTrain <- yldShuff
yldTrain[count, 2] <- NA
gValidate <- kin.blup(data=yldTrain, geno='line', pheno='yldgca',
K=gShuff)$g[count]
corVec.gblup[i] <- cor(gValidate, yldShuff[count, 2])
count <- count+12
print(corVec.gblup[i])
}
gblupyldcorrTenfold <- mean(corVec.gblup)
capture.output(gblupyldcorrTenfold, file="subpop1.RRBLUPyldcorr.txt")
![Page 141: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/141.jpg)
121
APPENDIX C. Sample script for predicting phenotypes using Bayesian Ridge
Regression implemented in R.
©Mark Nas, Nonoy Bandillo
Script provided as reference to future students working on genomic selection in crops.
Please cite this manuscript when using this script. Send email to [email protected] for
questions.
#Bayesian Ridge Regression, 10-fold cross-validation
library(BGLR)
setwd("C:/Mark's Briefcase/Syngenta Thesis/Final analysis")
phenoyldgca <- read.csv('yldphenogca.csv') #yield
names(phenoyldgca) <- c('parent', 'yldgca')
load(file='genoImputed.rda')
G <- A.mat(genoImputed) #calculate additive relationship matrix
gblupyld <- kin.blup(data=phenoyldgca, geno='parent', pheno='yldgca',
K=G)
gbyldGEBV <- gblupyld$g
set.seed(30109)
trainsubset <- dim(genoImputed)[1]
xvalBayesRR <- sample(1:trainsubset, trainsubset)
yldShuff <- phenoyldgca[xvalBayesRR, ]
snpShuff <- genoimputed[xvalBayesRR, ]
Gshuff <- G[xvalBayesRR,xvalBayesRR]
count <- 1:12
corVec.brr <- vector(length=10)
tf.brr <- matrix(NA,nrow=trainsubset,ncol=1)
ETA <- list(list(X=snpShuff, model='BRR', probIn=.10))
for(i in 1:10)
{
yldTrain <- yldShuff
yldTrain[count, 2] <- NA
modelBRR <- BGLR(y=yldTrain[,2], ETA=ETA, burnIn = 1000, nIter=2000,
verbose=FALSE)
BRRGebvs <- modelBRR$yHat[count]
corVec.brr[i] <- cor(BRRGebvs, yldShuff[count, 2])
tf.brr[count,] <- BRRGebvs
if(i<10) count = count + 12 else count = count + 13
print(corVec.brr[i])
}
BRRyldcorrTenfold <- mean(corVec.brr) #mean correlation
capture.output(BRRyldcorrTenfold, file = "BRRyldcorr10fold.txt")
![Page 142: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/142.jpg)
122
APPENDIX D. Sample script for predicting phenotypes using Bayesian CPi implemented
in R.
©Mark Nas, Nonoy Bandillo
Script provided as reference to future students working on genomic selection in crops.
Please cite this manuscript when using this script. Send email to [email protected] for
questions.
#Bayesian CPi, 10-fold cross validation
library(BGLR)
setwd("C:/Mark's Briefcase/Syngenta Thesis/Final analysis")
phenoyldgca <- read.csv('yldphenogca.csv') #yield
names(phenoyldgca) <- c('parent', 'yldgca')
load(file='genoImputed.rda')
G <- A.mat(genoImputed) #calculate additive relationship matrix
gblupyld <- kin.blup(data=phenoyldgca, geno='parent', pheno='yldgca',
K=G)
gbyldGEBV <- gblupyld$g
set.seed(30109)
trainsubset <- dim(genoimputed)[1]
xvalBayesCpi <- sample(1:trainsubset, trainsubset)
yldShuff <- phenoyldgca[xvalBayesCpi, ]
snpShuff <- genoimputed[xvalBayesCpi, ]
Gshuff <- G[xvalBayesCpi,xvalBayesCpi]
count <- 1:12
corVec.Cpi <- vector(length=10)
tf.Cpi <- matrix(NA,nrow=trainsubset,ncol=1)
ETA <- list(list(X=snpShuff, model='BayesC', probIn=.10))
for(i in 1:10)
{
yldTrain <- yldShuff
yldTrain[count, 2] <- NA
modelCpi <- BGLR(y=yldTrain[,2], ETA=ETA, burnIn = 1000, nIter=2000,
verbose=FALSE)
CpiGebvs <- modelCpi$yHat[count]
corVec.Cpi[i] <- cor(CpiGebvs, yldShuff[count, 2])
tf.Cpi[count,] <- CpiGebvs
if(i<10) count = count + 12 else count = count + 13
print(corVec.Cpi[i])
}
CPiyldcorrTenfold <- mean(corVec.Cpi) #mean correlation
capture.output(CPiyldcorrTenfold, file = "CPiyldcorr10fold.txt")
![Page 143: ii - Graduate School imgs/SAMPLE MANUSCRIPT.pdfselection and phenotypic analysis. He has also an interest in the application of good ... Team performed a great job in producing the](https://reader031.vdocuments.us/reader031/viewer/2022011816/5e738070c714af6fcf3a6427/html5/thumbnails/143.jpg)
123
APPENDIX E. Sample script for predicting phenotypes using Bayesian Lasso
implemented in R.
©Mark Nas, Nonoy Bandillo
Script provided as reference to future students of UPLB working on genomic selection in
crops. Please cite this manuscript when using this script. Send email to [email protected]
for questions.
#Bayesian Lasso, 10-fold cross validation
library(BGLR)
setwd("C:/Mark's Briefcase/Syngenta Thesis/Final analysis")
phenoyldgca <- read.csv('yldphenogca.csv') #yield
names(phenoyldgca) <- c('parent', 'yldgca')
load(file='genoImputed.rda')
G <- A.mat(genoImputed) #calculate additive relationship matrix
gblupyld <- kin.blup(data=phenoyldgca, geno='parent', pheno='yldgca',
K=G)
gbyldGEBV <- gblupyld$g
set.seed(30109)
trainsubset <- dim(genoimputed)[1]
xvalBayesLas <- sample(1:trainsubset, trainsubset)
yldShuff <- phenoyldgca[xvalBayesLas, ]
snpShuff <- genoimputed[xvalBayesLas, ]
Gshuff <- G[xvalBayesLas,xvalBayesLas]
count <- 1:12
corVec.Las <- vector(length=10)
tf.Las <- matrix(NA,nrow=trainsubset,ncol=1)
ETA <- list(MRK=list(X=snpShuff, type="gamma", lambda=10, shape=1.1,
rate=0.5, model="BL"))
for(i in 1:10)
{
yldTrain <- yldShuff
yldTrain[count, 2] <- NA
modelLas <- BGLR(y=yldTrain[,2], ETA=ETA, burnIn = 1000, nIter=2000,
verbose=FALSE)
LasGebvs <- modelLas$yHat[count]
corVec.Las[i] <- cor(LasGebvs, yldShuff[count, 2])
tf.Las[count,] <- LasGebvs
if(i<10) count = count + 12 else count = count + 13
print(corVec.Las)
}
LasyldcorrTenfold <- mean(corVec.Las) #mean correlation
capture.output(LasyldcorrTenfold, file = "Lasyldcorr10fold.txt")