the statistical models of genomic prediction...mendelian sampling sire child 1 child 2 child 3 child...
TRANSCRIPT
The statistical models of genomic prediction
John M Hickey, Chris Gaynor, Gregor Gorjanc
www.alphagenes.roslin.ed.ac.uk
@hickeyjohn
Genomic selection
Goddard & Hayes Nat. Rev. Genet. 2009
“GS is the quantitative geneticists revenge on molecular genetics” - A. Archibald
Relationship within and between training and prediction individuals
Relationships between TP and selection candidates leveraged for prediction
Selection candidates Training Pop.
Sel
ectio
n ca
ndid
ates
Tr
aini
ng P
op.
226240244248254542316323823619523522223223723923418480765511594103105444058167881461291947117513625113811551631413148373524223416576133180916833174166158114561161016118591704359118121124102211871509739651116711947751232021922092042062501541492039216019983182208179205201207454168425121910912815916122510812513711726107848102162242201981811341431878590521771275327195714162462021171301386610014515111048916413546491701511321213182412451891402471061475022721415224319036176781127238212253213301692916218819319623332156237760249956421197218242131622281481101781711572302524531446996737417298791732002818614219118322321582251153229998622121793122126210139120
120139210126122932172218699229153251822152231831911421862820017379981727473966914435425223015717117811014822862131242218197126495249607723156322331961931881622916930213253212387211278176361902431522142275014710624714018924524118131213215117049461351648910411151451006613813017212024616145719275312717752908518714313418119822022421610884107261171371251082251611591281092195142684145207201205179208182831991609220314915425020620420919220212375471196711165399715087211102124121118594370911856110111656114158166174331689180133671653422243537483114116315581113251361757119412914688167584044105103941155576801842342392372322222351952362386323154254248244240226
Haplotypes Genomic relationship matrix
Useful things with matrices
• Counting how many animals passing a scales
• Summing the animals weight
x =111
!
"
###
$
%
&&&
x'x = 1 1 1!"
#$
111
!
"
%%%
#
$
&&&= 1×1( )+ 1×1( )+ 1×1( ) = 3
y =101520
!
"
###
$
%
&&&
x'y = 1 1 1!"
#$
101520
!
"
%%%
#
$
&&&= 1×10( )+ 1×15( )+ 1×20( ) = 45
Useful things with matrices
• Averaging the animals weight
x'y = 1 1 1!"
#$
101520
!
"
%%%
#
$
&&&= 1×10( )+ 1×15( )+ 1×20( ) = 45
x'x = 1 1 1!"
#$
111
!
"
%%%
#
$
&&&= 1×1( )+ 1×1( )+ 1×1( ) = 3
x'yx'x
= x'x[ ]-1 x'y = b
453=13× 45=15
Useful things with matrices
• Summing total weight in males and females
• Weight of average male and average female
X =110
001
!
"
####
$
%
&&&&
X'y =1 1 0
0 0 1
!
"
###
$
%
&&&
101520
!
"
###
$
%
&&&= 25
20
!
"#
$
%&y =
101520
!
"
###
$
%
&&&
X'yX'X
= X'X[ ]-1 X'y = b2520
!
"#
$
%&
2 00 1
!
"#
$
%&
=
12
0
0 11
!
"
####
$
%
&&&&
2520
!
"#
$
%&=
12.520
!
"#
$
%&
b11 =12×25
"
#$
%
&'+ 0×20( ) =12.5
Shrinkage – Random Wand
• Ridge regression • BayesA • BayesB • BayesC • BayesLasso • BayesR • FnBayesB
• All differ in the shrinkage parameter – Some measure of our belief
Lets put in a little bit of genetics
• Diploid genomes
– Markers are AA, Aa, aA, or aa
– Label a=0 and A=1
– Thus the dosage is: • AA=2 • Aa=1 • aA=1 • aa=0
Mixed model equations
• Sample mean 0.75 • True intercept is 0.19 • True effect is 0.50
X'X X'ZZ'X Z'Z
!
"#
$
%&
-1X'yZ'y
!
"##
$
%&&= b
u
!
"##
$
%&&
X'X X'ZZ'X Z'Z+ Iλ
!
"#
$
%&
-1X'yZ'y
!
"##
$
%&&= b
u
!
"##
$
%&&
y =
0.100.701.300.65
1.250.120.681.20
!
"
###########
$
%
&&&&&&&&&&&
Z =
0121
2012
!
"
###########
$
%
&&&&&&&&&&&
X =
1111
1111
!
"
###########
$
%
&&&&&&&&&&&
LHS = 8 99 15
!
"#
$
%&
LHS = 8 99 15.85
!
"#
$
%&
b = 0.110.57
!
"#
$
%&
RHS = 69.53
!
"#
$
%&
RHS = 69.53
!
"#
$
%&
b = 0.200.49
!
"#
$
%& λ = 0.85
TBV =
0.00.51.00.5
1.00.00.51.0
!
"
###########
$
%
&&&&&&&&&&&
A range of shrinkage values
• If Lambda =1000 the SNP solution =0.00 • And the solution for the intercept = 0.75
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0 0.2 0.4 0.6 0.8 1 1.2
BetaHat
Lambda
Shrinkage versus more data
• Two data sets • One with 8 animals, the other with 80 animals • Compare effect of Lambda in both
X'X X'ZZ'X Z'Z+ Iλ
!
"#
$
%&
-1X'yZ'y
!
"##
$
%&&= b
u
!
"##
$
%&&
LHS = 8 99 15
!
"#
$
%& RHS = 6
9.53
!
"#
$
%& b = 0.11
0.57
!
"#
$
%& LHS = 80 85
85 263
!
"#
$
%& RHS = 57.4
147.94
!
"#
$
%& b = 0.18
0.50
!
"#
$
%&
No Lambda
Lambda = 5.0 (extremely high value)
LHS = 8 99 20
!
"#
$
%& b = 0.43
0.28
!
"#
$
%& LHS = 80 85
85 268
!
"#
$
%& b = 0.18
0.50
!
"#
$
%&RHS = 57.4
147.94
!
"#
$
%&RHS = 6
9.53
!
"#
$
%&
Mendelian sampling
Sire
Child 1 Child 2 Child 3 Child 4
Child 5 Child 6
In theory you can have sibs that are genetically unrelated
This is why I am different from my brother
0
20
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Additive genetic relation
HSFSm=4A =
1.00 0.50 0.50 0.50 0.500.50 1.00 0.25 0.25 0.250.50 0.25 1.00 0.25 0.250.50 0.25 0.25 1.00 0.250.50 0.25 0.25 0.25 1.00
!
"
######
$
%
&&&&&&
G =
1.00 0.50 0.50 0.50 0.500.50 1.00 0.20 0.30 0.200.50 0.20 1.00 0.20 0.300.50 0.30 0.20 1.00 0.200.50 0.20 0.30 0.20 1.00
!
"
######
$
%
&&&&&&
And “hidden” relationships
Population version
10 animal example
• 10 animal example – 2 unrelated sire families (FamA and FamB) – Dam’s are unrelated
• In each family – 2 half sibs used in prediction set – Sire and 2 half sibs used in training set – 5 individuals from other family used in training set
• Purpose – Show prediction due to parent average versus MS – Pedigree versus genomics – Close versus distant relatives – Show shrinkage
Pedigree
ID Sire Dam
1 0 02 1 03 1 04 1 05 1 06 0 07 6 08 6 09 6 0
10 6 0
Genetic relationships
• Captured in a matrix A – traditionally built using pedigree – Relationship between each pair of individuals
• Range from 0 to 2
– Inbred individuals have a relationship with themselves of 2
– Pair of completely unrelated individuals have a coefficient of relationship of 0
– Full sib have a relationship of 0.5 • If parents are not related
– Half sibs have a relationship of 0.25 • If parents are not related
“some animals are more equal than others”…….. even if the additive genetic relationship is the same
0
20
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Additive genetic relation
HSFSm=4
Genomic Relationships
e.g. actual relationship between HS can vary between 0.2 and 0.3
Lets do 5 animals first ID Sire Dam 1 0 02 1 03 1 04 1 05 1 0
A =
1.00 0.50 0.50 0.50 0.500.50 1.00 0.25 0.25 0.250.50 0.25 1.00 0.25 0.250.50 0.25 0.25 1.00 0.250.50 0.25 0.25 0.25 1.00
!
"
######
$
%
&&&&&&
G =
1.00 0.50 0.50 0.50 0.500.50 1.00 0.20 0.30 0.200.50 0.20 1.00 0.20 0.300.50 0.30 0.20 1.00 0.200.50 0.20 0.30 0.20 1.00
!
"
######
$
%
&&&&&&
Pedigree tells: Which family you belong to
Genomics tells: Which family you belong to Which sib you are more closely related to And shows “hidden” relationships (We will see the last bit with 10 animals) Linkage
10 animals for comparison
A =
1.00 0.50 0.50 0.50 0.50 0.00 0.00 0.00 0.00 0.000.50 1.00 0.25 0.25 0.25 0.00 0.00 0.00 0.00 0.000.50 0.25 1.00 0.25 0.25 0.00 0.00 0.00 0.00 0.000.50 0.25 0.25 1.00 0.25 0.00 0.00 0.00 0.00 0.000.50 0.25 0.25 0.25 1.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 0.00 0.00 1.00 0.50 0.50 0.50 0.500.00 0.00 0.00 0.00 0.00 0.50 1.00 0.25 0.25 0.250.00 0.00 0.00 0.00 0.00 0.50 0.25 1.00 0.25 0.250.00 0.00 0.00 0.00 0.00 0.50 0.25 0.25 1.00 0.250.00 0.00 0.00 0.00 0.00 0.50 0.25 0.25 0.25 1.00
!
"
##############
$
%
&&&&&&&&&&&&&&
ID Sire Dam 1 0 02 1 03 1 04 1 05 1 06 0 07 6 08 6 09 6 0
10 6 0
G =
1.00 0.50 0.50 0.50 0.50 0.02 0.02 0.02 0.02 0.020.50 1.00 0.20 0.30 0.20 0.02 0.01 0.03 0.01 0.030.50 0.20 1.00 0.20 0.30 0.02 0.03 0.01 0.03 0.010.50 0.30 0.20 1.00 0.20 0.02 0.01 0.03 0.01 0.030.50 0.20 0.30 0.20 1.00 0.02 0.03 0.01 0.03 0.010.02 0.02 0.02 0.02 0.02 1.00 0.50 0.50 0.50 0.500.02 0.01 0.03 0.01 0.03 0.50 1.00 0.20 0.30 0.200.02 0.03 0.01 0.03 0.01 0.50 0.20 1.00 0.20 0.300.02 0.01 0.03 0.01 0.03 0.50 0.30 0.20 1.00 0.200.02 0.03 0.01 0.03 0.01 0.50 0.20 0.30 0.20 1.00
!
"
##############
$
%
&&&&&&&&&&&&&&
Family relationships Family relationships Segregation within family Missing pedigree = “Unrelated” Linkage Linkage disequilibrium
5 animal example
X'X X'ZZ'X Z'Z+G-1λ
!
"##
$
%&&
-1X'yZ'y
!
"##
$
%&&= b
u
!
"##
$
%&&
X'X X'ZZ'X Z'Z+A-1λ
!
"##
$
%&&
-1X'yZ'y
!
"##
$
%&&= b
u
!
"##
$
%&&
A =
1.00 0.50 0.50 0.50 0.500.50 1.00 0.25 0.25 0.250.50 0.25 1.00 0.25 0.250.50 0.25 0.25 1.00 0.250.50 0.25 0.25 0.25 1.00
!
"
######
$
%
&&&&&&
G =
1.00 0.50 0.50 0.50 0.500.50 1.00 0.20 0.30 0.200.50 0.20 1.00 0.20 0.300.50 0.30 0.20 1.00 0.200.50 0.20 0.30 0.20 1.00
!
"
######
$
%
&&&&&&
LHS−1 =
1.55 -1.27 -1.18 -1.18 -0.64 -0.64-1.27 1.64 1.09 1.09 0.82 0.82-1.18 1.09 1.53 0.93 0.55 0.55-1.18 1.09 0.93 1.53 0.55 0.55-0.64 0.82 0.55 0.55 1.91 0.41-0.64 0.82 0.55 0.55 0.41 1.91
"
#
$$$$$$$
%
&
'''''''
LHS−1 =
1.52 -1.26 -1.15 -1.15 -0.63 -0.63-1.26 1.63 1.07 1.07 0.81 0.81-1.15 1.07 1.49 0.88 0.58 0.50-1.15 1.07 0.88 1.49 0.50 0.58-0.63 0.81 0.58 0.50 1.90 0.32-0.63 0.81 0.50 0.58 0.32 1.90
"
#
$$$$$$$
%
&
'''''''
Solutions =
0.000.00-1.201.200.000.00
!
"
#######
$
%
&&&&&&&
Solutions =
0.000.00-1.231.23-0.150.15
!
"
#######
$
%
&&&&&&&
TrueValues =
0.000.00−2.002.00−2.002.00
"
#
$$$$$$$$
%
&
''''''''
y =
0.00-2.002.00MissingMissing
!
"
######
$
%
&&&&&&
RHS =
0.000.00-2.002.00##
!
"
#######
$
%
&&&&&&&
5 animal example
• When we do BLUP we get an estimated breeding value (EBV)
• An EBV is simply a weighted average of all the phenotypic data available – With simultaneous correction for all other effects
• The weightings are determined by the inverse of the LHS – This is primarily driven by the relationship matrix
5 animal example
X'X X'ZZ'X Z'Z+G-1λ
!
"##
$
%&&
-1X'yZ'y
!
"##
$
%&&= b
u
!
"##
$
%&&
X'X X'ZZ'X Z'Z+A-1λ
!
"##
$
%&&
-1X'yZ'y
!
"##
$
%&&= b
u
!
"##
$
%&&
A =
1.00 0.50 0.50 0.50 0.500.50 1.00 0.25 0.25 0.250.50 0.25 1.00 0.25 0.250.50 0.25 0.25 1.00 0.250.50 0.25 0.25 0.25 1.00
!
"
######
$
%
&&&&&&
G =
1.00 0.50 0.50 0.50 0.500.50 1.00 0.20 0.30 0.200.50 0.20 1.00 0.20 0.300.50 0.30 0.20 1.00 0.200.50 0.20 0.30 0.20 1.00
!
"
######
$
%
&&&&&&
LHS−1 =
1.55 -1.27 -1.18 -1.18 -0.64 -0.64-1.27 1.64 1.09 1.09 0.82 0.82-1.18 1.09 1.53 0.93 0.55 0.55-1.18 1.09 0.93 1.53 0.55 0.55-0.64 0.82 0.55 0.55 1.91 0.41-0.64 0.82 0.55 0.55 0.41 1.91
"
#
$$$$$$$
%
&
'''''''
LHS−1 =
1.52 -1.26 -1.15 -1.15 -0.63 -0.63-1.26 1.63 1.07 1.07 0.81 0.81-1.15 1.07 1.49 0.88 0.58 0.50-1.15 1.07 0.88 1.49 0.50 0.58-0.63 0.81 0.58 0.50 1.90 0.32-0.63 0.81 0.50 0.58 0.32 1.90
"
#
$$$$$$$
%
&
'''''''
RHS =
0.000.00-2.002.00##
!
"
#######
$
%
&&&&&&&
Solutions =
0.000.00-1.201.200.000.00
!
"
#######
$
%
&&&&&&&
Solutions =
0.000.00-1.231.23-0.150.15
!
"
#######
$
%
&&&&&&&
TrueValues =
0.000.00−2.002.00−2.002.00
"
#
$$$$$$$$
%
&
''''''''
y =
0.00-2.002.00MissingMissing
!
"
######
$
%
&&&&&&
5 animal example
LHS−1 =
1.55 -1.27 -1.18 -1.18 -0.64 -0.64-1.27 1.64 1.09 1.09 0.82 0.82-1.18 1.09 1.53 0.93 0.55 0.55-1.18 1.09 0.93 1.53 0.55 0.55-0.64 0.82 0.55 0.55 1.91 0.41-0.64 0.82 0.55 0.55 0.41 1.91
"
#
$$$$$$$
%
&
'''''''
LHS−1 =
1.52 -1.26 -1.15 -1.15 -0.63 -0.63-1.26 1.63 1.07 1.07 0.81 0.81-1.15 1.07 1.49 0.88 0.58 0.50-1.15 1.07 0.88 1.49 0.50 0.58-0.63 0.81 0.58 0.50 1.90 0.32-0.63 0.81 0.50 0.58 0.32 1.90
"
#
$$$$$$$
%
&
'''''''
RHS =
0.000.00-2.002.00##
!
"
#######
$
%
&&&&&&&
Solutions =
0.000.00-1.201.200.000.00
!
"
#######
$
%
&&&&&&&
Solutions =
0.000.00-1.231.23-0.150.15
!
"
#######
$
%
&&&&&&&
TrueValues =
0.000.00−2.002.00−2.002.00
"
#
$$$$$$$$
%
&
''''''''
y =
0.00-2.002.00MissingMissing
!
"
######
$
%
&&&&&&
uSon4 = LHSSon4,Mean−1 ×RHSMean( )+ LHSSon4,Sire
−1 ×RHSSire( )+ LHSSon4,Son2−1 ×RHSSon2( )+ LHSSon4,Son3
−1 ×RHSSon3( )
uSon4 = −0.63×0.00( )+ 0.81×0.00( )+ 0.58×−2.00( )+ 0.50×2.00( ) = −0.15→Genomic
uSon4 = −0.64×0.00( )+ 0.82×0.00( )+ 0.55×−2.00( )+ 0.55×2.00( ) = 0.00→ Pedigree
A =
1.00 0.50 0.50 0.50 0.500.50 1.00 0.25 0.25 0.250.50 0.25 1.00 0.25 0.250.50 0.25 0.25 1.00 0.250.50 0.25 0.25 0.25 1.00
!
"
######
$
%
&&&&&&
G =
1.00 0.50 0.50 0.50 0.500.50 1.00 0.20 0.30 0.200.50 0.20 1.00 0.20 0.300.50 0.30 0.20 1.00 0.200.50 0.20 0.30 0.20 1.00
!
"
######
$
%
&&&&&&
10 animals
A =
1.00 0.50 0.50 0.50 0.50 0.00 0.00 0.00 0.00 0.000.50 1.00 0.25 0.25 0.25 0.00 0.00 0.00 0.00 0.000.50 0.25 1.00 0.25 0.25 0.00 0.00 0.00 0.00 0.000.50 0.25 0.25 1.00 0.25 0.00 0.00 0.00 0.00 0.000.50 0.25 0.25 0.25 1.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 0.00 0.00 1.00 0.50 0.50 0.50 0.500.00 0.00 0.00 0.00 0.00 0.50 1.00 0.25 0.25 0.250.00 0.00 0.00 0.00 0.00 0.50 0.25 1.00 0.25 0.250.00 0.00 0.00 0.00 0.00 0.50 0.25 0.25 1.00 0.250.00 0.00 0.00 0.00 0.00 0.50 0.25 0.25 0.25 1.00
!
"
##############
$
%
&&&&&&&&&&&&&&
ID Sire Dam 1 0 02 1 03 1 04 1 05 1 06 0 07 6 08 6 09 6 0
10 6 0
G =
1.00 0.50 0.50 0.50 0.50 0.02 0.02 0.02 0.02 0.020.50 1.00 0.20 0.30 0.20 0.02 0.01 0.03 0.01 0.030.50 0.20 1.00 0.20 0.30 0.02 0.03 0.01 0.03 0.010.50 0.30 0.20 1.00 0.20 0.02 0.01 0.03 0.01 0.030.50 0.20 0.30 0.20 1.00 0.02 0.03 0.01 0.03 0.010.02 0.02 0.02 0.02 0.02 1.00 0.50 0.50 0.50 0.500.02 0.01 0.03 0.01 0.03 0.50 1.00 0.20 0.30 0.200.02 0.03 0.01 0.03 0.01 0.50 0.20 1.00 0.20 0.300.02 0.01 0.03 0.01 0.03 0.50 0.30 0.20 1.00 0.200.02 0.03 0.01 0.03 0.01 0.50 0.20 0.30 0.20 1.00
!
"
##############
$
%
&&&&&&&&&&&&&&
Family relationships Family relationships Segregation within family Missing pedigree = “Unrelated” Linkage Linkage disequilibrium
Z'Z+A-1λ!" #$-1Z'y[ ] = u[ ] Z'Z+G-1λ!" #$
-1Z'y[ ] = u[ ]
10 animal example
LHS−1 =
0.59 0.12 0.12 0.00 0.00 0.00 0.29 0.29 0.00 0.000.12 0.62 0.02 0.00 0.00 0.00 0.06 0.06 0.00 0.000.12 0.02 0.62 0.00 0.00 0.00 0.06 0.06 0.00 0.000.00 0.00 0.00 0.59 0.12 0.12 0.00 0.00 0.29 0.290.00 0.00 0.00 0.12 0.62 0.02 0.00 0.00 0.06 0.060.00 0.00 0.00 0.12 0.02 0.62 0.00 0.00 0.06 0.060.29 0.06 0.06 0.00 0.00 0.00 1.65 0.15 0.00 0.000.29 0.06 0.06 0.00 0.00 0.00 0.15 1.65 0.00 0.000.00 0.00 0.00 0.29 0.06 0.06 0.00 0.00 1.65 0.150.00 0.00 0.00 0.29 0.06 0.06 0.00 0.00 0.15 1.65
"
#
$$$$$$$$$$$$$$
%
&
''''''''''''''
RHS =
0-22201822####
!
"
##############
$
%
&&&&&&&&&&&&&&
A G1 2 3 6 7 8 4 5 9 10 1 2 3 6 7 8 4 5 9 10
1 1.00 0.50 0.50 0.00 0.00 0.00 0.50 0.50 0.00 0.00 1 1.00 0.50 0.50 0.02 0.02 0.02 0.50 0.50 0.02 0.022 0.50 1.00 0.25 0.00 0.00 0.00 0.25 0.25 0.00 0.00 2 0.50 1.00 0.20 0.02 0.01 0.03 0.30 0.20 0.01 0.033 0.50 0.25 1.00 0.00 0.00 0.00 0.25 0.25 0.00 0.00 3 0.50 0.20 1.00 0.02 0.03 0.01 0.20 0.30 0.03 0.016 0.00 0.00 0.00 1.00 0.50 0.50 0.00 0.00 0.50 0.50 6 0.02 0.02 0.02 1.00 0.50 0.50 0.02 0.02 0.50 0.507 0.00 0.00 0.00 0.50 1.00 0.25 0.00 0.00 0.25 0.25 7 0.02 0.01 0.03 0.50 1.00 0.20 0.01 0.03 0.30 0.208 0.00 0.00 0.00 0.50 0.25 1.00 0.00 0.00 0.25 0.25 8 0.02 0.03 0.01 0.50 0.20 1.00 0.03 0.01 0.20 0.304 0.50 0.25 0.25 0.00 0.00 0.00 1.00 0.25 0.00 0.00 4 0.50 0.30 0.20 0.02 0.01 0.03 1.00 0.20 0.02 0.025 0.50 0.25 0.25 0.00 0.00 0.00 0.25 1.00 0.00 0.00 5 0.50 0.20 0.30 0.02 0.03 0.01 0.20 1.00 0.01 0.039 0.00 0.00 0.00 0.50 0.25 0.25 0.00 0.00 1.00 0.25 9 0.02 0.01 0.03 0.50 0.30 0.20 0.02 0.01 1.00 0.2010 0.00 0.00 0.00 0.50 0.25 0.25 0.00 0.00 0.25 1.00 10 0.02 0.03 0.01 0.50 0.20 0.30 0.02 0.03 0.20 1.00
Solutions =
0.00-1.201.2016.4714.0916.490.000.008.248.24
!
"
##############
$
%
&&&&&&&&&&&&&&
Solutions =
0.09-1.091.3516.5813.9016.340.250.378.168.41
!
"
##############
$
%
&&&&&&&&&&&&&&
TrueBreedingValues =
0-22201822-221822
!
"
##############
$
%
&&&&&&&&&&&&&&
0.5853 0.1219 0.1219 0.0012 0.0017 0.0017 0.2926 0.2926 0.0040 0.00400.1219 0.6247 0.0094 0.0017 -0.0006 0.0053 0.0992 0.0225 -0.0014 0.01280.1219 0.0094 0.6247 0.0017 0.0053 -0.0006 0.0225 0.0992 0.0128 -0.00140.0012 0.0017 0.0017 0.5853 0.1219 0.1219 0.0040 0.0040 0.2926 0.29260.0017 -0.0006 0.0053 0.1219 0.6247 0.0094 -0.0014 0.0128 0.0992 0.02250.0017 0.0053 -0.0006 0.1219 0.0094 0.6247 0.0128 -0.0014 0.0225 0.09920.2926 0.0992 0.0225 0.0040 -0.0014 0.0128 1.6380 0.0539 0.0167 0.01080.2926 0.0225 0.0992 0.0040 0.0128 -0.0014 0.0539 1.6380 -0.0092 0.03670.0040 -0.0014 0.0128 0.2926 0.0992 0.0225 0.0167 -0.0092 1.6380 0.05390.0040 0.0128 -0.0014 0.2926 0.0225 0.0992 0.0108 0.0367 0.0539 1.6380
LHS−1 =
10 animal example
uSonA4 = LHSSonA4,SireA−1 ×RHSSireA( )+ LHSSonA4,SonA2
−1 ×RHSSonA2( )+ LHSSonA4,SonA3−1 ×RHSSonA3( )
uSonA4 = LHSSonA4,SireA−1 ×RHSSireA( )+ LHSSonA4,SonA2
−1 ×RHSSonA2( )+ LHSSonA4,SonA3−1 ×RHSSonA3( )+ LHSSonA4,SireB
−1 ×RHSSireB( )+ LHSSonA4,SonB6−1 ×RHSSonB6( )+ LHSSonA4,SonB7
−1 ×RHSSonB7( )
uSon4 = 0.29×0.00( )+ 0.06×−2.00( )+ 0.06×2.00( ) = 0.00→ Pedigree
uSon4 = 0.2926×0.00( )+ 0.0992×−2.00( )+ 0.0225×2.00( )+ 0.0040×20.00( )+ −0.0014×18.00( )+ 0.00128×22.00( ) = 0.18→Genomic
Phenotype is missing for other non-zero coefficient
Matrix Inversion
• http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/video-lectures/lecture-3-multiplication-and-inverse-matrices/
• Minute 37 of this video from Gilbert Strang
• Gauss-Jordan Elimination
Inversion by Gauss-Jordan
A = 1 32 7
!
"#
$
%&
1 32 7
1 00 1
!
"
##
$
%
&&
A−1 = 7 −3−2 1
"
#$
%
&' I = 1 0
0 1
!
"#
$
%&
A-1A = I = AA-1
1 30 1
1 0−2 1
"
#
$$
%
&
''
1 00 1
7 −3−2 1
"
#
$$
%
&
''
Move 1 Subtract 2 of Row 1 from Row 2
Move 2 Subtract 3 of Row 2 from Row 1
Useful things with matrices
• Averaging the animals weight
x'y = 1 1 1!"
#$
101520
!
"
%%%
#
$
&&&= 1×10( )+ 1×15( )+ 1×20( ) = 45
x'x = 1 1 1!"
#$
111
!
"
%%%
#
$
&&&= 1×1( )+ 1×1( )+ 1×1( ) = 3
x'yx'x
= x'x[ ]-1 x'y = b
453=13× 45=15
Gauss Seidel Residual Update
• Easy efficient way to solve and understand genomic prediction equations
• Form – X’X (diagonal) – Form X’y – Initialize values for beta’s – Assume current values of beta-i’s are correct – Form new y vector (called e) based on the residuals – Estimate new solution for betai (Xi’e divided by Xi’Xi) – Repeat until convergence
• Simple extension to Bayesian model
Legarra and Misztal (JDS 2008)
Excel Sheet
• Lots of little examples with Excel
GBLUP versus other methods
• Genomic BLUP is the simplest method to do genomic evaluations
• Algebraically identical to ridge regression
• Ridge regression treats each marker as a random effect
• Ridge regression has the same shrinkage parameter for each marker
• Other methods allow heterogeneous shrinkage parameters
Brief description of SNP models
• Genomic selection prediction models treat markers as random effects
• MARS treats markers as fixed effects
• Random effects have two benefits – Trick to allow all markers be fitted simultaneously – Shrinkage
• Fixed effect models overestimate marker effects • Random effects models correct for this overestimation by
shrinking marker effects back towards the mean of all marker effects
• Shrinkage is proportional to the uncertainty in the marker effect (and a statistical prior)
– More uncertainty = more shrinkage towards the mean – More information to estimate effect = less shrinkage
• Ridge regression – All SNP’s in the model – All have equal shrinkage parameter – Shrinkage parameter is estimated or set apriori
• BayesA – All SNP’s in the model – Each SNP has unique shrinkage parameter – Each shrinkage parameter estimated – Shape and scale parameters are fixed – Problem is it cannot shrink to zero
Brief description of SNP models
• BayesLasso – Similar to BayesA – Each SNP has unique shrinkage parameter – Each shrinkage parameter estimated – Shape and scale parameters are estimated – Use inverse Gaussian distribution instead of inverse chi square
which allows greater shrinkage towards zero
Brief description of SNP models
• BayesB – Similar to BayesA except that proportion of 1- π SNPs
are in the model – Thus can shrink SNPs to zero – Mixture model
• BayesCpi – Has similarities to SnpBlup and BayesB – Estimates π – All SNP have equal shrinkage parameter – Estimates shrinkage parameter – Mixture model
Brief description of SNP models
Summary of SNP models
• Ridge regression SNP have EQUAL shrinkage parameter
• BayesA/BayesLasso achieve shrinkage • Unequal shrinkage parameter for all SNPs
• BayesB achieves shrinkage • Only including the proportion of SNPs in each round • These SNPs have UNEQUAL shrinkage parameter
• BayesCpi achieves shrinkage – Only including the proportion of SNPs in each round – These SNPs have EQUAL shrinkage parameter
• Models have an equivalence to genomic relationship matrix G = MHM’
Non-linear models
• Reproducing kernel Hilbert space
• Neural networks
• Basically these bend the relationships in the genomic relationship matrix
• Close relatives get more weight
• Distant relatives less weight
• Capture epistatic interactions that may be shared by close relatives but not by distant relatives
• Perhaps useful for advanced yield trials
• I favour simpler models – Prevents a pointless debate about which model – Genetic improvement is an additive thing