the statistical models of genomic prediction...mendelian sampling sire child 1 child 2 child 3 child...

42
The statistical models of genomic prediction John M Hickey, Chris Gaynor, Gregor Gorjanc www.alphagenes.roslin.ed.ac.uk @hickeyjohn

Upload: others

Post on 27-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

The statistical models of genomic prediction

John M Hickey, Chris Gaynor, Gregor Gorjanc

www.alphagenes.roslin.ed.ac.uk

@hickeyjohn

Page 2: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This
Page 3: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

Genomic selection

Goddard & Hayes Nat. Rev. Genet. 2009

“GS is the quantitative geneticists revenge on molecular genetics” - A. Archibald

Page 4: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

Relationship within and between training and prediction individuals

Relationships between TP and selection candidates leveraged for prediction

Selection candidates Training Pop.

Sel

ectio

n ca

ndid

ates

Tr

aini

ng P

op.

226240244248254542316323823619523522223223723923418480765511594103105444058167881461291947117513625113811551631413148373524223416576133180916833174166158114561161016118591704359118121124102211871509739651116711947751232021922092042062501541492039216019983182208179205201207454168425121910912815916122510812513711726107848102162242201981811341431878590521771275327195714162462021171301386610014515111048916413546491701511321213182412451891402471061475022721415224319036176781127238212253213301692916218819319623332156237760249956421197218242131622281481101781711572302524531446996737417298791732002818614219118322321582251153229998622121793122126210139120

120139210126122932172218699229153251822152231831911421862820017379981727473966914435425223015717117811014822862131242218197126495249607723156322331961931881622916930213253212387211278176361902431522142275014710624714018924524118131213215117049461351648910411151451006613813017212024616145719275312717752908518714313418119822022421610884107261171371251082251611591281092195142684145207201205179208182831991609220314915425020620420919220212375471196711165399715087211102124121118594370911856110111656114158166174331689180133671653422243537483114116315581113251361757119412914688167584044105103941155576801842342392372322222351952362386323154254248244240226

Haplotypes Genomic relationship matrix

Page 5: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

Useful things with matrices

•  Counting how many animals passing a scales

•  Summing the animals weight

x =111

!

"

###

$

%

&&&

x'x = 1 1 1!"

#$

111

!

"

%%%

#

$

&&&= 1×1( )+ 1×1( )+ 1×1( ) = 3

y =101520

!

"

###

$

%

&&&

x'y = 1 1 1!"

#$

101520

!

"

%%%

#

$

&&&= 1×10( )+ 1×15( )+ 1×20( ) = 45

Page 6: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

Useful things with matrices

•  Averaging the animals weight

x'y = 1 1 1!"

#$

101520

!

"

%%%

#

$

&&&= 1×10( )+ 1×15( )+ 1×20( ) = 45

x'x = 1 1 1!"

#$

111

!

"

%%%

#

$

&&&= 1×1( )+ 1×1( )+ 1×1( ) = 3

x'yx'x

= x'x[ ]-1 x'y = b

453=13× 45=15

Page 7: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

Useful things with matrices

•  Summing total weight in males and females

•  Weight of average male and average female

X =110

001

!

"

####

$

%

&&&&

X'y =1 1 0

0 0 1

!

"

###

$

%

&&&

101520

!

"

###

$

%

&&&= 25

20

!

"#

$

%&y =

101520

!

"

###

$

%

&&&

X'yX'X

= X'X[ ]-1 X'y = b2520

!

"#

$

%&

2 00 1

!

"#

$

%&

=

12

0

0 11

!

"

####

$

%

&&&&

2520

!

"#

$

%&=

12.520

!

"#

$

%&

b11 =12×25

"

#$

%

&'+ 0×20( ) =12.5

Page 8: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

Shrinkage – Random Wand

•  Ridge regression •  BayesA •  BayesB •  BayesC •  BayesLasso •  BayesR •  FnBayesB

•  All differ in the shrinkage parameter –  Some measure of our belief

Page 9: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

Lets put in a little bit of genetics

•  Diploid genomes

–  Markers are AA, Aa, aA, or aa

–  Label a=0 and A=1

–  Thus the dosage is: •  AA=2 •  Aa=1 •  aA=1 •  aa=0

Page 10: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

Mixed model equations

•  Sample mean 0.75 •  True intercept is 0.19 •  True effect is 0.50

X'X X'ZZ'X Z'Z

!

"#

$

%&

-1X'yZ'y

!

"##

$

%&&= b

u

!

"##

$

%&&

X'X X'ZZ'X Z'Z+ Iλ

!

"#

$

%&

-1X'yZ'y

!

"##

$

%&&= b

u

!

"##

$

%&&

y =

0.100.701.300.65

1.250.120.681.20

!

"

###########

$

%

&&&&&&&&&&&

Z =

0121

2012

!

"

###########

$

%

&&&&&&&&&&&

X =

1111

1111

!

"

###########

$

%

&&&&&&&&&&&

LHS = 8 99 15

!

"#

$

%&

LHS = 8 99 15.85

!

"#

$

%&

b = 0.110.57

!

"#

$

%&

RHS = 69.53

!

"#

$

%&

RHS = 69.53

!

"#

$

%&

b = 0.200.49

!

"#

$

%& λ = 0.85

TBV =

0.00.51.00.5

1.00.00.51.0

!

"

###########

$

%

&&&&&&&&&&&

Page 11: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

A range of shrinkage values

•  If Lambda =1000 the SNP solution =0.00 •  And the solution for the intercept = 0.75

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0 0.2 0.4 0.6 0.8 1 1.2

BetaHat

Lambda

Page 12: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

Shrinkage versus more data

•  Two data sets •  One with 8 animals, the other with 80 animals •  Compare effect of Lambda in both

X'X X'ZZ'X Z'Z+ Iλ

!

"#

$

%&

-1X'yZ'y

!

"##

$

%&&= b

u

!

"##

$

%&&

LHS = 8 99 15

!

"#

$

%& RHS = 6

9.53

!

"#

$

%& b = 0.11

0.57

!

"#

$

%& LHS = 80 85

85 263

!

"#

$

%& RHS = 57.4

147.94

!

"#

$

%& b = 0.18

0.50

!

"#

$

%&

No Lambda

Lambda = 5.0 (extremely high value)

LHS = 8 99 20

!

"#

$

%& b = 0.43

0.28

!

"#

$

%& LHS = 80 85

85 268

!

"#

$

%& b = 0.18

0.50

!

"#

$

%&RHS = 57.4

147.94

!

"#

$

%&RHS = 6

9.53

!

"#

$

%&

Page 13: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

Mendelian sampling

Sire

Child 1 Child 2 Child 3 Child 4

Child 5 Child 6

In theory you can have sibs that are genetically unrelated

This is why I am different from my brother

0

20

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Additive genetic relation

HSFSm=4A =

1.00 0.50 0.50 0.50 0.500.50 1.00 0.25 0.25 0.250.50 0.25 1.00 0.25 0.250.50 0.25 0.25 1.00 0.250.50 0.25 0.25 0.25 1.00

!

"

######

$

%

&&&&&&

G =

1.00 0.50 0.50 0.50 0.500.50 1.00 0.20 0.30 0.200.50 0.20 1.00 0.20 0.300.50 0.30 0.20 1.00 0.200.50 0.20 0.30 0.20 1.00

!

"

######

$

%

&&&&&&

And “hidden” relationships

Page 14: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

Population version

Page 15: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

10 animal example

•  10 animal example –  2 unrelated sire families (FamA and FamB) –  Dam’s are unrelated

•  In each family –  2 half sibs used in prediction set –  Sire and 2 half sibs used in training set –  5 individuals from other family used in training set

•  Purpose –  Show prediction due to parent average versus MS –  Pedigree versus genomics –  Close versus distant relatives –  Show shrinkage

Page 16: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

Pedigree

ID Sire Dam

1 0 02 1 03 1 04 1 05 1 06 0 07 6 08 6 09 6 0

10 6 0

Page 17: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

Genetic relationships

•  Captured in a matrix A – traditionally built using pedigree –  Relationship between each pair of individuals

•  Range from 0 to 2

–  Inbred individuals have a relationship with themselves of 2

–  Pair of completely unrelated individuals have a coefficient of relationship of 0

–  Full sib have a relationship of 0.5 •  If parents are not related

–  Half sibs have a relationship of 0.25 •  If parents are not related

Page 18: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

“some animals are more equal than others”…….. even if the additive genetic relationship is the same

0

20

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Additive genetic relation

HSFSm=4

Genomic Relationships

e.g. actual relationship between HS can vary between 0.2 and 0.3

Page 19: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

Lets do 5 animals first ID Sire Dam 1 0 02 1 03 1 04 1 05 1 0

A =

1.00 0.50 0.50 0.50 0.500.50 1.00 0.25 0.25 0.250.50 0.25 1.00 0.25 0.250.50 0.25 0.25 1.00 0.250.50 0.25 0.25 0.25 1.00

!

"

######

$

%

&&&&&&

G =

1.00 0.50 0.50 0.50 0.500.50 1.00 0.20 0.30 0.200.50 0.20 1.00 0.20 0.300.50 0.30 0.20 1.00 0.200.50 0.20 0.30 0.20 1.00

!

"

######

$

%

&&&&&&

Pedigree tells: Which family you belong to

Genomics tells: Which family you belong to Which sib you are more closely related to And shows “hidden” relationships (We will see the last bit with 10 animals) Linkage

Page 20: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

10 animals for comparison

A =

1.00 0.50 0.50 0.50 0.50 0.00 0.00 0.00 0.00 0.000.50 1.00 0.25 0.25 0.25 0.00 0.00 0.00 0.00 0.000.50 0.25 1.00 0.25 0.25 0.00 0.00 0.00 0.00 0.000.50 0.25 0.25 1.00 0.25 0.00 0.00 0.00 0.00 0.000.50 0.25 0.25 0.25 1.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 0.00 0.00 1.00 0.50 0.50 0.50 0.500.00 0.00 0.00 0.00 0.00 0.50 1.00 0.25 0.25 0.250.00 0.00 0.00 0.00 0.00 0.50 0.25 1.00 0.25 0.250.00 0.00 0.00 0.00 0.00 0.50 0.25 0.25 1.00 0.250.00 0.00 0.00 0.00 0.00 0.50 0.25 0.25 0.25 1.00

!

"

##############

$

%

&&&&&&&&&&&&&&

ID Sire Dam 1 0 02 1 03 1 04 1 05 1 06 0 07 6 08 6 09 6 0

10 6 0

G =

1.00 0.50 0.50 0.50 0.50 0.02 0.02 0.02 0.02 0.020.50 1.00 0.20 0.30 0.20 0.02 0.01 0.03 0.01 0.030.50 0.20 1.00 0.20 0.30 0.02 0.03 0.01 0.03 0.010.50 0.30 0.20 1.00 0.20 0.02 0.01 0.03 0.01 0.030.50 0.20 0.30 0.20 1.00 0.02 0.03 0.01 0.03 0.010.02 0.02 0.02 0.02 0.02 1.00 0.50 0.50 0.50 0.500.02 0.01 0.03 0.01 0.03 0.50 1.00 0.20 0.30 0.200.02 0.03 0.01 0.03 0.01 0.50 0.20 1.00 0.20 0.300.02 0.01 0.03 0.01 0.03 0.50 0.30 0.20 1.00 0.200.02 0.03 0.01 0.03 0.01 0.50 0.20 0.30 0.20 1.00

!

"

##############

$

%

&&&&&&&&&&&&&&

Family relationships Family relationships Segregation within family Missing pedigree = “Unrelated” Linkage Linkage disequilibrium

Page 21: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

5 animal example

X'X X'ZZ'X Z'Z+G-1λ

!

"##

$

%&&

-1X'yZ'y

!

"##

$

%&&= b

u

!

"##

$

%&&

X'X X'ZZ'X Z'Z+A-1λ

!

"##

$

%&&

-1X'yZ'y

!

"##

$

%&&= b

u

!

"##

$

%&&

A =

1.00 0.50 0.50 0.50 0.500.50 1.00 0.25 0.25 0.250.50 0.25 1.00 0.25 0.250.50 0.25 0.25 1.00 0.250.50 0.25 0.25 0.25 1.00

!

"

######

$

%

&&&&&&

G =

1.00 0.50 0.50 0.50 0.500.50 1.00 0.20 0.30 0.200.50 0.20 1.00 0.20 0.300.50 0.30 0.20 1.00 0.200.50 0.20 0.30 0.20 1.00

!

"

######

$

%

&&&&&&

LHS−1 =

1.55 -1.27 -1.18 -1.18 -0.64 -0.64-1.27 1.64 1.09 1.09 0.82 0.82-1.18 1.09 1.53 0.93 0.55 0.55-1.18 1.09 0.93 1.53 0.55 0.55-0.64 0.82 0.55 0.55 1.91 0.41-0.64 0.82 0.55 0.55 0.41 1.91

"

#

$$$$$$$

%

&

'''''''

LHS−1 =

1.52 -1.26 -1.15 -1.15 -0.63 -0.63-1.26 1.63 1.07 1.07 0.81 0.81-1.15 1.07 1.49 0.88 0.58 0.50-1.15 1.07 0.88 1.49 0.50 0.58-0.63 0.81 0.58 0.50 1.90 0.32-0.63 0.81 0.50 0.58 0.32 1.90

"

#

$$$$$$$

%

&

'''''''

Solutions =

0.000.00-1.201.200.000.00

!

"

#######

$

%

&&&&&&&

Solutions =

0.000.00-1.231.23-0.150.15

!

"

#######

$

%

&&&&&&&

TrueValues =

0.000.00−2.002.00−2.002.00

"

#

$$$$$$$$

%

&

''''''''

y =

0.00-2.002.00MissingMissing

!

"

######

$

%

&&&&&&

RHS =

0.000.00-2.002.00##

!

"

#######

$

%

&&&&&&&

Page 22: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

5 animal example

•  When we do BLUP we get an estimated breeding value (EBV)

•  An EBV is simply a weighted average of all the phenotypic data available –  With simultaneous correction for all other effects

•  The weightings are determined by the inverse of the LHS –  This is primarily driven by the relationship matrix

Page 23: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

5 animal example

X'X X'ZZ'X Z'Z+G-1λ

!

"##

$

%&&

-1X'yZ'y

!

"##

$

%&&= b

u

!

"##

$

%&&

X'X X'ZZ'X Z'Z+A-1λ

!

"##

$

%&&

-1X'yZ'y

!

"##

$

%&&= b

u

!

"##

$

%&&

A =

1.00 0.50 0.50 0.50 0.500.50 1.00 0.25 0.25 0.250.50 0.25 1.00 0.25 0.250.50 0.25 0.25 1.00 0.250.50 0.25 0.25 0.25 1.00

!

"

######

$

%

&&&&&&

G =

1.00 0.50 0.50 0.50 0.500.50 1.00 0.20 0.30 0.200.50 0.20 1.00 0.20 0.300.50 0.30 0.20 1.00 0.200.50 0.20 0.30 0.20 1.00

!

"

######

$

%

&&&&&&

LHS−1 =

1.55 -1.27 -1.18 -1.18 -0.64 -0.64-1.27 1.64 1.09 1.09 0.82 0.82-1.18 1.09 1.53 0.93 0.55 0.55-1.18 1.09 0.93 1.53 0.55 0.55-0.64 0.82 0.55 0.55 1.91 0.41-0.64 0.82 0.55 0.55 0.41 1.91

"

#

$$$$$$$

%

&

'''''''

LHS−1 =

1.52 -1.26 -1.15 -1.15 -0.63 -0.63-1.26 1.63 1.07 1.07 0.81 0.81-1.15 1.07 1.49 0.88 0.58 0.50-1.15 1.07 0.88 1.49 0.50 0.58-0.63 0.81 0.58 0.50 1.90 0.32-0.63 0.81 0.50 0.58 0.32 1.90

"

#

$$$$$$$

%

&

'''''''

RHS =

0.000.00-2.002.00##

!

"

#######

$

%

&&&&&&&

Solutions =

0.000.00-1.201.200.000.00

!

"

#######

$

%

&&&&&&&

Solutions =

0.000.00-1.231.23-0.150.15

!

"

#######

$

%

&&&&&&&

TrueValues =

0.000.00−2.002.00−2.002.00

"

#

$$$$$$$$

%

&

''''''''

y =

0.00-2.002.00MissingMissing

!

"

######

$

%

&&&&&&

Page 24: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

5 animal example

LHS−1 =

1.55 -1.27 -1.18 -1.18 -0.64 -0.64-1.27 1.64 1.09 1.09 0.82 0.82-1.18 1.09 1.53 0.93 0.55 0.55-1.18 1.09 0.93 1.53 0.55 0.55-0.64 0.82 0.55 0.55 1.91 0.41-0.64 0.82 0.55 0.55 0.41 1.91

"

#

$$$$$$$

%

&

'''''''

LHS−1 =

1.52 -1.26 -1.15 -1.15 -0.63 -0.63-1.26 1.63 1.07 1.07 0.81 0.81-1.15 1.07 1.49 0.88 0.58 0.50-1.15 1.07 0.88 1.49 0.50 0.58-0.63 0.81 0.58 0.50 1.90 0.32-0.63 0.81 0.50 0.58 0.32 1.90

"

#

$$$$$$$

%

&

'''''''

RHS =

0.000.00-2.002.00##

!

"

#######

$

%

&&&&&&&

Solutions =

0.000.00-1.201.200.000.00

!

"

#######

$

%

&&&&&&&

Solutions =

0.000.00-1.231.23-0.150.15

!

"

#######

$

%

&&&&&&&

TrueValues =

0.000.00−2.002.00−2.002.00

"

#

$$$$$$$$

%

&

''''''''

y =

0.00-2.002.00MissingMissing

!

"

######

$

%

&&&&&&

uSon4 = LHSSon4,Mean−1 ×RHSMean( )+ LHSSon4,Sire

−1 ×RHSSire( )+ LHSSon4,Son2−1 ×RHSSon2( )+ LHSSon4,Son3

−1 ×RHSSon3( )

uSon4 = −0.63×0.00( )+ 0.81×0.00( )+ 0.58×−2.00( )+ 0.50×2.00( ) = −0.15→Genomic

uSon4 = −0.64×0.00( )+ 0.82×0.00( )+ 0.55×−2.00( )+ 0.55×2.00( ) = 0.00→ Pedigree

A =

1.00 0.50 0.50 0.50 0.500.50 1.00 0.25 0.25 0.250.50 0.25 1.00 0.25 0.250.50 0.25 0.25 1.00 0.250.50 0.25 0.25 0.25 1.00

!

"

######

$

%

&&&&&&

G =

1.00 0.50 0.50 0.50 0.500.50 1.00 0.20 0.30 0.200.50 0.20 1.00 0.20 0.300.50 0.30 0.20 1.00 0.200.50 0.20 0.30 0.20 1.00

!

"

######

$

%

&&&&&&

Page 25: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

10 animals

A =

1.00 0.50 0.50 0.50 0.50 0.00 0.00 0.00 0.00 0.000.50 1.00 0.25 0.25 0.25 0.00 0.00 0.00 0.00 0.000.50 0.25 1.00 0.25 0.25 0.00 0.00 0.00 0.00 0.000.50 0.25 0.25 1.00 0.25 0.00 0.00 0.00 0.00 0.000.50 0.25 0.25 0.25 1.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 0.00 0.00 1.00 0.50 0.50 0.50 0.500.00 0.00 0.00 0.00 0.00 0.50 1.00 0.25 0.25 0.250.00 0.00 0.00 0.00 0.00 0.50 0.25 1.00 0.25 0.250.00 0.00 0.00 0.00 0.00 0.50 0.25 0.25 1.00 0.250.00 0.00 0.00 0.00 0.00 0.50 0.25 0.25 0.25 1.00

!

"

##############

$

%

&&&&&&&&&&&&&&

ID Sire Dam 1 0 02 1 03 1 04 1 05 1 06 0 07 6 08 6 09 6 0

10 6 0

G =

1.00 0.50 0.50 0.50 0.50 0.02 0.02 0.02 0.02 0.020.50 1.00 0.20 0.30 0.20 0.02 0.01 0.03 0.01 0.030.50 0.20 1.00 0.20 0.30 0.02 0.03 0.01 0.03 0.010.50 0.30 0.20 1.00 0.20 0.02 0.01 0.03 0.01 0.030.50 0.20 0.30 0.20 1.00 0.02 0.03 0.01 0.03 0.010.02 0.02 0.02 0.02 0.02 1.00 0.50 0.50 0.50 0.500.02 0.01 0.03 0.01 0.03 0.50 1.00 0.20 0.30 0.200.02 0.03 0.01 0.03 0.01 0.50 0.20 1.00 0.20 0.300.02 0.01 0.03 0.01 0.03 0.50 0.30 0.20 1.00 0.200.02 0.03 0.01 0.03 0.01 0.50 0.20 0.30 0.20 1.00

!

"

##############

$

%

&&&&&&&&&&&&&&

Family relationships Family relationships Segregation within family Missing pedigree = “Unrelated” Linkage Linkage disequilibrium

Z'Z+A-1λ!" #$-1Z'y[ ] = u[ ] Z'Z+G-1λ!" #$

-1Z'y[ ] = u[ ]

Page 26: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

10 animal example

LHS−1 =

0.59 0.12 0.12 0.00 0.00 0.00 0.29 0.29 0.00 0.000.12 0.62 0.02 0.00 0.00 0.00 0.06 0.06 0.00 0.000.12 0.02 0.62 0.00 0.00 0.00 0.06 0.06 0.00 0.000.00 0.00 0.00 0.59 0.12 0.12 0.00 0.00 0.29 0.290.00 0.00 0.00 0.12 0.62 0.02 0.00 0.00 0.06 0.060.00 0.00 0.00 0.12 0.02 0.62 0.00 0.00 0.06 0.060.29 0.06 0.06 0.00 0.00 0.00 1.65 0.15 0.00 0.000.29 0.06 0.06 0.00 0.00 0.00 0.15 1.65 0.00 0.000.00 0.00 0.00 0.29 0.06 0.06 0.00 0.00 1.65 0.150.00 0.00 0.00 0.29 0.06 0.06 0.00 0.00 0.15 1.65

"

#

$$$$$$$$$$$$$$

%

&

''''''''''''''

RHS =

0-22201822####

!

"

##############

$

%

&&&&&&&&&&&&&&

A G1 2 3 6 7 8 4 5 9 10 1 2 3 6 7 8 4 5 9 10

1 1.00 0.50 0.50 0.00 0.00 0.00 0.50 0.50 0.00 0.00 1 1.00 0.50 0.50 0.02 0.02 0.02 0.50 0.50 0.02 0.022 0.50 1.00 0.25 0.00 0.00 0.00 0.25 0.25 0.00 0.00 2 0.50 1.00 0.20 0.02 0.01 0.03 0.30 0.20 0.01 0.033 0.50 0.25 1.00 0.00 0.00 0.00 0.25 0.25 0.00 0.00 3 0.50 0.20 1.00 0.02 0.03 0.01 0.20 0.30 0.03 0.016 0.00 0.00 0.00 1.00 0.50 0.50 0.00 0.00 0.50 0.50 6 0.02 0.02 0.02 1.00 0.50 0.50 0.02 0.02 0.50 0.507 0.00 0.00 0.00 0.50 1.00 0.25 0.00 0.00 0.25 0.25 7 0.02 0.01 0.03 0.50 1.00 0.20 0.01 0.03 0.30 0.208 0.00 0.00 0.00 0.50 0.25 1.00 0.00 0.00 0.25 0.25 8 0.02 0.03 0.01 0.50 0.20 1.00 0.03 0.01 0.20 0.304 0.50 0.25 0.25 0.00 0.00 0.00 1.00 0.25 0.00 0.00 4 0.50 0.30 0.20 0.02 0.01 0.03 1.00 0.20 0.02 0.025 0.50 0.25 0.25 0.00 0.00 0.00 0.25 1.00 0.00 0.00 5 0.50 0.20 0.30 0.02 0.03 0.01 0.20 1.00 0.01 0.039 0.00 0.00 0.00 0.50 0.25 0.25 0.00 0.00 1.00 0.25 9 0.02 0.01 0.03 0.50 0.30 0.20 0.02 0.01 1.00 0.2010 0.00 0.00 0.00 0.50 0.25 0.25 0.00 0.00 0.25 1.00 10 0.02 0.03 0.01 0.50 0.20 0.30 0.02 0.03 0.20 1.00

Solutions =

0.00-1.201.2016.4714.0916.490.000.008.248.24

!

"

##############

$

%

&&&&&&&&&&&&&&

Solutions =

0.09-1.091.3516.5813.9016.340.250.378.168.41

!

"

##############

$

%

&&&&&&&&&&&&&&

TrueBreedingValues =

0-22201822-221822

!

"

##############

$

%

&&&&&&&&&&&&&&

0.5853 0.1219 0.1219 0.0012 0.0017 0.0017 0.2926 0.2926 0.0040 0.00400.1219 0.6247 0.0094 0.0017 -0.0006 0.0053 0.0992 0.0225 -0.0014 0.01280.1219 0.0094 0.6247 0.0017 0.0053 -0.0006 0.0225 0.0992 0.0128 -0.00140.0012 0.0017 0.0017 0.5853 0.1219 0.1219 0.0040 0.0040 0.2926 0.29260.0017 -0.0006 0.0053 0.1219 0.6247 0.0094 -0.0014 0.0128 0.0992 0.02250.0017 0.0053 -0.0006 0.1219 0.0094 0.6247 0.0128 -0.0014 0.0225 0.09920.2926 0.0992 0.0225 0.0040 -0.0014 0.0128 1.6380 0.0539 0.0167 0.01080.2926 0.0225 0.0992 0.0040 0.0128 -0.0014 0.0539 1.6380 -0.0092 0.03670.0040 -0.0014 0.0128 0.2926 0.0992 0.0225 0.0167 -0.0092 1.6380 0.05390.0040 0.0128 -0.0014 0.2926 0.0225 0.0992 0.0108 0.0367 0.0539 1.6380

LHS−1 =

Page 27: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

10 animal example

uSonA4 = LHSSonA4,SireA−1 ×RHSSireA( )+ LHSSonA4,SonA2

−1 ×RHSSonA2( )+ LHSSonA4,SonA3−1 ×RHSSonA3( )

uSonA4 = LHSSonA4,SireA−1 ×RHSSireA( )+ LHSSonA4,SonA2

−1 ×RHSSonA2( )+ LHSSonA4,SonA3−1 ×RHSSonA3( )+ LHSSonA4,SireB

−1 ×RHSSireB( )+ LHSSonA4,SonB6−1 ×RHSSonB6( )+ LHSSonA4,SonB7

−1 ×RHSSonB7( )

uSon4 = 0.29×0.00( )+ 0.06×−2.00( )+ 0.06×2.00( ) = 0.00→ Pedigree

uSon4 = 0.2926×0.00( )+ 0.0992×−2.00( )+ 0.0225×2.00( )+ 0.0040×20.00( )+ −0.0014×18.00( )+ 0.00128×22.00( ) = 0.18→Genomic

Phenotype is missing for other non-zero coefficient

Page 28: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

Matrix Inversion

•  http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/video-lectures/lecture-3-multiplication-and-inverse-matrices/

•  Minute 37 of this video from Gilbert Strang

•  Gauss-Jordan Elimination

Page 29: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

Inversion by Gauss-Jordan

A = 1 32 7

!

"#

$

%&

1 32 7

1 00 1

!

"

##

$

%

&&

A−1 = 7 −3−2 1

"

#$

%

&' I = 1 0

0 1

!

"#

$

%&

A-1A = I = AA-1

1 30 1

1 0−2 1

"

#

$$

%

&

''

1 00 1

7 −3−2 1

"

#

$$

%

&

''

Move 1 Subtract 2 of Row 1 from Row 2

Move 2 Subtract 3 of Row 2 from Row 1

Page 30: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

Useful things with matrices

•  Averaging the animals weight

x'y = 1 1 1!"

#$

101520

!

"

%%%

#

$

&&&= 1×10( )+ 1×15( )+ 1×20( ) = 45

x'x = 1 1 1!"

#$

111

!

"

%%%

#

$

&&&= 1×1( )+ 1×1( )+ 1×1( ) = 3

x'yx'x

= x'x[ ]-1 x'y = b

453=13× 45=15

Page 31: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

Gauss Seidel Residual Update

•  Easy efficient way to solve and understand genomic prediction equations

•  Form –  X’X (diagonal) –  Form X’y –  Initialize values for beta’s –  Assume current values of beta-i’s are correct –  Form new y vector (called e) based on the residuals –  Estimate new solution for betai (Xi’e divided by Xi’Xi) –  Repeat until convergence

•  Simple extension to Bayesian model

Page 32: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This
Page 33: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

Legarra and Misztal (JDS 2008)

Page 34: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This
Page 35: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

Excel Sheet

•  Lots of little examples with Excel

Page 36: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

GBLUP versus other methods

•  Genomic BLUP is the simplest method to do genomic evaluations

•  Algebraically identical to ridge regression

•  Ridge regression treats each marker as a random effect

•  Ridge regression has the same shrinkage parameter for each marker

•  Other methods allow heterogeneous shrinkage parameters

Page 37: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

Brief description of SNP models

•  Genomic selection prediction models treat markers as random effects

•  MARS treats markers as fixed effects

•  Random effects have two benefits –  Trick to allow all markers be fitted simultaneously –  Shrinkage

•  Fixed effect models overestimate marker effects •  Random effects models correct for this overestimation by

shrinking marker effects back towards the mean of all marker effects

•  Shrinkage is proportional to the uncertainty in the marker effect (and a statistical prior)

–  More uncertainty = more shrinkage towards the mean –  More information to estimate effect = less shrinkage

Page 38: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

•  Ridge regression –  All SNP’s in the model –  All have equal shrinkage parameter –  Shrinkage parameter is estimated or set apriori

•  BayesA –  All SNP’s in the model –  Each SNP has unique shrinkage parameter –  Each shrinkage parameter estimated –  Shape and scale parameters are fixed –  Problem is it cannot shrink to zero

Brief description of SNP models

Page 39: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

•  BayesLasso –  Similar to BayesA –  Each SNP has unique shrinkage parameter –  Each shrinkage parameter estimated –  Shape and scale parameters are estimated –  Use inverse Gaussian distribution instead of inverse chi square

which allows greater shrinkage towards zero

Brief description of SNP models

Page 40: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

•  BayesB –  Similar to BayesA except that proportion of 1- π SNPs

are in the model –  Thus can shrink SNPs to zero –  Mixture model

•  BayesCpi –  Has similarities to SnpBlup and BayesB –  Estimates π –  All SNP have equal shrinkage parameter –  Estimates shrinkage parameter –  Mixture model

Brief description of SNP models

Page 41: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

Summary of SNP models

•  Ridge regression SNP have EQUAL shrinkage parameter

•  BayesA/BayesLasso achieve shrinkage •  Unequal shrinkage parameter for all SNPs

•  BayesB achieves shrinkage •  Only including the proportion of SNPs in each round •  These SNPs have UNEQUAL shrinkage parameter

•  BayesCpi achieves shrinkage –  Only including the proportion of SNPs in each round –  These SNPs have EQUAL shrinkage parameter

•  Models have an equivalence to genomic relationship matrix G = MHM’

Page 42: The statistical models of genomic prediction...Mendelian sampling Sire Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 In theory you can have sibs that are genetically unrelated This

Non-linear models

•  Reproducing kernel Hilbert space

•  Neural networks

•  Basically these bend the relationships in the genomic relationship matrix

•  Close relatives get more weight

•  Distant relatives less weight

•  Capture epistatic interactions that may be shared by close relatives but not by distant relatives

•  Perhaps useful for advanced yield trials

•  I favour simpler models –  Prevents a pointless debate about which model –  Genetic improvement is an additive thing