visg – large datasets literature review introduction – genome wide selection aka genomic...
DESCRIPTION
Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic information ‘Training set’ of animals phenotyped and genotyped representative of industry Predictor Over-specified – e.g variables, 1000 individuals Robust model selection required Application Predict in selection candidates –Maybe no phenotypes –Maybe no pedigreesTRANSCRIPT
VISG – LARGE DATASETS
Literature Review
Introduction – Genome Wide Selection
Aka Genomic SelectionSet of Markers
• 10,000’s - enough to capture most genetic information
‘Training set’ of animals• phenotyped and genotyped• representative of industry
Predictor• Over-specified – e.g. 10000 variables, 1000 individuals• Robust model selection required
Application• Predict in selection candidates
– Maybe no phenotypes– Maybe no pedigrees
Introduction – Genome Wide SelectionPrediction Methods
• Stepwise Regression• gBLUP
– Fit all markers as a random effect– gi ~ N(0,g
2)
• BayesA– gi ~ N(0,gi
2)– prior : gi
2~ S/2 (choose S and )
• BayesB– similar to BayesA, except– proportion of effects are zero
Most investigations compare theseMany variations (sometimes with the same name)
Literature
Dairy applications review (Hayes et al., 2009)GWS in crops (Heffner, Sorrells, Jannick, 2009)Prediction in unrelateds (Meuwissen, 2009) Marker panels (Habier, 2009)Phenotypes (Harris & Johnson, unpub)+ ...
Issues
National evaluationsLong term gainsLD or relationship trackingMultiple breedsDistance from Training to ApplicationMarker Panels (subsets)Phenotypes (EBV-based)Non-additive effectsComputing requirements
Methods
gBLUP almost as good as Bayes(A) (dairy)• Interpretation(?): many genes of small effect
Bayes methods better at using real LD (vs relatedness)Bayes(B) advantage greater with
• Higher marker density• Higher Training Application distance• Smaller Training set
Mixture of 2 normals ~ BayesBPartial Least SquaresMachine LearningHaplotype methods not used in practice yet
Marker Panels
Evenly spaced panels• Track inheritance from parents (both SNP-chipped)• Will work with new traits
Lasso methods popular• Shrinks small effects to zero
Other
Combining marker and other information• Phenotype info, parent info• Index methods; ‘blending’• Important for seamless national evaluations
Computing strategies• Tricks to reduce computation• Approximation rather than Iterative (MCMC) methods
Online resources
Conferences• Statistical Genetics of Livestock for the Post-Genomic Era.
UW-Madison, May, 2009. http://dysci.wisc.edu/sglpge/index.html
• QTL/MAS Workshops. 2008: http://www.computationalgenetics.se/QTLMAS082009: http://www.qtlmas2009.wur.nl/UK/
Courses• Whole Genome Association and Genomic Selection.
September 1-8, 2008, Salzburg, Austria. http://www.nas.boku.ac.at/12100.html?&L=0
• Use of High-density SNP Genotyping for Genetic Improvement of Livestock . Iowa State, June, 2009. http://www.ans.iastate.edu/stud/courses/short/
Toy example
• 5 SNP / 1000 individuals• y = mu + SNP1 + e
– mu = 10– SNP1 substitution effect = 10 / p = 0.5– Var(e) = 1
• 1 block / 1000 iterations• Runs in ~ 5 secs
200 400 600 800 1000
02
46
810
iteration
b.sa
mp
200 400 600 800 1000
1.0
1.5
2.0
2.5
3.0
iteration
z.sa
mp
200 400 600 800 1000
89
1011
12
iteration
mu.
sam
p
200 400 600 800 1000
02
46
810
14
iteration
var.b
.sam
p
0 2 4 6 8 10
0.0
0.4
0.8
b 1
N = 900 Bandw idth = 0.09915
Den
sity
0 2 4 6 8 10
05
1015
20
b 2
N = 900 Bandw idth = 0.02
Den
sity
0 2 4 6 8 10
020
4060
b 3
N = 900 Bandw idth = 0.00561
Den
sity
0 2 4 6 8 10
05
1020
b 4
N = 900 Bandw idth = 0.01463D
ensi
ty
0 2 4 6 8 10
0.0
0.5
1.0
1.5
b 5
N = 900 Bandw idth = 0.2309
Den
sity
3
z 1
020
060
0
1 2
z 2
020
060
0
1 2
z 3
020
060
0
1 2 3
z 40
200
600
1
z 5
020
060
0
8 9 10 11 12
0.0
0.2
0.4
mu
N = 900 Bandw idth = 0.1564
Den
sity
0 2 4 6 8 10 12 14
0.00
0.10
0.20
var.b 2
N = 900 Bandw idth = 0.3301
Den
sity
200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
iteration
p
0.5 0.6 0.7 0.8 0.9 1.0
01
23
45
6
p 1
N = 900 Bandwidth = 0.0169
Den
sity
0.0 0.1 0.2 0.3 0.4 0.5
05
1015
20
p 2
N = 900 Bandwidth = 0.007388
Den
sity
0.0 0.1 0.2 0.3 0.4
02
46
8
p 3
N = 900 Bandwidth = 0.01327
Den
sity