genomic analyses with emphasis on...

20
1 Genomic analyses with emphasis on single-step Condensed notes for short-course taught in 2015 in Poznan, Poland Ignacy Misztal et al., 11/20/14 - 11/29/17 Introduction In the last couple of years, the animal breeding research focused on genomic selection. Early studies focused on the design of SNP chips for many species and developed basic methodologies based either on SNP estimation or GBLUP. Successive studies looked at the number of genotyped animals and size of SNP chip for a reasonable accuracy, optimal SNP selection, and SNP weighting. Later studies pondered imputation with low density chips and gains with high density chips, best choice of animals for genotyping, validation methods, etc. Now the most trending topic is the use of sequence data. The animal-breeding group at the University of Georgia (UGA) has long been active in tools for commercial genetic evaluation across species. When genomic selection appeared, the group focused on development of technology that would be suitable for a commercial application. The method was single-step GBLUP, a BLUP with a different relationship matrix. While the basic theory for ssGBLUP is simple, there are many details. In particular, details that seem to be unimportant for one species become paramount in another species. The last major steps in the development of ssGBLUP were algorithms for inversion of genomic relationship matrix and the pedigree relationship matrix for a subset of genotyped animals. These developments removed limits to the number of genotyped animals. The success of the first algorithm raises doubts about meaningful accuracy gains with high-density chips (dimensionality of genomic information less than 15,000) and paves the way for genomic selection becoming a routine procedure like BLUP. Single-step GBLUP has been applied by some of the largest companies/institutions in the US. As a result, UGA has access to perhaps the most comprehensive data sets anywhere. With access to data sets and practical problems, the group solved many puzzles, developed many tools, and discovered many solutions. The purpose of this material is to present an outline of developments in the genomic selection from a practical point of view, for use in a new short course taught by Misztal, Aguilar and Lourenco in 2015 in Poznan, Poland. Due to scarcity of time, nearly all references in this draft are to papers associated with UGA and collaborators. See those papers for references and names of the “UGA team.” For more additional information please see 2016 course notes as well as BLUPF90 manual available at nce.ads.uga.edu.

Upload: others

Post on 24-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genomic analyses with emphasis on single-stepnce.ads.uga.edu/wiki/lib/exe/fetch.php?media=ss_new_text_dl1_.pdfa regular animal model. Analyzing the output of BAGS is a necessity. An

1

Genomicanalyseswithemphasisonsingle-step

Condensednotesforshort-coursetaughtin2015inPoznan,PolandIgnacyMisztaletal.,11/20/14-11/29/17IntroductionInthelastcoupleofyears,theanimalbreedingresearchfocusedongenomicselection.EarlystudiesfocusedonthedesignofSNPchipsformanyspeciesanddevelopedbasicmethodologiesbasedeitheronSNPestimationorGBLUP.SuccessivestudieslookedatthenumberofgenotypedanimalsandsizeofSNPchipforareasonableaccuracy,optimalSNPselection,andSNPweighting.Laterstudiesponderedimputationwithlowdensitychipsandgainswithhighdensitychips,bestchoiceofanimalsforgenotyping,validationmethods,etc.Nowthemosttrendingtopicistheuseofsequencedata.Theanimal-breedinggroupattheUniversityofGeorgia(UGA)haslongbeenactiveintoolsforcommercialgeneticevaluationacrossspecies.Whengenomicselectionappeared,thegroupfocusedondevelopmentoftechnologythatwouldbesuitableforacommercialapplication.Themethodwassingle-stepGBLUP,aBLUPwithadifferentrelationshipmatrix.WhilethebasictheoryforssGBLUPissimple,therearemanydetails.Inparticular,detailsthatseemtobeunimportantforonespeciesbecomeparamountinanotherspecies.ThelastmajorstepsinthedevelopmentofssGBLUPwerealgorithmsforinversionofgenomicrelationshipmatrixandthepedigreerelationshipmatrixforasubsetofgenotypedanimals.Thesedevelopmentsremovedlimitstothenumberofgenotypedanimals.Thesuccessofthefirstalgorithmraisesdoubtsaboutmeaningfulaccuracygainswithhigh-densitychips(dimensionalityofgenomicinformationlessthan15,000)andpavesthewayforgenomicselectionbecomingaroutineprocedurelikeBLUP.Single-stepGBLUPhasbeenappliedbysomeofthelargestcompanies/institutionsintheUS.Asaresult,UGAhasaccesstoperhapsthemostcomprehensivedatasetsanywhere.Withaccesstodatasetsandpracticalproblems,thegroupsolvedmanypuzzles,developedmanytools,anddiscoveredmanysolutions.Thepurposeofthismaterialistopresentanoutlineofdevelopmentsinthegenomicselectionfromapracticalpointofview,foruseinanewshortcoursetaughtbyMisztal,AguilarandLourencoin2015inPoznan,Poland.Duetoscarcityoftime,nearlyallreferencesinthisdraftaretopapersassociatedwithUGAandcollaborators.Seethosepapersforreferencesandnamesofthe“UGAteam.”Formoreadditionalinformationpleasesee2016coursenotesaswellasBLUPF90manualavailableatnce.ads.uga.edu.

Page 2: Genomic analyses with emphasis on single-stepnce.ads.uga.edu/wiki/lib/exe/fetch.php?media=ss_new_text_dl1_.pdfa regular animal model. Analyzing the output of BAGS is a necessity. An

2

Models,Methods,andProgramsGoodgenomicpredictionsrequireafewcomponents:

1. Appropriatemodelsfortraitsofstudy2. Appropriatemethodsforparameterestimation3. Appropriatemethodsforgeneticevaluation4. Softwarethatcanimplementmixedmodelsforbothparameterestimationandgenetic

predictionHereweassumethatallmodelsareanalyzedbymixedmodelmethodology.Goodmodelsareaprerequisiteforanygeneticevaluationincludinggenomic.Badmodels=badGEBVs.Allmodelsareapproximations.Manyeffectsinfluencethetraitsofinterest.Givinglargedatasets,anyeffectislikelytobestatisticallysignificantalthoughmanymaybeunimportantinpractice.Wecandefinepracticallyequivalentmodelsassuchwherecorrelationsof(G)EBVforselectioncandidatesareatleast0.98.Then,ourdesiredmodelisassimpleaspossible(parsimonious)thatgivesessentiallythesamepredictionsasmorecomplicatedmodels.Pleasenotethatverycomplexmodelsmaygivebadpredictionsduetotraitsnotfollowingthemodelassumptionsorduetocomputinginstability.Also,commonlyusedstatisticalcriteriaformodelselectionsuchasBICoftenleadtocomplexmodelsandnottothebestpredictionsasdeterminedbysomesortofcross-validation.CommentsonModelsSingle-trait(repeatabilitywithrepeatedrecords)animalmodel.Basicmodelwhenothertraitsarenotavailableorareweaklycorrelated.MultipletraitmodelAmodelthataccountsforcorrelationsamongtraits(geneticandenvironmental).Importantwhensometraitsaremissing,whentherecordingissequentialandwhengeneticcorrelationsarenotcloseto0.Inclusionoftraitsunderselectioninthemodelhelpstoremovebiasesfromanalysesofalltraits.Randomregressionmodel(RRM)Usefulforanalysesoftraitsthatarerecordedcontinuouslyovertime(e.g.,milkproductionorbodyweight)orothervariablesliketemperature-humidityindexorherd-level.ThekeypartofRRMistheuseofseveralparameterstodescribevariabilityofcontinuoustraits.Functionsthatusesuchparametersmayberegularpolynomials(simplebutwithpoornumericalstability),Legendreorthogonalpolynomials(betternumericalproperties),orsplines(lessvariabilityatextremes).Typeandorderoffunctionsdeterminetheshapeofthecovariancetrajectories.Inparticular,toohighpolynomialorderinRRMshowsmodelingartifactswhere(co)variancecurveshavelargefluctuationsandarenotsupportedbybiologicalormanagementreasons.One-wayofsimplifyingRRMistheuseoflinearsplinesastheyareequivalenttomultiple-traitmodelswithtraitscorrespondingtopointsatknots(Misztal,2006)andcross-validationpropertiesaregood(Bohmanovaetal.,2008).

Page 3: Genomic analyses with emphasis on single-stepnce.ads.uga.edu/wiki/lib/exe/fetch.php?media=ss_new_text_dl1_.pdfa regular animal model. Analyzing the output of BAGS is a necessity. An

3

Unknownparentgroups(UPG)Ifthebasepopulationisheterogeneous,underselection,themeritofmissingparentsmaybeuneven.Then,pedigreescanbeaugmentedby“phantomparents”viaunknownparentgroups.Groupscanmodelthemeritofunknownparentsbyyearofbirth,sex,pathofselection,geneticoriginorbreed.UseofUPGisdangerousduetothepossibilityofconfoundingorlargeSE(UPGarefixedeffects).Therefore,whenUPGareused,thesolutionsofUPGshouldbeexaminedforexcessivefluctuations,trends,orjustcommonsense.Onewayofreducingtheconfounding/fluctuationsistreatingUPGasrandom.Inthesimplestcase,thisisbytreatingUPGasanimals(var(UPG)=additivevariance).UPGandhistoricaldataInmanyanalyses,theuseofmorethan2-3historicalgenerationsdoesnotinfluenceGEBVoftheyoungestgeneration(selectioncandidates)ormayevenimproveit(Lourencoetal.,2014).Thisisduetoanumberofreasonsincluding1)decaysofgenomicpredictions,2)changesintraitsovertime,3)imperfectmodelingoffixedeffectsovertime.So,oneofthesimplestwaystolimitcomputationsisbytruncatingphenotypesandpedigrees.TruncatingdataalsolimitsoreveneliminatestheneedforUPG.CommonfeaturesofsoftwareSoftwareformixedmodelsshouldgenerallysupport:- missingtraits- differentmodelspertrait- heterogeneousresidualvarianceApproachesforvariancecomponentestimation(VCE)VCEisusuallyaccomplishedbyREMLorbyBayesiananalysesviaGibbssampling.Forgeneralproperties,seepaper“Reliablecomputinginestimationofvariancecomponents.”Belowaresomeofthemorepopularfeaturesofcommonalgorithms.GeneralREMLprogramsrelyonsparsematrixsoftware.Theyhaveapproximatelyaquadraticcostwiththenumberofanimalsandthecubiccostwiththenumberoftraits.Theyarelikelytofailwhenthenumberoftraitsistoolargeorresultingcovariancematricesareclosetosingularity.AI-REMLisfastwithsimplermodelsbutoftencrasheswithmodelsclosetotheparameterspace.EM-REMLismorestablebutveryslowandtherearenogoodestimatesofSE.EM-REMLisfasterwithless“missinginformation.”EM_REMLgets“stuck”withRRMifthestartingparametersaresmall.

Page 4: Genomic analyses with emphasis on single-stepnce.ads.uga.edu/wiki/lib/exe/fetch.php?media=ss_new_text_dl1_.pdfa regular animal model. Analyzing the output of BAGS is a necessity. An

4

SpecialversionofREML(canonicaltransformation)canbeverystableandinexpensivewithalargenumberoftraitsbutaddmodellimitations.SeeKarinMeyer’spageformanyoptionsinREMLforcomplicatedmodels.BayesianmethodsviaGibbssampling(BAGS)maybesloworfastdependingonoptimizations.Usuallythenumberofrequiredsamplesishigherwithmodelcomplexityandwiththenumberofparameters.Forinstance,areducedanimalmodelmayrequire10timesfewersamplesthanaregularanimalmodel.AnalyzingtheoutputofBAGSisanecessity.AnoptimizedBAGS(e.g.,gibbs1f90andlaterversions)isresistanttonon-positivedefinitematricesandcanbeveryfastwithlargenumberoftraitsifmodelspertraitarenottoodifferent.Complexmodels(e.g.,threshold)aremucheasiertoimplementviaBAGSthanviaREML.ApproachesfornationalgeneticevaluationFornationalevaluations,thenumberofeffectsinthemodelcanbesolargethatthemixedmodelequationswouldnotfitinmemoryevenifweusethelargestcomputeravailable.Subsequently,solutionsarecomputedby“matrixfree”or“iterationondata”approaches,wherethedataisreadeveryroundofiterationandcoefficientsofmixedmodelequationsarerecreated.Insteadofbeingstored,thesecoefficientsareusedimmediatelytocreatequantitiesusedbyiterationmethods.Aparticularlyefficientyetsimpletoimplementiterationmethodispreconditionedconjugategradient(Tsurutaetal.,2001).Thismethodcanusecoefficientsofmixedmodelequationsinanyorder.Whysingle-stepforgenomicevaluation?Currentmethodsrequirede-regressionsandcreationofanindexcombiningdifferentsourcesofinformation(e.g.,parentaverage,directgenomicvalue,pedigreeprediction).Asbothrelyonaccuracies,whenaccuraciesareapproximatedboththederegressionsandtheindexareapproximate.Additionally,deregressiononamixofhighandlowaccuracyanimalscancreatedouble-counting(Legarraetal.,2014;Lourencoetal.,2015).DerivationofHThederivationsfollowLegarraetal.(2009)andAguilaretal.(2010).LetAbeanumeratorrelationshipmatrixandletthegeneticvariancebesetto1.0.Letindices1refertoungenotypedand2togenotypedanimals.𝑣𝑎𝑟 𝑢 = 𝐴𝜎()

𝐴 = 𝐴** 𝐴*)𝐴)* 𝐴))

Basedonconditionaldistributions:𝑢*|𝑢)~𝑁(𝐴*)𝐴))/*𝑢), 𝐴** − 𝐴*)𝐴))/*𝐴)*)

Page 5: Genomic analyses with emphasis on single-stepnce.ads.uga.edu/wiki/lib/exe/fetch.php?media=ss_new_text_dl1_.pdfa regular animal model. Analyzing the output of BAGS is a necessity. An

5

𝑢)~𝑁 0, 𝐺 𝑢* = 𝐸 𝑢*|𝑢) + 𝜀= 𝐴*)𝐴))/*𝑢) + 𝜀Calculatevariancesandcovariances:𝑣𝑎𝑟 𝑢* = 𝑣𝑎𝑟 𝐴*)𝐴))/*𝑢) + 𝜀 = 𝑣𝑎𝑟(𝐴*)𝐴))/*𝑢)) + 𝑣𝑎𝑟(𝜀)= 𝐴*)𝐴))/*𝐺𝐴))/*𝐴)* + 𝐴** − 𝐴*)𝐴))/*𝐴)*Rearranging:= 𝐴** + 𝐴*)𝐴))/*𝐺𝐴))/*𝐴)* − 𝐴*)𝐴))/*𝐴)*= 𝐴** + 𝐴*)𝐴))/*𝐺𝐴))/*𝐴)* − 𝐴*)𝐴))/*𝐼𝐴)*= 𝐴** + 𝐴*)𝐴))/*𝐺𝐴))/*𝐴)* − 𝐴*)𝐴))/*𝐴))𝐴))/*𝐴)*Therefore,𝑣𝑎𝑟 𝑢* = 𝐴** + 𝐴*)𝐴))/* 𝐺 − 𝐴)) 𝐴))/*𝐴)*𝑣𝑎𝑟 𝑢) = 𝑣𝑎𝑟 𝑍𝑎 = 𝐺𝑐𝑜𝑣 𝑢*, 𝑢) = 𝑐𝑜𝑣 𝐴*)𝐴))/*𝑢), 𝑢) = 𝐴*)𝐴))/*𝑣𝑎𝑟 𝑢) = 𝐴*)𝐴))/*𝐺Finally:

𝐻 = 𝑣𝑎𝑟 𝑢* 𝑐𝑜𝑣 𝑢*, 𝑢)𝑐𝑜𝑣 𝑢), 𝑢* 𝑣𝑎𝑟 𝑢)

= 𝐴** + 𝐴*)𝐴))/* 𝐺 − 𝐴)) 𝐴))/*𝐴)* 𝐴*)𝐴))/*𝐺𝐺𝐴))/*𝐴)* 𝐺

=𝐴 + 𝐴*)𝐴))/* 𝐺 − 𝐴)) 𝐴))/*𝐴)* 𝐴*)𝐴))/* 𝐺 − 𝐴))𝐺 − 𝐴)) 𝐴))/*𝐴)* 𝐺 − 𝐴))

Whichcanbesimplifiedto:

=𝐴 + 𝐴*)𝐴))/* 00 𝐼

𝐼𝐼 𝐺 − 𝐴)) 𝐼 𝐼 𝐴))/*𝐴)* 0

0 𝐼

Theinversecanbederivedfromdistributionsknowingthat:

𝑢*|𝑢)~𝑁 𝐴*)𝐴))/*𝑢), 𝐴** − 𝐴*)𝐴))/*𝐴)* = 𝑁 𝐴*)𝐴))/*𝑢), 𝐴** /*

Page 6: Genomic analyses with emphasis on single-stepnce.ads.uga.edu/wiki/lib/exe/fetch.php?media=ss_new_text_dl1_.pdfa regular animal model. Analyzing the output of BAGS is a necessity. An

6

TypesofgenomicrelationshipmatrixThemostpopularmatrixwasdefinedbyVanRaden(2008).ItisbasedonaSNPmodel:

𝑦? = 𝜇 + 𝑍𝑎 + 𝑒,𝑣𝑎𝑟 𝑎 = 𝐼𝜎B),𝑣𝑎𝑟 𝑒 = 𝐼𝜎C)whereZisamatrixofgenecontentadjustedforallelefrequenciesand𝑎isavectorSNPeffects.Whenuisavectorofbreedingvalues:

𝑢 = 𝑍𝑎, 𝑣𝑎𝑟 𝑢 = 𝐺𝜎(), 𝐺 = 𝑍𝑍D ∗ 𝜎B)/𝜎()Alternately,

𝐺 =𝑍𝑍D

𝑝?(1 − 𝑝?)

ManyotherGhavebeenproposedbutnoneseemstoprovidemoreaccuratesolutions.PleasenotethatZadjustedforallelefrequenciesis

𝑍 = 𝑀 − 2𝑃

whereelementsofMare0,1or2andPisamatrixofallelefrequencies.TheSNPmodelandGBLUPhavesamesolutionsregardlessofgenefrequencies(StrandenandChristensen,2011)butssGBLUPisaffectedbyP.ThepropertiesofGdependongenefrequencies.Withcurrentallelefrequencies,themeanofthediagonalelementsofGis1.0andthatofoff-diagonalsis0.Withequalallelefrequencies(0.5),themeansofdiagonalandoff-diagonalelementscanbeashighas1.0and1.5.DecompositionofGEBVByanalyzingonelineofMMEwecanfindthedecompositionofGEBV:

𝑢? = 𝑤*𝑃𝐴 + 𝑤)𝑌𝐷 + 𝑤O𝑃𝐶 + 𝑤Q𝐷𝐺𝑉 + 𝑤S𝑃𝑃, 𝑤? = 1

p (u1, u2 ) = p (u1 |u2 )p (u2 )

∝ exp[−0.5(u1 −A12A22−1u2 ′) A11(u1 −A12A22

−1u2 )]exp[−0.5 ′u2G−1u2 ]

= exp −0.5 ′u1 ′u2⎡⎣

⎤⎦

A11 −A11A12A22−1

−A22−1A21A

11 G−1 +A22−1A21A

11A12A22−1

⎢⎢

⎥⎥u1

u2

⎣⎢⎢

⎦⎥⎥

⎝⎜⎜

⎠⎟⎟

= exp −0.5 ′u1 ′u2⎡⎣

⎤⎦A11 A12

A21 G−1 +A22 −A22−1

⎣⎢⎢

⎦⎥⎥

u1

u2

⎣⎢⎢

⎦⎥⎥

⎝⎜⎜

⎠⎟⎟

.

!!H−1 = A11 A12

A21 G−1 +A22 −A22−1

⎣⎢⎢

⎦⎥⎥= A−1 +

0 00 G−1 −A22

−1

⎣⎢⎢

⎦⎥⎥

Page 7: Genomic analyses with emphasis on single-stepnce.ads.uga.edu/wiki/lib/exe/fetch.php?media=ss_new_text_dl1_.pdfa regular animal model. Analyzing the output of BAGS is a necessity. An

7

wherePAisParentAverage,YDisYieldDeviation(phenotypesadjustedformodeleffectsotherthanadditivegeneticanderror),PCisProgenyContribution,DGVisthegenomicprediction(comingfromG-1orindirectlyfromSNPeffects),PPisthepedigreepredictionbasedonthesubsetofgenotypedanimalsfromA(comingfromA22

-1 ).Whenbothparentsareknownandeachprogenyhasaknownmate,theweightsare:

𝑤* = )TCU

; 𝑤) = UV WTCU

; 𝑤O = UX )TCU

; 𝑤Q = YZZ

TCU ; 𝑤S =

/B[[ZZ

TCU

𝑑𝑒𝑛 = 2 + 𝑛^/𝛼 + 𝑛`/2 + 𝑔?? − 𝑎))??

wherenristhenumberofrecords,αisthevarianceratio(residualovergeneticvariance),andnpisprogenysize.withvaluesforDGVandPP:

DGVi=− 𝑔?fj,j≠i uh

𝑔??;PPi=

− 𝑎))?f

j,j≠i uh𝑎))??

Thedecompositionforyounganimals(w2=w3=0)isthesameasinVanRadenetal.(2009b)exceptthattheeffectsareestimatedjointly.PPaccountsforpartofPAexplainedbyDGV.Inparticular,whenA=A22,PAandPPcancelout.Whenoneortwoparentsaregenotyped,PPwillincludeafractionofPA.Whenagenotypedanimalisunrelatedtoagenotypedpopulation,PP=0.Alternately,DGVandPPcanbecombinedintoagenomicindex(GI)asinVanRadenandWright(2013):

𝑤Q𝐷𝐺𝑉 + 𝑤S𝑃𝑃 = 𝑤j𝐺𝐼

GI=(𝑔?fj,j≠i −𝑎))

?f )uh𝑔?? − 𝑎))??

; 𝑤j =𝑔?? − 𝑎))??

𝑑𝑒𝑛

GIexcludestheeffectofPAfromDGV.SeeLourencoetal.(2015b)formoredetails.Assumingthatdeviations(GEBVs–EBVs)andEBVsareuncorrelatedleadstothefollowingapproximation[19]:

𝑔?? − 𝑎))?? ≈ (𝑔?f − 𝑎)),?f)) ≈ 𝑛Y − 1 Δ?)f,fo?

where𝑛YisthenumberofgenotypedanimalsandΔ?)istheaverageofthesquareddifferencebetweengenomicandpedigreerelationshipsacrossallindividualswithindividuali(Misztaletal.,2012).Inpractice,Δ?)isabout0.04

2(Wangetal.,2014).

Page 8: Genomic analyses with emphasis on single-stepnce.ads.uga.edu/wiki/lib/exe/fetch.php?media=ss_new_text_dl1_.pdfa regular animal model. Analyzing the output of BAGS is a necessity. An

8

QualitycontrolforGQualitycontrolofGisacriticalpartofgenomicselectionaspoorrelationshipscouldleadtomanytypesofbiasesandlossesofaccuracy.CallingrateforSNPandanimalsRawgenotypesincludeeitherthecorrectSNPoranerrorcodestatingthatanSNPcouldnotbereliablyread(ortyped).CallingrateforSNPremovesspecificSNPmarkersfromallgenotypedanimalswherethatSNPcouldnotbereliablytypedinalargenumberofcases.CallingrateforanimalsremovesgenotypesofsuchanimalswherelessthanathresholdSNPsarecorrectlytyped.Usuallythethresholdis90-95%correcttyping.ParentalexclusionsandparentidentificationParent-progenygenotypesshouldbecompatible.Inpractice,thismeansthatsuchpairsshouldnothaveexclusivegenotypes(aainparentandbbinprogeny).Forunrelatedanimals,thenumberofSNPwithoppositegenotypesisupto12.5%(Tsurutaetal.,2009).Forparent-progenypairs,thenumberisusuallylessthan3%withrawgenotypesbutcandropbelow0.1%(afewSNPwith50kchip)afterimposingthecallingrateedits.Ifthereisexclusionforanyparent-progenypair(>3%exclusions),possibilitiesincludepedigreeorgenotypingerror.Incaseofwrongpaternity,onesolutionistofindacompatibleparent(canbedonebysoftwarelikeSEEKPARENTF90).DistributionsofdiagonalsofGUndercorrectscalingandwithcurrentallelefrequencies,alldiagonalsofGshouldhaveaquasi-normaldistributionwithameanof1.0andSD<0.1.Toosmallortoolargediagonals(<0.6or>1.5)indicateeitherpoorgenotypingqualityorananimalfromanotherlineorbreed(Simeoneetal.,2011).DifferencesbetweenmatchedGandA22ForproperlymatchedGandA22,differencesareusually<0.1.Largedifferencesindicatepedigreeorgenotypingproblems,orinsufficientpedigree.Inpopulationswithmissingpedigrees,thereisanextravariationofthedifferenceduetounequalpedigreelength(Wangetal.,2014).ChromosomeeditingSNPsinthesexchromosomesandSNPsthatareinunmappedpositions(usuallycalledchromosome0)contributelittleifanytoGEBVandareroutinelyeliminated.HeritabilityofgenecontentWithcorrectgenotypes,heritabilityofeachSNPasadependentvariable(y)inananimalmodelis100%(Fornerisetal.,2015).Underspecialgenotypingerrors,includingthosefrombadimputation,theheritabilityislower.Inpractice,lessthan0.99averageheritabilityforallSNPindicatesproblems.Populationstratificationusingprincipalcomponents

Page 9: Genomic analyses with emphasis on single-stepnce.ads.uga.edu/wiki/lib/exe/fetch.php?media=ss_new_text_dl1_.pdfa regular animal model. Analyzing the output of BAGS is a necessity. An

9

Genotypesofapopulationcanbeshowngraphicallyasaplotofthetwolargestprincipalcomponents.Whileasinglepopulationclusterhasanovalshape,separateclustersindicateseparatelinesorbreeds.

Principalcomponentplotforasimulatedpopulationcomposedoftwopurebreds(AAandBB)andonecrossbredline(F1).MatchingGandA22NormalprocedureformatchingGandA22includeonlyadjustmentstoGforthesameaveragesofdiagonalandoff-diagonalelements.

G=(1-λ/2)G*+11’λ

whereGistheadjustedgenomicrelationshipmatrix,1isavectorofones,andλisWright’sFSTthatiscalculatedas(Vitezicaetal.,2011):

λ= 1n2( A22i,j–ji Gi,jji )

wherenisthenumberofelementsinA22andG.

Whenthebasepopulationisheterogeneous(pedigreesmissingacrossgenerations),adjustmentsaredonetoanaveragepedigreelength,andGistoobigforanimalswithshortpedigreeandtoosmallwithlongpedigree(Chengetal.,2011;Vitezicaetal.,2011).ThisresultsinpoorconvergenceratewhensolvingssGBLUPbyiterationandinupward/downwardbiasesorinflation(Misztaletal.,2012).SeveraloptionsexistformatchingGandA22.OneoptionistomodifytheH-1matrixbyusingaweightforA22-1(omega)smallerthan1.0.

H-1=A-1+0 00 G-1–𝛚A22

-1

Page 10: Genomic analyses with emphasis on single-stepnce.ads.uga.edu/wiki/lib/exe/fetch.php?media=ss_new_text_dl1_.pdfa regular animal model. Analyzing the output of BAGS is a necessity. An

10

AnotheroptionistointroducenonzerorelationshipsbetweenunknownparentslikeinVanRadenorviametafoundersconcept(Legarraetal.,2015).Thelastchoiceistocutphenotypesandpedigreesto2-3generations.Cuttingdatatoafewgenerationsdidnotdecreaseaccuracy(Lourencoetal.,2014)andreducestheimpactofoldpedigrees.Correlationsbetweendiagonalandoff-diagonalelementsofGandA22Intypicalanalyzes,thecorrelationsbetweenGandA22isabout0.3±0.1fordiagonalsand0.6±0.15foroff-diagonals.Thecorrelationsdependonthedepthofpedigree.Thecorrelationsbetweenthediagonalsaretreatedasthosebetweenpedigreeandgenomicinbreeding.Whilepedigree-basedinbreedingdependsonthedepthofpedigree,thegenomicinbreedingindirectlyincludesallpastinbreeding.Ingeneral,newinbreedingismoredetrimentalthanoldinbreeding,whereeffectsofdeleteriousgenesaremostlyremovedbyselection.Recentinbreedingwiththegenomicinformationcanbeassessedbyrunsofhomozygosity.MatchingofGandA22incrossbred/multibreedpopulationsUndercrossbreeding,ideallyGwouldbematchedtoA22forallbreedcombinations.Inparticular,blocksofGduetopurebredbreedcombinationswouldbe0.However,manualmanipulationofblocksofmatricesoftenmakesthematrixnon-positivedefinite.Inaterminalcrossmodelinpigs,thebestapproachforGwastouseaveragegenefrequenciesacrossallanimals,withoutadditionalsteps(Lourencoetal.,2016).Inamultibreedevaluationofsheep,wherephenotypeswereavailableoncrossbredsbutnotoneverypurebred,Gwasgeneratedusinganaverageallelefrequencybutmodelingincludedexplicitunknownparentgroups(UPG).ImportanceofmatchingofGandA22fordifferentspeciesandgenotypingscenariosTheimpactofincorrectmatchingislargeinsimulatedpopulations,smallerinpigsandchicken,andwasalmostnoneindairywhenthereferencepopulationincludedhighaccuracybullsonly.Theimpactofthegenomicinformationisnegligibleonhigh-accuracyanimalsandsomescalingcancelsoutforyounganimals.Also,theimpactofscalingisstrongerunderstrongselection.Simulatedpopulationsusuallyincludemanygenotypedgenerations,eachwithrelativelysmallnumberoflow-accuracyanimals,andwithselectiononasingletrait.Therefore,anyproblemswithscalingwillinfluencetheoldergenerationsandsubsequentlytheyounggeneration.Inrealpopulations,selectionisusuallymultitraitandnotverystrongonanysingletrait.ValidationReliability(R2)ofGEBVwitholderdataInpopulationswithhighaccuracymalesandlargenumberofprogeny,validationincludespartialandcompletedatacomparisons.Apartialdatasetexcludesprogenyoftheyoungestmales.ComparisonsaremadewithGEBVorEBVobtainedwithpartialdataandEBVorDYDofamaleobtainedwithcompletedata.Whenthenumberofprogenyislarge,thecomparisoninvolvesPA+DGV-PPandPC.ThedecompositionofGEBVindicateswhensuchapproachisappropriate.

Page 11: Genomic analyses with emphasis on single-stepnce.ads.uga.edu/wiki/lib/exe/fetch.php?media=ss_new_text_dl1_.pdfa regular animal model. Analyzing the output of BAGS is a necessity. An

11

PredictabilityWhenyoungestanimalshavemainlyownrecords,therealizedaccuraciescanbecomputedasCorr(gebv,y-Xb)/h,wheregebvispredictionforagivenanimalobtainedwithoutitsphenotype(partialdata),y-Xbisphenotypeofthatanimaladjustedforfixedeffects(basedoncompletedata),andhisthesquarerootofheritability(Legarra,2008).Predictabilityseemsveryobjectivebutishardtoapplyforcomplexmodels(e.g.,maternaleffects).Whenheritabilityislow,accuraciescanovertheupperlimitfortraditionalblup(0.70)andssGBLUP(1.0);inthiscase,itisrecommendedtousepredictivity(Corr(gebv,y-Xb))insteadofaccuracy.Negativeorverylowvaluescanbeobservedforsmallgenotypedpopulations.X-foldcross-validationsWhenthenumberofgenotypedanimalsissmall,cross-validationisdonebycreatingmultiplesubsetsofgenotypedanimals(folds),andcomparingpredictionsobtainedwithphenotypesforallbutonesubset,againsty-Xbforthatsubset.E.g.,seepapersbySaatchietal.(2011)orKachmanetal.(2013).Thecross-validationcangenerateveryhighcorrelationsduetostrongrelationshipsacrosssamples.Thecorrelationsareamorerealisticestimateofaccuracywhensubsetsarechosentobelessrelated,e.g.,byK-meansclustering.SNPweightingandGWASinsingle-stepTheGmatrixusedinsingle-stepcanbeweighted,withweightsobtainedelsewhere,e.g.,byBayesianorLassomodels.WeightsandsubsequentlyManhattanplotscanbeobtainedfromsingle-stepdirectlybybacksolvingDGVtoSNPeffects(Wangetal.,2012;Wangetal.,2014).FollowingWangetal.(2012),theideathatGBLUPisequivalenttoSNP-BLUPwasextendedtothessGBLUPapproach.VanRaden(2008)showedthatfromtheselectionindexequationforGBLUP:

𝑢 = 𝐆 𝐆 + 𝐑𝜎B)

𝜎C)

/*

(y − 𝐗𝛃)

whereRisadiagonalmatrixaccountingforheterogeneousresidualvariance.If𝑢|𝑎 = 𝐙𝑎,replacingthefirstGby𝐙′,weightedbytheratioofSNPtoadditivedirectvariances,wouldallowthecalculationofSNPeffects(a):

𝑎 = 𝐙′k 𝐆 + 𝐑𝜎B)

𝜎C)

/*

(y − 𝐗𝛃)

wherekis𝜎B)/𝜎(),and𝜎B)istheSNPvariance.Itcanbeassumedthat𝜎B) = 𝜎() 2 𝑝?(1 − 𝑝?).Therefore,kcanbereducedto1 2 𝑝?(1 − 𝑝?).

Assumingthat𝑤 = 𝐆 + 𝐑 yz[

y{[

/*(y − 𝐗𝛃),𝑎 = k𝐙′w,andtherefore,𝑢 = 𝐆w.Inthisway,

w =𝐆/*𝑢,andfinally,theSNPeffectscanbecalculatedas:𝑎 = k𝐙′𝐆/*𝑢

as𝑉𝑎𝑟(𝑎) = 𝐃,theconditionalmeanofSNPeffectsgiventheGEBVis:𝑎|𝑢 = k𝐃𝐙′𝐆/*𝑢

Thus,givenGEBVfromssGBLUPareavailable,SNPeffectsarecalculatedas(Wangetal.,2012):

Page 12: Genomic analyses with emphasis on single-stepnce.ads.uga.edu/wiki/lib/exe/fetch.php?media=ss_new_text_dl1_.pdfa regular animal model. Analyzing the output of BAGS is a necessity. An

12

𝑎 = k𝐃𝐙′𝐆/*𝑢AssumingthatDisadiagonalmatrixwithweightsforSNP,theiterativeprocessis:

1. SettheweightmatrixD=I2. G=ZDZ’/k(kscalingfactor;detailsomitted)3. Runsingle-step

4. FromGEBV,obtainDGVasDGVi=/ YZ~j,j≠i ��

YZZ

5. ConvertDGVtoSNPeffectsa=DZ’inv(ZDZ’)DGV6. CalculateSNPvarianceanduseasweightdi=2piqiai27. NormalizeDforthesamesum(D)8. Goto2

Usuallybestweightsareobtainedafter1-2rounds.Step3canbedoneonce(ifDGVchangelittleacrossrounds)oreveryround.Formulain6canbedifferent.SNPvariancescanbeplottedasManhattanplots.IntestsusingsimulateddatasetstheestimateofSNPeffectsweresimilartothosebyBayesB.However,thebestestimateswerenotforSNPeffectofQTLbutforaclusterofnearbySNPs.ThisillustrateslimitedresolutionofGWASinpopulationswithsmalleffectivepopulationsize.GWASbysingle-stepallowsforanalysisinmodelswhereobtainingderegressedproofsisdifficult(e.g.,maternal,randomregressionorreactionnorm).TherearenoknownmethodsatthistimetoassessSNPsignificanceundersingle-step.Weightingseemstoimprovetheaccuracyfordatasetswithsmallnumberofgenotypedanimals,butmarginalornoimprovementisobservedforlargegenotypedpopulations(>10k;Lourencoetal.,2017).Indirectpredictioninsingle-stepviaSNPeffectsWhenmanyyounganimalsaregenotypedandfastpredictionisessential,theirpredictioncanbeobtainedindirectly:

1. Runsingle-stepusingonlyanimalswithinformation2. PredictSNPeffectsfromDGVasabove3. UseSNPeffectsforpredictions,possiblyaddingparentaverage.

Whenthenumberofgenotypedanimalsinthereferencepopulationislarge,parentaverageaddsverylittletopredictionsandcanbeomitted(Lourencoetal.,2015).

Page 13: Genomic analyses with emphasis on single-stepnce.ads.uga.edu/wiki/lib/exe/fetch.php?media=ss_new_text_dl1_.pdfa regular animal model. Analyzing the output of BAGS is a necessity. An

13

ApproximationsofaccuraciesAccuraciesingenomicmodelscanbeapproximatedbycalculatingtheamountofinformationduetogenomics(Misztaletal.,2013).Ingeneraltheinformationdiforanimaliiscomposedofinformationduetophenotypes,pedigrees,andgenomicinformation

𝑑? = 𝑑?`� + 𝑑?

`CT + 𝑑?YCU

Withreliabilitiescalculatedas

𝑟𝑒𝑙? =𝑑?

𝛼 + 𝑑?

wherealphaisvarianceratiofortheanimalmodeliftheinformationisintermsofeffectivenumberofrecords,orforthesiremodeliftheinformationisintermsofeffectivedaughters.Approximately,thegenomicaccuracyis:

𝑑?YCU~ 𝑔?f − 𝑎)),?f

)

f,fo?

𝑎𝑐𝑐f)

whereaccjis“independent”accuracyofanimalj.Independentmeansnotsharedwithotheranimals.OneproposedwayofobtainingaccuracyisbyapproximatingthelefthandsideofMMEbythefollowingandsubsequentinversion:

𝐿𝐻𝑆YCU = 𝐷`� + 𝐷`CT + 𝐼 + 𝐺/* − 𝐴))/*LinearadjustmentstoD’smayimprovetheapproximation.Computingsingle-stepwithalargenumberofgenotypedanimalsREMLandYAMSEfficiencywithREMLcomputationsinanimalmodelsisbasedonMMEbeingsparseandsubsequentuseofsparsematrixsoftware.Withgenomics,MMEcontainslargedenseblocksduetoG,andsparsematrixcomputationsarelessefficientwithdensematricesthandensematrixtechniques.Denseblocksalsooccurwithlargemultiple-traitandrandom-regressionmodels.Onesolutionistheuseofacomputationalpackagethatrecognizesdenseblocksandrearrangecomputationsaccordingly,allowingparallelcomputationsforlargedenseblocks.ApackagethatimplementsthistechniqueiscalledYAMSandwasdevelopedbyYutakaMasuda.Whenappliedto(AI)REMLF90forcomplexmodels,theprogramsrunupto20timesfasterwithindividualcomputationsandupto100timesfasterwithsparseinversion.Withefficientmatrixfactorizationandinversion,otheroperationsbecomebottlenecks.Thereislittleimprovementforverysimplesingle-traitmodelswithnodenseblocks.StructureofgenotypedpopulationsInitialgenotypingfocusesonhighaccuracyanimals.Whenallhighaccuracyanimalsaregenotyped,genotypingmovestolower-accuracyanimalsthatprovidemuchlessinformation.Withregardtoyounganimals,manyaregenotypedbutfewareselected.Forinstance,over1.8

Page 14: Genomic analyses with emphasis on single-stepnce.ads.uga.edu/wiki/lib/exe/fetch.php?media=ss_new_text_dl1_.pdfa regular animal model. Analyzing the output of BAGS is a necessity. An

14

millionHolsteinshavebeengenotypedby2017,butonly25karehighaccuracybulls,<50karecowswithphenotypeeligibleforanationalevaluation,andtherestareanimalsunimportantforgenomicselection,exceptforpossiblereductioninbiasduetopre-selection.Subsequently,computingchoicesmayeitheruseallgenotypesinanalysesorselectonlytheimportantonesforthemainanalysisandindirectpredictionsfortherest.EfficientinversionofGbyAPYalgorithmTherankofGforlargepopulationsasshownbytheproportionofeigenvaluesexplaining99%ofthevariationislimitedduetosmalleffectivepopulationsize(Pocrnicetal.,2016).Sucharankwasfoundtobearound10kinHolsteinsandAngus,6kinpigs,and4kinbroilerchicken.Thisisduetoalimitednumberofindependentchromosomesegments(ICS),orlimitednumberofindependentSNPcluster.SuchafactcanbeusedtoderiveaninverseofGefficiently(Misztaletal.,2014;Fragomenietal.,2015;Misztaletal.,2015;Lourencoetal.,2015;Misztal2016;Masudaetal.,2016;Bradfordetal.,2017).LetabeavectorofeffectsofICS(oruncorrelatedSNPsegments).Divideindividualsintocoreindividualsdenotedascandother(non-core)individualsdenotedasn.Then = +c c cu Z a ε and = +n n nu Z a ε .Whenthenumberofcore

animalsisthesameasthenumberofICS: a ≈Zc−1uc and = +n n nu Pu ε ,wherePisamatrixthat

relatesBVsofnon-coretocoreindividuals.Denote:

2var b bb bcu

c cb ccs

é ù é ù= =ê ú ê ú

ë û ë û

u G GG

u G G,

Usingtherecursionequations, - -

- - --é ù é ù= + -é ùê ú ê ú ë ûë û ë û

1 11 1 1 .cc cc cn

nc ccG 0 G G

G M G G I0 0 I

Theaboveinversehascubiccost(andquadraticmemory)forcoreanimalsandlinearcost(andlinearmemory)fornoncoreanimals.Thisinverseisalsosparse.Intests,APYinversionofa570kx570kmatrixforHolsteinstook<2hofcomputingtimeandusedlessthan100Gbytesofmemory(Masudaetal.,2015).TheAPYinverseistherealinverse,andsimulationsshowedslightlyhigheraccuracyofGEBVcomparedtousingtheregularinversionofG(withblending),whichcanbepartlyduetoeliminationofnoisecontainedintheregularG-1.

WhileG-1foralargenumberofanimalsissparse,A22-1canbedense(FauxandGengler,2013).Thismatrixcanbecomputedindirectly(StrandenandMantysaari,2014):

𝐴))/* = 𝐴)) − (𝐴*))′ 𝐴** /*(𝐴*))

whereallmatricesontherighthandsidearesubmatricesofsparseA-1andcanbestoredexplicitly.InthePCGalgorithm,onlyaproductofamatrixbyavectorisneeded.ThenA22-1qcanbecomputedas

𝐴))/*𝑞 = 𝐴))𝑞 − 𝐴*) D[ 𝐴** /*[𝐴*)𝑞]]

Page 15: Genomic analyses with emphasis on single-stepnce.ads.uga.edu/wiki/lib/exe/fetch.php?media=ss_new_text_dl1_.pdfa regular animal model. Analyzing the output of BAGS is a necessity. An

15

Withsomeexceptions(solvingstepwithA11),computationsincludeonlyproductsofamatrixbyavector.See(Masudaetal.,2016)fordetails.

ReferencesandselectedpublicationsNonUGApapersWiggansGR,CooperTA,VanRadenPM,ColeJB:Technicalnote:Adjustmentoftraditionalcowevaluationstoimproveaccuracyofgenomicpredictions.JDairySci2011,94:6188–6193.LegarraA,Robert-GranieC,ManfrediE,ElsenJM:Performanceofgenomicselectioninmice.Genetics2008,180:611–618.VanRadenPM:Efficientmethodstocomputegenomicpredictions.JDairySci2008,91:4414–4423.BijmaP:Accuraciesofestimatedbreedingvaluesfromordinarygeneticevaluationsdonotreflectthecorrelationbetweentrueandestimatedbreedingvaluesinselectedpopulations.JAnimBreedGenet2012,129:345–358.VanRadenPM,WrightJR.Measuringgenomicpre-selectionintheoryandinpractice.InterbullBull.2013;47:147-50.StrandenI,ChristensenOF:Allelecodingingenomicevaluation.GenSelEvol2011,43:25.ChristensenOF:Compatibilityofpedigree-basedandmarker-basedrelationshipmatricesforsingle-stepgeneticevaluation.GenetSelEvol2012,44:37.GarrickDJ,TaylorJF,FernandoRL:Deregressingestimatedbreedingvaluesandweightinginformationforgenomicregressionanalyses.GenetSelEvol2009,41:55.EarlypapersfromUniversityofGeorgiaandAssociatesTsuruta.S.,I.MisztalandI.Strandén.2001.Useofthepreconditionedconjugategradientalgorithmasagenericsolverformixed-modelequationsinanimalbreedingapplications.J.Anim.Sci.1166:1172.Misztal,I.2006.Propertiesofrandomregressionmodelusinglinearsplines.J.Anim.Breed.Genet.123(2):73-144.

Page 16: Genomic analyses with emphasis on single-stepnce.ads.uga.edu/wiki/lib/exe/fetch.php?media=ss_new_text_dl1_.pdfa regular animal model. Analyzing the output of BAGS is a necessity. An

16

Misztal,I.2008.ReliableComputinginEstimationofVarianceComponents.J.Anim.Breed.Genet.125:363-370.Bohmanova,J.,F.Miglior,J.Jamrozik,I.MisztalandP.G.Sullivan.2008.ComparisonofrandomregressionmodelswithLegendrepolynomialsandlinearsplinesforproductiontraitsandsomaticcellscoreofCanadianHolsteincows.J.DairySci.91:3627-3638.PapersongenomicsfromUniversityofGeorgiaandAssociatesTsuruta,S.,I.Misztal,andT.J.Lawlor.2009.UseoflowdensitySNPchipforparentalverificationinUSHolsteins.J.DairySci.92(Suppl.1):W52.

Misztal,I.,A.Legarra,andI.Aguilar.2009.Computingproceduresforgeneticevaluationincludingphenotypic,fullpedigreeandgenomicinformation.J.DairySci.92:4648-4655.

Legarra,A.,I.Aguilar,andI.Misztal.2009.Arelationshipmatrixincludingfullpedigreeandgenomicinformation.J.DairySci.92:4656-4663

Aguilar,I.,I.Misztal,D.L.Johnson,A.Legarra,S.Tsuruta,andT.J.Lawlor.2010.Aunifiedapproachtoutilizephenotypic,fullpedigree,andgenomicinformationforgeneticevaluationofHolsteinfinalscore.J.DairySci.93:743:752.

Chen,C.Y.,I.Misztal,I.Aguilar,S.Tsuruta,T.H.E.Meuwissen,S.E.Aggrey,T.Wing,andW.M.Muir.2011.Genome-widemarker-assistedselectioncombiningallpedigreephenotypicinformationwithgenotypicdatainonestep:anexampleusingbroilerchickens.J.AnimalSci.89:23-28.

Aguilar,I.,I.Misztal,A.Legarra,S.Tsuruta.2011.Efficientcomputationofgenomicrelationshipmatrixandothermatricesusedinsingle-stepevaluation.J.Anim.Breed.Genet.128(6):422-428.

Forni,S.,I.Aguilar,andI.Misztal.2011.Differentgenomicrelationshipmatricesforsingle-stepanalysisusingphenotypic,pedigreeandgenomicinformation.Genet.Sel.Evol.43:1.

Aguilar,I.,I.Misztal,S.Tsuruta,G.R.WiggansandT.J.Lawlor.2011.MultipletraitgenomicevaluationofconceptionrateinHolsteins.J.DairySci.94:2621-2624.

Simeone,R.,I.Misztal,I.Aguilar,andA.Legarra.2011.Evaluationoftheutilityofgenomicrelationshipmatrixasadiagnostictooltodetectmislabeledgenotypedanimalsinabroilerchickenpopulation.J,Anim.Breed.Genet.128(5):386-393.

Chen,C.Y.,I.Misztal,I.Aguilar,A.Legarra,andB.Muir.2011.Effectofdifferentgenomicrelationshipmatrixonaccuracyandscale.J.Anim.Sci.89:2673-2679.

Tsuruta,S.,I.Aguilar,I.Misztal,andT.J.Lawlor.2011.Multiple-traitgenomicevaluationoflineartypetraitsusinggenomicandphenotypicdatainUSHolsteins.J.DairySci.94:4198-4204.

Vitezica,Z.G.,I.Aguilar,I.Misztal,andA.Legarra.2011.BiasinGenomicPredictionsforPopulationsUnderSelection.Genet.Res.Camb.93:357–366.

Page 17: Genomic analyses with emphasis on single-stepnce.ads.uga.edu/wiki/lib/exe/fetch.php?media=ss_new_text_dl1_.pdfa regular animal model. Analyzing the output of BAGS is a necessity. An

17

Simeone,R.,I.Misztal,I.Aguilar,andZ.Vitezica.2012.Evaluationofamulti-linebroilerchickenpopulationusingasingle-stepgenomicevaluationprocedure.J,Anim.Breed.Genet.

129(1):3–10.

Wang,H.,I.Misztal,I.Aguilar,A.Legarra,andW.M.Muir.2012.Genome-wideassociationmappingincludingphenotypesfromrelativeswithoutgenotypes.Genet.Res.94(2):73-83.

Faux,P.,N.Gengler,andI.Misztal.2012.Arecursivealgorithmfordecompositionandcreationoftheinverseofthegenomicrelationshipmatrix.J.DairySci.95:6093-6102.

Misztal,I.,S.Tsuruta,I.Aguilar,A.Legarra,P.M.VanRaden,andT.J.Lawlor.2013.MethodstoApproximateReliabilitiesinSingle-StepGenomicEvaluation.J.DairySci.96:647–654.

Misztal,I.,Z.G.Vitezica,A.Legarra,I.Aguilar,andA.A.Swan.2013.Unknown-parentgroupsinsingle-stepgenomicevaluation.J.Anim.Breed.Genet.130:252–258.

Tsuruta,S.,I.Misztal,andT.J.Lawlor.2013.GenomicevaluationsoffinalscoreforUSHolsteinsbenefitfromtheinclusionofgenotypesoncows.J.DairySci.96:3332-3335.

Elzo,M.A.,C.A.Martinez,G.C.Lamb,D.D.Johnson,M.G.Thomas,I.Misztal,D.O.Rae,J.G.Wasdin,J.D.Driver.2013.Genomic-polygenicevaluationforultrasoundandweighttraitsinAngus–BrahmanmultibreedcattlewiththeIllumina3kchip,Livest.Sci.153:39-49.

Lourenco,D.A.L.,I.Misztal,H.Wang,I.Aguilar,S.Tsuruta,andJ.K.Bertrand.2013.

Predictionaccuracyforasimulatedmaternallyaffectedtraitofbeefcattleusingdifferentgenomicevaluationmodels.J.Anim.Sci.4090-4098.

Lourenco,D.A.L.,I.Misztal,S.Tsuruta,I.Aguilar,E.Ezra,M.Ron,A.Shirak,andJ.I.Weller.2014.Methodsforgenomicevaluationofarelativelysmallgenotypeddairypopulationandeffectofgenotypedcowinformationinmultiparityanalyses.J.DairySci.97:1742-1752.

Misztal,I.,A.Legarra,andI.Aguilar.2014.Usingrecursiontocomputetheinverseofthegenomicrelationshipmatrix.J.DairySci.97:3943-3952

Lourenco,D.A.L.,I.Misztal,S.Tsuruta,I.Aguilar,T.J.Lawlor,S.Forni,andJ.I.Weller.2014.Areevaluationsonyounggenotypedanimalsbenefitingfromthepastgenerations?J.DairySci.97:3930-3942.

Tsuruta,S.,I.Misztal,D.A.L.Lourenco,andT.J.Lawlor.2014.AssigningunknownparentgroupstoreducebiasingenomicevaluationsoffinalscoreinUSHolsteins.J.DairySci.97:5814-5821.

Wang,H.,I.Misztal,I.Aguilar,A.Legarra,R.L.Fernando,Z.Vitezica,R.Okimoto,T.Wing,R.Hawken,andW.M.Muir.2014.Genome-wideassociationmappingincludingphenotypesfromrelativeswithoutgenotypesinasingle-step(ssGWAS)for6-weekbodyweightinbroilerchickens.FrontiersGenet.DOI=10.3389/fgene.2014.00134.

Legarra,A.,O.F.Christensen,I.Aguilar,andI.Misztal.2014.SingleStep,AGeneralApproachForGenomicSelection.Livest.Sci.166:54-65.

Page 18: Genomic analyses with emphasis on single-stepnce.ads.uga.edu/wiki/lib/exe/fetch.php?media=ss_new_text_dl1_.pdfa regular animal model. Analyzing the output of BAGS is a necessity. An

18

Dufrasne,M.,I.Misztal,S.Tsuruta,N.Gengler,andK.A.Gray.2014.Geneticanalysisofpigsurvivaluptocommercialweightinacrossbredpopulation.Livest.Sci.167:19-24.

Wang,H.,I.MisztalandA.Legarra.2014.Differencesbetweengenomic-basedandpedigree-basedrelationshipsinachickenpopulation,asafunctionofqualitycontrolandpedigreelinksamongindividuals.J.AnimalBreedingGenet.DOI:10.1111/jbg.12109.

Fragomeni,B.,I.Misztal,D.Lourenco,I.Aguilar,R.OkimotoandW.Muir.2014.ChangesinvarianceoftopSNPwindowsovergenerationsforthreetraitsinbroilerchicken.FrontiersGenet.doi:10.3389/fgene.2014.00332.

Zhang,X.,I.Misztal,M.Heidaritabar,J.W.M.Bastiaansen,R.Borg,R.L.Sapp,T.Wing,R.R.Hawken,D.A.L.Lourenco,Z.G.Vitezica.2015.Priorgeneticarchitectureimpactinggenomicregionsunderselection:anexampleusinggenomicselectionintwopoultrybreeds.Livest.Sci.171:1-11.

Forneris,N.S.,A.Legarra,Z.G.Vitezica,S.Tsuruta,I.Aguilar,I.Misztal,andR.J.C.Cantet.2015.Qualitycontrolofgenotypesusingheritabilityestimatesofgenecontentatthemarker.Genetics.doi:10.1534/genetics.114.173559.

Fragomeni,B.O.,D.A.L.Lourenco,S.Tsuruta,Y.Masuda,I.Aguilar,A.Legarra,T.J.Lawlor,andI.Misztal.2015.Useofgenomicrecursionsinsingle-stepgenomicBLUPwithalargenumberofgenotypes.J.DairySci.98:4090–4094.

Fragomeni,B.O.,D.A.L.Lourenco,S.Tsuruta,Y.Masuda,I.Aguilar,andI.Misztal.2015.UseofGenomicRecursionsandAlgorithmforProvenandYoungAnimalsforSingle-StepGenomicBLUPAnalyses-ASimulationStudy.J.Anim.Breed.Genet.132:340-5.

Legarra,A.,O.F.Christensen,Z.G.Vitezica,I.Aguilar,andI.Misztal.2015.Ancestralrelationshipsusingmetafounders:finiteancestralpopulationsandacrosspopulationrelationships.Genetics.doi:10.1534/genetics.115.177014.

Lourenco,D.A.L.,S.Tsuruta,B.O.Fragomeni,Y.Masuda,I.Aguilar,A.Legarra,J.K.Bertrand,T.S.Amen,L.Wang,D.W.Moser,andI.Misztal.2015.Geneticevaluationusingsingle-stepgenomicBLUPinAmericanAngus.J.Anim.Sci.doi:10.2527/jas2014-8836.

Lourenco,D.A.L.,B.O.Fragomeni,S.Tsuruta,I.Aguilar,B.Zumbach,R.J.Hawken,A.Legarra,andI.Misztal.2015b.Accuracyofestimatedbreedingvaluesformalesandfemaleswithgenomicinformationonmales,females,orboth:abroilerchickenexample.Genet.Sel.Evol.47:56.

Masuda,Y.,S.Tsuruta,I.Aguilar,andI.Misztal.2015.Technicalnote:Accelerationofsparseoperationsforaverage-informationREMLanalyseswithsupernodalmethodsandsparse-storagerefinements.J.DairySci.93:4670-4674.

Misztal,I.,B.O.Fragomeni,D.A.L.Lourenco,S.Tsuruta,Y.Masuda,I.Aguilar,A.Legarra,andT.J.Lawlor.2015.Efficientinversionofgenomicrelationshipmatrixbythealgorithmforprovenandyoung(APY).InterbullBull.49.

Page 19: Genomic analyses with emphasis on single-stepnce.ads.uga.edu/wiki/lib/exe/fetch.php?media=ss_new_text_dl1_.pdfa regular animal model. Analyzing the output of BAGS is a necessity. An

19

Masuda,Y.,I.Misztal,S.Tsuruta,D.A.L.Lourenco,B.O.Fragomeni,A.Legarra,I.Aguilar,T.J.Lawlor.2015.Single-stepgenomicevaluationswith570KgenotypedanimalsinUSHolsteins.InterbullBull.49:85-89.

Fragomeni,B.O.,D.A.L.Lourenco,S.Tsuruta,K.Gray,Y.Huang,andI.Misztal.2016.UsingsinglestepgenomicBLUPtoenhancethemitigationofseasonallossesduetoheatstressinpigs.J.Anim.Sci.94:5004-5013.https://doi.org/10.2527/jas.2016-0820.

Lourenco,D.A.L.,S.Tsuruta,B.O.Fragomeni,C.Y.Chen,andI.Misztal.2016.Crossbredevaluationsinsingle-stepgenomicBLUPusingadjustedrealizedrelationshipmatrices.J.Anim.Sci.94:909-919.https://doi.org/10.2527/jas.2015-9748.

Masuda,Y.,I.Misztal,S.Tsuruta,A.Legarra,I.Aguilar,D.Lourenco,B.FragomeniandT.L.Lawlor.2016.Implementationofgenomicrecursionsinsingle-stepgenomicBLUPforUSHolsteinswithalargenumberofgenotypedanimals.J.DairySci.99:1968-1974.https://doi.org/10.3168/jds.2015-10540.

Misztal,I.2016.Inexpensivecomputationoftheinverseofthegenomicrelationshipmatrixinpopulationswithsmalleffectivepopulationsize.Genetics202:411-409.https://doi.org/10.1534/genetics.115.182089.

Pocrnic,I.,D.A.L.Lourenco,Y.Masuda,A.Legarra,andI.Misztal.2016.Thedimensionalityofgenomicinformationanditseffectongenomicprediction.Genetics203:573-581.https://doi.org/10.1534/genetics.116.187013.

Pocrnic,I.,D.A.L.Lourenco,Y.Masuda,andI.Misztal.2016.DimensionalityofgenomicinformationandperformanceoftheAlgorithmforProvenandYoungfordifferentlivestockspecies.Genet.Sel.Evol.48:82.https://doi.org/10.1186/s12711-016-0261-6.

Vitezica,Z.,L.Varona,J.M.Elsen,I.Misztal,W.Herring,andA.Legarra.2016.GenomicBLUPincludingadditiveanddominantvariationinpurebredsandF1crossbreds,withanapplicationinpigs.Genet.Sel.Evol.48:6.https://doi.org/10.1186/s12711-016-0185-1.

Zhang,X.,D.A.L.Lourenco,I.Aguilar,A.Legarra,andI.Misztal.2016.Weightingstrategiesforsingle-stepgenomicBLUP:aniterativeapproachforaccuratecalculationofGEBVandGWAS.FrontiersinGenet.7:151.https://doi.org/10.3389/fgene.2016.00151.

Lourenco,D.A.L.,B.O.Fragomeni,H.L.Bradford,I.R.Menezes,J.B.S.Ferraz,S.Tsuruta,I.Aguilar,andI.Misztal.2017.ImplicationsofSNPweightingonsingle-stepgenomicpredictionsfordifferentreferencepopulationsizes.J.Anim.Bred.Genet.(inpress;doi:10.1111/jbg.12288).

Andonov,S.,D.A.L.Lourenco,B.O.Fragomeni,Y.Masuda,andI.Misztal.2017.Accuracyofbreedingvaluesinsmallgenotypedpopulationsusingdifferentsourcesofexternalinformation—Asimulationstudy.J.DairySci.100:395–401.https://doi.org/10.3168/jds.2016-11335.

Bradford,H.L.,I.Pocrnić,B.O.Fragomeni,D.A.L.Lourenco,I.Misztal.2017.SelectionofcoreanimalsintheAlgorithmforProvenandYoungusingasimulationmodel.J.Anim.Breed.Genet.InPress.https://doi.org/10.1111/jbg.12276.

Page 20: Genomic analyses with emphasis on single-stepnce.ads.uga.edu/wiki/lib/exe/fetch.php?media=ss_new_text_dl1_.pdfa regular animal model. Analyzing the output of BAGS is a necessity. An

20

Fragomeni,B.O.,D.A.L.Lourenco,Y.Masuda,A.Legarra,I.Misztal.2017.Incorporationofcausativequantitativetraitnucleotidesinsingle-stepGBLUP.Genet.Sel.Evol.49:59.https://doi.org/10.1186/s12711-017-0335-0.

Masuda,Y.,I.Misztal,A.Legarra,S.Tsuruta,D.A.L.Lourenco,B.O.Fragomeni,andI.Aguilar.2017.Technicalnote:Avoidingthedirectinversionofthenumeratorrelationshipmatrixforgenotypedanimalsinsingle-stepgenomicBLUPsolvedwithpreconditionedconjugategradient.J.Anim.Sci.95:49-52.https://doi.org/10.2527/jas.2016.0699.

Misztal,I.andA.Legarra.2017.Invitedreview:efficientcomputationstrategiesingenomicselection.Animal.11:731-736.https://doi.org/10.1017/S1751731116002366.

Pocrnic,I.,D.A.L.Lourenco,H.Bradford,C.Y.Chen,I.Misztal.2017.Technicalnote:Impactofpedigreedepthonconvergenceofsingle-stepgenomicblupinapurebredswinepopulation.J.Anim.Sci.InPress.https://doi.org/10.2527/jas2017.1581.

Tsuruta,S.,D.A.L.Lourenco,I.Misztal,T.J.Lawlor.2017.Genomicanalysisofcowmortalityandmilkproductionusingathreshold-linearmodel.J.DairySci.InPress.https://doi.org/10.3168/jds.2017-12665.