high-throughput sequencing technologies and genomics...
TRANSCRIPT
UNIVERSITAgrave CATTOLICA DEL SACRO CUORE Sede di Piacenza
Scuola di Dottorato per il Sistema Agro-alimentare
Doctoral School on the Agro-Food System
cycle XXVII
SSD AGR17
High-throughput sequencing technologies and genomics insights into the bovine species
Candidate Marco Milanesi Matr n 4011148
Academic Year 20132014
Scuola di Dottorato per il Sistema Agro-alimentare
Doctoral School on the Agro-Food System
cycle XXVII
SSD AGR17
High-throughput sequencing technologies and genomics insights into the bovine species
Coordinator Chmo Prof Antonio Albanese
_______________________________________
Candidate Marco Milanesi Matriculation n 4011148
Tutor Prof Paolo AjmoneMarsan PhD Lorenzo Bomba PhD Stefano Capomaccio PhD Ezequiel Louis Nicolazzi
Academic Year 20132014
Alla$mia$famiglia$
A$mio$nonno$Mario$
INDEXamp
Chapteramp1ndashIntroduction 3
Chapteramp2Relativeextendedhaplotypehomozygosity(rEHH)signalsacrossbreedsrevealdairyandbeefspecificsignaturesofselection
11
Chapteramp3SearchingnewsignalsforproductivetraitsthroughgenebasedassociationanalysisinthreeItaliancattlebreeds
49
Chapteramp4ampMUGBASaspeciesfreegenebasedassociationsuite
81
Chapteramp5Imputationaccuracyisrobusttocattlereferencegenomeupdates
89
Chapteramp6QuestfordeleteriousrecessivevariantsinItalianHolsteinbullscombiningexomesequencingandhighdensitySNPanalysis
99
Chapteramp7ndashConclusion 167
amp
Acknowledgmentsampamp 173
CHAPTER1
Introduction
Background
The bovine species counts almost 15 billion of heads classified in some 800
breeds worldwide (httpfaostat3faoorg) The species is adapted to a wide
varietyofdifferentenvironmentandhusbandrysystemsandisoftenrearedfor
multiple purposes (Taberlet et al 2008) draft power milk meat leather
productionandotheruses(Elsiketal2009)
European cattle (Bos$ taurus) derive from a single domestication event of the
auroch (Bos$primigenius)occurred in theFertileCrescent around10000years
ago(Taberletetal2011)FollowingdomesticationcattlefollowedtheNeolithic
expansionofagricultureandcolonisedEurope(seeAjmoneMMarsanetal2010
for a review) Many thousand years of natural and artificial selection have
shaped cattle phenotype and genotype to meet environmental conditions and
human needs and together with demographic events and genetic drift have
started thedifferentiation of the bovine species in diverse and locally adapted
populations This differentiation process accelerated in the last two centuries
whenthebreedconceptwasinventedsothatnowthisspeciescountshundreds
of different breeds with high diversity in coat colour body size production
aptitudeadaptationtoclimate tolerancetodiseaseetc(Taberletetal2008
Elsiketal2009TheBovineHapMapConsortium2009)
Thisdiversityisnowbeingprogressivelylostwiththeextinctionofmanylocal
populationsmainlyforeconomicreasonsSelectioneffortshavethereforebeen
focussedonafewoutperformingindustrialbreedsthatarepresentlyselectedfor
specialiseddairyorbeefproductionandsometimesdoublepurpose
In these breeds breeding schemes are well developed and have been
traditionally based on the genetic evaluation of sires and dams using data
collectedfromperformanceandprogenytestingandunbiasedlinearprediction
modelsThesearefoundedontheFisherinfinitesimalmodellingofquantitative
trait variation and need no genomic information to estimate breeding values
(EBVs) The genetic progress recorded using this approach has been relevant
particularly for traits having medium to high heritability In Italian Holstein
genetic trends for themain production traits (milk protein and fat yield) are
positiveHoweveraccurateestimateofbreedingvalueshasahighcostinterms
4
of investment and time so that 5 to 6 years are needed to have a dairy bullprogenytested(wwwanafiit)Theadventofgenomicselection(Meuwissenetal2001)markedamilestoneindairycattlebreedingTheideawasdevelopedbeforethemoleculartoolswereinplaceforimplementitinpracticeandhasbecomearealityonlyrecentlyInthisapproach progeny tested bulls are genotyped at many thousand loci and thevalue of their EBVs used to estimate the value of each allele at each locusgenotyped These values are then used to predict the genetic value of younganimals on the basis of their genotype As a result animals from the newgeneration receive an EBV with the same accuracy of estimates obtained byprogenytestingbutmuchearlierThislowerthegenerationintervalandgreatlyincreasestherateofgeneticprogress indairypopulationsThe initial ideawasso good that nowadays genomic selection is widely applied in industrialcountriestothemajordairycattlebreedsWiththedecreaseofgenotypingcostit is likely that its use will be soon expanded to animals selected for otherpurposes (eg beef cattle) and other species (eg pigs) However genomicselection is an extension of traditional selection It is extremely useful inbreedingforcomplextraitsbutgivesnobiologicalinformationonthegenesthatcontrolthemIfthetraditionalselectionviewsasiregenomeasabigblackboxhaving a value but no clue why genomic selection views the same genomedisassembled inmany small boxes but still black and stillwithno clueon thereasonoftheirvalueIt is reasonable to think that a deeper understanding of the biologicalmechanisms controlling traits may further increase the accuracy of genomicevaluationsIndeedthenewtechnologiesoffernewopportunitiesinthissensebyloweringcostsandincreasingprecisionofdataproductionThis thesis is exploring the use of these technologies for retrieving biologicalinformationfromgenomicdataItusesmethodsnowconsideredldquotraditionalrdquoingenomics (GWAS and analysis of selection sweeps) and less traditional ones(gene centered GWAS and exome analysis) All data used here have beenproducedwithin twonationalprojectson farmanimalgenomics fundedby theItalianMinistryofAgriculture
5
M SELMOL ldquoRicerca e innovazione nelle attivitagrave di miglioramento genetico
animalemediante tecniche di geneticamolecolare per la competitivitagrave del
sistemazootecniconazionaleIrdquo(ResearchandInnovationinanimalbreeding
sheepgoatbuffalohorseanddonkeyI)
M INNOVAGENldquoRicercaeinnovazionenelleattivitagravedimiglioramentogenetico
animalemediante tecniche di geneticamolecolare per la competitivitagrave del
sistema zootecnico nazionale IIrdquo (Research and Innovation in animal
breedingsheepgoatbuffalohorseanddonkeyII)
In these projects the research group I work with M Istituto di Zootecnica at
Universitagrave Cattolica del Sacro Cuore di Piacenza M coordinated all the research
activitiesconcerningtheapplicationofgenomicstechniquesindairycattle
Contentsofthethesis
The thesis is structured into a short introduction (this first chapter) 5 core
chapters(Chapter2to6)andashortConclusion(Chapter7)Withtheexception
of Chapter 6 still in preparation the other core chapters containmanuscripts
that at time of writing are under review for their publication in international
peerMreviewedjournalsormanuscript
InChapter2selectionsignatureindairyandbeefcattleareinvestigatedwiththe
Illumina BovineSNP50 Genotyping BeadChip Five Italian cattle breeds (Italian
Holstein Italian Brown Italian Simmental Marchigiana and Piedmontese) are
analyzedwiththerEHHmethod(Sabetietal2002)tofindsignaturesofrecent
selectionCandidategenesinthevicinityofsignaturessharedbyeitherdairyor
beefbreeds are identified and theirpossible role indairy andbeefproduction
discussed
In Chapter 3 a GWAS analysis is run to investigate production traits in three
Italian dairy cattle breeds (Italian Holstein Italian Brown Italian Simmental)
Data are analysed with a well established method (Amin et al 2007)
6
implemented in theGenABELRpackage (Aulchenkoet al 2007) andwith the
newgeneMbasedmethodAssociationisinvestigatedbetween50KSNPsandmilk
production(milkyield)andqualitytraits(percentageofmilkproteinandfat)
InChapter4thegeneMbasedmethodusedinthepreviouschapter3inspiredby
theVEGASapproach(Liuetal2010)isdescribedThesoftwarewasdeveloped
andreMcodedinourlaboratoryinordertopermittheanalysisofanyspeciesin
additiontohumanMultiMtestingcorrectionandplotrepresentationcompletethe
toolsmadeavailable
InChapter5wetesttheimpactoftheuseofdifferentreferencesequencesinthe
imputation step Italian Simmental data are used to test variation in genomeM
wide inputation accuracy using different cattle reference sequences (BTAU42
UMD31 and BTAU46) To increase the analyses reliability four different
inputationsoftwaresaretested
InChapter6Holsteindataareanalysedcombiningexomesequencesand800K
SNPs data to seek deleterious recessive variantsWhile in natural populations
the destiny of these variants is to disappear in livestock breeds they can be
maintained and disseminated if carried by a highly productive sire The
intersection of deleterious mutations identified by sequencing and lack of
homozygousregionsidentifiedbytheanalysisofthepopulationwiththeHDSNP
chipidentifiedgenescandidatetocarryrecessivedeleteriousvariants
Objective
Themainobjectiveofthethesiswastoinvestigatethebovinegenomewiththe
aid new highMthroughput technology and to explore new strategies for linking
biology (genes and their function) to traits This linkage once validated may
increase the accuracy of genomic prediction and find application in selection
schemesegineradicatingthemostdeleteriousvariantsandorintheplanning
ofmatings
7
ReferencesAjmoneMMarsanPGarciaJFLenstraJA2010OntheoriginofcattleHow
aurochsbecamecattleandcolonizedtheworldEvolAnthropol19148ndash157doi101002evan20267
AminNvanDuijnCMAulchenkoYS2007AGenomicBackgroundBasedMethodforAssociationAnalysisinRelatedIndividualsPLoSONE2e1274doi101371journalpone0001274
AulchenkoYSKoningDMJdeHaleyC2007GenomewideRapidAssociationUsingMixedModelandRegressionAFastandSimpleMethodForGenomewidePedigreeMBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614
ElsikCGTellamRLWorleyKC2009TheGenomeSequenceofTaurineCattleAWindowtoRuminantBiologyandEvolutionScience324522ndash528doi101126science1169588
LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKMHaywardNKMontgomeryGWVisscherPMMartinNGMacgregorS2010AVersatileGeneMBasedTestforGenomeMwideAssociationStudiesTheAmericanJournalofHumanGenetics87139ndash145doi101016jajhg201006009
MeuwissenTHHayesBJGoddardME2001PredictionoftotalgeneticvalueusinggenomeMwidedensemarkermapsGenetics1571819ndash1829
SabetiPCReichDEHigginsJMLevineHZPRichterDJSchaffnerSFGabrielSBPlatkoJVPattersonNJMcDonaldGJAckermanHCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash837doi101038nature01140
TaberletPCoissacEPansuJPompanonF2011ConservationgeneticsofcattlesheepandgoatsCRBiol334247ndash254doi101016jcrvi201012007
8
TaberletPValentiniARezaeiHRNaderiSPompanonFNegriniR
AjmoneMMarsanP2008Arecattlesheepandgoatsendangered
speciesMolEcol17275ndash284doi101111j1365M294X200703475x
TheBovineHapMapConsortium2009GenomeMWideSurveyofSNPVariation
UncoverstheGeneticStructureofCattleBreedsScience324528ndash532
doi101126science1167936
9
CHAPTER2
Relative(extendedamphaplotype(homozygosity)(rEHH)ampsignalsampacrossampbreedsamprevealampdairyampand$beef$specificampsignaturesampofampselection
LBomba1ELNicolazzi2MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly2FondazioneParcoTecnologicoPadanoViaEinsteinLocCascinaCodazza26900LodiItalyTheseauthorscontributedequallytothiswork
____________________SubmittedtoGeneticSelectionEvolution
Abstract
Background
Nowadays many methods have been implemented to scan for signatures of
selection at genomescale level within and across breeds Extended haplotype
homozygosity is a proficient approach to detect genome regions under recent
selective pressure within breeds The objective of this study is to identify
common regions under positive recent selection shared by most represented
dairyandbeefItaliancattlebreeds
Results
A total of 3220 animals from Italian Holstein (2179) Italian Brown (775)
Simmental (493) Marchigiana (485) and Piedmontese (379) breeds were
genotyped with the Illumina BovineSNP50 BeadChip v1 in the frame of the
ItalianlivestockgenomicprojectsldquoSelMolrdquoandldquoProzoordquoAfterstandardcleaning
proceduresgenotypeswerephasedandcorehaplotypesidentifiedThedecayof
Linkage Disequilibrium (LD) for each core haplotype was assessedmeasuring
the extended haplotype homozygosity (EHH) To correct for the lack of good
estimatesof local recombinationrates therelativeEHH(rEHH)wascalculated
for each corehaplotypeGenomic regionsunder recentpositive selectionwere
identifiedasthosehighlyfrequentcorehaplotypeswithsignificantrEHHvalues
Significant independentcorehaplotypeswerealignedacrossbreeds to identify
signalsofrecentselectionsharedbydairyandsharedbybeefbreedsOverall82
and 87 common regions under selectionwere detected among dairy and beef
cattlebreedsrespectivelyBioinformaticsanalysisidentified244and232genes
mapping in these common genomic regions Pathway and network analysis of
significant genes revealed molecular functions biologically related to signal of
selectionsdetectedinmilkormeatproductiontypes
Conclusions
Resultssuggest that themultibreedapproach isamethodto identifyselection
signatures in cattle breeds under selection for the same production goal
12
allowing to better pinpoint the genomic regions of interest in dairy and beef
production
Background
The advent of genomic technologies and the consequent availability of single
nucleotidepolymorphism(SNP)datahaveenabledthestudyofselectionevents
on the cattle genome (Hayes et al 2008 Qanbari et al 2010) Such selection
signals arise from environmental or anthropogenic pressures and help to
understand the causes that led to breed formation These studies are usually
addressedinaldquotopdownrdquoapproach(ShahzadandLoor2012)fromgenotypeto
phenotypeinwhichgenomicdataisstatisticallyevaluatedtodetectdirectional
selectionThephenotypeneededtorunselectionsignaturesisintendedinavery
generalsenseabreedaproductionaptitudeorevenageographicareaoforigin
Thisapproachholdsthepotentialto investigatetraitsasadaptationtoextreme
climatesanddifferentfeedingandhusbandrysystemsresiliencetodiseasesetc
thatareveryexpensivedifficultandsometimesimpossibletostudywithclassic
GWAS (GenomeWide Association Study) approaches Therefore it is expected
thatgenomic informationretrievedinthesestudiesonlypartiallyoverlapsthat
identified by GWAS on specific traits Searching for these ldquosignatures of
selectionrdquo is of interest for understanding molecular mechanisms underlying
important biological processes and providing new insights to functional
biologicalinformation(egdiseaseandorimportanteconomictraits(Nielsenet
al 2007)) To date many methods have been implemented to scan these
selectionsignaturesatthegenomiclevelMethodsdifferwithrespecttoseveral
properties (Lenstra et al 2012) there are methods that perform analyses
withinbreedoracrossbreedsthatcomparealleleorhaplotypefrequenciesand
size that detect recent or ancient selection events either still segregating or
fixatedintopopulations(Lenstraetal2012OrsquoBrienetal2014Utsunomiyaet
al2013)Differentmethodshavedifferentsensitivityandrobustnesseg they
maybeinfluencedatadifferentextentbymarkerascertainmentbiasanduneven
distributionofrecombinationhotspotsalongthegenome
13
Extended haplotype homozygosity (EHH) is a method that has been used in
many animal species including cattle (Pan et al 2013 Qanbari et al 2010
Zhangetal2006)TheEHHmethodwasfirstdevelopedbySabetietal(2002)
for human population analysis The main objective of this methodology is to
identifylongrangehaplotypesinheritedwithoutrecombinationeventsMorein
detailunderaneutralevolutionmodelchangesinallelefrequenciesaredriven
onlybygeneticdriftInthisscenarioanewvariantwouldrequirealongtimeto
reach high frequency in the population and the Linkage Disequilibrium (LD)
surroundingitwoulddecayduetorecombinationevents(Kimura1984)Inthe
caseofpositiveselectionaquickriseinfrequencyofabeneficialmutationina
relatively short time will preserve the original haplotype structure as the
number of recombination events would be limited Therefore a signature of
positiveselectionisdefinedasaregionwithanalleleinlongrangeLDlocatedin
anuncommonlyhighfrequencyhaplotype
One of themost interesting features of thismethod is that it is able to detect
genomic regions that are under recent selection These recent selection
signatures can be therefore used to identify the genomic regions involved in
breedformationInadditionEHHmethoddoesnotrequireanydefinitionofan
ancestralallele(asneededintheintegratedhaplotypescoreiHS(Voightetal
2006) and is suited to be applied to SNP data because it is less sensitive to
ascertainmentbiasthanothermethods(Nielsenetal2007)HowevertheEHH
methodgeneratesa largenumberof falsepositiveandnegative resultsdue to
heterogeneous recombination rates along the genome (Qanbari et al 2010)
Moreover another drawback not specific of EHH but shared by all selection
signaturemethod is the lack of robust inferences able to discern true signals
fromthosegeneratedbychance(Kemperetal2014)
To partially account for these limitations Sabeti et al (2002) developed the
relative extended haplotype homozygosity (rEHH) including an empirical
methodologytoobtainsignificantsignalsTherEHHofaputativecorehaplotype
compares its original EHH valuewith that of other haplotypes at that specific
locus using all other haplotypes as control for local variation in the
recombination rate Therefore it identifies genomic regions carrying variants
14
under selection that are still segregating in the population analysed Although
thesemethodswere conceived for human populations theywere successfully
appliedto livestockspeciessuchaspig(Maetal2012)andcattle(Qanbariet
al2010)
After domestication about 10000 years ago in the fertile crescent taurine
cattle have colonized Europe and Africa and have been selected for different
human needs (Taberlet et al 2011) Particularly in the last centuries the
anthropic pressure led to the formation of hundreds of specialized breeds
adapted to different environmental conditions and linked to local tradition
constitutingagenepooldeservingattentionforconservation(AjmoneMarsanet
al2010)Someofthesebreedshavebeenselectedspecificallyfordairybeefor
bothproductiontypesfollowingstrongartificialselectionforthesetraits(Hayes
etal2009)Thepresentstudyaimsat identifyingsignalsof recentdirectional
selectionusingrEHHmethodindairyandbeefproductiontypesusingdatafrom
five Italian dairy beef and dual purpose cattle breeds We focused on the
significant core haplotypes scanned by rEHH that are shared among breeds of
thesameproductiontypeThenweusedabioinformaticsapproachto identify
positionalcandidategeneslyingwithinthegenomicregionsunderselectionand
investigatetheirbiologicalrole
Methods
SampleAnimalsandGenotyping
A total of 4311 bulls of five Italian dairy beef anddual purpose breedswere
genotyped with the Illumina BovineSNP50 BeadChip v1 (Illumina San Diego
CA)joiningthegenotypingeffortsoftwoItalianprojects(namelyldquoSelMolrdquoand
ldquoProzoordquo)Thedatasetincluded101replicatesand773siresonspairsusedfor
downstreamqualitycheckofdataproducedIndetailgenotypesof2954dairy
(2179 ItalianHolsteinand775 ItalianBrown)864beef (485Marchigianaand
379 Piedmontese) and 493 dual purpose (Italian Simmental) bulls were
availableDataqualitycontrol(QC)wasperformedintwostagesafirststageon
15
animals independently for each breed applying the same methods and
thresholdsandasecondstageappliedonmarkersacrossall theindividualsof
thedatasetThefirststageexcludedindividualswithunexpectedlyhigh(gt02)
Mendelian errors for fatherson couples andwith low call rates (lt95)The
secondstageexcludedi)SNPswithle25missingvaluesinthewholedataset
orcompletelymissing inonebreed ii)SNPswithminorallele frequencyle5
and iii) SNPs in sexual chromosomes and with unassigned chromosome or
physicalposition
EstimationofrEHH
HaplotypeswereobtainedusingfastPHASEsoftwareconsideringdefaultoptions
(ScheetandStephens2006)andwererunbreedwiseandchromosomewisein
eachbreedPedigreeinformationforallbullswereprovidedbytheirrespective
breederassociations andwereused to filteroutdirect relatives (in fatherson
pairsthesonwasmaintainedinthedatasetandthefatherdiscarded)andover
representedfamilies(amaximumof5randomlychosenindividualsperhalfsib
familywasallowed)Thefinaldatasetcontainingtheseldquolessrelatedrdquoanimalswill
behenceforthcalled ldquononredundantrdquodatasetThisnonredundantdatasetwas
used also to calculate within breed pairwise wholegenome Linkage
Disequilibrium(LD)Ther2statisticforallpairsofmarkerswasobtainedusing
PLINK software v107 (Purcell et al 2007)Thedecayof LDwas assessedby
averagingr2valuesupto1Mbdistance
To test if the population structure influenced the detection of rEHH we
replicated all analyses on the whole Italian Holstein dataset without any
population correction (ie ldquoredundantrdquo dataset) focusing on genes or gene
clustersthatarewellknowntobeunderrecentselectionincattle(ieldquocandidate
regionsrdquo) In particularwe focused on the casein polled gene cluster and coat
colourgenes(MC1RandKIT(Kemperetal2014Qanbarietal2010))
The EHH and rEHH calculations were performed using Sweep v11 software
(Sabetietal2002)Somedefaultprogramsettingshadtobemodifiedtoadapt
theseanalyses to thebovinegenomeas thesoftwarewasoriginallydeveloped
forhumangeneticanalysesSpecificallylocalrecombinationratesbetweenSNPs
16
wereapproximatedto1cMperMbEHHandrEHHcalculationswereperformed
breedwise and chromosomewise using an automatic core selection with
defaultoptions(ieconsideringlongestnonoverlappingcoresandlimitingcores
toatleast3andnomorethan20SNPs)asinQanbarietal(2010)AlthoughEHH
and rEHH values were obtained for all core haplotypes only those with
frequency ge 25 were retained for further analyses The (empirical) rEHH
significancethresholdwasobtainedbydividingtherEHHvaluesinto20binsof
5 frequency range each logtransforming withinbin values to achieve
normality Core haplotypes with pvalues lt 005 were considered significant
DetectionofcommonselectionsignatureswasobtainedcomparingrEHHvalues
acrossgenomicregionsacrossbreeds
Breedgroupingaccordingtoproductiontype
Toexamine thesignals common toeachof the twoproduction types (iedairy
andbeef) corehaplotypes thatsharedoneormoreSNP inat least twobreeds
from the same production type were selected The dual purpose Italian
Simmentalwas included inbothdairy (ItalianHolsteinand ItalianBrown)and
beef groups (Piedmontese and Marchigiana) since potentially it possesses
haplotypes selected for both production types All downstream analyses were
performedseparatelyforthedairyandthebeefproductiontypes
Detectionandannotationofcandidategenes
Genome annotation was performed using Biomart
(httpwwwbiomartorgindexhtml) a comprehensive source of gene
annotationformanyspeciesprovidedbyEnsembl(wwwensemblorg)Thelist
ofsharedhaplotypesregions(inbp)fordairyandbeefwasusedasinputfilein
theBiomartwebinterfaceThesetofgenesretrievedbyBiomartwasthenused
as input for theCanonicalpathwayandregulatorynetworkanalyses Ingenuity
PathwayAnalysis toolversion80(IPA IngenuityregSystems IncRedwoodCity
CA httpwwwingenuitycom) and a substantial examination of published
literaturewasusedtoexaminethefunctionalrelationshipsamongtheresulting
genes The IPA operates with a proprietary knowledge database providing
complementarypathwayanalysisforseveralspeciesincludingcattleIntheIPA
17
analyses the significance of each biological function identified was estimated
using Fisherrsquos exact test A Benjamini and Hochberg correction for multiple
testingwas calculated for function and canonical pathways For eachNetwork
IPAcomputedascore(pscore=log10(pvalue))fittingthesetstudiedgenesand
a list of biological functionspresent in the IPAknowledgedatabase The score
considered thenumberofgenes in thenetworkand thesizeof thenetwork to
estimatehowrelevantthisnetworkwasaccordingtothelistofgenesprovided
Thenthescorewasusedtorankthenetworks
Results
DatasetQC
Therepeatabilityobtainedfromthe101replicatespresentinthewholedataset
washigherthan998AfterthetwoQCstages105individualsand9730SNPs
were discarded Moreover after phasing 1292 individuals were removed to
reducethe largenumberofsibfamiliespresent intheredundantdatasetAfter
quality control procedures the final dataset had 44271 SNPs and 1132 514
393 410 and 364 individuals from Italian Holstein Italian Brown Italian
SimmentalMarchigianaandPiedmontesebullsrespectively(Table1)
18
Table1JNumberofanimalsgenotypedbeforeandafterQCTotalnumberofanimalgenotypedandrelativenumberofanimaldiscardedafterQCanalysis
Detectionofselectionsignatures
Sweepv11 softwaredetectedrEHHsignalswitha frequencyhigher than25overlapping the test candidate regions in both datasets corrected or not forpopulation structure In the redundant dataset we found only one significantrEHHsignal(caseincluster)whereasinthenonredundantdatasetweidentified5significantrEHHsignalsoverlappingallthetestcandidateregionsconsidered(caseinclusterpolledclusterMC1R)(Table2)
BreedTotal
Genotyped
ED15
misAN
ED1
REPL
ED1
MENDCleaned
HOL 2179 40 31 5 2093
BRW 775 6 16 4 749
SIM 493 6 6 2 479
MAR 485 37 38 410
PIE 379 5 10 364
19
Table2JComparisonofcandidateregionrEHHinnonJredundantandredundantdatasetrEHH overlapping candidate regions in both corrected (nonredundant) and not corrected
(redundant)datasetsforpopulation
Redundant NonRedundant
Candidate
geneBTA1
Closest
SNP(bp)CH2range CH2Frequency
rEHHJ
log(pval)CH2Frequency
rEHHJ
log(pval)
POLLED
Cluster1 1981154 18974181981154 H1079 093037 H1078 167174
MC1R 18 13657912 1331772014007505 H1069 120096 H1069 077167
KIT_BOVIN 6 72821175 7250492172821175 H1054 030030 H1054 033048
Casein
Cluster6 88427760 8835009888452835 H1047 094139 H1046 148133
Casein
Cluster6 88427760 8835009888452835 H2032 002003 H2033 002003
In total5526567847724388and4049corehaplotypeswitha frequency
higher than 25 were detected on Italian Holstein Italian Brown Italian
Simmental Marchigiana and Piedmontese breeds respectively A total of 838
866 740 692 and 613 core haplotypes were found significant outliers (see
Materials and Methods) in the aforementioned breeds Table 3 shows the
distribution of the total and significant core haplotypes per chromosome and
breed
20
Tableamp3amp(ampCoreamphaplotypesampdistributionampDistributionoftotalandsignificantcorehaplotypesperchromosomeandbreed
ampamp ItalianampampHolsteinamp
ItalianampampBrownamp
ItalianampSimmentalamp Marchigianaamp Piedmonteseamp
BTA1amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
1amp 389 66 389 68 335 49 310 51 288 46
2amp 312 47 314 52 267 44 257 42 238 33
3amp 292 48 313 62 264 46 236 34 227 34
4amp 268 46 279 50 245 41 237 39 229 39
5amp 223 32 231 29 200 30 183 27 171 22
6amp 291 51 307 40 265 45 252 38 233 42
7amp 249 37 253 46 214 39 195 33 204 27
8amp 268 39 270 38 235 35 223 32 209 31
9amp 227 37 228 41 210 37 165 22 187 30
10amp 219 36 234 34 192 25 196 26 168 25
11amp 248 39 246 34 209 30 205 33 187 25
12amp 162 24 176 18 148 21 148 25 135 22
13amp 205 32 200 19 155 18 178 35 135 19
14amp 175 17 189 25 168 29 143 22 129 19
15amp 180 31 185 29 148 23 139 13 130 19
16amp 176 28 192 29 160 25 149 19 127 15
17amp 170 27 183 29 168 31 135 22 123 16
18amp 133 18 153 19 119 14 108 10 82 12
19amp 137 22 130 23 118 16 99 18 92 15
20amp 184 35 172 33 145 23 124 23 124 17
21amp 145 23 147 30 113 18 110 21 94 13
22amp 140 24 145 19 115 21 112 18 104 13
23amp 106 13 100 16 67 10 64 9 57 8
24amp 129 11 140 20 124 17 97 23 101 16
25amp 91 12 98 11 68 10 77 12 70 6
26amp 125 7 114 14 93 12 96 15 90 16
27amp 91 12 91 15 75 11 84 12 60 10
28amp 88 9 96 10 70 10 66 9 55 11
29amp 103 15 103 13 82 10 72 9 67 12
TOTamp 5526 838 5678 866 4772 740 4388 692 4049 613
1BostaurusAutosomes2Detectedcorehaplotypeswithafrequencyinthebreedhigherthan253Significantcorehaplotypesatplt005
Comparisonamptoamppreviouslyampreportedampdataamp
AnumberofstudieshavesearchedselectionsweepsinHolstein(Qanbarietal
2010) Brown (Qanbari et al 2011) and Simmental (Fan et al 2014) breeds
Since different methods are expected to identify signatures having different
characteristicsthecomparisonwithpreviousstudiesislimitedtothosestudies
using thesamemethodand thesamebreed(s)analysed inourstudyOnlyone
study analysed (German) HolsteinRFriesian cattle with rEHH (Qanbari et al
2010)Qanbarietal(2010)reportedcandidategenesandgeneclustersunder
recent positive selection that we compared with our rEHH in dairy breeds
dataset(Table4)
Table4JCorehaplotypesincandidategenesfollowingQanbarietal2011
1BosTaurusChromosome2Corehaplotype
Thenumberofcorehaplotypesfoundinourstudyinthecandidateregionswas
lowerpreviouslyreportedWhenconsideringHolsteinsthetwomostsignificant
candidateregionsinbothstudiesmatch(caseinclusterandthesomatostatinSST
gene)althoughitisimpossibletodetermineifthehaplotypeunderselectionis
the same as this information was not provided in Qanbari et al (2010)
However other genes considered significant in Qanbari et al (with pvalues
ranging from 004 to 010) were not significant in our studyWhen using the
same loose significant threshold (pvalues le 010) ldquosignificantldquo signals were
foundalsoinItalianBrownfortheCaseinCluster(log10pvalue=109)andin
ItalianSimmentalfortheSSTgene(log10pvalue=126)
HOLSTEIN BROWN SIMMENTAL
Candidate
geneBTA1
Closest
SNP
(bp)
CH2rangeCH2
Frequency
rEHH
Hlog(p)CH2range
CH2
Frequency
rEHH
Hlog(p)CH2range
CH2
Frequency
rEHH
Hlog(p)
DGAT1 14 444963236653443936
H1057 011443936763332
H1042 0130007
Casein
Cluster6 88391612
8835009888452835
H1046 1481338832601288452835
H2068 0801098835009888452835
H1044 037022
8835009888452835
H2033 018030
8835009888452835
H3034 022045
GH 19 49652377 GHR 20 33908597
SST 1 813769568128358581376961
H1036 200189 8131845181376961
H1042 126053
8128358581376961
H2029 00630084 8131845181376961
H2042 006027
IGFH1 5 71169823
ABCG2 6 37374911 3731702038256889
H1044 031027
3731702038256889
H1040 029031
Leptin 4 95715500
LPR 3 855692038549710885594551
H1047 0910728549710885794693
H1068 0800638549710885794693
H1063 002005
8549710885594551
H2041 027025
PITH1 1 35756434
22
Signaturessharedbydairyandbybeefbreeds
Significantcorehaplotypeswerealignedacrossbreedsto identifythoseshared
among dairy or beef production types Since breeds can be considered
independentsetsofobservationssharedsignaturesaremorelikelytorepresent
real effects rather than false positives A total of 123 and 142 significant core
haplotypesweresharedbetweenatleasttwodairyorbeefbreedsrespectively
(File S1) Only 82 and 87 of those significant core haplotypes contained genes
considered positional candidates under positive selection In total 22 and
17 of the genomewas under selection for dairy and beef production types
respectivelyTherEHHaveragelengthswere216932and190994bpfordairy
andbeefrespectively
Genesetannotationandnetworkpathwayanalysis
Atotalof244and232annotatedgenesfellwithintheselectedregionsfordairy
andbeefproductiontypesrespectively(Table5)
23
Table5JStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreedsStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreeds
DairyCattle BeefCattle
BTA SelSig
n
Genes1 Sum
SelSignsize
(bp)
Mean
SelSign
size(bp)
SelSi
gn
Genes1 Sum
SelSignsize
(bp)
Mean
SelSign
size(bp)
1 7 11 1521608 138328 7 13 2263213 174093
2 5 9 2216685 246298 8 11 2658549 2416863 2 5 1619336 323867 7 21 3755847 1788504 7 21 5956755 283655 6 11 1761288 1601175 4 17 4506119 265066 5 11 1251127 1137396 2 4 1467738 366934 5 12 5149692 4291417 6 30 4842969 161432 3 28 11889191 4246148 0 0 0 0 4 12 3512215 2926859 4 7 1554863 222123 2 6 1106768 18446110 4 17 8839762 519986 2 3 573138 19104611 6 8 1841802 230225 1 1 100439 10043912 2 8 4244133 530517 1 1 536461 53646113 3 8 1933026 241628 2 8 985376 12317214 0 0 0 0 3 9 692397 7693315 5 9 1415405 157267 2 2 189094 9454716 3 6 1302506 217084 4 6 905719 15095317 3 4 471046 117762 2 9 1551010 17233418 0 0 0 0 3 23 2961718 12877019 3 23 5744143 249745 2 2 116938 5846920 1 2 148886 74443 2 4 627971 15699321 3 7 1930206 275744 1 5 1150760 23015222 3 5 620674 124135 3 10 2411751 24117523 0 0 0 0 1 1 68421 6842124 2 18 9166004 509222 2 4 803770 20094225 2 11 2016602 183327 1 3 329199 10973326 2 4 466134 116534 4 7 776080 11086927 1 1 631792 631792 2 3 1004717 33490628 0 0 0 0 1 1 93464 9346429 2 9 935209 103912 1 5 798300 159660TOT 82 244 65393403 216932 87 232 50024613 190994
1GenesidentifiedGenesintheregionssharedbyallthreedairybreeds(8genes)orallthreebeefbreeds(11genes)foramoredetailedanalysiswereselected(Fig1)
24
Figure1JGenomiclocationoftheselectionsignaturessharedamongstudiedbreedsa)GenesinensembletracksaredisplayedasaredboxcorehaplotypesandSNPsarecolouredinorange(MarchigianaMAR)inviolet(PiedmontesePIE)andpink(SimmentalSIM)b)Genesinensemble tracks are displayed as a red box core haplotypes and SNPs are coloured in blue(HolsteinHOL)ingreen(ItalianBrownBRW)andpink(SimmentalSIM)
All the genes identified were submitted to downstream pathwaynetwork
analyses(Table5Figure1FileS1)Interestinglygenesintheregionssharedby
allthreedairyorbeefbreedsweremanifestedinthemostlikelynetworksforthe
two production aptitudes The IPA analysis detected a total of 12 and 15
networksindairyandbeefbreedsrespectively
Indairybreeds network scores ranged from46 to2 (File S2) Eightnetworks
obtainedscoreshigherthan20andincludedmorethan14moleculeseachThey
wereassociatedtoi)geneexpressionRNAdamageandrepairproteinsynthesis
andposttranslationalmodificationii)cellassemblyorganizationfunctionand
maintenance iii) cell cycle proliferation and cancer iv) carbohydrate
metabolism v) gastrointestinal and neurological disease developmental and
endocrine system disorders A total of 23moleculeswere associatedwith the
highest ranked network (score = 46) In this the immunoglobulins mitogen
25
activatedproteinkinasesandNFkBcomplexesformacentralhublinkinggenes
involved in celltocell signaling and interaction hematological system
developmentandfunctionandimmunecelltraffickingfunctions(Figure2)
Figure2JThemostlikelyIPAnetworkindairycattlebreedsThefunctionsassociatedtothenetworkarecelltocellsignallingandinteractionhematologicalsystem development and function immune cell trafficking Nodes in red correspond to genesidentifiedincorehaplotypesoverlappinginallthethreebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds
Breast cancer antiNestrogen resistance gene 3 (BARC3) andpituitary glutaminyl
cyclase gene (QPCT) genes were common to all three dairy breeds and are
directlyconnectedwithmammaryglandmetabolisms(Ezuraetal2004GSun
et al 2012) The solute carrier family 2 member 5 (SLC2A5) gene facilitates
glucosefructose transport (Zhao et al 1998) and the zetaNchain (TCR)
26
associated protein kinase 70kDa (ZAP70) gene plays a critical role in Tcell
signaling (Bonnefont et al 2011) Calpain is another important complex that
togetherwithcalpainN3(CAPN3)mediatesepithelialcelldeathduringmammary
gland involution (Wilde et al 1997) Furthermore RAS guanyl nucleotide
releasingprotein(RASGRP1)activatestheErkMAPkinasecascaderegulatesT
cellsandBcellsdevelopmenthomeostasisanddifferentiationandisinvolvedin
the regulation of breast cancer cell (Madani et al 2004 Yasuda et al 2007
Wickramasingheetal2009)Genesinvolvedintheothernetworksarelistedin
FileS2
The scoreof thebeef cattlebreedsnetwork rangedbetween46and2 and the
numberofmoleculespresentineachnetworkrangedfrom23to1(FileS3)Asin
dairy breeds eight networks obtained a score higher than 20 and 13 ormore
molecules each These networkswere associated to i) carbohydrate lipid and
nucleic acidsmetabolism energyproduction and smallmoleculebiochemistry
ii) cardiovascular system development and function iii) cellular and tissue
developmentcellassemblyandorganizationfunctionandmaintenanceandcell
and organmorphology iv) cell cycle celltocell signaling and interaction v)
DNA replication recombination and repair and RNA posttranscriptional
modification vi) dermatological diseases and conditions infectious disease
organismal injury and abnormalities In the network with the highest score
(score=46)chondroitinsulfateproteoglycan4(CSPG4)andsnurportinN1(SNUPN)
genes were shared among all beef cattle breeds investigated CSPG4 gene is
relatedtomeattendernesswhileSNUPN isanimprintedgeneimportantinthe
embryodevelopmentandinvolvedinhumanmuscleatrophy(Narayananetal
2002) (Figure 3) All the genes included in this network were involved in
carbohydratemetabolismsmallmoleculesbiochemistryandcellcycleTheother
networkswere related to important signaling andmetabolic process essential
foranimalgrowth(FileS3)
27
Figure3JThemostlikelyIPAnetworkinbeefcattlebreedsThe functions associated to the network are carbohydrate metabolism small molecule
biochemistry and cell cycle Nodes in red correspond to genes identified in core haplotypes
overlapping in all the three breeds of each production type whereas those in green depict
overlappingcorehaplotypesinatleasttwoofthosebreeds
A total of 6 and 9 statistically significant Canonical Pathways (FDR le 005
log10(FDR) ge 13) were identified using IPA for dairy and beef breeds
respectively(Figure4)
28
Figure4JBarplotofstatisticallysignificantCanonicalPathwaysPvalues were corrected for multiple testing using BenjaminiHochberg method and arepresented in thegraphas log(pvalue)Thebar represents thepercentageof genes in a givenpathway that meet cutoff criteria within the total number of molecules that belong to thefunctiona)BarplotofstatisticallysignificantCanonicalPathways indairycattlebreedsb)BarplotofstatisticallysignificantCanonicalPathwaysinbeefcattlebreeds
Themost significant Canonical Pathway in dairywas for purinemetabolism (
log10(FDR) = 26) supporting the highly synthetic processes in the mammary
epithelium(Figure5)
29
Figure5JGenesdetectedunderrecentpositiveselectionindairycattleinvolvedinpurinemetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds
Inbeefproductiontheephrinreceptorsignal(log10(FDR)=27)wasthemost
significant Canonical Pathway (Figure 6) Among other functions ephrin is
knowntopromotemuscleprogenitorcellmigrationbeforemitoticactivation(Li
andJohnson2013)AlltheotherCanonicalPathwaysarereportedinFileS4
30
Figure6JGenesdetectedunderrecentpositiveselectioninbeefcattleinvolvedinEphrinmetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds
Discussion
In this work more than 4000 genotyped bulls of five dairy beef and dualpurposeItalianbreedswerescreenedsearchingforselectionsignaturesindairyand beef production types A strict data quality controlwas applied to reducepossiblesourcesofbiassuchastechnicalgenotypingproblemsandpopulationstructurethatcouldhavelargeimpactonrEHHresultsInfactsincetheimpactofpopulationstructureonrEHHvaluesisstillunexploredwereplicatedpartofouranalyseswithandwithoutcloserelativesinanattempttostudytheeffectofpopulationstructureonthisselectionsignaturemethodInprinciplepopulationstratificationshouldleadtoanoverrepresentationofcertainhaplotypessimplyduetothepresenceoftightpedigreelinks(egsirespassinghalfoftheirgeneticmaterialtotheirsons)ratherthanactualselectionprocessesForthisreasonfor
31
the full analyses all siresons couples were removed after haplotype phasing(retaining only the younger animal) and halfsib familieswere restricted to amaximum of five randomly chosen individuals in an attempt to reduce overrepresentation This assessment was restricted to four regions known to beunderselectioninItalianHolsteinasthecaseinclusterthepolledgeneclusterMC1R and KIT Italian Holstein was selected for two reasons i) it is a highlystructuredbreedaccordingtoourdata(abouthalfofindividualsremovedwereclose relatives) ii) allowed us to compare our results with a previous studyAlthoughtheanalysesconductedonbothredundantandnonredundantdatasetswereable to identify rEHHsignals in these regions thenonredundantdatasetproduced five significant rEHH signals rather than just one in the redundantdataset(Table2)Thisresulthighlightstheconfoundingeffectofthepresenceofcloserelativesinthedatasetandconsequentlytheimprovedabilitytodetectedsignificant selection signature while correcting for population structureInterestingly our results only partially overlap those found by Quanbari et al(2010) These authors detected significant signals at the casein cluster andsomatostatinSSTaswedid but also at ahighernumberof candidate regionsThese inconsistencies may be the effect of different sires included in theanalyses of the different dataset size and to the close relative trimmingprocedure we adopted to decrease the effect of population structure andconsequent bias (Table 4) However poor correlation across investigations isoftenobservedalsointhedeeplyinvestigatedhumanspeciesThisisduetotheuse of different within and between breed statistics that identify selectionsignatureshavingdifferent characteristics (ancientrecent segregatingfixatedunder directionalbalancing selection) to the likely high rate of falsepositivesnegativesresultsandalsotothedifferentconsiderationofpopulationstructureandbackgroundselection(Enardetal2014)The number of total and significant core haplotypes identified by the SweepsoftwarewasthehighestinItalianBrownandthelowestinPiedmontesebreedSinceEHHisheavilybasedonpopulationLDtheaverageLDleveloverdistancewascalculatedineachbreed(Figure7)
32
Figure7JMultiJbreedaveragelinkagedisequilibriumagainstphysicaldistance(inkb)Marchigiana Piedmontese and Italian Simmental breeds present a lower persistence of LD allargerdistancethanItalianHolsteinandItalianBrownbreedsThisanalysishighlightedageneralpositivecorrelationbetweenthe levelofLDover distance and the number of total and significant core haplotypes foundHoweverconsideringthatrEHHisarelativemeasure it ismorelikelythatthehighernumberofsignificantcorehaplotypesidentifiedindairybreedswasduemostlytoahigherselectivepressureindairycomparedtobeefbreedsSignificance tests used in detecting selection signatures should measure theprobability of a statistic to be anoutlier compared to its expecteddistributionunder a neutral model However no reliable neutral model has so far beendevelopedincattlebecauseofthecomplexityofthedemographichistoryofthisspecies (Kemper et al 2014) As a consequence empiric rather than modelbasedsignificancetestaregenerallyusedinthedetectionofselectionsignaturesAccordinglyweconsideredasoutliersvaluesas thosesimply falling in the5plusvarianttailoftherEHHdistributionWekeptthewithinbreedsignificancethresholdloosenoncorrectingitformultipletestingbutconsideredonlysignals
33
eitherexceedingthisthresholdbyatleasttwoordersofmagnitudeorsharedby
twoormoreindependentbreeds
Parallel comparison of results from independent analyses of different breeds
allowed the achievement of a double objective i) to identify putative regions
under(recent)selectioninbreedsbelongingtotwoproductiontypeswhichwas
the main objective of this study and ii) reducing false positives since multi
breed analyses served as internal control Since the rEHH method does not
considerphenotypicinformationasignificantsignalmightarisebecausei)the
core haplotype is actually under a selective pressure ii) the result is a false
positive ie causedby chance population structureor anyotherdriving force
However even considering an unrealistic scenario with no false positives a
proportionofthetotalamountofsignalswouldactuallybeselectionsignatures
due to a selectionpressurenot ondairy or beef traits This becausedairy and
beefbreedsinvestigatedareonlyafewandshareanumberoftraitsnotdirectly
related to dairybeef production Even considering this limitation to our
knowledge this is the first dairy and beef multibreed study applying this
strategytoreducefalsepositivesatthecostofpossiblelossofinformationdue
tohighfalsenegativeratesSignificantsignalsharedbydairyandbybeefbreeds
wereused fordownstreamnetworkanalysesonpositional candidategenes to
search forabiological justification towhatwasobtainedstudying thegenomic
information Only the top network for dairy and the top network for beef
productionarediscussedindetail
Dairybreeds
ThemostsignificantnetworkdetectedindairybreedsisassociatedwithCellTo
CellSignalingandInteractionHematologicalSystemDevelopmentandFunction
andImmuneCellTraffickingfunctions(Figure2)ImmunoglobulinandNFkBact
aspivotalgeneswithinthisnetworkInterestinglyBARC3andQPCTgeneswere
sharedamongallthethreedairybreedsBothgeneshavenotyetbeenstudiedin
cattlealthoughinhumanstudiesthesegenesarewellknowntobelinkedtothe
mammaryglandmetobolismandthecalciumregulationTheformerisinvolved
in integrinmediated cell adhesions and signaling which is required for
mammary gland development and functions (G Sun et al 2012) The latter is
34
associatedwithlowradialbodymineraldensity(BMD)inadultwomen(Ezuraet
al2004)AnotherimportantplayeronthisnetworkisSLC2A5genealsoknown
asGLUT5 SLC2A5 gene acts as fructose transporter in the intestine and has a
significantroleinenergybalanceofdairycows(Burantetal1992)Findingthis
gene in adairy cattle specificnetwork seems surprising since there shouldbe
little need for transporting glucose andor fructose in the ruminant intestinal
tractassimplecarbohydratesaredegradedintovolatilefattyacids(VFA)inthe
rumen (Iqbal et al 2009) However it is often the case that large amount of
starchbypasstherumenincowsfedwithdietsrichincerealgrains(Zhaoetal
1998)Thisbypassedstarchneedstobedigestedinthesmallintestinetoavoid
highlevelsofglucoseconcentrationinthelargeintestine
WedetectedcalpaincomplexandcalpainN3(CAPN3)genesunderpositiverecent
selection in the dairy group as reported also by Utsunomiya et al (2013)
AltoughCalpainisknowntobeinvolvedinpostmortemmeattendernizationitis
also related to dairy metabolism as the muscle breakdown promoted by the
calpain gene provides an energy source for milk production especially at the
beginningoflactation(Kuhlaetal2011)InadditionasreportedbyWildeetal
(1997) calpainNcalpastatin system is related to programmed cell death of
alveolar secretory epithelial cells during lactation Zap70 gene is a cytoplasmic
proteintyrosinekinaseItisrelatedtotheimmunesystemasacomponentofthe
Tcell receptor playing a central role in Tcell responses (Wang et al 2010)
Bonnefontet al (2011) found thatZap70 genewasup regulated (FoldChange
(FC)=82FDR=277E12)onmilksomaticcellsofsheepinfectedbySaureus
andSepidermidisshowinganassociationwithmastitisresistance
Purinemetabolismwasthemostsignificantcanonicalpathwayindairybreeds
In a study conducted in human mammary epthelium Maningat et al (2008)
conductedageneexpressionanalysisonbreastmilkfatglobulesidentifyingthe
PurinemetabolismasthemostsignificantpathwaySynthesisandbreakdownof
purine is essential in the metabolism of the tissues of many organisms
particularlyofthemammaryglandduringlactation
Another interesting canonical pathway was endothelin signaling (Figure 5)
endothelin functions as vasocontrictor and is secreted by endothelial cells
35
(Nussdorferetal1999)Acostaetal(1999)reportedthatincattleendothelinsareinvolvedinthefollicularproductionofprostaglandinsandtheregulationofsteroidogenesis in the mature follicle In a recent study Puglisi et al (2013)confirmed the implication of endothelin in particular EDNRA and endothelinconvertin enzyme 1 in reproductive disorder in bovine They classified theEDNRAaspotentialbiomarkerforfertilityincow
Beefbreeds
Inbeefcattlebreedsthefocaladhesionkinase(FAK)togetherwithintegrinandERK12genesplayacentralroleashubsinthemostsignificantnetwork(Figure3)Theyareinvolvedincellularadhesionandcellmotilityprocessesparticularlyduringskeletalmyogenesis(Nguyenetal2014)Moreovertheoverexpressionof FAK can determine a reduced apoptosis and an increase risk of metastaticcancers(MehlenandPuisieux2006)Thesehubsarenotdirectlyinvolvedinourstudybut sharemany interactors thatare located in region found tobeunderselction such as CSPG4 RB1CC1 MGAT3 CIRPB CSPG4 gene belongs tochondroitin sulfate proteoglycans (CSPGs) family CSPGs are proteoglycansconsistingofaproteincoreandachondroitinsulfatesidechainTheyareknownto be structural components of a variety of tissues includingmuscle andplaykey roles inneuraldevelopmentandglial scar formationTheyare involved incellular processes such as cell adhesion cell growth receptor binding cellmigrationandinteractionswithotherextracellularmatrixconstituents
Manystudiesreporttheroleofproteoglycansinmeattextureofseveralmusclesfromthebovinespecies(Nishimuraetal1996)Dubostetal(2013)highlightadirect involvement of proteoglycans with respect to cooked meat juicinessRB1CC1geneplaysacrucialroleinmusculardifferentiationanditsactivationisa essential for myogenic differentiation (Watanabe et al 2005 p 1)Monoacylglycerol acyltransferase (MGAT3) gene cause the synthesis ofdiacylglycerol (DAG)using2monoacylglyceroland fattyacyl coenzymeAThisenzymaticreactionis fundamental fortheabsorptionofdietaryfat inthesmallintestineSunetal(2012p3)reportedinastudyonfiveChinesecattlebreedsthatMGAT3gene isassociatedwithgrowthtraitsCIRPBgenemaybepartofacompensatorymechanism inmuscles undergoing atrophy It preservesmuscle
36
tissuemassduringcoldshockresponsesaginganddisuse(DupontVersteegden
et al 2008) SNUPN is an imprinted gene and is expressed monoallelically
dependingonitsparentaloriginSNUPNplayimportantrolesinembryosurvival
andpostnatal growth regulation (Smithet al 2012Wanget al 2012)Ephrin
receptorsignalingwasthetopcanonicalpathwayproducebyIPAandhasalsoan
interestingbiologicalinterpretationformeatproduction(Figure6)Indeedthis
pathaway is important for the muscle tissue growth and regeneration
participating in correct positioning and formation of the neural muscular
junction(LiandJohnson2013)
Conclusions
In this study we analysed the selection signatures at genomewide level in 5
Italian cattle breeds Then we used a multibreed approach to identify the
genomicregionssharedamongcattlebreedsselectedfordairyorbeefpurpose
Thisapproachincreasedthepowertopinpointregionsofthegenomethatplay
important roles in economically relevant traits in both dairy and beef cattle
Moreover network and pathways analyses were used to display a detailed
functional characterization of genes mapping in the regions detected under
recentpositiveselection
Particularly in dairy cattle genes under directional selection are related to
feeding adaptation (increasing level of starch in the diet) mammary gland
metabolismandresistencetomastitisThebiologicalfeaturesunderselectionin
beef cattle breeds are involved in animal growth meat texture and juiceness
These results have a logical biological explanation However the frequent
approach of interpreting selection signatures on the basis of the function of
geneslocatednearbythepeaksignalofthestatisticusedispresentlychallenged
Novelinformationinhumanssuggeststhatmostselectedvariationisnotlocated
within genes and coding regions but in regulatory sites identified by the
ENCODEproject(Enardetal2014)Thesemaycontroltheexpressionofentire
genomicregionsorofgeneslocatedatarelavantdistancefromtheselectedsite
makingbiologicalinterpretationmorecomplex
37
InanycasefurtherstudiesusingdenserSNPchipsorwholegenomesequencing
producinginformationnotsubjectedtoascertainmentbias(Qanbarietal2014)
may increase the resolution of our analysis and togetherwith new knowledge
arisingonthecontrolofgeneexpressionvalidateorcorrectourresults
Acknowledgements
ANAFI (Italian Association of Holstein Friesian Breeders) ANAPRI (Italian
Association Simmental Breeders) ANABORAPI (National Association of
PiedmonteseBreeders)ANABIC(AssociationofItalianBeefCattleBreeders)and
ANARB (Italian Brown Cattle Breeders Association) are acknowledged for
providing the biological samples and the necessary information for this work
The Authors are also grateful to SelMol project (MiPAAF Ministero delle
PoliticheAgricoleAlimentarieFortestali)andbytheProZooproject(Regione
LombardiandashDGAgricoltura Fondazione Cariplo Fondazione Banca Popolare di
Lodi)forfunding
Thefundershadnoroleinstudydesigndatacollectionandanalysisdecisionto
publishorpreparationofthemanuscript
References
Acosta TJ Berisha B Ozawa T Sato K Schams D Miyamoto A 1999
Evidence for a Local EndothelinAngiotensinAtrial Natriuretic Peptide
SysteminBovineMature Follicles In Vitro Effects on SteroidHormones
and Prostaglandin Secretion Biol Reprod 61 1419ndash1425
doi101095biolreprod6161419
AjmoneMarsan P Garcia JF Lenstra JA 2010 On the origin of cattle how
aurochs became cattle and colonized theworld Evol Anthropol Issues
NewsRev19148ndash157
38
BonnefontCToufeerMCaubetCFoulonETascaCAurelMRBergonier
D Boullier S RobertGranieacute C Foucras G 2011 Transcriptomic
analysisofmilksomaticcells inmastitisresistantandsusceptiblesheep
upon challenge with Staphylococcus epidermidis and Staphylococcus
aureusBMCGenomics12208
BurantCFTakedaJBrotLarocheEBellGIDavidsonNO1992Fructose
transporter inhumanspermatozoaandsmall intestine isGLUT5 JBiol
Chem26714523ndash14526
DubostAMicolDPicardBLethiasCAnduezaDBauchartDListratA
2013Structuralandbiochemicalcharacteristicsofbovineintramuscular
connective tissue and beef quality Meat Sci 95 555ndash561
doi101016jmeatsci201305040
DupontVersteegden EE Nagarajan R Beggs ML Bearden ED Simpson
PMPetersonCA2008IdentificationofcoldshockproteinRBM3asa
possible regulator of skeletal muscle size through expression profiling
AJP Regul Integr Comp Physiol 295 R1263ndashR1273
doi101152ajpregu904552008
Enard D Messer PW Petrov DA 2014 Genomewide signals of positive
selection in human evolution Genome Res gr164822113
doi101101gr164822113
EzuraYKajitaMIshidaRYoshidaSYoshidaHSuzukiTHosoiTInoue
SShirakiMOrimoHEmiM2004Associationofmultiplenucleotide
variationsinthepituitaryglutaminylcyclasegene(QPCT)withlowradial
BMDinadultwomenJBoneMinRes191296ndash301
Fan H Wu Y Qi X Zhang J Li J Gao X Zhang L Li J Gao H 2014
Genomewide detection of selective signatures in Simmental cattle J
ApplGenet1ndash9doi101007s1335301402006
HayesBJChamberlainAJMaceachernSSavinKMcPartlanHMacLeodI
SethuramanLGoddardME2009Agenomemapofdivergentartificial
selectionbetweenBostaurusdairycattleandBostaurusbeefcattleAnim
Genet40176ndash184doi101111j13652052200801815x
Hayes BJ Lien S Nilsen H Olsen HG Berg P Maceachern S Potter S
Meuwissen THE 2008 The origin of selection signatures on bovine
39
chromosome 6 Anim Genet 39 105ndash111 doi101111j1365
2052200701683x
IqbalSZebeliQMazzolariABertoniGDunnSMYangWZAmetajBN
2009 Feeding barley grain steeped in lactic acid modulates rumen
fermentation patterns and increases milk fat content in dairy cows J
DairySci926023ndash6032doi103168jds20092380
KemperKESaxtonSJBolormaaSHayesBJGoddardME2014Selection
for complex traits leaves littleornoclassic signaturesof selectionBMC
Genomics15246doi1011861471216415246
KimuraM1984Theneutraltheoryofmolecularevolution
KuhlaBNuumlrnbergGAlbrechtDGoumlrsSHammonHMMetgesCC2011InvolvementofSkeletalMuscleProteinGlycogenandFatMetabolismin
the Adaptation on Early Lactation of Dairy Cows J Proteome Res 10
4252ndash4262doi101021pr200425h
Lenstra JAGroeneveld LF EdingHKantanen JWilliams JL Taberlet P
NicolazziELSolknerJSimianerHCianiEGarciaJFBrufordMW
AjmoneMarsan P Weigend S 2012 Molecular tools and analytical
approaches for the characterization of farm animal genetic diversity
AnimGenet43483ndash502
Li J Johnson SE 2013 EphrinA5 promotes bovine muscle progenitor cell
migration before mitotic activation J Anim Sci 91 1086ndash1093
doi102527jas20125728
Ma YL Zhang Q Ding XD 2012 [Detecting selection signatures on X
chromosome in pig through high density SNPs] Yi Chuan Hered
ZhongguoYiChuanXueHuiBianJi341251ndash1260
Madani S Hichami A CharkaouiMalki M Khan NA 2004 Diacylglycerols
Containing Omega 3 and Omega 6 Fatty Acids Bind to RasGRP and
Modulate MAP Kinase Activation J Biol Chem 279 1176ndash1183
doi101074jbcM306252200
Maningat PD Sen P Rijnkels M Sunehag AL Hadsell DL Bray M
Haymond MW 2008 Gene expression in the human mammary
epitheliumduring lactation themilk fat globule transcriptome Physiol
Genomics3712ndash22doi101152physiolgenomics903412008
40
Mehlen P Puisieux A 2006Metastasis a question of life or deathNat Rev
Cancer6449ndash458doi101038nrc1886
NarayananUOspinaJKFreyMRHebertMDMateraAG2002SMNthe
Spinal Muscular Atrophy Protein Forms a PreImport Snrnp Complex
withSnurportin1andImportin HumMolGenet111785ndash1795
Nguyen N Yi JS Park H Lee JS Ko YG 2014 Mitsugumin 53 (MG53)
LigaseUbiquitinatesFocalAdhesionKinaseduringSkeletalMyogenesisJ
BiolChem2893209ndash3216doi101074jbcM113525154
NielsenRHellmannIHubiszMBustamanteCClarkAG2007Recentand
ongoing selection in the human genome Nat Rev Genet 8 857ndash868
doi101038nrg2187
NishimuraTHattoriATakahashiK1996Relationshipbetweendegradation
of proteoglycans andweakening of the intramuscular connective tissue
duringpostmortemageingofbeefMeatSci42251ndash260
Nussdorfer GG Rossi GPMalendowicz LKMazzocchi G 1999Autocrine
ParacrineEndothelinSysteminthePhysiologyandPathologyofSteroid
SecretingTissuesPharmacolRev51403ndash438
OrsquoBrienAMPUtsunomiyaYTMeacuteszaacuterosGBickhartDM LiuGE Tassell
CPV Sonstegard TS Silva MVD Garcia JF Soumllkner J 2014
Assessing signatures of selection through variation in linkage
disequilibrium between taurine and indicine cattle Genet Sel Evol 46
19doi101186129796864619
Pan D Zhang S Jiang J Jiang L Zhang Q Liu J 2013 GenomeWide
DetectionofSelectiveSignatureinChineseHolsteinPLoSONE8e60440
doi101371journalpone0060440
PuglisiRCambuliCCapoferriRGianninoLLukajADuchiRLazzariG
Galli C Feligini M Galli A Bongioni G 2013 Differential gene
expressionincumulusoocytecomplexescollectedbyovumpickupfrom
repeat breeder and normally fertile Holstein Friesian heifers Anim
ReprodSci14126ndash33doi101016janireprosci201307003
Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender D
MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKa
41
tool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795
Qanbari S Gianola D Hayes B Schenkel F Miller S Moore S Thaller GSimianer H 2011 Application of site and haplotypefrequency basedapproachesfordetectingselectionsignaturesincattleBMCGenomics12318doi1011861471216412318
Qanbari S Pausch H Jansen S Somel M Strom TM Fries R Nielsen RSimianer H 2014 Classic Selective Sweeps Revealed by MassiveSequencinginCattlePLoSGenet10doi101371journalpgen1004148
Qanbari S Pimentel ECG Tetens J Thaller G Lichtner P Sharifi ARSimianerH2010AgenomewidescanforsignaturesofrecentselectioninHolsteincattleAnimGenetdoi101111j13652052200902016x
Sabeti PC Reich DE Higgins JM Levine HZ Richter DJ Schaffner SFGabriel SB Platko JV Patterson NJ McDonald GJ Ackerman HCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash7
ScheetPStephensM2006Afastandflexiblestatisticalmodelforlargescalepopulation genotype data applications to inferring missing genotypesandhaplotypicphaseAmJHumGenet78629ndash44
Shahzad K Loor JJ 2012 Application of topdown and bottomup systemsapproaches in ruminantphysiologyandmetabolismCurrGenomics13379
SmithLSuzuki JGoffAFilionFTherrien JMurphyBKohanGhadrHLefebvreRBrisvilleABuczinskiSFecteauGPerecinFMeirellesF2012DevelopmentalandEpigeneticAnomaliesinClonedCattleReprodDomestAnim47107ndash114doi101111j14390531201202063x
Sun G Cheng SYS Chen M Lim CJ Pallen CJ 2012 Protein TyrosinePhosphatase Phosphotyrosyl789 Binds BCAR3 To Position Cas forActivationatIntegrinMediatedFocalAdhesionsMolCellBiol323776ndash3789doi101128MCB0021412
Sun J Zhang C Lan X Lei C ChenH 2012 Exploring polymorphisms andassociationsofthebovineMOGAT3genewithgrowthtraitsGenomeNatl
42
Res Counc Can Geacutenome Cons Natl Rech Can 55 56ndash62doi101139g11077
TaberletPCoissacEPansu J PompanonF 2011Conservationgeneticsofcattle sheep and goats C R Biol On the trail of domesticationsmigrationsandinvasionsinagricultureDominiqueJobGeorgesPelletierJeanClaudePernollet334247ndash254doi101016jcrvi201012007
Utsunomiya YT Perez OrsquoBrien AM Sonstegard TS Van Tassell CP doCarmo AS Meszaros G Solkner J Garcia JF 2013 Detecting lociunder recent positive selection in dairy and beef cattle by combiningdifferentgenomewidescanmethodsPLoSOne8e64280
Voight BF Kudaravalli S Wen X Pritchard JK 2006 A Map of RecentPositive Selection in the Human Genome PLoS Biol 4 e72doi101371journalpbio0040072
WangHKadlecekTAAuYeungBBGoodfellowHESHsuLYFreedmanTSWeissA2010ZAP70AnEssentialKinaseinTcellSignalingColdSpringHarbPerspectBiol2doi101101cshperspecta002279
WangMZhangXKangLJiangCJiangY2012MolecularcharacterizationofporcineNECD SNRPNandUBE3Agenesand imprinting status in theskeletal muscle of neonate pigs Mol Biol Rep 39 9415ndash9422doi101007s1103301218066
WatanabeRChanoTInoueHIsonoTKoiwaiOOkabeH2005Rb1cc1iscritical for myoblast differentiation through Rb1 regulation VirchowsArchIntJPathol447643ndash648doi101007s0042800411831
WickramasingheNSManavalanTTDoughertySMRiggsKALiYKlingeCM 2009 Estradiol downregulates miR21 expression and increasesmiR21targetgeneexpressioninMCF7breastcancercellsNucleicAcidsRes372584ndash2595doi101093nargkp117
WildeCJAddeyCVLiPFernigDG1997Programmedcelldeathinbovinemammary tissue during lactation and involution Exp Physiol 82 943ndash953
YasudaSStevensRLTeradaTTakedaMHashimotoTFukaeJHoritaTKataoka H Atsumi T Koike T 2007 Defective Expression of Ras
43
Guanyl NucleotideReleasing Protein 1 in a Subset of Patients with
SystemicLupusErythematosusJImmunol1794890ndash4900
ZhangCBaileyDKAwadTLiuGXingGCaoMValmeekamVRetiefJ
Matsuzaki H Taub M Seielstad M Kennedy GC 2006 A whole
genome longrange haplotype (WGLRH) test for detecting imprints of
positiveselectioninhumanpopulationsBioinformatics222122ndash8
ZhaoFQOkineEKCheesemanCIShiraziBeecheySPKennellyJJ1998
Glucose transporter gene expression in lactating bovine gastrointestinal
tractJAnimSci762921ndash2929
44
Supplementaryfiles
YoucanfindChapterʹsupplementaryfilesat
- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)
- GoogleDrive(httptinyurlcomPhDMMChapter02)orscantheQRcode
SupplementaryfileS1ndashSignificantcorehaplotypesandgenessharedin
dairyandbeef
Significantcorehaplotypes(pvaluele005haplotypefrequencyge025)shared
indairyandbeefcattlebreedsandrelativegenesintersectingwiththemThefile
isdividedinto4tablesThefirst2sheets(sheet1and2)shownthesignificant
corehaplotypesfordairyandbeefThelast2sheets(sheet3and4)showngenes
intersectingwiththesignificantcorehaplotypesfordairyandbeef
SupplementaryfileS2ndashNetworksrankingindairybreeds
Networksrankingindairybreedswithrelativemoleculessymbolslistscore
valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop
functionforeachnetwork
SupplementaryfileS3ndashNetworksrankinginbeefbreeds
Networksrankinginbeefbreedswithrelativemoleculessymbolslistscore
valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop
functionforeachnetwork
45
SupplementaryfileS4ndashCanonicalpathwaysrankingindairyorbeefcattle
breeds
Canonicalpathwaysrankingindairy(sheet1)orbeef(sheet2)cattlebreedswithrelativemoleculessymbolslistratioandndashlog10ofthepvaluesforeachcanonicalpathway
46
CHAPTER3Searchingnewsignalsforproductivetraitsthroughgenebasedassociationanalysisin
threeItaliancattlebreedsSCapomaccio1MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItalyTheseauthorscontributedequallytothiswork
____________________SubmittedtoAnimalGenetics
Abstract
Genome wide association studies (GWAS) have been widely applied to
disentanglethegeneticbasisofcomplextraits IncattlebreedsclassicalGWAS
approach with medium density markers panels are far to be conclusive
especiallyforcomplextraitsThisisduetotheintrinsiclimitationsofGWASand
to the choices that are made to step from the associationrsquos signals to the
functionalunits
Here we applied a geneHbased association strategy to prioritize genotypeH
phenotype associations found with classical approaches in three Italian dairy
cattlebreedswithdifferentsamplesizes(ItalianHolsteinn=2058ItalianBrown
n=745ItalianSimmentaln=477)formilkproductionandqualitytraits
WhileclassicalsinglemarkerregressionfailedtorevealgenomeHwidesignificant
genotypeHphenotype association except for Italian Holstein the gene based
approach found specific genes in each breed that are associated with milk
physiologyandmammaryglanddevelopment
Since no established method is yet implemented to step from variation to
functional units our strategy may contribute to reveal new genes that play
significant roles in complex traits like thosehere investigated condensing low
signalsofassociationinageneHcentricfashionapproach
Background
Modern cattle breeds have nomore than 200 years of history (Taberlet et al
2008) a time spanwhere selection reproductive isolationand the consequent
geneticdriftshapedallelefrequenciesandlinkagedisequilibriumthroughoutthe
genome This is particularly true for industrial dairy breeds where the joint
application of reproductive technologies and modern genetic evaluation
methodshave imposedahigh selectionpressureona limitednumberof traits
(Taberletetal2008)involvedinmilkquantityandquality
Indeed the milk production industry has been historically ready to acquire
technicalimprovementsfromdifferentfieldsstatisticsbiologyandengineering
50
ForexampleBLUPbasedstrategiesandhighHdensitygenotyping throughSNPHchipsarenowroutinelyincludedinbullevaluationshavingsignificantimpactonselectionefficiencyandpopulationstructureThese techniques have dramatically improved milk yield and compositionwithout knowing the biological basis of the traits under selection To datedespite many efforts and many QTL mapped only a few genes have beeneffectivelyassociatedwithmilkyieldandcomposition(Grisartetal2002Blottetal2003CohenHZinderetal2005)In thiscontextGenomeWideAssociationStudies(GWAS) is themostcommonmethodfordissectingthebiologythatunderliescomplextraits(McCarthyetal2008)TheunderlyingideaofGWASissimplefindmarkerHtraitassociationsexploitingthe linkage disequilibrium (LD) that exists between the causative mutation ndashwhichweignorendashandoneormoreanalyzedmarkerswhichwewellknowInthiswayonecanpinpointgenomicregionscarryingcausalvariantsforanytrait
Although a number of GWAS have been conducted in cattle productive andmorphological traits (some examples (Jiang et al 2010 Cole et al 2011Meredith et al 2012)) and found several genomic regions associated to thesetraits none focused their attention directly on genes involved inmilk biologyMoreoveroneof themajordrawbacksofGWASisrelatedtothescarceresultsportability across populations due to a variety of confounding factorspopulationstructuredifferentialLD levelsbreedspecific selection targetsandeven SNPs ascertainment bias (Clark et al 2005) For instance significantassociationsonfatpercentagewereidentifiedinmanyregionsoftheBostaurusgenome in different breeds Except for genes of major effect there is littleagreement on which regions and H more importantly H genes are actuallyinvolvedinthecontrolofthetraitinvestigatedThisissomehownotsurprisingone favorable allele may segregate in one breed and be fixated in a differentbreed the same allele segregates in both breeds but allelesmay differ or thesegregates in both breeds but the genetic background masks the effect ofsegregation
51
AnotherlimitationinGWASisthestringentsignificancethresholdoftenapplied
tocorrectformultipletestingAsaresulta largeproportionofgeneswithlow
effectaredisregardedwithconsequentoverestimationof thehigheffectgenes
variance(Xu2003)Thisbiasisparticularlyworryingwheninvestigatingtraits
controlledbya largenumberofgeneswithsmalleffectand inhighLDasmilk
yield(GIANOLAETAL2013)
To overcome this kind of limitation different strategies have been invented
usingprior knowledge Interesting ldquocandidatepathwayrdquo approacheshavebeen
conducted(Ravenetal2013)testingforassociationbetweenaphenotypeand
genes inpathwaysthatare likelytobe involvedinthecontrolofthetraitThis
approach reduces thenumberofmarkers tobe tested increases the statistical
poweroftheexperimentrestrictingtheanalysistoexistingknowledge
Other methods based on statistical manipulations or particular experimental
designs have been developed to prioritize or validate genotypeHphenotype
associations For example geneHbased association strategies while restricting
GWA study only to genes and neighboring genomic regionsmay identify new
candidategenesdeservingpostHGWASanalysis(Akulaetal2011Cantoretal
2010 Liu et al 2010) Moreover it is true that classical GWAS downstream
bioinformatics analyses concentrate their efforts in findingmeaningful results
usinggenesasfinaltarget
The aim of ourworkwas to find new candidate genes and novel information
associated to complex traits as milk production (yiled) and quality (fat and
protein percentage) using a geneHcentric genome wide association in three
differentdairybreedsItalianHolsteinItalianBrownandItalianSimmental
Methods
BiologicalSamples
Bulls from the three most represented dairy breeds in Italy (Italian Holstein
ItalianBrownandItalianSimmental)wereselectedfromArtificialInsemination
(AI) bulls progeny tested in Italy until 2010 and forwhich biological samples
52
wereavailableItalianHolsteinbullswerechosenaccordingtofollowingcriteria
i) having the national Selection Index namely production functionality type
(PFT)withreliabilitygt75andii)havingthelowestpossiblerelationshipwith
theothersiresinthesetThiswasachievedbymaximizingthevariabilitywithin
the group On the contrary considering the low number of available Italian
BrownandItalianSimmentalbullstheywereallincludedintheanalysesAfter
thisprocedureatotalof2139ItalianHolstein775ItalianBrownand486Italian
Simmental samples (including quality control replicates)were genotypedwith
theBovineSNP50BeadChipv1(IlluminaSanDiegoCA)
Traitsconsideredintheanalysis
Italian Holstein (ANAFI) Italian Brown (ANARB) and Italian Simmental
(ANAPRI)breedersassociationsprovidedderegressedproofs(DRP)ordaughter
yielddeviations (DYD)hereafter referredasphenotype forall traitsanalyzed
Traitsanalyzedweremilkyieldfatpercentageandproteinpercentage
Genotypingandqualitycontrol
Quality controls (QC) were run independently for each breed using the same
pipelineandthresholds IndetailQCexcluded i)thereplicatesamplewiththe
highernumberofmissinggenotypesii)sampleswithunexpectedlyhigh(ge100)
mendelianerrors(forfatherHsonpairs)iii)sampleswithmorethan5missing
genotypes iv) SNPs with ge 25 missing data v) SNPs with minor allele
frequency(MAF)le5vi)SNPswithmendelianerrorsge25andvii)SNPsout
ofHardyHWeinbergequilibriumtestedbyFisherexacttest(ple0001Bonferroni
corrected)
Analysisofpopulationstructure
MultiDimensionScaling(MDS)wasperformedusingPLINK107(Purcelletal
2007)andresultsplottedwithRscriptsThisanalysisshowninsupplementary
material (FigureS1) exploredpopulationsubstructureandverified thegenetic
homogeneityofthedatasetpriortofurtheranalyses
53
PairwiseLDwascalculatedusingPLINK107(Purcelletal2007)LDdecayand
LD structure on specific regions were calculated with in house R scripts and
snpplotterRpackage(LunaandNicodemus2007)
GWAS
GenomeHwide association analysis was made with the GenABEL package in R
(Team 2014) using GRAMMARHCG (Genome wide Association using Mixed
Model and Regression H Genomic Control) approach (Amin et al 2007
Aulchenkoetal2007)
Basingonpriorknowledge(Minozzietal2013) in ItalianHolsteintheDGAT1
regionwas also treated as fixed effect inGenABEL analysis of the three traits
ResultsusingthismodelarereferredtoasldquoHolsteinDGATrdquo
Manhattan andquantileHquantileplots for each traitwereproduced to allowa
thoroughevaluationoftheGWASresultsAllcalculatedpHvaluesfromtheGWAS
analysiswere retained for the downstream prioritizationwith the geneHbased
association
Workingongenomecoordinates
Coordinatesof themarkerswere liftedover fromBTA40(Elsiketal2009)to
UMD31assembly(Ziminetal2009)usingSnpChimpv1(Nicolazzietal2014)
andgenecoordinateswereretrievedusingBioMartEnsembl
QTLcoordinatesweredownloadedfromtheAnimalQTLdb(Huetal2013) in
bedformat
Operations on genomic intervalswereperformedusing thebedtools suite ver
2162(QuinlanandHall2010)
GeneBasedAssociationanalysis
Toreducetheantagonismbetweensignificancethresholdandfalsepositiverate
of a classical GWA study (Gianola et al 2013) and to facilitate selection
signature comparison across the three breeds investigated we used a geneH
54
centricstatisticthatcollapsesinasinglevalueallthepHvaluesfromSNPsrelated
toagenecorrectingforLDinformation
Our approachderivesdirectly from the softwareVEGAS (VersatileGeneHBased
TestforGenomeHwideAssociationStudies)(Liuetal2010)adaptedtopursue
meaningfulresultsinanyspeciesbeyondhumans
Briefly the geneHwise test statistic is the sum of the n upper tail chiHsquared
valuesofaSNPsubset withonedegreeof freedom(wheren is thenumberof
SNPmappinginthegivengene)WithperfectlinkageequilibriumthegeneHwise
pHvaluewouldbetheoneHsidepHvalueofachiHsquaredistributionwithndegrees
of freedom Otherwise more likely the distribution of the pHvalues has to be
evaluated(withsimulations)andweighted(withLDvalues)(Liuetal2010)
The VEGAS speciesHfree procedure was implemented in a UNIX environment
usingBashandRlanguagesusingEnsemblannotationasinputfilesPLINKdata
typeandpHvaluesforeachSNPfromthepreviouslycitedGRAMMARprocedure
WemappedSNPsonBostaurusgenes(UMD31)andtheirboundaries(100000
bpupHanddownHstream)toaccountforregulatoryregionsOnlygeneswithat
least 2 SNPs mapped were retained Significant entries from the geneHbased
associationweremappedinthecattleQTLdatabasewithbedtools
ResultsandDiscussion
In this study a gene based association strategy was applied to identify new
genomicregionsboostingGWASanalysisresultsformilkproductionandquality
fromthreeItaliandairybreeds
Complextraitslikethosehereevaluatedareaffectedbyafewmajorgeneswith
large effects andmany otherswithmoderate to low effects the latter are not
easily identifiedby genomewide scans inmodern cattle breedsduemainly to
samplesizelimitationwhilesignalfromtheformersarelostingenomescansH
withsomeexceptionsHduerapidfixations
InthisscenariowithmediumdensitySNPpanelsliketheoneusedinthisstudy
associationbasedonsingleSNPregressionorotherldquoclassicalrdquomethodscouldbe
inconclusiveintermsofresults
55
AftertheQCprocess7452058and477samplesand3575938535and38574SNPs were retained for Italian Brown Italian Holstein and Italian Simmentalbulls respectively (Table S1) HolsteinDGAT analysis retained 38534markersoneSNPlessthantheotherHolsteinpanelDescriptive statistics on genotypes for each breed (histogram and barplot forheterozygosis and call rate per individual Figure S7 and multi dimensionalscalingFigureS1)revealedabsenceofconfoundingeffectsWiththesedatasetsatthenominalgenomeHwidepHvaluesignificancethresholdof 005 (ple125 x 10H6 after Bonferroni correction (Burton et al 2007Dudbridge and Gusnanto 2008)) significant associations were found only inItalianHolsteinbetweenmilkyieldandfatpercentageand26SNPsmappingintheproximalregionofBTA14nearbyDGAT1(Table1)
56
Table1PHvaluesoftheSNPsovercomingthegenomewidesignificancethresholdintheHolsteinbreedformilkyieldandfatpercentage(notaccountingforDGAT1effect)SNPsaresortedbypositionsintheUMD31assembly
SNPID BTA Position(Bp) POvalue(Fat) POValue(Milk)
Hapmap30383OBTCO005848 14 1489496 112EH12 361EH12
ARSOBFGLONGSO57820 14 1651311 247EH46 161EH16
ARSOBFGLONGSO34135 14 1675278 857EH17
ARSOBFGLONGSO94706 14 1696470 665EH16
ARSOBFGLONGSO4939 14 1801116 534EH49 106EH20
Hapmap52798Oss46526455 14 1923292 915EH21
UAOIFASAO6878 14 2002873 382EH24
ARSOBFGLONGSO107379 14 2054457 118EH33 664EH15
ARSOBFGLONGSO18365 14 2117455 250EH16
Hapmap30922OBTCO002021 14 2138926 439EH13
Hapmap25384OBTCO001997 14 2217163 390EH10 697EH09
Hapmap24715OBTCO001973 14 2239085 108EH09 369EH09
BTAO35941OnoOrs 14 2276443 402EH13
Hapmap30374OBTCO002159 14 2468020 528EH12
Hapmap30086OBTCO002066 14 2524432 798EH11
ARSOBFGLONGSO74378 14 3640788 161EH08
ARSOBFGLOBACO1511 14 3765019 375EH09
UAOIFASAO9288 14 3956956 811EH09
ARSOBFGLONGSO100480 14 4364952 133EH09
UAOIFASAO5306 14 4468478 458EH08
ThisislikelyduetotheeffectoftheK232ADGAT1mutationthatissegregatingin
thisbreed(Fontanesietal2014)NootherSNPwassignificantlyassociatedto
anyothertraitinthe3breedsinvestigated
SomeSNPswereunderasuggestivesignificancethresholdofple5x10H5Details
on these markers are in supplementary material (Table S3) but they are not
discussedherePlotsdescribingatglancetheassociationresultsrevealingalow
signaltonoiseratioareavailableintheonlinesupplementarymaterial(Figure
S2S3S4andS5respectively for ItalianBrown ItalianHolsteinHolsteinDGAT
andItalianSimmental)
57
Asperthegenecentricanalysiswemappedatotalof192371989919894and
19844 genes on 24616 available on Ensembl (release version 73) in Italian
BrownItalianHolsteinHolsteinDGATandItalianSimmentalrespectivelyGenes
with a geneHwisepHvalue at least 10H5were considered significantly associated
withthestudiedtraitThesignificanceappearedtobeadequatetothehighrate
ofredundancyof theannotationfeaturesandto figureoutthetopassociations
with a reasonable level of confidence Results on these associations per breed
andpertraitaresummarizedinTable2Table3andinsupplementarymaterials
(TableS2andTableS4)ThemajorityofsignificantgenesfoundwiththegeneH
centricanalysismapped inpreviouslycharacterizedQTL thatareassociated to
milkproductionorqualityThesearesummarizedintheTableS5
Table2Numberofsignificantlyassociatedgenesforeachbreedineachtraitinthegenebasedassociationtest
Fat Milkyield Protein
Brown 3 1
Holstein 92 80 1
HolsteinDGAT 1 4 1
Simmental 1 13
58
Tableamp3FeaturessignificantlyassociatedperbreedtraitHolsteinanalysisnotaccountingforDGAT1isshow
ninTableS3
Breedamp
Traitamp
EnsemblampIDamp
GeneWiseampPValueamp
SNPsInGeneamp
BestSNPamp
BestSNP_PValueamp
BTAamp
Startamp(bp)amp
Endamp(bp)amp
GeneampNam
eamp
BROWNamp
FAT
Nosignificantassociationfound
MILK
ENSBTAG00000019237
500EG05
4Hapmap53770Gss46526325
204EG04
23
47333403
47348212
TXNDC5
ENSBTAG00000043918
500EG05
4Hapmap53770Gss46526325
204EG04
23
47337650
47337787
5S_rRNA
ENSBTAG00000046839
500EG05
4Hapmap53770Gss46526325
204EG04
23
47328131
47352621
UncharacterizedProtein
PROTEIN
ENSBTAG00000001260
700EG05
3ARSGBFGLGNGSG116222
441EG03
18
52120387
52124418
PINLYP
amp
HOLSTEINDGATamp
FAT
ENSBTAG00000000369
100EG05
3Hapmap53294Grs29016908
333EG05
594673755
94749093
EPS8
MILK
ENSBTAG00000026896
130EG05
5ARSGBFGLGNGSG104949
385EG04
23
51549721
51553563
FOXF2
ENSBTAG00000005260
400EG05
5Hapmap33288GBTCG033751
120EG03
638120578
38127577
SPP1
ENSBTAG00000006579
500EG05
4ARSGBFGLGBACG2207
632EG05
15
54418559
54459500
P4HA3
ENSBTAG00000007013
700EG05
6UAGIFASAG7665
396EG05
19
5221507
5283308
TOM1L1
PROTEIN
ENSBTAG00000006638
600EG05
3ARSGBFGLGNGSG104774
176EG04
18
56523439
56530499
BCL2L12
amp
SIMMENTHALamp
FAT
ENSBTAG00000037649
900EG05
6Hapmap30001GBTAG142716
529EG05
4120427915
120494453
VIPR2
MILK
ENSBTAG00000017538
100EG05
6ARSGBFGLGBACG13093
587EG06
12
85630428
85630912
Pseudogene
ENSBTAG00000000856
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1766767
1769754
FBXL6
ENSBTAG00000000857
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1763994
1766621
GPR172B
ENSBTAG00000026356
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1795351
1804562
DGAT1
ENSBTAG00000035158
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1770660
1772329
TMEM
249
ENSBTAG00000046208
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1782901
1788087
SCRT1
ENSBTAG00000009811
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1825982
1844587
UncharacterizedProtein
ENSBTAG00000009816
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1838135
1840081
UncharacterizedProtein
ENSBTAG00000014458
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1844664
1894424
MROH1
ENSBTAG00000020751
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1806081
1825793
HSF1
ENSBTAG00000044406
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1883849
1883971
SCARNA15
ENSBTAG00000046889
400EG05
5ARSGBFGLGBACG13093
587EG06
12
85610279
85610762
UncharacterizedProtein
ENSBTAG00000020117
800EG05
3BTAG78494GnoGrs
188EG04
721189879
21196263
ZBTB7A
PROTEIN
Nosignificantassociationfound
amp
59
1 ItalianHolstein
The$ large$ number$ of$ significant$ signals$ was$ observed$ in$ the$ Italian$ Holstein$
especially$ for$ milk$ yield$ and$ fat$ percentage$ As$ shown$ in$ Table$ 2$ the$ strong$
effect$ of$ DGAT1$ in$ Italian$ Holsteins$ observed$ with$ individual$ SNPs$ was$
confirmed$by$the$geneBcentric$approach$In$this$region$almost$the$totality$of$the$
geneBwise$pBvalues$under$the$chosen$threshold$appeared$to$be$influenced$by$this$
major$ gene$ and$ by$ the$ high$ level$ of$ linkage$ disequilibrium$ that$ characterizes$
cosmopolitan$bovine$breeds$(see$Figure$S6)$Within$this$pool$of$genes$it$is$worth$
to$mention$PLEC$ (plectin)$ a$ cell$ surface$ receptor$ gene$ in$ the$RNASE5$ pathway$
(Angiogenin$ encoded$ by$ angiogenin$ ribonuclease$ RNase$ A$ family$ 5)$ Proteins$
encoded$ by$ these$ genes$ that$ are$ present$ in$ milk$ are$ involved$ in$ protein$
synthesis$ and$ act$ as$ growth$ factors$ in$ vitro$Moreover$RNASE5$ seems$ to$ have$
other$ functions$ such$ as$ regulation$ of$ stress$ response$ and$ apoptosis$ and$ even$
angiogenesis$ through$ which$ it$ may$ also$ contribute$ to$ the$ regulate$ milk$
production$(Raven$et$al$2013)$
The$ exclusion$ of$DGAT1$ effect$ (HolsteinDGAT$ results)$ highlighted$ that$ all$ the$
significant$ effects$ on$ fat$ percentage$ found$ for$ genes$ located$ within$
approximately$25$Mb$up$or$downstream$from$the$excluded$SNP$(ARS5BFGL5NGS5
4939)$were$ redundant$ signals$ of$DGAT1$ (Table3$ and$Table$ S2$ and$ S3)$ This$ is$
also$observed$for$milk$yield$due$to$the$genetic$correlations$between$fat$and$milk$$
Also$ PLEC$ was$ not$ significant$ in$ our$ analyses$ when$ taking$ into$ account$ the$
DGAT1$effect$contradicting$previous$findings$in$Holstein$that$report$association$
of$PLEC$gene$with$production$traits$either$accounting$or$not$for$the$effect$of$the$
diglyceride$ acyltransferase$ (Raven$ et$ al$ 2013)$ A$ possible$ explanation$ for$ the$
different$results$between$these$studies$is$a$difference$between$genetic$structures$
of$the$populations$analyzed$To$reduce$false$positive$signals$we$only$evaluated$
the$genes$based$on$HolsteinDGAT$results$for$the$Italian$Holstein$$
11 Milkyield
We$found$genes$associated$with$this$trait$in$Holstein$in$particular$SPP1$P4HA3=
FOXF2$and$TOM1L1$genes$
Among$ these$ the$ osteopontin$ SPP1$ (secreted$ phosphoprotein$ 1)$ is$ directly$
related$to$milk$production$(de$Mello$et$al$2012)$ It$ is$ located$ in$BTA6$nearby$
60
$
ABCG2$ and$ there$ is$ evidence$ that$ polymorphisms$ in$ this$ region$ are$ strongly$
correlated$with$high$production$ levels$of$milk$ in$various$breeds$(CohenBZinder$
et$ al$ 2005)$Osteopontin$SPP1$ is$ a$ secreted$glycophosphoprotein$ that$plays$ a$
crucial$ role$ in$ mammary$ gland$ development$ having$ an$ essential$ role$ also$ in$
mammary$gland$differentiation$and$branching$of$the$mammary$epithelial$ductal$
system$ This$ gene$has$ also$ been$ implicated$ in$many$ types$ of$ cancer$ including$
breast$cancer$ for$ the$potentiality$of$proliferation$and$alveologenesis$ (Hubbard$
et$ al$ 2013)$ It$ is$worth$mentioning$ that$ the$most$ significant$ SNPs$within$ this$
gene$was$Hapmap332885BTC5033751=with=a$pBvalue$of$00012$ largely$over$ the$
genomeBwide$significance$ threshold$This$signal$would$be$ ignored$ in$ the$single$
SNP$approach$
Another$gene$revealed$by$the$geneBbased$association$is$P4HA3$encoding$for$the$
prolyl$4Bhydroxylase$α$polypeptide$III$involved$in$the$correct$folding$of$nascent$
procollagen$ chains$ It$ is$ known$ that$ stromalBepithelial$ interactions$ are$ critical$
for$mammary$gland$development$and$defective$deposition$or$reorganization$of$
extracellular$ matrix$ can$ lead$ to$ incorrect$ mammary$ epithelial$ morphogenesis$
(Gillette$ et$ al$ 2013)$ highlighting$ the$ role$ of$ this$ gene$ in$ the$ healthy$ breast$
tissue=
FOXF2=gene$ (Forkhead$Box$F2)$ is$ a$member$of$ the$ forkhead$box$ transcription$
factors$ known$ to$ be$ involved$ in$ tissue$ development$ as$ part$ of$ the$ ldquoHedgehog$
signaling$pathwayrdquo$This$key$cascade$is$essential$for$the$correct$segmentation$of$
tissues$during$embryonic$development$but$has$been$recently$demonstrated$that$
is$active$in$adult$stem$cells$proliferation$in$mammary$tissue$(Liu$et$al$2006)$In$
particular$ FOXF2$ plays$ a$ key$ role$ in$ extracellular$ matrix$ synthesis$ and$
epithelialBmesenchymal$ interactions$ and$ could$ therefore$ affect$ the$ proper$
development$of$the$mammary$stroma$In$addition$low$levels$of$transcription$of$
this$gene$are$correlated$ to$early$onset$metastasis$ in$breast$cancer$ (Kong$et$al$
2013)$ underlining$ a$ specific$ role$ in$ the$ mammary$ gland$
Lastly$ TOM1L1$ (Target$ Of$ Myb1$ ChickenBLike$ 1)$ is$ another$ gene$ potentially$
involved$ with$ the$ milk$ physiology$ as$ is$ known$ its$ implication$ with$ the$ early$
endosome$ sorting$ coupled$ with$ EGFR= (epidermal$ growth$ factor$ receptor)$
another$ important$player$ in$ lactation$(Fowler$et$al$1995)$TOM1L1$belongs$to$
61
$
the$ VHS$ domain$ containing$ proteins$ that$ are$ involved$ in$ the$ postBGolgi$
trafficking$and$signaling$(Wang$et$al$2010)$
12 Fatpercentage
In$ Italian$ Holstein$ breed$ we$ found$ a$ significant$ association$ with$ or$ without$
DGAT1$ effect$ correction$ to$ the$EPS8$ gene$ located$on$BTA5$at$around$95$Mbp$
This$gene$encodes$for$a$substrate$of$the$already$cited$EGFR$and$is$involved$in$the$
fatty$acid$metabolism$(Fazioli$et$al$1993)$A$recent$study$on$Holstein$Friesian$
(Wang$et$al$2012)$found$strong$association$with$a$variant$located$in$the$second$
intron$of$the$epidermal$growth$factor$receptor$pathway$substrate$enforcing$the$
hypothesis$of$a$true$involvement$of$this$gene$on$fat$percentage$
13 Proteinpercentage
The$BCL2L12$gene$(BclB2Blike$protein$12)$associated$with$protein$percentage$in$
Italian$Holstein$breed$represents$another$interesting$result$The$product$of$this$
gene= is$ an$ important$ oncoprotein$ in$ two$ fundamental$ cell$ cycle$ pathways$
namely$ p53$ and$ cytoplasmic$ caspase$ signaling$ BCL2L12= resides$ in$ both$
cytoplasm$and$nucleus$inhibiting$caspases$3$and$7$and$forming$a$complex$with$
p53$thereby$ inhibiting$p53Bdirected$transcriptomic$changes$upon$DNA$damage$
(Stegh$ and$DePinho$ 2011)$ The$members$ of$ BCL2$ family$ are$ considered$ antiB
apoptotic$genes$and$it$is$obvious$that$dysfunctional$programmed$cell$death$plays$
an$ important$ role$ in$ the$ pathogenesis$ and$ in$ the$ progression$ of$ cancer$Many$
members$of$this$family$were$found$to$be$modulated$in$various$malignancies$and$
BCL2L12= in$ particular$ was$ expressed$ in$ mammary$ gland$ and$ proposed$ as$
prognostic$ cancer$ biomarker$ (Thomadaki$ et$ al$ 2007)$ In$ dairy$ cows$ the$
members$of$BCL2$family$are$involved$in$the$key$events$ in$the$mammary$tissue$
development$ and$ remodeling$ during$ lactation$ and$ involution$ (Stefanon$ et$ al$
2002$p$200)$Members$of$BCL2$family$regulate$the$turnover$of$cell$population$
in$the$mammary$gland$during$lactation$and$its$overexpression$was$found$to$be$
linearly$related$to$milk$production$in$Holstein$cow$(Colitti$et$al$2004)$
$
62
$
2 ItalianSimmental
In$ the$ Italian$ Simmental$ breed$ we$ found$ one$ association$ on$ BTA4$ for$ fat$
percentage$(VIPR2)$and$two$associations$ for$milk$yield$one$ in$BTA7$(ZBTB7A)$
and$ the$ other$ in$ BTA14$ related$ to$ the$ DGAT1$ No$ association$ was$ found$ for$
protein$percentage$
21 Fatpercentage
Among$the$genes$found$significantly$associated$with$production$traits$in$Italian$
Simmental$VIPR2=a$ gene$ encoding$ a$ receptor$ for$ a$ small$ vasoactive$ intestinal$
neuropeptide$ was$ the$ single$ signal$ in$ BTA4$ for$ fat$ percentage$ Vasoactive$
intestinal$ peptide$ (VIP)$ is$ synthesized$ in$ the$pituitary$ gland$ and$ among$other$
functions$ stimulates$ prolactin$ secretion$ and$ is$ involved$ in$ proliferation$
survival$and$differentiation$in$human$breast$cancer$cells$(Valdehita$et$al$2010)$$
An$interesting$novel$finding$is$the$effect$of$immunosuppression$(through$direct$
effect$of$IL56)$on$prolactin$cells$and$its$ability$to$regulate$prolactin$by$acting$on$
pituitary$ VIP$ through$ VIP$ receptors$ (Blanco$ et$ al$ 2013)$ These$ evidences$
suggest$ that$VIPR2$ could$ be$ an$ important$ candidate$ for$ further$ investigations$
through$post$GWAS$approaches$such$as$reBsequencing$or$fine$mapping$
22 Milkyield
Surprisingly$the$DGAT1$gene$was$significantly$associated$with$milk$yield$but$not$
with$fat$percentage$in$Italian$Simmental$breed$Indeed$the$most$significant$SNP$
for$the$milk$yield$trait$in$this$breed$was$the$ARS5BFGL5NGS54939$with$a$pBvalue$
of$ 413EB06$ The$ importance$ of$ this$ gene$ in$ the$ lactation$ is$ largely$ known$
(Grisart$et$al$2002)$$
To$ investigate$DGAT1$ signal$ in$ Simmental$ we$ analyzed$ the$ fine$ LD$ structure$
genomeBwide$ and$ in$ BTA14$ proximal$ region$ in$ all$ the$ three$ breeds$ revealing$
different$patterns$
Observed$LD$decay$on$ the$ three$breeds$genomes$revealed$similar$behavior$ for$
Italian$Holstein$and$Italian$Brown$with$r2$gt$010$$until$ around$ 250$ Kbp$ while$
Italian$Simmental$showed$lower$LD$Distances$higher$that$400$Kbp$revealed$half$
level$of$LD$in$Simmental$(Figure$S6)$
63
$
In$DGAT1$region$LD$behavior$is$very$similar$between$Italian$Holstein$and$Italian$
Brown$ (Figure$ S8)$ while$ association$ values$ are$ really$ different$ because$ ARS5
BFGL5NGS54939=is$fixed$in$Italian$Brown$and$purged$by$QC$process$Interestingly$
association$ patterns$ become$ similar$ when$ DGAT1$ effect$ is$ removed$ from$ the$
Holstein$ panel$ (HolsteinDGAT$ Figure$ S8)$ In$ Simmental$ we$ found$ a$ different$
situation$ very$ low$ LD$ level$ and$ significant$ association$ levels$ with$ ARS5BFGL5
NGS54939$with$ the$milk$ trait$This$ is$probably$due$ to$ the$ low$ frequency$of$ the$
p232K$ allele$ in$ Italian$ Simmental$ (Scotti$ et$ al$ 2010)$ and$ to$ the$ different$
selection$strategies$applied$in$these$breeds$
The$ relatively$ high$ pBvalue$ of$ this$ SNP$ is$ responsible$ for$ the$ statistical$
significance$of$many$of$the$13$genes$found$significant$in$this$breed$for$this$trait$
as$ shown$ in$ Table$ 3$ therefore$ reducing$ the$ gene$ set$ to$ three$ locations$ two$
overlapping$features$indicating$a$pseudogene$in$BTA12$and$a$zinc$finger$protein$
ZBTB7A$in$BTA7$DGAT1$role$and$function$will$be$not$discussed$here$as$already$
done$elsewhere$(Grisart$et$al$2002)$
ZBTB7A=is$part$of$zinc$finger$and$BTB$proteins$(Broad$complex$Tramtrack$and$
Bricabrac)$ transcription$ factors$ with$ emerging$ importance$ for$ their$
involvement$ in$ oncogenesis$ and$ cell$ differentiation$by$ repressing$key$ genes$of$
cell$cycle$regulation$as$retinoblastoma$(Rb)$and=p53=(ldquoJeon$et$al$B$2008$B$ProtoB
oncogene$ FBIB1$ (PokemonZBTB7A)$ Represses$ Trpdfrdquo$ nd)= ZBTB7A$ over$
expression$ is$ shown$ in$primary$ and$ recurrent$ breast$ cancer$ tissues$ leading$ to$
disease$progression$and$poor$survival$(Zu$et$al$2011)$Link$between$this$gene$
polymorphism$and$milk$production$might$be$in$the$proper$cellular$development$
of$ the$ mammary$ gland$ This$ transcription$ factor$ is$ also$ known$ as$ a$ master$
regulator$ of$ B$ and$ TBcell$ lineage$ (Lee$ and$ Maeda$ 2012)$ enforcing$ the$ link$
between$ milk$ production$ and$ immune$ protection$ as$ already$ outlined$ by$ of$
Lemay$ and$ colleagues$ with$ proteomic$ studies$ where$ the$ ldquoactivation$ of$ the$
innate$ immune$systemrdquo$resulted$as$one$of$ the$most$represented$gene$ontology$
categories$(Lemay$et$al$2009)$
$
64
$
3 ItalianBrown
In$ the$ Italian$Brown$breed$we$ found$ one$ association$ on$BTA23$ (TXNDC5)$ for$
milk$yield$and$one$in$BTA18$for$protein$percentage$(PINLYP)$The$three$features$
listed$in$Table$3$for$milk$yield$in$Italian$Brown$were$collapsed$due$to$overlap$$
31 Milkyield
Thioredoxin$ domainBcontaining$ protein$ 5$ (TXNDC5)$ belongs$ to$ thioredoxin$
family$small$ubiquitous$proteins$containing$disulphide$domain$affecting$protein$
folding$with$chaperone$activity$preventing$protein$misBfolding$within$the$lumen$
of$ the$endoplasmic$reticulum$This$multifunctional$organelle$have$a$key$role$ in$
several$ processes$ including$ lipid$ catabolism$ and$ protein$ synthesis$ (RamiacuterezB
Torres$ et$ al$ 2012)$ Because$ of$ its$ disulphide$ bonds$ thioredoxin$ facilitate$ the$
reduction$of$other$proteins$by$cysteine$thiol$disulfide$exchange$This$mechanism$
is$essential$for$detossification$
During$ lactation$ indeed$ an$ enhanced$ detoxification$ level$ is$ functional$ to$
preserve$ milk$ quality$ against$ increased$ metabolic$ activities$ Higher$ metabolic$
levels$ lead$ to$ free$ radicals$ production$ making$ necessary$ the$ maintenance$ of$
redox$ homeostasis$ that$ include$ among$ others$ the$ reduction$ and$ oxidation$ of$
peroxiredoxin$and$thioredoxin$(Lu$and$Holmgren$2014)$
32 Proteinpercentage
For$ protein$ percentage$ in$ Italian$ Brown$ there$ is$ only$ one$ associated$ gene$
PINLYP=is$encoding$a$phospholipase$A2$inhibitor$but$this$protein$product$is$not$
yet$well$studied$ $Many$types$of$phospholipase$A2$were$characterized$and$ it$ is$
known$ that$ they$ are$ mediator$ of$ inflammation$ atherosclerosis$ and$ cancer$ in$
mammals$and$that$plays$an$important$role$in$the$innate$immune$response$(Qiu$
and$Lai$2013)$It$may$be$reasonable$to$imagine$a$role$for$PINLYP$in$the$lactation$
in$light$of$previously$discussed$results$that$link$milk$to$innate$immune$response$
(Lemay$et$al$2009)$
$
Although$ limited$ by$ the$ size$ of$ the$ study$ and$ the$ number$ of$ the$ included$
markers$these$results$represent$both$a$confirmation$and$a$step$forward$towards$
the$comprehension$of$the$biological$bases$of$milk$yield$and$quality$$
65
$
New$ genes$ potentially$ involved$ with$ production$ traits$ in$ dairy$ cows$ were$
tracked$when$GWAS$signals$resulted$too$weak$to$be$considered$in$a$classical$SNP$
based$mapping$with$the$ldquoclosest$gene$to$peakrdquo$or$the$windowBbased$strategy$
It$ is$ worth$ mentioning$ that$ none$ of$ the$ analyzed$ SNPs$ (not$ considering$ the$
Italian$Holstein$with$ARS5BFGL5NGS54939= included$ in$ the$ analysis)$were$ above$
the$ genomeBwide$ significance$ threshold$ in$ single$ SNP$ analysis$ meaning$ that$
important$information$is$often$lost$in$such$studies$$
In$addition$ to$ the$newly$ identified$ loci$we$observed$a$confirmation$of$existing$
associations$ with$ genes$ related$ to$ milk$ production$ not$ only$ dissecting$ gene$
specific$ functions$ but$ from$ mapping$ these$ genes$ in$ previously$ characterized$
regions$(Table$S5)$This$ is$not$trivial$when$using$new$data$analysis$techniques$
especially$ in$ high$ noise$ to$ signal$ ratio$ datasets$ like$ those$ here$ presented$We$
believe$ that$ this$ approach$will$ benefit$ of$data$ from$denser$marker$panels$ (eg$
BovineHD)$ to$ exploit$ local$ linkage$ disequilibrium$ and$ increase$ the$ number$ of$
mapped$genes$with$more$than$two$SNPs$
Results$ obtained$ with$ the$ gene$ centric$ approach$ although$ having$ intrinsic$
limitations$are$not$ influenced$by$a$number$of$subjective$choices$often$made$in$
GWAS$ For$ instance$when$ ldquowindow$mappingrdquo$ is$ used$ to$decrease$ the$noise$ to$
signal$ ratio$ eg$ due$ to$ nearby$ markers$ having$ different$ allele$ frequencies$
windows$size$ is$ to$be$decided$This$can$be$based$on$equal$number$of$markers$
equal$number$of$base$pairs$or$equal$map$distance$in$cM$per$window$The$three$
choices$ do$ not$ coincide$ because$ markers$ are$ not$ equally$ spread$ along$ the$
chromosomes$ and$ recombination$ differs$ in$ different$ genomic$ regions$ In$
addition$ windows$ can$ be$ chosen$ nonBoverlapping$ or$ with$ different$ degree$ of$
overlap$Finally$when$stepping$ from$significant$SNPs$or$windows$ to$candidate$
genes$the$size$of$the$flanking$region$from$which$picking$up$candidate$genes$is$to$
be$decided$as$well$Sometimes$the$gene$closer$to$the$signal$peak$is$selected$ in$
other$cases$all$genes$within$certain$boundaries$are$included$in$further$analyses$
All$ these$ choices$ may$ generate$ misleading$ results$ For$ example$ not$ truly$
associated$genes$may$be$retained$for$further$analyses$disregarding$the$correct$
ones$ In$addition$ a$ large$number$of$ false$positives$can$be$ included$ to$be$ later$
interpreted$with$ the$help$of$network$and$pathway$analysis$As$ a$ consequence$
these$decisions$have$a$direct$impact$in$terms$of$production$and$interpretation$of$
66
$
results$leading$to$conclusions$heavily$based$on$the$enrichment$analysis$and$preB
existing$knowledge$$
With$ the$ gene$ based$ association$ we$ also$ noticed$ some$ advantages$ in$ the$
downstream$data$mining$First$genes$are$functional$units$and$the$golden$targets$
of$ the$ ldquoclassical$ postBGWASrdquo$ bioinformatics$ pipelines$ Hence$ the$ gene$ based$
association$leads$to$genes$with$an$embedded$statistically$robust$framework$This$
is$ a$ ldquoplusrdquo$ in$ studying$ complex$ traits$ because$ one$ can$ concentrate$ on$ specific$
signals$ rather$ than$ cherry$ picking$meaningful$ features$ from$ a$ list$ constructed$
with$the$previously$cited$approaches$$
Moreover$it$is$easier$to$track$back$gene$clusters$to$major$gene$effects$B$like$for$
example$DGAT1$in$a$gene$rich$genome$chunk$many$genes$may$result$significant$
due$to$the$same$SNP$(Table$S2$and$Table$3)$With$the$VEGAS$method$finding$the$
ldquotruerdquo$association$is$facilitated$by$following$the$ldquobest$SNPrdquo$in$the$region$
In$addition$we$found$phenotype$coherent$signals$in$such$situation$where$only$a$
few$SNPs$B$ if$none$ndash$are$above$an$acceptable$statistical$genomeBwide$threshold$
using$a$single$SNP$regression$method$
The$gene$centric$approach$ is$a$ first$and$slight$ improvement$ in$ livestock$GWAS$
analyses$since$we$still$have$a$partial$genome$annotation$ In$addition$ it$ is$now$
known$that$ the$definition$of$gene$ is$ to$be$revised$many$genes$do$not$code$ for$
proteins$ have$ alternative$ starting$ sites$ have$ antisense$ transcripts$ host$
regulatory$ RNA$ are$ subjected$ to$ alternative$ splicing$ and$may$ be$ regulated$ by$
sequences$quite$ far$ away$ from$ their$ coding$ regions$ (Consortium$2012)$Other$
limitations$are$the$need$for$larger$number$of$animals$in$these$type$of$studies$as$
in$this$study$and$the$still$relatively$high$cost$of$sequencing$that$cannot$be$used$
routinely$in$large$experiments$at$least$not$yet$$
$
$
Conclusions
This$study$underlines$the$importance$of$the$implementation$of$new$methods$to$
analyze$complex$traits$with$genomic$data$No$golden$path(s)$is$discovered$yet$to$
walk$from$variation$to$functional$units$The$geneBcentric$analysis$may$contribute$
67
$
to$ reveal$ new$ signals$ throughout$ the$ bovine$ genome$ condensing$ weak$
association$signals$in$chromosome$bits$having$functional$significance$
$
$
Acknowledgements
The$Authors$would$ like$ to$ thank$Dr$ Jimmy$ Liu$ for$ his$ advices$ in$ building$ the$
software$AIA$ANAFI$ANARB$and$ANAPRI$breedersrsquo$associations$ for$ their$help$
and$ support$during$ the$data$analysis$ SelMol$ and$ProZoo$projects$ that$provide$
samples$ The$ Authors$ are$ also$ grateful$ to$MIPAAF$ and$ Cariplo$ Foundation$ for$
funding$
$
$
$
References
Akula$ N$ Baranova$ A$ Seto$ D$ Solka$ J$ Nalls$MA$ Singleton$ A$ Ferrucci$ L$
Tanaka$T$Bandinelli$ S$Cho$YS$Kim$YJ$Lee$ JBY$Han$BBG$Bipolar$
Disorder$ Genome$ Study$ (BiGS)$ Consortium$ The$ Wellcome$ Trust$ CaseB
Control$Consortium$McMahon$FJ$2011$A$NetworkBBased$Approach$to$
Prioritize$ Results$ from$ GenomeBWide$ Association$ Studies$ PLoS$ ONE$ 6$
e24220$doi101371journalpone0024220$
Amin$N$ van$Duijn$ CM$ Aulchenko$ YS$ 2007$ A$Genomic$Background$Based$
Method$ for$ Association$ Analysis$ in$ Related$ Individuals$ PLoS$ ONE$ 2$
e1274$doi101371journalpone0001274$
Aulchenko$YS$de$Koning$DBJ$Haley$C$2007$Genomewide$Rapid$Association$
Using$ Mixed$ Model$ and$ Regression$ A$ Fast$ and$ Simple$ Method$ For$
Genomewide$PedigreeBBased$Quantitative$Trait$Loci$Association$Analysis$
Genetics$177$577ndash585$doi101534genetics107075614$
Blanco$ EJ$ CarreteroBHernaacutendez$ M$ GarciacuteaBBarrado$ J$ IglesiasBOsma$ MC$
Carretero$M$Herrero$JJ$Rubio$M$Riesco$JM$Carretero$J$2013$The$
activity$and$proliferation$of$pituitary$prolactinBpositive$cells$and$pituitary$
68
$
VIPBpositive$ cells$ are$ regulated$by$ interleukin$6$Histol$Histopathol$ 28$
1595ndash1604$
Blott$S$Kim$JBJ$Moisio$S$SchmidtBKuntzel$A$Cornet$A$Berzi$P$Cambisano$
N$Ford$C$Grisart$B$Johnson$D$Karim$L$Simon$P$Snell$R$Spelman$
R$ Wong$ J$ Vilkki$ J$ Georges$ M$ Farnir$ F$ Coppieters$ W$ 2003$
Molecular$ dissection$ of$ a$ quantitative$ trait$ locus$ a$ phenylalanineBtoB
tyrosine$substitution$in$the$transmembrane$domain$of$the$bovine$growth$
hormone$ receptor$ is$ associated$ with$ a$ major$ effect$ on$ milk$ yield$ and$
composition$Genetics$163$253ndash266$
Burton$PR$Clayton$DG$Cardon$LR$Craddock$N$Deloukas$P$Duncanson$A$
Kwiatkowski$ DP$ McCarthy$ MI$ Ouwehand$ WH$ Samani$ NJ$ Todd$
JA$ Donnelly$ P$ Barrett$ JC$ Burton$ PR$ Davison$ D$ Donnelly$ P$
Easton$ D$ Evans$ D$ Leung$ HBT$ Marchini$ JL$ Morris$ AP$ Spencer$
CCA$Tobin$MD$Cardon$LR$Clayton$DG$Attwood$AP$Boorman$JP$
Cant$B$Everson$U$Hussey$JM$Jolley$JD$Knight$AS$Koch$K$Meech$
E$ Nutland$ S$ Prowse$ CV$ Stevens$ HE$ Taylor$ NC$ Walters$ GR$
Walker$NM$Watkins$NA$Winzer$T$Todd$JA$Ouwehand$WH$Jones$
RW$ McArdle$ WL$ Ring$ SM$ Strachan$ DP$ Pembrey$ M$ Breen$ G$
Clair$ DS$ Caesar$ S$ GordonBSmith$ K$ Jones$ L$ Fraser$ C$ Green$ EK$
Grozeva$D$Hamshere$ML$Holmans$PA$Jones$IR$Kirov$G$Moskvina$
V$ Nikolov$ I$ OrsquoDonovan$ MC$ Owen$ MJ$ Craddock$ N$ Collier$ DA$
Elkin$A$Farmer$A$Williamson$R$McGuffin$P$Young$AH$Ferrier$IN$
Ball$SG$Balmforth$AJ$Barrett$JH$Bishop$DT$Iles$MM$Maqbool$A$
Yuldasheva$N$Hall$AS$Braund$PS$Burton$PR$Dixon$RJ$Mangino$
M$ Stevens$ S$ Tobin$ MD$ Thompson$ JR$ Samani$ NJ$ Bredin$ F$
Tremelling$ M$ Parkes$ M$ Drummond$ H$ Lees$ CW$ Nimmo$ ER$
Satsangi$ J$Fisher$SA$Forbes$A$Lewis$CM$Onnie$CM$Prescott$NJ$
Sanderson$J$Mathew$CG$Barbour$J$Mohiuddin$MK$Todhunter$CE$
Mansfield$JC$Ahmad$T$Cummings$FR$Jewell$DP$Webster$J$Brown$
MJ$ Clayton$DG$ Lathrop$GM$ Connell$ J$Dominiczak$A$ Samani$NJ$
Marcano$CAB$Burke$B$Dobson$R$Gungadoo$J$Lee$KL$Munroe$PB$
Newhouse$SJ$Onipinla$A$Wallace$C$Xue$M$Caulfield$M$Farrall$M$
Barton$A$ (braggs)$TB$ in$RG$and$G$Bruce$ IN$Donovan$H$Eyre$S$
69
$
Gilbert$ PD$ Hider$ SL$ Hinks$ AM$ John$ SL$ Potter$ C$ Silman$ AJ$Symmons$ DPM$ Thomson$ W$ Worthington$ J$ Clayton$ DG$ Dunger$DB$ Nutland$ S$ Stevens$ HE$ Walker$ NM$ Widmer$ B$ Todd$ JA$Frayling$TM$Freathy$RM$Lango$H$Perry$JRB$Shields$BM$Weedon$MN$ Hattersley$ AT$ Hitman$ GA$ Walker$ M$ Elliott$ KS$ Groves$ CJ$Lindgren$ CM$ Rayner$ NW$ Timpson$ NJ$ Zeggini$ E$ McCarthy$ MI$Newport$M$Sirugo$G$Lyons$E$Vannberg$F$Hill$AVS$Bradbury$LA$Farrar$ C$ Pointon$ JJ$ Wordsworth$ P$ Brown$ MA$ Franklyn$ JA$Heward$JM$Simmonds$MJ$Gough$SCL$Seal$S$(uk)$BCSC$Stratton$MR$Rahman$N$Ban$M$Goris$A$Sawcer$SJ$Compston$A$Conway$D$Jallow$ M$ Newport$ M$ Sirugo$ G$ Rockett$ KA$ Kwiatkowski$ DP$Bumpstead$ SJ$ Chaney$A$Downes$K$ Ghori$MJR$ Gwilliam$R$Hunt$SE$Inouye$M$Keniry$A$King$E$McGinnis$R$Potter$S$Ravindrarajah$R$ Whittaker$ P$ Widden$ C$ Withers$ D$ Deloukas$ P$ Leung$ HBT$Nutland$S$Stevens$HE$Walker$NM$Todd$JA$Easton$D$Clayton$DG$Burton$PR$Tobin$MD$Barrett$JC$Evans$D$Morris$AP$Cardon$LR$Cardin$NJ$Davison$D$Ferreira$T$PereiraBGale$ J$Hallgrimsdoacutettir$ IB$Howie$ BN$Marchini$ JL$ Spencer$ CCA$ Su$ Z$ Teo$ YY$ Vukcevic$ D$Donnelly$P$Bentley$D$Brown$MA$Cardon$LR$Caulfield$M$Clayton$DG$ Compston$ A$ Craddock$ N$ Deloukas$ P$ Donnelly$ P$ Farrall$ M$Gough$ SCL$ Hall$ AS$ Hattersley$ AT$ Hill$ AVS$ Kwiatkowski$ DP$Mathew$ CG$McCarthy$MI$ Ouwehand$WH$ Parkes$M$ Pembrey$M$Rahman$N$Samani$NJ$Stratton$MR$Todd$ JA$Worthington$ J$2007$GenomeBwide$ association$ study$ of$ 14000$ cases$ of$ seven$ common$diseases$ and$ 3000$ shared$ controls$ Nature$ 447$ 661ndash678$doi101038nature05911$
Cantor$ RM$ Lange$ K$ Sinsheimer$ JS$ 2010$ Prioritizing$ GWAS$ Results$ A$Review$ of$ Statistical$ Methods$ and$ Recommendations$ for$ Their$Application$Am$J$Hum$Genet$86$6ndash22$doi101016jajhg200911017$
Clark$ AG$ Hubisz$ MJ$ Bustamante$ CD$ Williamson$ SH$ Nielsen$ R$ 2005$Ascertainment$ bias$ in$ studies$ of$ human$ genomeBwide$ polymorphism$Genome$Res$15$1496ndash1502$doi101101gr4107905$
70
$
CohenBZinder$M$ Seroussi$ E$ Larkin$ DM$ Loor$ JJ$Wind$ AE$ der$ Lee$ JBH$Drackley$ JK$Band$MR$Hernandez$AG$Shani$M$Lewin$HA$Weller$JI$ Ron$ M$ 2005$ Identification$ of$ a$ missense$ mutation$ in$ the$ bovine$ABCG2$gene$with$a$major$effect$on$ the$QTL$on$ chromosome$6$affecting$milk$yield$and$composition$in$Holstein$cattle$Genome$Res$15$936ndash944$doi101101gr3806705$
Cole$JB$Wiggans$GR$Ma$L$Sonstegard$TS$Lawlor$TJ$Crooker$BA$Tassell$CPV$ Yang$ J$ Wang$ S$ Matukumalli$ LK$ Da$ Y$ 2011$ GenomeBwide$association$ analysis$ of$ thirty$ one$ production$ health$ reproduction$ and$body$ conformation$ traits$ in$ contemporary$ US$ Holstein$ cows$ BMC$Genomics$12$408$doi1011861471B2164B12B408$
Colitti$M$Wilde$CJ$Stefanon$B$2004$Functional$expression$of$bclB2$protein$family$and$AIF$in$bovine$mammary$tissue$in$early$lactation$J$Dairy$Res$71$20ndash27$
Consortium$ TEP$ 2012$ An$ integrated$ encyclopedia$ of$ DNA$ elements$ in$ the$human$genome$Nature$489$57ndash74$doi101038nature11247$
De$Mello$F$Cobuci$JA$Martins$MF$Silva$MVGB$Neto$JB$2012$Association$of$the$polymorphism$g8514CgtT$in$the$osteopontin$gene$(SPP1)$with$milk$yield$ in$ the$ dairy$ cattle$ breed$ Girolando$ Anim$ Genet$ 43$ 647ndash648$doi101111j1365B2052201102312x$
Dudbridge$ F$ Gusnanto$ A$ 2008$ Estimation$ of$ significance$ thresholds$ for$genomewide$ association$ scans$ Genet$ Epidemiol$ 32$ 227ndash234$doi101002gepi20297$
Elsik$ CG$ Tellam$ RL$ Worley$ KC$ 2009$ The$ Genome$ Sequence$ of$ Taurine$Cattle$ A$window$ to$ ruminant$ biology$ and$ evolution$ Science$ 324$ 522ndash528$doi101126science1169588$
Fazioli$ F$ Minichiello$ L$ Matoska$ V$ Castagnino$ P$ Miki$ T$ Wong$ WT$ Di$Fiore$ PP$ 1993$ Eps8$ a$ substrate$ for$ the$ epidermal$ growth$ factor$receptor$kinase$enhances$EGFBdependent$mitogenic$signals$EMBO$J$12$3799ndash3808$
Fontanesi$ L$ Calograve$ DG$ Galimberti$ G$ Negrini$ R$ Marino$ R$ Nardone$ A$AjmoneBMarsan$ P$ Russo$ V$ 2014$ A$ candidate$ gene$ association$ study$
71
$
for$ nine$ economically$ important$ traits$ in$ Italian$ Holstein$ cattle$ Anim$
Genet$nandashna$doi101111age12164$
Fowler$KJ$Walker$F$Alexander$W$Hibbs$ML$Nice$EC$Bohmer$RM$Mann$
GB$ Thumwood$ C$ Maglitto$ R$ Danks$ JA$ 1995$ A$ mutation$ in$ the$
epidermal$growth$factor$receptor$in$wavedB2$mice$has$a$profound$effect$
on$ receptor$ biochemistry$ that$ results$ in$ impaired$ lactation$ Proc$ Natl$
Acad$Sci$92$1465ndash1469$doi101073pnas9251465$
Gianola$ D$ Hospital$ F$ Verrier$ E$ 2013$ Contribution$ of$ an$ additive$ locus$ to$
genetic$variance$when$inheritance$is$multiBfactorial$with$implications$on$
interpretation$ of$ GWAS$ Theor$ Appl$ Genet$ 126$ 1457ndash1472$
doi101007s00122B013B2064B2$
Gillette$ M$ Bray$ K$ Blumenthaler$ A$ VargoBGogola$ T$ 2013$ P190B$ RhoGAP$
Overexpression$ in$ the$ Developing$Mammary$ Epithelium$ Induces$ TGFβB
dependent$ Fibroblast$ Activation$ PLoS$ ONE$ 8$ e65105$
doi101371journalpone0065105$
Grisart$B$Coppieters$W$Farnir$F$Karim$L$Ford$C$Berzi$P$Cambisano$N$
Mni$ M$ Reid$ S$ Simon$ P$ Spelman$ R$ Georges$ M$ Snell$ R$ 2002$
Positional$Candidate$Cloning$of$a$QTL$ in$Dairy$Cattle$ Identification$of$a$
Missense$Mutation$ in$the$Bovine$DGAT1$Gene$with$Major$Effect$on$Milk$
Yield$ and$ Composition$ Genome$ Res$ 12$ 222ndash231$
doi101101gr224202$
Hubbard$NE$Chen$QJ$Sickafoose$LK$Wood$MB$Gregg$ JP$Abrahamsson$
NM$ Engelberg$ JA$ Walls$ JE$ Borowsky$ AD$ 2013$ Transgenic$
Mammary$Epithelial$Osteopontin$(Spp1)$Expression$Induces$Proliferation$
and$ Alveologenesis$ Genes$ Cancer$ 4$ 201ndash212$
doi1011771947601913496813$
Hu$ ZBL$ Park$ CA$ Wu$ XBL$ Reecy$ JM$ 2013$ Animal$ QTLdb$ an$ improved$
database$tool$for$livestock$animal$QTLassociation$data$dissemination$in$
the$ postBgenome$ era$ Nucleic$ Acids$ Res$ 41$ D871ndashD879$
doi101093nargks1150$
Jeon$et$al$ B$2008$ B$ProtoBoncogene$FBIB1$ (PokemonZBTB7A)$Represses$Trpdf$
nd$
72
$
Jiang$ L$ Liu$ J$ Sun$ D$ Ma$ P$ Ding$ X$ Yu$ Y$ Zhang$ Q$ 2010$ Genome$Wide$
Association$ Studies$ for$ Milk$ Production$ Traits$ in$ Chinese$ Holstein$
Population$PLoS$ONE$5$e13661$doi101371journalpone0013661$
Kong$ PBZ$ Yang$ F$ Li$ L$ Li$ XBQ$ Feng$ YBM$ 2013$Decreased$FOXF2$mRNA$
Expression$ Indicates$ EarlyBOnset$ Metastasis$ and$ Poor$ Prognosis$ for$
Breast$ Cancer$ Patients$ with$ Histological$ Grade$ II$ Tumor$ PLoS$ ONE$ 8$
e61591$doi101371journalpone0061591$
Lee$SBU$Maeda$T$2012$POKZBTB$proteins$an$emerging$ family$of$proteins$
that$ regulate$ lymphoid$ development$ and$ function$ Immunol$ Rev$ 247$
107ndash119$
Lemay$ DG$ Lynn$ DJ$ Martin$ WF$ Neville$ MC$ Casey$ TM$ Rincon$ G$
Kriventseva$ EV$ Barris$ WC$ Hinrichs$ AS$ Molenaar$ AJ$ 2009$ The$
bovine$lactation$genome$ insights$ into$the$evolution$of$mammalian$milk$
Genome$Biol$10$R43$
Liu$JZ$Mcrae$AF$Nyholt$DR$Medland$SE$Wray$NR$Brown$KM$2010$A$
versatile$ geneBbased$ test$ for$ genomeBwide$ association$ studies$ Am$ J$
Hum$Genet$87$139$
Liu$S$Dontu$G$Mantle$ID$Patel$S$Ahn$N$Jackson$KW$Suri$P$Wicha$MS$
2006$Hedgehog$Signaling$and$BmiB1$Regulate$SelfBrenewal$of$Normal$and$
Malignant$ Human$ Mammary$ Stem$ Cells$ Cancer$ Res$ 66$ 6063ndash6071$
doi1011580008B5472CANB06B0054$
Lu$ J$ Holmgren$ A$ 2014$ The$ thioredoxin$ superfamily$ in$ oxidative$ protein$
folding$ Antioxid$ Redox$ Signal$ 140202091634000$
doi101089ars20145849$
Luna$ A$ Nicodemus$ KK$ 2007$ snpplotter$ an$ RBbased$ SNPhaplotype$
association$and$linkage$disequilibrium$plotting$package$Bioinforma$Oxf$
Engl$23$774ndash776$doi101093bioinformaticsbtl657$
McCarthy$ MI$ Abecasis$ GR$ Cardon$ LR$ Goldstein$ DB$ Little$ J$ Ioannidis$
JPA$ Hirschhorn$ JN$ 2008$ GenomeBwide$ association$ studies$ for$
complex$traits$consensus$uncertainty$and$challenges$Nat$Rev$Genet$9$
356ndash369$doi101038nrg2344$
Meredith$ BK$ Kearney$ FJ$ Finlay$ EK$ Bradley$ DG$ Fahey$ AG$ Berry$ DP$
Lynn$ DJ$ 2012$ GenomeBwide$ associations$ for$ milk$ production$ and$
73
$
somatic$cell$score$in$HolsteinBFriesian$cattle$in$Ireland$BMC$Genet$13$21$doi1011861471B2156B13B21$
Minozzi$ G$Nicolazzi$ EL$ Stella$ A$ Biffani$ S$Negrini$ R$ Lazzari$ B$ AjmoneBMarsan$ P$Williams$ JL$ 2013$ Genome$Wide$ Analysis$ of$ Fertility$ and$Production$ Traits$ in$ Italian$ Holstein$ Cattle$ PLoS$ ONE$ 8$ e80219$doi101371journalpone0080219$
Nicolazzi$EL$Picciolini$M$Strozzi$F$Schnabel$RD$Lawley$C$Pirani$A$Brew$F$ Stella$ A$ 2014$ SNPchiMp$ a$ database$ to$ disentangle$ the$ SNPchip$jungle$ in$ bovine$ livestock$ BMC$ Genomics$ 15$ 123$ doi1011861471B2164B15B123$
Purcell$ S$ Neale$ B$ ToddBBrown$ K$ Thomas$ L$ Ferreira$ MAR$ Bender$ D$Maller$J$Sklar$P$de$Bakker$PIW$Daly$MJ$Sham$PC$2007$PLINK$a$tool$ set$ for$ wholeBgenome$ association$ and$ populationBbased$ linkage$analyses$Am$J$Hum$Genet$81$559ndash575$doi101086519795$
Qiu$ S$ Lai$ L$ 2013$ Antibacterial$ properties$ of$ recombinant$ human$ nonBpancreatic$secretory$phospholipase$A2$Biochem$Biophys$Res$Commun$441$453ndash456$doi101016jbbrc201310092$
Quinlan$AR$Hall$IM$2010$BEDTools$a$flexible$suite$of$utilities$for$comparing$genomic$ features$ Bioinformatics$ 26$ 841ndash842$doi101093bioinformaticsbtq033$
RamiacuterezBTorres$ A$ BarceloacuteBBatllori$ S$ MartiacutenezBBeamonte$ R$ Navarro$ MA$Surra$ JC$ Arnal$ C$ Guilleacuten$N$ Aciacuten$ S$ Osada$ J$ 2012$ Proteomics$ and$gene$ expression$ analyses$ of$ squaleneBsupplemented$ mice$ identify$microsomal$thioredoxin$domainBcontaining$protein$5$changes$associated$with$ hepatic$ steatosis$ J$ Proteomics$ 77$ 27ndash39$doi101016jjprot201207001$
Raven$ LBA$ Cocks$ BG$ Pryce$ JE$ Cottrell$ JJ$ Hayes$ BJ$ 2013$ Genes$ of$ the$RNASE5$pathway$ contain$ SNP$ associated$with$milk$ production$ traits$ in$dairy$cattle$Genet$Sel$Evol$45$25$doi1011861297B9686B45B25$
Scotti$E$Fontanesi$L$Schiavini$F$La$Mattina$V$Bagnato$A$Russo$V$2010$DGAT1$ pK232A$ polymorphism$ in$ dairy$ and$ dual$ purpose$ Italian$ cattle$breeds$Ital$J$Anim$Sci$9$doi104081ijas2010e16$
74
$
Stefanon$ B$ Colitti$ M$ Gabai$ G$ Knight$ CH$ Wilde$ CJ$ 2002$ Mammary$
apoptosis$and$lactation$persistency$in$dairy$animals$J$Dairy$Res$69$37ndash
52$
Stegh$ AH$ DePinho$ RA$ 2011$ Beyond$ effector$ caspase$ inhibition$ Bcl2L12$
neutralizes$ p53$ signaling$ in$ glioblastoma$ Cell$ Cycle$ 10$ 33ndash38$
doi104161cc10114365$
Taberlet$ P$ Valentini$ A$ Rezaei$ HR$ Naderi$ S$ Pompanon$ F$ Negrini$ R$
AjmoneBMarsan$ P$ 2008$ Are$ cattle$ sheep$ and$ goats$ endangered$
species$Mol$Ecol$17$275ndash284$doi101111j1365B294X200703475x$
Team$RC$ 2014$ R$ A$ Language$ and$Environment$ for$ Statistical$ Computing$ R$
Foundation$for$Statistical$Computing$Vienna$Austria$
Thomadaki$H$ Talieri$M$ Scorilas$ A$ 2007$ Prognostic$ value$ of$ the$ apoptosis$
related$genes$BCL2$and$BCL2L12$in$breast$cancer$Cancer$Lett$247$48ndash
55$doi101016jcanlet200603016$
Valdehita$ A$ Bajo$ AM$ FernaacutendezBMartiacutenez$ AB$ Arenas$ MI$ Vacas$ E$
Valenzuela$ P$ RuiacutezBVillaespesa$ A$ Prieto$ JC$ Carmena$ MJ$ 2010$
Nuclear$ localization$ of$ vasoactive$ intestinal$ peptide$ (VIP)$ receptors$ in$
human$ breast$ cancer$ Peptides$ 31$ 2035ndash2045$
doi101016jpeptides201007024$
Wang$T$Liu$NS$Seet$LBF$Hong$W$2010$The$Emerging$Role$of$VHS$DomainB
Containing$Tom1$Tom1L1$and$Tom1L2$in$Membrane$Trafficking$Traffic$
11$1119ndash1128$doi101111j1600B0854201001098x$
Wang$ X$Wurmser$ C$ Pausch$ H$ Jung$ S$ Reinhardt$ F$ Tetens$ J$ Thaller$ G$
Fries$R$2012$Identification$and$Dissection$of$Four$Major$QTL$Affecting$
Milk$Fat$Content$in$the$German$HolsteinBFriesian$Population$PLoS$ONE$7$
e40711$doi101371journalpone0040711$
Xu$S$2003$Theoretical$Basis$of$the$Beavis$Effect$Genetics$165$2259ndash2268$
Zimin$AV$Delcher$AL$Florea$L$Kelley$DR$Schatz$MC$Puiu$D$Hanrahan$
F$ Pertea$ G$ Tassell$ CPV$ Sonstegard$ TS$ Marccedilais$ G$ Roberts$ M$
Subramanian$ P$ Yorke$ JA$ Salzberg$ SL$ 2009$ A$ wholeBgenome$
assembly$ of$ the$ domestic$ cow$ Bos$ taurus$ Genome$ Biol$ 10$ R42$
doi101186gbB2009B10B4Br42$
75
$
Zu$X$Ma$J$Liu$H$Liu$F$Tan$C$Yu$L$Wang$J$Xie$Z$Cao$D$Jiang$Y$2011$ProBoncogene$ Pokemon$ promotes$ breast$ cancer$ progression$ by$upregulating$ survivin$ expression$ Breast$ Cancer$ Res$ 13$ R26$doi101186bcr2843$ $
76
$
Supplementaryfiles$
You$can$find$Chapter$͵$supplementary$files$at$
- DocTa$(Doctoral$Thesis$Archive)$(httptesionlineunicattit)$
- Google$Drive$(httptinyurlcomPhDMMChapter03)$or$scan$the$QR$code$
$
$
SupplementaryFigureS1Multidimensionalscaling
Spatial$distribution$of$genetic$structure$in$the$analyzed$breeds$$
$
Supplementary$FigureS2GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$Italian$Brown$
$
Supplementary$FigureS3GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$Italian$Holstein$
$
Supplementary$FigureS4GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$HolsteinDGAT$
$
Supplementary$FigureS5GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$Italian$Simmental$
77
$
$Supplementary$FigureS6LDdecayGenomeBwide$LD$decay$for$the$three$breeds$(Holstein$Simmental$Brown)$$Supplementary$FigureS7DescriptiveplotsDescriptive$statistics$for$the$Italian$Brown$Italian$Holstein$and$Italian$Simmental$$Supplementary$FigureS8DGAT1LDFocus$on$the$LD$of$the$DGAT1$region$in$all$the$datasets$(Brown$Holstein$HolsteinDGAT$Simmental)$$Supplementary$TableS1SNPfactNumber$of$subjects$and$SNPs$after$QC$$Supplementary$TableS2GeneBasedAssociationresultsFull$results$of$the$significant$genes$associated$with$the$traits$in$all$the$datasets$$ Sheet$1$Brown$ndash$fat$percentage$$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS3SingleSNPAssociationresultsFull$results$of$the$SNP$that$overcome$the$suggestive$threshold$$$ Sheet$1$Brown$ndash$fat$percentage$
78
$
$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS4GenesandtraitsGeneWise$pBvalues$of$the$discussed$genes$in$all$the$analysis$In$grey$are$evidenced$the$pBvalues$that$overcome$the$significance$threshold$Genes$are$alphabetically$listed$$Supplementary$TableS5GenesinQTLsGenes$from$the$geneBbased$association$mapping$into$known$QTLs$$$$$
79
CHAPTER(4(
MUGBAS(a(species(free(gene(based(association(suite(
S(Capomaccio1(M(Milanesi1(L(Bomba1(et(al(1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122ItalyTheseauthorscontributedequallytothiswork
____________________ApplicationnotesubmittedtoBioinformatics
Abstract(Theassociationstudiesbetweengenomewidemarkersandphenotypes(GWAS)
arenowwidelyusedinmodelandnonmodelspeciesHowevertoolsthatmerge
singlemarkerresultsandtheirprobabilityofbeingassociatedtofunctionalunits
(typicallygenes)areseldomdevelopedforspeciesotherthanhuman
Herewe presentMUGBAS a suite that calculates a generegionbased pvalue
fromagivenannotationsinglemarkerGWAresultandgenotypedatawiththe
objective to ease postGWAS analysis The software is species and annotation
independentfasthighlyparallelizedandreadyforhighdensitymarkerstudies
Availabilityandimplementationhttpsbitbucketorgcapemastermugbas
Introduction(With the availability of highdensity SNP panels Genome Wide Association
Studies (GWAS) have become the gold standard for dissecting the biology of
complextraits(McCarthyetal2008)Thisistrueforhumansandotherspecies
TheunderlyingideaofGWASissimplefindmarkertraitassociationsexploiting
the linkage disequilibrium (LD) that exists between the causative mutation ndash
whichweignorendashandoneormoremarkersofknownchromosomalposition
Whileanumberofwellestablishedapproachescanbeeasilyusedinastandard
GWAS (Aulchenko et al 2007 Kang et al 2010 Purcell et al 2007) to find
marker(s) associated to a particular phenotype several issues have to be
accountedforduringtheupanddownstreamanalyses
Firstly genome wide studies have to take into account multiple testing If
stringent statistical thresholds are applied a proportion ofmoderate or small
effectsactuallyassociatedwiththetraitwillbedisregardedThereforereducing
the number of tests while maintaining all the information provided by high
densitypanelswouldbeaclearadvantageInadditiononceasignificantsignalis
detected different strategiesmaybeused to tag the candidate causative gene
Sometimesthegeneclosertothepeaksignalisselectedinothercasesallgenes
withinaflankingregionofanarbitrarysizeofthesignificantSNP(s)areincluded
82
infurtheranalysesThesechoicesmayendupinbiasresultseitherretaininga
largenumberofnottrulyassociatedgenesordisregardingthecorrectones
Somemethodshaverecentlybeenproposedtoovercometheselimitationsusing
aprioriknowledgeastheldquocandidatepathwayrdquoanalysis(Ravenetal2013)and
the ldquogenebasedrdquo association strategies (Akula et al 2011 Cantor et al 2010
Liuetal2010)TheseapproachesrestrictGWAstudiesonlytoknowngenesor
genesubsetsresizingthemultipletestingproblemandincreasingthepowerof
theanalyses
Usually bioinformatics pipelines are developed for the analysis of human and
modelorganismsandarenotadaptedtononmodelorganismThisisthecaseof
the VEGAS software (Liu et al 2010 Yang et al 2014) developed only for
humansandonlyforgenebasedanalyses
Here we introduce MUGBAS [MUlti species GeneBased Association Suite] a
softwarebasedonVEGASthatusesGWASdataandagivenannotation(typically
genesbutanyregionmaybesuitable)toestimategenebasedandregionbased
associationpvaluesThesuite is speciesandannotation free ready toanalyze
highdensitydatafromanyexperimentaldesigns
Methods(MUGBASisasuiteofscriptsbuiltinBashRPythonandPerlThesuiterequires
several prerequisites that can be easily fulfilled in a LinuxUnixMac
environmentwith a few command lines Further information on how tomeet
theserequirementsaredetailedintheonlinerepository
MUGBAS suite can be invoked launching the Bash wrapper that checks
dependenciesandstoresuserchoices
Required input files are MAPPED files in PLINK format with genotype data
from(atleast200individualsarerecommendedforaproperLDcalculation)an
annotation file inBED format (chromosome startingposition endingposition
nameandusercustomfields)ofthedesiredgenomefeatureandtheGWASresult
file with mandatory headers ldquoSNPNAMErdquo and ldquoPVALUErdquo Multiple association
resultscanbetestedatthesametime ifprovidedinseparatefiles inthesame
directoryAtrialdatasetisdownloadablealongwiththesoftwarereleasePlease
83
notethatforsimplicityweprovideageneannotationbutanyBEDfilewithany
desiredfeaturecanbeused
The program starts the analysis assigning SNPs to any feature of the given
annotationusingbedtools(QuinlanandHall2010)Thisstepiscustomizableby
user defined boundaries (up and downstream) in order to catch regulatory
regionsthatcouldbeinhighLDwiththecurrentgeneregion
Inordertospeeduptheanalysistheprogramsubsamples200individualsfrom
theinputdatasetWethenimplementedtwolevelsofparallelizationoneforthe
LDandonefortheldquogenewiserdquoorldquoregionwiserdquopvaluecalculationThelatteris
particularlyusefulanalyzingmorethanoneGWAresultsfromthesamedataset
BrieflyMUGBASfirstsplitstheLDcalculationinncores(asdefinedbytheuser)
using the foreach doParallel packages (Analytics andWeston 2014a 2014b)
and then distributes traits across cores using GNU Parallel and foreach
doParallel
SinceLDcalculationisaslowprocessVEGASbenefitsofHapMapphase2datato
speed up the processwhile preserving the option for a user defined (human)
population using PLINK In order to expand these analysis to any species LD
calculationinMUGBASisimplementedwithther2fast()functionfromGenABEL
package(Aulchenkoetal2007)whilethegenewisepvalueiscalculatedasin
theVEGASapproachBrieflythegenewiseteststatisticisthesumofthenupper
tailchisquaredvaluesofaSNPsubsetwithonedegreeoffreedom(wheren is
thenumberofSNPmappinginthegivengene)Withperfectlinkageequilibrium
the genewise pvalue would be the onetailed pvalue of a chisquare
distributionwithndegreesof freedomOtherwisemorelikelythedistribution
of the pvalues has to be evaluated (with simulations) andweighted (with LD
values)(Liuetal2010)
Once every generegion has its ldquogenewiserdquo pvalue a FDR qvalue (False
Discovery Rate) statistics is calculated with the base R function padjust()
Results are stored and enrichedwith ldquomanhattanrdquo and ldquoundergroundrdquo plots of
theldquogenewiserdquopvalueandtheqvaluerespectively
84
Results(MUGBAS output is generegionoriented with useful information about the
location of the analyzed feature the ldquoBest SNPrdquo within that feature with its
genomiccoordinatesandtheoriginalpvaluethegeneorregionwisestatistics
(pvalue number of simulations and qvalue) the custom annotation of that
particularentryandthepositionoftheSNPwithrespecttothefeatureandthe
decidedboundariesAll this informationareuseful to track theeffectofhighly
significant markers that could map into multiple region of interest and
effectively pinpoint the most probable location to focus on with data mining
processes
A typical analysis performed with a highdensity panel (gt600K markers)
distributed in 10 midperformance cores (Intel Xeon X5675 307 GHz) on
threetraitssimultaneouslytakesaboutfourhours
Discussion(Generegionbased methods are now widely used and accepted in genomic
research However non modelspecies suffer for the lack of availability of
specifictoolstoenhancediscoveryfromgenomewidedataMUGBASprovidesa
robust and accepted statistics to researchers dealing with species other than
humantoeasilyjumpfromassociatedmarkerstothedesiredfunctionalunits
Acknowledgements(((WewouldliketothankDrJimmyLiuforhisadvicesinbuildingthesuite
Reference(Akula N Baranova A Seto D Solka J NallsMA Singleton A Ferrucci L
TanakaTBandinelli SChoYSKimYJLee JYHanBGBipolar
Disorder Genome Study (BiGS) Consortium The Wellcome Trust Case
85
ControlConsortiumMcMahonFJ2011ANetworkBasedApproachtoPrioritize Results from GenomeWide Association Studies PLoS ONE 6e24220doi101371journalpone0024220
Analytics R Weston S 2014a doParallel Foreach parallel adaptor for theparallelpackage
AnalyticsRWestonS2014bforeachForeachloopingconstructforRAulchenkoYSdeKoningDJHaleyC2007GenomewideRapidAssociation
Using Mixed Model and Regression A Fast and Simple Method ForGenomewidePedigreeBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614
Cantor RM Lange K Sinsheimer JS 2010 Prioritizing GWAS Results AReview of Statistical Methods and Recommendations for TheirApplicationAmJHumGenet866ndash22doi101016jajhg200911017
KangHMSulJHServiceSKZaitlenNAKongSFreimerNBSabattiCEskin E 2010 Variance component model to account for samplestructure in genomewide association studies Nat Genet 42 348ndash354doi101038ng548
LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKM2010Aversatile genebased test for genomewide association studies Am JHumGenet87139
McCarthy MI Abecasis GR Cardon LR Goldstein DB Little J IoannidisJPA Hirschhorn JN 2008 Genomewide association studies forcomplextraitsconsensusuncertaintyandchallengesNatRevGenet9356ndash369doi101038nrg2344
Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender DMallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKatool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795
QuinlanARHallIM2010BEDToolsaflexiblesuiteofutilitiesforcomparinggenomic features Bioinformatics 26 841ndash842doi101093bioinformaticsbtq033
86
Raven LA Cocks BG Pryce JE Cottrell JJ Hayes BJ 2013 Genes of the
RNASE5pathway contain SNP associatedwithmilk production traits in
dairycattleGenetSelEvol4525doi101186129796864525
87
CHAPTER5
Imputationaccuracyisrobusttocattlereferencegenomeupdates
MMilanesi1etal1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122Italy
____________________ShortCommunicationacceptedinAnimalGenetics
SummaryGenotype imputation is routinely applied in a large number of cattle breeds Imputation
hasbecomeaneedduetothelargenumberofSNParrayswithvariabledensity(currently
from2900to777962SNPs)Althoughmanyauthorshavestudiedtheeffectofdifferent
statisticalmethodsonimputationaccuracytheimpactofa(likely)changeinthereference
genomeassemblyonimputationfromlowertohigherdensityhasnotbeendeterminedso
far In this work 1021 Italian Simmental SNP genotypes were reJmapped on the three
mostrecentreferencegenomeassembliesFour imputationmethodswereusedtoassess
theimpactofanupdateinthereferencegenomeAsexpectedthefourmethodsbehaved
differentlywith largedifferences in termsof accuracyUpdatingSNPcoordinateson the
three tested cattle reference genome assemblies determined only a slight variation on
imputationresultswithinmethod
KeywordsImputationreferencegenomeassemblySNPbovine
MaintextFromthepublicationof the firstbovinesinglenucleotidepolymorphism(SNP)array the
wholegenomicscenarioofthisspeciesevolvedrapidlyNowadaysseveralSNPpanelshave
been designed with a range of densities mapping on different reference genome
assemblies and produced by different manufacturers When running analyses with SNP
arrays of different densities or from different commercial companies (eg Illumina and
Affymetrix) an imputation step is often required The impact of different statistical
methods on imputation accuracy has already been assessed (Pei et al 2008) (Marchini
andHowie2010)(Maetal2013)Surprisinglyuptonowtherearenotavailablestudies
investigating the impact of an update on the reference genome assembly on imputation
accuracy This assessment would be of fundamental importance considering that reJ
mappingSNPsdeterminesarearrangementofthehaplotypestructureofthegenotypes(in
somecasesSNPsarereJmappedtoadifferentchromosome)andthatthebovinereference
assembly is likely toberepeatedlyupdatedover timeThisworkwasaimedatassessing
the effect of different reference genome assemblies in cattle As test casewe compared
four different imputation methods from low (Illumina BovineLDv10 Beadchip) to
90
mediumhigh (Illumina BovineSNP50v2 Beadchip) density in the Italian Simmental
populationForeachmethodweassessedthe impacton imputationaccuracyofmapping
SNPs on the BTAU42 BTAU46 (wwwhgscbcmeduotherJmammalsbovineJgenomeJ
project)orUMD31(Ziminetal2009)cattlereferencegenomeassemblies
Atotalof900and121SimmentalbullsweregenotypedwiththeIlluminaBovineSNP50v1
and v2 Beadchips (Illumina Inc San Diego CA) respectively Illumina BovineSNP50v2
Beadchip (hereinafter named 54k) the official reference chip for the Italian Simmental
breedersassociation(ANAPRI)wasconsideredasreferenceSNPpanelTheSNPchiMpv1
tool(Nicolazzietal2014)wasusedtoobtainSNPcoordinatesonBTAU42UMD31and
BTAU46genomeassembliesForeachassemblySNPsnotcontainedinthe54kunmapped
orinthesexualchromosomeswerediscardedThefinaldatasetcontained5118952886
and51290SNPsinBTAU42UMD31andBTAU46respectively
FourimputationmethodswereconsideredPedImpute(Nicolazzietal2013)FindHapv2
(VanRadenetal2011)FImpute(Sargolzaeietal2014)andBeaglev332(Browningand
Browning2007)Thefirstthreeusepedigreeandpopulation(ie linkagedisequilibrium
LD) informationwhereas the fourth uses only LD information Performance of the four
methods testedacrossgenomicassemblieswas comparedusingwrongly imputedalleles
(ldquoerrorsrdquo)andallelicsquaredcorrelation(ldquoallelicJR2rdquo)statisticsAllelicJR2wasdefined
asthesquaredcorrelationbetweenimputedandtrue(eggenotyped)allelesmultipliedby
100 as in Browning amp Browning (2009) The 878 oldest sires were used as training
population whereas the 143 youngest bulls were used as test population SNPs not in
commonwith the lowdensity panelwere set tomissing in the test population The test
population was split in four scenarios individuals with sire and maternal grandsire
genotypedinthereferencepopulation(scenarioAJ84animals)individualswithonlythe
sire genotyped (scenario B J 34 animals) individuals with only the maternal grandsire
genotyped(scenarioCJ13animals)ornoneofcloserelativesgenotyped(scenarioDJ12
animals)Thepopulationusedwashighlystructuredwithnearlyhalfofthebullspresent
in the dataset (490) sired by 115 genotyped sires In addition 35 out of these 115
genotypedsiresweresiredby22bulls(eggrandsires)alsopresentinthedataset
DifferencesacrossassembliesarenothomogeneousFromBTAU42toBTAU4646SNPs
weremapped to different chromosomes and the average difference in physical position
withinchromosomeswasnearly025MbpHowevertherewasnosubstantialdifferencein
the order of markers In fact not considering unmapped SNPs on either of the
aforementionedassembliesadifferenceinthesequentialorderwasobservedforonly134
91
single (ie spread throughout the genome) SNPs On the contrary from UMD31 to
BTAU46 222 SNPsweremapped to different chromosomes and amuch larger average
differenceinSNPsphysicalpositionwasobserved(iemorethan2Mbp)Alsothenumber
ofSNPswithdifferentsequentialorderwasmuchlarger(ie1893SNPs)Howevermore
than50oftherearrangementsinvolvedblocksfrom3tonearly100markersAsaresult
slight differences in overall imputation accuracy from BTA46 to UMD31 rather than
between the two BTAU assemblies were expected across methods On the contrary if
rearrangements are due to issues in the assembly of scaffolds theymight have a direct
(negative) impact on local imputation performances In fact a preliminary analysis on
BTA1consideringlocalerroracrossassembliesandmethodssuggestatleasttwosuch
SNPsclustersat44and138MbponUMD31(FiguresS1S2andS3)
Considering allmethods and scenarios amaximum average variation of 02 inerrors
(Table1)andof09inallelicJR2(Table2)wasobservedacrossreferenceassembliesThe
standarddeviation(SD)oftheimputationaccuracywasstableacrossreferenceassemblies
withinmethodsOnthecontraryalargervariabilitywasobservedacrossmethods
PedImpute and FindHap showed a relatively large average allelicJR2 variability across
scenarios evaluated over the same reference assembly Only PedImpute showed a large
variation in SD across scenarios in errors and allelicJR2 with high variability on
scenarios C and D showing a low performance of thismethodwhen there are no close
relatives(egsiredam)genotypedinthereferencedatasetFImputeinsteadachievedthe
highestaccuracyacrossallreferencegenomesandscenariossurprisinglyobtainingitsbest
resultsJerrorsof14(06SD)andallelicJR2of955(26SD)JinscenarioCwhereonly
onerelativelycloserelativeisgenotyped(maternalgrandJsire)FImputeappeartobeable
todetectshorterhaplotypesfromdistantrelativesmoreaccuratelythananyothermethod
tested in this study thus it is lessdependenton theavailabilityofpedigree information
thantheothertwopedigreeJbasedmethods(Sargolzaeietal2010)
The large variability across methods within assembly was expected although the
differences obtained among PedImpute FindHap and Beagle were larger in this study
comparedtoasimilarstudyinItalianHolstein(Nicolazzietal2013)Thiscoulddependon
a number of factors First the sample size of the test population is small (especially for
groups C and D where less than 15 individuals were present) Second the interaction
between population structure and imputationmethodsmay have a role PedImpute and
FindHap rely much more heavily on pedigree information than FImpute and Beagle
assumesall individualsareunrelatedHowever the lattertwomethods indirectlybenefit
92
greatly of from highly structured populations with large number of relatives ndash close or
distantndashpresent inthereferencepopulation(Sargolzaeietal2014)Anotherinteresting
factor that influences the imputation accuracy may be the persistence of LD at long
distances this is much lower in the Italian Simmental than in the Italian Holstein
population(FigureS4)Informationonrelatives(eitherpedigreeinformationorgenotyped
relativesinthereferencepopulation)isexpectedtoplayabiggerroleifLDisconservedat
shorter distances This hypothesis is supported by the fact that although there is high
variability acrossmethods better performances of the pedigreeJbasedmethods are still
observedinscenariosAandB(andCforFImpute)
Theslightdifferencesinimputationaccuracyobservedacrossreferenceassembliesmaybe
explained by the aforementioned small variation in the SNP blocks across reference
assemblies irrespectively of the rearrangements of single SNPs Considering that SNP
blocksaremostlymaintainedweexpectthatbreedswithlargerLDpersistenceatlonger
distances (ie Holstein Brown) will obtain even better imputation accuracy across
assembliescomparedtowhatreportedinthisstudyinItalianSimmentalHoweverastill
relativelyunexploredaspectof imputationperformance is thedistributionof imputation
errors across the genomeNot considering genotyping errors (which are expected to be
randomly distributed in the genome) a cluster of consecutive markers with high
percentage of imputation error across methods are most probably identifying missJ
assembledregions in thegenome(FiguresS1S2andS3)Althoughthe impactonoverall
imputation accuracy is limited as results in this study show these errorsmight have a
large impact on any analyses relying on specific genomic coordinates (eg genomeJwide
associationstudies)Morecomprehensiveresultsincludingannotationstudiesandwhole
genomesequencingdataarerequiredtoconfirmandinvestigatetheseobservations
Toconcludeonlyasmalleffectofupdatingthereferencegenomeassemblyonimputation
accuracywas observed in Italian Simmental On the other hand as already reported by
other studies the variability of imputation accuracy estimates is much larger across
imputation methods The choice of the most suitable imputation method based on the
breedgeneticbackgrounditscharacteristicsandontheavailabledataisthereforecritical
AcknowledgementsThe research leading to these results has received funding from the European Unions
Seventh Framework Programme (FP72007J2013) under grant agreement ndeg 289592 ndash
93
Gene2FarmPartofthegenotypicdatausedinthisstudywasprovidedbyprojectldquoSelMolrdquo
fundedbytheItalianMinistryofAgriculture(MiPAAF)MMwassupportedbytheDoctoral
SchoolontheAgroJFoodSystem(Agrisystem)of theUniversitagraveCattolicadelSacroCuore
(Italy) FB was supported by the Marie Curie European Reintegration Grant
ldquoNEUTRADAPTrdquo(FP7JPEOPLEJ2009JRG)
ReferencesBrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingandMissingJ
Data Inference for WholeJGenome Association Studies By Use of Localized
HaplotypeClusteringAmJHumGenet811084ndash1097
MaPBroslashndumRFZhangQLundMSSuG2013Comparisonofdifferentmethods
forimputinggenomeJwidemarkergenotypesinSwedishandFinnishRedCattleJ
DairySci964666ndash4677doi103168jds2012J6316
Marchini J Howie B 2010 Genotype imputation for genomeJwide association studies
NatRevGenet11499ndash511doi101038nrg2796
NicolazziELBiffaniSJansenG2013ShortcommunicationImputinggenotypesusing
PedImputefastalgorithmcombiningpedigreeandpopulationinformationJournal
ofDairyScience962649ndash2653doi103168jds2012J6062
NicolazziELPiccioliniMStrozziFSchnabelRDLawleyCPiraniABrewFStella
A 2014 SNPchiMp a database to disentangle the SNPchip jungle in bovine
livestockBMCGenomics15123doi1011861471J2164J15J123
PeiYLiJZhangLPapasianCJDengH2008Analysesandcomparisonofaccuracyof
differentgenotypeimputationmethodsPLoSONE3551
VanRadenPMOrsquoConnellJRWiggansGRWeigelKA2011Genomicevaluationswith
manymoregenotypesGenetSelEvol43
94
Tableamp1ampIm
putationampaccuracyampacrossampassem
bliesampforampPedImputeampFindH
apampFImputeampandampBeagleamppercentageampofampwronglyampim
putedamp
allelesamp
PedImputeamp
FindHapamp
FImputeamp
Beagleamp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
Scenarioamp
Namp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
A1amp
84
20
07
21
07
21
07
32
09
32
09
32
09
15
09
15
09
15
09
24
11
25
11
25
11
B2amp
34
23
08
23
09
22
08
36
11
36
12
36
11
15
07
15
09
14
06
23
08
23
08
23
08
C3amp
13
98
41
98
41
98
41
51
10
52
10
51
10
14
06
14
06
14
06
23
06
23
06
24
06
D4 amp
12
88
49
89
49
88
49
54
11
56
12
54
11
16
11
17
12
16
11
26
15
26
15
26
14
1 Sireandm
aternalgrandsiregenotyped
2 Siregenotypedmate
rnalgrandsireno
tgenotyped
3 Sireno
tgenotypedmate
rnalgrandsiregenotyped
4 Sireandm
aternalgrandsireno
tgenotyped
Tableamp2ampIm
putationampaccuracyampacrossampassem
bliesampforampPedImputeampFindH
apampFImputeampandampBeagleampallelicJR2amp
PedImputeamp
FindHapamp
FImputeamp
Beagleamp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
Scenarioamp
Namp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
A1amp
84
941
20
940
20
941
20
907
24
908
25
907
24
952
26
952
27
952
27
925
33
923
33
925
33
B2amp
34
935
24
934
25
935
24
896
30
896
34
895
32
954
20
954
21
954
20
929
24
928
25
928
24
C3amp
13
759
89
761
88
758
87
848
33
845
30
848
32
955
18
955
19
955
19
931
19
928
19
927
19
D4 amp
12
782
113
777
113
781
112
841
32
832
35
839
33
949
34
947
38
949
35
921
44
919
45
921
44
1 Sireandm
aternalgrandsiregenotyped
2 Siregenotypedmate
rnalgrandsireno
tgenotyped
3 Sireno
tgenotypedmate
rnalgrandsiregenotyped
4 Sireandm
aternalgrandsireno
tgenotyped
95
Supplementaryfiles
YoucanfindChapterͷsupplementaryfilesat
- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)
- GoogleDrive(httptinyurlcomPhDMMChapter05)orscantheQRcode
SupplementaryFigureS1ErrorpercentageacrossmethodsforBTAU42onBTA1
SupplementaryFigureS2ErrorpercentageacrossmethodsforUMD31onBTA1
SupplementaryFigureS3ErrorpercentageacrossmethodsforBTAU46onBTA1
SupplementaryFigureS4Linkagedisequilibrium(LD)decayoverdistanceinItalian
SimmentalandHolstein
ThedottedlinewithblackdotsindicatesLDdecayoverdistanceinItalianHolstein
whereasthedashedlinewithwhitediamondsindicatesthesamevaluesforItalian
SimmentalItalianHolsteingenotypeswereasubsetofthedatausedinNicolazzietal
(2013)
96
CHAPTER6
QuestfordeleteriousrecessivevariantsinItalianHolsteinbullscombiningexomesequencingandhighdensitySNPanalysis
MMilanesi1etal
1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly
____________________Manuscript
Abstract
Outbreed species carry a genetic load of deleterious recessive variants Those
that are lethal survive only in the heterozygous state in the population while
otherssimplyreducethefitnessofindividualshomozygousforthevariantThe
long=termdestinyofthesemutationsistodecreaseinfrequencyduetonegative
selectionandeventuallydisappearbecauseofgeneticdriftHowever theymay
bemaintainedinlivestockpopulationsandevenincreaseinfrequencywhenin
linkage with favourable alleles for traits under selection or occurring in the
genomeofsireshavinghighgeneticvalueforproductivetraitsWesearchedfor
these variants combining exome sequence data from 20 Italian Holstein bulls
sampledfromoppositetailsofthemaleandfemalefertilityEBVdistributionthe
HighDensity(HD)SNPgenotypingof1009progenytestedItalianHolsteinbulls
and information existing in public databases Exome sequencing detected
202956 variants Among these 5167 were classified as ldquodeleteriousrdquo when
annotatedwithSIFTandAnnovarThesewerefilteredbypickingthosemapping
in regions out of equilibrium and lack of homozygotes in the progeny tested
Holstein bulls or having an asymmetric distribution in bulls plus=variant and
minus=variant for fertility EBV The resulting 193 variants in 163 genes were
furtherassessedbycomparativegenomicshaplotypeanalysisandfunctionFor
eachassessmentapositiveornegativescorewasgiventoevidenceinfavouror
against thevariant tobedeleteriousA totalof35variantshadapositive total
score These genes are involved in development cell motility and immune
responseSomeof themareknowntobeexpressed in the testiclesandappear
importantforspermfertilityinmodelspeciesothersareresponsibleforgenetic
defectsinhumanandmousecarryingmutationssimilartotheonesdetectedin
cattleThesegenevariantsarecandidatetobeamongthosethatconstitutethe
ldquogeneticloadrdquoofdeleteriousrecessivegenesintheHolsteinpopulation
100
Introduction
The presence of deleterious recessive variants is well tolerated by outbred
speciesWhenthesevariantsappeartheymayspreadandsurvivealongtimein
the population in the heterozygous state (Simmons and Crow 1977) It is
estimatedthatoutbredindividualshumanincludedcarryonaverageagenetic
loadofmorethan100loss=of=functionvariantsintheirgenome(MacArthuretal
2012)
Theeffectofdeleterious recessivevariantsappearswhendemographicevents
eggeneticbottlenecksreducethegeneticvariabilityofthepopulationorwhen
relatedindividualsareintentionallymatedforbreedingpurposesInthesecases
recessive deleterious reduce the fitness of individuals carrying them in the
homozygous state and in extreme case induce very severe genetic defects
premature death and abortion (McClure et al 2014 Sahana et al 2013
VanRadenetal2011)Theythereforecontributetoexplainthephenomenonof
inbreeding depression and its negative effect on fertility (Charlesworth and
Willis2009)
The search for deleterious recessive variants is particularly interesting in
Holstein cattle Thisbreed is experiencing anegative genetic trend for fertility
traits (Jamrozik et al 2005) it is worth noting that this is occurring while
inbreedingisunderstrictmonitoringandistheoreticallyincreasingataverylow
pace(about005peryear=Mrodeetal2009)Inadditionmatingisplannedto
minimise relationship between animals and the additive genetic variance for
mosttraitsndashincludingmilkproduction=isapparentlynotdecreasinginspiteof
the selection pressure applied by modern schemes (Biffani et al 2002)
indicatingthatoveralldiversityisonlyslightlydecreasing
SelectiontoincreasefertilityischallengingSelectioncriteriagenerallyadopted
to improve this trait are far from the biological mechanisms that govern
reproduction Inaddition thesemechanismsarecomplexandverysensitive to
environmental(management)conditions(Laneetal2013Wathesetal2014)
Asa result fertility traitshave loworvery lowheritability (Pryceetal1997)
andthereforelowresponsetoselectionandslowgeneticprogress
101
Forthesereasons thequest formolecularvariants influencing fertility traits is
on=goingsincetheavailabilityofmoleculartoolsforDNAanalysisAnumberof
investigations tested the effect of functional candidate genes running genome
wide association analyses to find QTLs for fertility traits using hundreds
microsatellite markers in structured families (Ashwell et al 2004) or tens of
thousand SNP panels in whole populations (Aston and Carrell 2009) The
availability of medium density SNP panels allowed the search for deleterious
recessivesevenwithouttheuseofphenotypictraitsbyscanningthegenomein
searchofregionslackingoneofthetwohomozygousclasses(Sahanaetal2013
VanRaden et al 2011) This approach has highlighted a number of genomic
regions likely carrying deleterious recessives This information has immediate
application inmolecular assisted breeding and a relevant scientific interest in
thesearchofcausativevariantsandassessmentoftheirfunctions
Thedevelopmentofnewsequencingtechnologieshasgreatlydecreasedthecost
of sequencing so that nowadays the sequencing of few target individuals is
becoming the gold standard to seek for causal mutations for rare monogenic
defectsinhumansandotherspecies(ChunandFay2009)Oftenonlytheexome
the protein coding part of the genome is sequenced searching for those
mutationsthatseverelyalterproteinstructureandorfunction(Bamshadetal
2011 Li et al 2013) A typical mammalian exome consists in a few tens of
millionnucleotides(around2ofthereferencegenome)comparedtothefew
billionofthewholegenomeThereforewiththesamenumberofreadsexome
sequencinghastheadvantageofamoreaccurategenotypingdueto thehigher
sequencing depth an easier data management lower computation load and
easier interpretation (Bieseckeretal2011Majewskietal2011) Indeed the
numbervariants found inexomesareone=twoordersofmagnitude lowerthan
thatdetectedinwholegenomesandmutationsaredetectedinannotatedgenes
orimmediatelynearbyOntheotherhandtheexomeapproachcanonlyidentify
variantson annotatedgenesVariants inunknownexonsor that occur innon=
coding regulatory features will be missed (Bamshad et al 2011) In humans
these are turning out to be of outmost importance for gene regulation and
downstreamphenotypic effects (ENCODEConsortium2012) but they are still
verypoorlyannotatedinlivestock
102
Herewesequencedtheexomesof20Holsteinbullsprogenytestedselectedfrom
the tails of male and female fertility EBVs (Estimated Breeding Value) Since
livestockfertilityisacomplextraitandcausativemutationscannotbesoughtby
the approach used in monogenic diseases we integrated the sequence
information with high density SNP chip data obtained on a much larger
populationofItalianHolsteinbullstodetectgenescarryingallelescandidateto
be deleterious recessives and confirm their likely deleterious effect by
comparativegenomics
Materialsandmethods
1SamplingandDNAextraction
Semen(straws)andbloodsampleswereprovidedbyItaliancertificatedartificial
insemination centres (UE Directive 88407CEE) thus ethics committee
approvalforthisstudyisnotrequired
A total of 20 ItalianHolstein progeny tested bullswere sampled for complete
exome sequencing Among them 19were chosen from the tails of female and
male fertilityEBV(EstimatedBreedingValue)distribution (Table1) tohave5
animals fromeach tail One animalwas to be in theminus=variant tail of both
traitsBullswereselectedasunrelatedaspossibleSamplingwaspossiblethanks
tothecollaborationwithANAFI(ItalianHolsteinBreederAssociation)
ANAFIdevelopedageneticevaluationfor fertilitycombiningdifferenttraitsas
explained by Biffani and colleagues (Biffani et al 2005) The traits are ldquodays
from calving to first inseminationrdquo ldquocalving intervalrdquo ldquofirst=service non return
rateto56daysrdquoldquoangularityrdquoandldquomatureequivalentmilkyieldat305daysrdquo
MalefertilityisexpressedasanEstimatedRelativeConceptionRate(ERCR)and
isameasureofconceptionrateofaservicesirerelativetoservicesiresofherd
mates(Soggiuetal2013)
103
Table1FemaleandmalefertilityEBVsofthe20animalssequenced
Animal EBVFemaleFertility EBVMaleFertility GroupfriA01 97 =111 NAfriA02 98 =166 MalLfriA03 115 0 FemHfriA04 114 0 FemHfriA05 114 0 FemHfriA06 86 0 FemLfriA07 98 195 MalHfriA08 105 213 MalHfriA09 95 =154 MalLfriA10 95 149 MalHfriA11 102 264 MalHfriA12 91 =475 FemLMalLfriA13 105 =179 MalLfriA14 102 207 MalHfriA15 99 =405 MalLfriA16 85 0 FemLfriA17 87 0 FemLfriA18 121 0 FemHfriA19 117 0 FemHfriA20 88 0 FemL
A total of 1009 Italian Holstein bulls were chosen from the entire population
balancing theirpresence in the tailsof theEBV forkgprotein (PROT) somatic
cell count (SCC) fertility (FERT) and functional longevity (LONG) for high
densitygenotyping(TableS1)
DNAwasisolatedfromwholebloodandsemenstrawssamplesinLGSlaboratory
(CremonaItalywwwlgscrit)usingtheGenomixkitaccordingtoLGSinternal
protocolDNAsforexomecaptureandsequencingwereextractedindependently
tosatisfy therequirement indicatedby IGA technologies (Udine Italy)at least
12mgofDNAataconcentration100ngulwithR260280higherthan18and
R260230near19
2Moleculardataproductionandqualitychecking
21Exomecaptureandsequencing
104
Exomes were captured using the ldquoSureSelected All exon Bovinerdquo (Agilent
Technologies) kit in early version Probes coordinates are based on UMD31
bovinereferencesequence(Ziminetal2009)anddesignedagainstNCBIexons
as well as predicted exons UTRs and miRNAs Number and distribution of
probesperchromosomearedetailedinTableS2
rdquoOvation Ultralow Library Systemsrdquo protocol was followed for library
preparation After quantification with Qubit (Life technologies) and quality
controlwith2100Bioanalyzer(AgilentTechnologies)DNAwassonicatedwith
Bioruptor (Diagenode) to obtain fragments between 150 to 200 bp in length
blunt end=repaired adenylated ligated with paired=end adaptors and purified
followingtheprotocolprovidedbythemanufacturer
LibrarieswerehybridizedwithldquoSureSelectedAllexonBovinerdquoprobesusingthe
Encore Target Capture Module (Nugen) SureSelect Target Enrichment and
Herculase II FusionEnzymewithdNTPCombo (AgilentTechnology) kit Then
capturedDNAwasamplifiedandtaggedwithlibraryspecificindexesQualityof
capturedsampleswastestedwithQubitand2100Bioanalyzer
Paired=end 2X100bp sequencing was performed on Illumina HiSeq2500
(IlluminaSanDiegoCA)atanexpectedcoverageof50X
22Sequencealignment
Adapters and reads shorter than 36bp were trimmed using Trimmomatic
(version032)(Bolgeretal2014)ForeachFastqfilethealignmentandpairing
of the reads to theUMD31 reference sequencewas performedwithBurrows=
WheelerAligner(BWA)(version077=r441)(LiandDurbin2009)applyingalso
trimmingoption (=q5) SAM files resultedwere converted intoBAM file using
SAMtools (version0118) (Lietal2009)Foreach individualBAM fileswere
merged in a single one with Picard (version 1111)
(httpsgithubcombroadinstitutepicard)Duplicatereadswereidentifiedand
markedwithPicardFinallyindividualBAMfileswereindexedwithSAMtools
105
Sequencedepthwasevaluatedsimultaneouslyonall20sequencedanimalsand
onlyonregionstargetedbyprobesusingDepthOfCoverageofGATK(McKennaet
al2010)
23ExomeVariantcallingandqualitycheck
Singlenucleotidepolymorphisms(SNPs)insertionsanddeletions(InDels)were
calledsimultaneouslyonall20sequencedanimalsandonlyonregionstargeted
byprobesusingUnifiedGenotyperofGATK(DePristoetal2011)
All variantswere filteredusingVariantFiltration ofGATK to removeoneswith
lowmappingandorsequencingqualityThechoiceoffiltersandthresholdswas
made in accordance to ldquoGATK best practicerdquo for exome sequencing and
distributionoffiltervalues
bull ldquoSNPclusterrdquoexcludedvariantifmorethan3variantswerefoundin10
bp
bull ldquoBaseQualityRankSumTestrdquoexcludedifitrsquoslowerthen=5ormorethan
5
bull ldquoMappingQualityRankSumTestrdquoexcludedif itrsquoslowerthen=7ormore
than6
bull ldquoRead Position Rank Sum Testrdquo excluded if itrsquos lower then =4 or more
than4
bull ldquoFisherStrandrdquoexcludedifitrsquosmorethan30
bull ldquoDepthrdquoexcluded if itrsquos lower then60X(ieaminimumvalueof3Xper
animals)ormorethan3500X
bull ldquoQualityrdquoexcludedifitrsquoslowerthen40
bull ldquoMappingqualityrdquoexcludedifitrsquoslowerthen30
Filters were applied to all variants but only bi=allelic SNPs were used in the
following analyses Genotypes with ldquoGenotype Depthrdquo lower than 5 and
ldquoGenotype Qualityrdquo lower than 20 were discarded The remaining ones were
converted intoldquoABrdquocode(A=referencealleleandB=alternativeallele)Then
data were transformed in PLINK (Purcell et al 2007) format using a home=
made python script To prepare the working dataset the final quality control
106
removedvariantswithmore than25missingdata (that iswithmore than5
animals out of 20 without genotypes) monomorphic (minor allele frequency
MAFlt1)ornotmappinginautosomesBasicpopulationgeneticsparameters
werecalculatedtoevaluatethepresenceofcrypticstructureinthedatathatmay
influencetheoutcomeofthedownstreamanalyses
In total 24390 SNPs identified in the exomes were also included in the HD
SNPchip panel HD genotype from nineteen out to twenty of the sequenced
animals sequenced was used to compare technologies Genotypes at common
locationswerecomparedacrosstechnologies toevaluatesequencequalityand
theeffectofsequencedepthongenotypecalling
24BovineHDGenotypingBeadChipdataandqualitycheck
AllanimalsweregenotypedusingtheBovineHDGenotypingBeadChip(Illumina
San Diego CA) Genotypes were quality controlled and filtered with standard
threshold Markers with more than 5 missing data and MAF lt 1 were
excluded Animals with more than 5missing data were also excluded Only
autosomal SNPs were retained Thereafter genotypes were phased and the
remainingmissingdata imputedusingBeaglev332(BrowningandBrowning
2007)
Ofthe20animalssequenced12hadalsoBovineHDGenotypingBeadChipand7
BovineSNP50 Genotyping BeadChip version 2 genotypes available For
concordance and haplotype analysis we imputed the 50K data to 800K as
previouslydescribedabove
3Strategyfordetectingcandidatedeleteriousvariants
Wesearchedforcandidatedeleteriousvariantsfollowinganintegratedapproach
that joined the analysis of i) exomes of a few Italian Holstein bulls having
plusvariant and minusvariant EBVs for a target trait ii) HD genotyping of a
population of 1009 progeny tested Italian Holstein bulls iii) the comparative
genomic analysis of 100 vertebrate species The strategy followed for the
107
analysisisshowninFigure1Inthefirststep(Search)welookedforcandidate
deleterious variants in the exomes In step2 (Filter)we filtered candidatesby
assessingwhichof thesevariantseitherhadanon=randomdistributionamong
animals having extreme EBVs or co=mapped with genomic regions carrying
outliersignals inthelargepopulationofprogenytestedbulls Instep3(Asses)
candidateswere further evaluated by i) assessing their conservation across a
panel of 100 vertebrate species ii) searching for associated haplotypes and
evaluating their frequencies in the population of progeny tested bulls iii)
evaluatingtheirassociationwithgeneticdefectsinhumanandmodelspeciesiv)
examining their expression profile and function when known Methods are
belowdescribedindetail
Figure1Strategyfollowedtoidentifydeleteriousvariants
108
31Step1Searchforcandidatedeleteriousvariants
Quest for candidate deleterious variants in exome sequence data by
variantannotation
WeannotatedallvariantsusingANNOVAR(Wangetal2010)andVEP(Variant
Effect Predictor) (McClure et al 2014) software They use the Ensembl gene
annotation database (UMD31 reference sequence release 74) to investigate
locationandpotentialeffectoncodingsequencesofallvariantsVEPintegrates
the SIFT algorithm (Kumar et al 2009) to predict the effect of amino acid
substitutiononproteinfunctionForeachsequencechangesequencehomology
and physic=chemical similarity between the reference amino acid and the
alternative is considered providing a score that evaluates the mutation
tolerabilityVariantswithscorelowerthan005wereclassifiedasldquodeleteriousrdquo
inaccordingtosoftwarerecommendations
SIFTalsocomparedallvariants found in theexomeswith thosepresent in the
dbSNPdatabase(version140)toidentifythosethatwerenovel
32Step2Filteringofcandidatedeleteriousvariants
2a Analysis of deleterious variant distribution among animals having
extremesEBVvaluesformaleandfemalefertilitytraits
Association between variants and the classification in the tails of the EBV
distributionwas estimated byFisherexact testwhich compares frequencies of
alleles (a vs A) in the two tails not assumingH=W equilibriumgenotypic test
that compares the frequency of genotypes (aa vs Aa vs AA) and the recessive
geneactiontestwhichtestsforthedistributionofonehomozygousclassversus
theothers(aavsaAAA)AlltestsareimplementedinPLINKfunctionsThe5
significance threshold was set empirically by permutation (N=100000) either
point=wise that is not corrected for multiple testing (EMP1 = Empirical
Probability1)andthereafterfamily=wisecorrectedformultipletesting(EMP2=
EmpiricalProbability2)
109
GenesexceedingEMP1were considered suggestive and thoseexceedingEMP2
significant candidates to be deleterious recessives and their characteristics
conservationacrossspeciesandfunctionfurtherinvestigated
2b Quest for regions candidate to carry deleterious variants in the
populationanalysedwiththeHDBeadChip
Basic population genetics parameters and Multi Dimensional Scaling were
calculated with GenABEL R package (Aulchenko et al 2007) to evaluate the
presenceofcrypticstructure inthedatathatmayinfluencetheoutcomeofthe
downstreamanalyses
ExpectedandobservedallelicandgenotypicfrequenciesandFisherexacttestof
Hardy=Weinberg proportions (Janis E Wigginton 2005) were calculated with
PLINKon theHDSNPdatasetValueswere averaged in slidingwindowsof10
markersThisvalue(10markers)waschosenaftertestingdifferentwindowsize
as represent a good compromise between noise reduction and resolution (not
shown)
Candidate regions containing deleterious mutations were identified following
twodifferentapproaches
In one approach hereafter refereed to as ldquoOut of Hardy=Weinbergrdquo approach
(OHW)markerssignificantlydeviatingfromofHardy=WeinbergequilibriumatP
le84710=8 (nominalvaluePle005Bonferronicorrected formultiple testing)
werefirstidentifiedWindowswithatleastthreesignificantmarkersbelowthis
thresholdandhavingahomozygotedeficitwereconsideredcandidatestocarry
deleterious mutations In this OHW approach we applied very stringent
constraints to reduce false positive discoveries due to the Wahlund effect
inducedbypopulationsubstructure(seeresultsandFigureS5)
In thesecondapproachhereafterreferred toas ldquoLackOfHomozygotesrdquo (LOH)
approach the differences between observed and expected frequencies of
homozygotesfortheminorallelewerecomputedandaveragedacrossmarkers
included in each sliding window Values were standardized and only regions
carrying three overlapping windows below =3Z score value were considered
110
candidates to carry deleterious mutations Thereafter significant windows
carryingatleastonemarkerincommonwerejoinedtodefineregionboundaries
ExomesequencingknowledgeandHDBeadChipdatawerecrossed inorder to
identifygenescarryingseverevariantslocatedwithinsignificantregionsortheir
proximity (50Kb upstream or downstream) These genes were considered
candidates to be deleterious recessives and their characteristics conservation
acrossspeciesandfunctionwasfurtherinvestigated
33Step3Assessmentofcandidatedeleteriousvariants
Genes carrying variations classified as deleterious by SIFT or ANNOVAR and
eitherassuggestivebyfertilityEBVtailsanalysisorassignificantbyOHWeLOH
analyses of the Holstein bull population were submitted to comparative
genomics population genetics and functional analyses In order to prioritize
genesvariants discoveredwe scored each evaluation using a subjective scale
ranging from =3 to 3 to measuring the evidence in favour (scores 1 to 3) or
against(scores=1to=3)thedeleteriouseffectoftheSNPsThescore0inallcases
indicated the absence of evidence or that a specific assessment failed The
completescorescalethereforerangedfrom=9to9
3aComparativegenomics
For comparative genomics purposes the position of eachmutationwas ldquolifted
overrdquo from UMD31 bovine coordinates to hg19 human coordinates using the
specifictoolatUCSC(Kentetal2002=httpgenomeucscedu)Orthologous
proteins were aligned using the UCSC genome browser and amino acid
conservation assessed in a set of 100 vertebrate species that included human
primates (11 species) euarchontoglires (egmouse14 species) laurasiatheria
(egsheep25species)afrotheira(egelephant6species)othermammals(eg
platypus5species)birds(egchicken14species)sarcopterygii(egturtle8
species) fish (eg zebrafish 16 species) Results of amino acid conservation
analysiswererecordedonlyifhomologousproteinfromatleast50speciescould
beretrievedandalignedThereafteraminoacidconservationwasscored3ifthe
111
aminoacidwasconservedinallvertebrates2ifconservedinallmammalsand1
ifconservedinthelargemajority(atleast90)ofmammalsAscoreof=1was
given when the amino acid was not conserved among species and a score =3
when both variants observed in cattle were carried by other species Finally
wheneithertheliftoverprocessortheproteinalignmentfailedthecomparison
wasconsiderednoninformativeandgivenascore0
3bSearchofhaplotypesassociatedtodeleteriousvariantsandanalysisof
HDpopulationdata
To acquire further evidence we searched for the haplotypes that carry the
deleterious variations and estimated their frequency in the
homozygousheterozygousstatuswithinthebullpopulationgenotypedwiththe
HDBeadChip
Weusedamultistepapproachtofindhaplotypescarryingdeleteriousvariantsi)
from the phased HD BeadChip data we extracted the haplotypes of the 20
sequencedanimalsintheregionofthevariant(100markersoneachside)ii)in
these 20 animals we searched the shorter haplotype able to distinguish
homozygous andor heterozygous animals for the deleterious variant from all
other haplotypes iii) we assessed the frequency of homozygotes and
heterozygotes for thehaplotype in thewholepopulationofbulls characterised
withtheHDBeadChip
Scoresinthiscasewere=3inthecaseofuniquehaplotypesshowinganumberof
homozygotesnonsignificantlydifferentfromthenumberexpectedfromHardy=
Weinbergproportions3inthecaseofuniquehaplotypesshowinganumberof
homozygotes significantly lower than expected 1 in the case of unique
haplotypesneverhomozygousandrare0whenevernouniquehaplotypecould
befound
3cFunctionalanalyses
With PantherDB (Mi et al 2010) resources we classified our gene list into
functionalcategoriesaccordingtotheGeneOntologyclassificationAllsuggestive
112
andcandidateEnsemblGeneIDsubmittedtothebulkanalysiswererecognizedand annotated Beyond the GO term characterization in order to understandpotentialdetrimentaleffectsofmutationsandalsopinpointpotentialcandidatesfor fertility traits we retrieved information for each gene according to thefollowing parameters i) the involvement in known Mendelian or complexdiseases fromOMIMdatabase ii) theavailabilityofknock=outmice iii) and theeventual expression in reproductive tissues from the Gene Expression Atlas(httpwwwebiacukgxa)(Table4)
Alsoforthesedataweusedthesamescaleattributingarangefrom=3pointstogenes having no relation with fertility viability and genetic defects and norelevant phenotypic effects in knock=out mouse models to 3 to gene closelyrelated to fertility and viability associated to genetic defects in human andaffectingfertilityorsurvivalwhenknocked=outInparticularascoreranging=1to+1wasattributedtogenesassociatedtohumansyndromes(=1=noassociationrecordedinOMIMdatabase0=noinformationinOMIM1=associationinOMIM)A score with same range to information collected from knock=out mousephenotypes(=1=knock=outmouseshowsnoimpairedphenotype0=noknockoutmouse has been produced 1= knock=out mouse shows impaired phenotype)Thesamescorewasattributedtomolecularfunctions(=1=functionunrelatedtofertilityandviability0=functionindirectlyrelatedasimmunityandmetabolism1=functiondirectlyrelatedasfertilityanddevelopment)
Results
Exomecaptureandsequencing
A totalof225414probeswereused for capturing thecattleexome (5355Mbabout 2 of UMD31 reference genomes) Mitochondrial (MT) and Ychromosome were not included in probe design The number of probes perchromosomespannedfrom2500onBTA27toalmost12000inBTA3(TableS2)
113
The average exome sequencing depth was 4058 slightly lower than the
expected50XAllanimalshadatleast24Xaveragegenomessequencedbutina
same animal the sequencing depth varied across chromosomes and regions
withinchromosomesThelowestchromosomeaveragevalueobservedwas10X
inBTA25oftheshallowersequencedanimalAveragecoverageperanimaland
chromosomeandoverallmeansareshowninFiguresS1
In total1613461402 raw100bppair=ends sequence readswereproducedA
total of 1425514112 100bp reads were mapped and properly paired to the
reference genome Paired reads per animal ranged between 41973510 and
138840392
Insynthesissequencingdepthwasvariableasexpectedandalmostallanimals
hadsomeregionsofscarceornosequencedepthbutoverallallexomeregions
were sufficiently covered to permit a comprehensive analysis of molecular
variantsoftheanimalsinvestigated
Step1Searchforcandidatedeleteriousvariants
Variantdetectionandannotation
Thesequencingdepthofallvariantscalledonthealignmentofallreadsacross
animalsfollowedaChi=squaredistributionAveragedepthwas79694spanning
from2Xto5000Xwithpeakfrequencyclassat150=250X(FigureS2)
A multistep approach was followed to clean the exome sequence dataset
Cleaning process steps and the number onmarkers retained in each step are
describedinTable2
114
Table2Typeandnumberofmarkersdetectedandretainedafterdatasetcleaning
TypeofvariationNumberofvariations(N)
Rawdata
Aftervariantcleaning
Aftergenotypecleaning(onlyautosomal)
Complex 1213 963 0Deletion 6938 5031 0Insertion 5510 3656 0MultiallelicComplex 5710 3995 0MultiallelicSNP 803 512 0SNP 242988 203079 164277Total 263162 217236 164277whencomparedtothereferencegenome
Atotalof263162variantswereidentifiedAmongthese217236(8255)wereretainedafterapplyingthevariantfilters(seematerialsandmethodssection)
According to GATK classification the vast majority (9348) of variationsidentified were biallelic SNPs followed by insertions and deletions (InDels40) more complex variations and multiallelic SNPs (252) Indels andcomplexvariationsalthoughlessfrequentthanSNPsaffectedarelevantportionoftheexome(49589bp)Thenumberofvariantsdetectedperchromosomewassignificantly correlated to the number of probes used (r=08761P=228x10=10)andofbasescaptured(r=08424P=53x10=19)perchromosomehoweverBTA23BTA27 BTAX had an outlier behaviour the former two containing highervariation(450and445variantsperKb)andthelatterlower(084variantsperKb)thantheaverage(322variantsperKb)
Bialleic SNPswere further filtered for genotype quality (sequence depth gt 5Xandsequencequalitygt20)andBTAXmarkersThefinalautosomalSNPdatasetcounted164277SNPsAmongthese17829biallelicSNPs(109)describedforthefirsttimehavebeensubmittedtodbSNP
Variantsrsquo annotation is shown in Table 3a Variantswere detected in differentgenedomains(exonsintrons3rsquountranslatedregions)aswellasupstreamanddownstreamgenesMorethan31ofthevariationsweredetectedinexonsandamong them more than 41 have the potential to have an effect on genefunctionbyaffectingsplicingoralterproteinstructure(Table3b)
115
Table 3 A) number of autosomal SNPs annotated in each gene region B) number and
classificationofautosomalSNPsalteringgenefunction
A)
Generegion NofSNPs(autosomes)Downstream 1170Exonic 51293Intergenic 49901Intronic 58116ncRNA_exonic 160ncRNA_intronic 3ncRNA_splicing 2Splicing 184Upstream 1445Upstreamdownstream 54UTR3 1345UTR5 604Total 164277
B)
EffectofexonicSNPs NofSNPs(autosomes)Nonsynonymoussinglenucleotidevariation 20839Stopgainsinglenucleotidevariation 213Stoplosssinglenucleotidevariation 10Synonymoussinglenucleotidevariation 30226Unknown 5Total 51293
whencomparedtothereferencegenome
Amongthe51293exonicSNPs4166in2978geneswereclassifiedldquodeleteriousrdquo
by SIFT Among these 3210 in 2356 genes were observed at least twice
ANNOVARidentified223stopgain(213)orloss(10)variantsin217genes
Theaverageconcordanceat24390SNPs in commonbetweenexomesand the
HD Beadchip was 9716 (Table S3) Discordance was due to either alleles
detected by sequencing andnot detected by genotyping (exome=heterozygote
Beadchip=homozygote) and vice=versa (exome=homozygote
Beadchip=heterozygote) Discordances were randomly distributed among
markers and inversely correlated to sequencing depth (Figure S3) indicating
that most errors are likely occurring during sequencing even if occasional
116
Beadchip failure indetectingsomeallelecannotbeexcludedTwoanimalshad
anoutlierbehaviouronehaving846andthesecond719concordanceBoth
animalswereretainedforvariantdiscoverywhiletheywereexcludedfromthe
analysesofEBVtailsWithoutthesetwooutlieranimalsthemeanconcordance
increasesto9938
Multidimensional scaling confirmed the absence of a substructure in animals
sequenced(FigureS4)
Step2Filteringofcandidatedeleteriousvariants
AnalysisofEBVtails
Fisherexact test isgenerallyused to test for theassociationbetweenavariant
and a Mendelian disease Samples are divided in cases and controls and the
association between alleles or genotypes and the disease is tested under
different inheritance models Here we used the same method to test for
association between allelesgenotypes and animals from opposite tails of the
EBVdistributionfori)malefertility(N=4high+4low)ii)femalefertility(N=5
high+4low)andiii)maleandfemalefertility(N=9high+8low)
Genotypes were tested under all inheritance models implemented in PLINK
Giventhelownumberofsamplesnovariantwassignificantaftercorrectionfor
multiple comparisons (EMP2) and only suggestive variants could be found
(significant at 5 at EMP1)We considered suggestive of carrying deleterious
allelesalistof101genescarrying112variants(TableS4)
Detectionofregionscarryingdeleteriousmutationfrom800KSNPdata
After applying quality control parameters 26653 and 146991 markers were
removed from the original dataset for excessive missing data and MAF
respectivelyInaddition17individualsremovedforlowgenotypingrateAswith
exomes only autosomal SNPswere retained for the analyses thus obtaining a
workingdatasetof590680SNPsand992animals
117
MDS(FigureS5)analysis indicates that thedatasethassomesubstructure that
mightaffectdownstreamanalysesWeaccountedfortheriskoffindingmarkers
outofHardy=WeinbergproportionsduetotheWahlundeffectbychoosingvery
stringentsignificancethresholdintheOHWanalyses
To detect regions candidate to carry deleterious recessivemutations we used
two approaches the ldquoOutside Hardy Weinbergrdquo (OHW) and the ldquoLack Of
Homozygotesrdquo (LOH) approaches (see materials and methods) The two
approaches complement each other in the detection of regions carrying less
homozygotes than expected OHW identifies regions in which a few markers
have extreme homozygote deficit while LOH detects regions in which many
markershaveamoderatedeficit
In the SNP dataset 1884 markers were significantly out of Hardy=Weinberg
equilibrium(Ple84710=8)Atotalof485windowswereidentifiedwithatleast
3 markers out of 10 exceeding this threshold Among these 84 contained
markerssignificantbecauseofhomozygotedeficitThese84windowsidentified
11OHWcandidateregionsin9chromosomes(TableS5)
TheLOHapproachidentified2335windowsbelow=3Zscoreofthestandardised
differencebetweenobservedandexpected frequencyofhomozygotes forMAF
Theseidentified326LOHcandidateregionsspreadonallautosomes(TableS6)
Onlyregionsformedbyatleast3singlewindowswereretainedreducingLOH
candidateregionsto240
Deleterious variants (SIFT deleterious and stop codon gainedlost) were
intersectedwithsignificantOHWandLOHregionSevenofthesemappedwithin
OHW sliding windows 36 within LOH sliding windows Among these 5 were
located in regions significant in both OHW and LOH Other 51 deleterious
variants mapped in the immediate vicinity (+=50Kb) of significant windows
These 83 variants affect 66 unique genes candidate to carry deleterious
recessivealleles(TableS7)
Five regionswere common toOHWandLOHoneonBTA4 (LOH=74959610=
75151417 OHW=74970653=75151417) two on BTA7 (LOH=9626435=
10099512 OHW=9628735=10099512 LOH=10575691=11171132
118
OHW=10486424=11087441) one on BTA15 (LOH=80299924=80928311
OHW=80299924=80921274) and one on BTA18 (LOH=28122373=
28185525OHW=28106022=28164693)
Step3Assessingcandidatedeleteriousvariants
After the two filtering step previously described a total of 191 SNPs were
retained out of the 3433 classified as deleterious by SIFT and Annovar and
analysed further These are carried by 163 genes Eighteen genes carried 2
candidateSNPsthreegenes3SNPsandonegene5SNPsInaddition4SNPsin
four geneswere identified as potential candidates in both the analysis of EBV
tailsandofbullpopulation
A totalof12SNPs introducedstopcodonswhile the remaining inducedamino
acid substitutionsor splicing variants In the filtereddataset theproportionof
stop codons on total variation (63) was only slightly higher than the
proportion found in the entire dataset (51 225 stop codon out of 4391
deleteriousSNPs)
Comparativegenomics
Proteinmultiplealignmentwassuccessfulin146outof191comparisonsIn26
casesthemostfrequentaminoacidincattlewasveryhighlyconservedeitherin
allvertebrates(N=9)orinallmammals(N=17)Inother23casestheaminoacid
was conserved in at least 90 of mammalian species In the last class were
generally included amino acids that changed in mammalian species
phylogenetically verydistant from cattle (eg platypus and related species) In
97 instances the amino acid could change quite freely and occasionally both
variantsobservedincattlewerecarriedbyotherspecies
HaplotypeanalysisofHDpopulationdata
The search for haplotypes associated to the191 variants for further testing in
the population identified 101 unique and invariant haplotypes common to all
119
carriersofatargetvariantanddifferentfromallthosefoundinnoncarriersIn
additional36casesanoncompletelyinvarianthaplotypecouldbeassociatedto
themutationConverselyin54instancesareliablehaplotypecouldnotbefound
(table S8) In total 38 haplotypes had no homozygous individuals in the
population Thiswas expected in the case of 33 rather rare haplotypes (fewer
than 100 heterozygotes observed in the population) Conversely 5 haplotypes
were rather frequent (from 170 to almost 300 heterozygotes observed in the
population)andtheexpectationwastofind78to222homozygousindividuals
in thepopulation insteadof thenone foundAnothervarianthadaremarkable
high frequency (35) and a clear outlier behaviour Only 11 homozygous
individualswereobservedoutofthe121expected
Functionalanalysisofcandidategenes
To assess the potential deleterious effects of the variants identified we also
retrieved functional information for each gene according to the following
parametersi)associationtoknownMendelianorcomplexdiseaseslistedinthe
OMIM database ii) availability and effect of gene knock=out in mice iii)
expression profileswith particular attention in reproductive tissues from the
GeneExpressionAtlas(Table4)
Twenty genes out of 163 have a recognized phenotype in humans These are
mostlyseveresyndromesUnfortunatelyonlyafewgeneshaveanull=miceline
producedwithphenotypicdataavailable
According to the expression data available from Ensembl these genes are
generally (with someexceptions)expressed inavarietyof tissueswitha clear
tendencytobehighlyexpressedintestis
120
Tableamp4Informationon163genesanalyzedconcerningeventualphenotypesinmanorknockoutsinmouseHeaderampcodesAbrainBcolonCheartDkidney
EliverFlungGskeletalmuscleHspleenItestisColoursrepresentthegenelevelofexpressionindifferenttissuefromlow(whitegrey)tohigh(blue)NCin
FunctioncolumnmeansldquoNotClassifiedrdquo
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000000079
CCSAP
Cells
9
1
03
1
07
2
06
2
2
NC
ENSBTAG00000000149
USP40
Cells
6
7
2
7
5
7
2
8
9
fertility
ENSBTAG00000000414
FUT5
Nomatch
37
1
cellular
process
ENSBTAG00000000918
ERMARD
Periventricular
nodularheterotopia
6Xlinked
periventricular
heterotopia6q
terminaldeletion
syndrome
Periventricular
nodularheterotopia
Nomatch
5
4
4
9
6
7
4
9
17
cellular
process
ENSBTAG00000000998
SLC39A6
Cells
6
9
2
7
4
10
1
10
9
cellular
process
ENSBTAG00000001133
VWA8
Mice
2
3
6
7
10
3
3
4
4
blood
metabolism
ENSBTAG00000001140
IAH1
httpswwwmousephenotypeor
gdatagenesMGI1914982sect
ionassociations
BOTHSEXadiposetissue
behaviourneurological
growthsizebodyFEMALE
homeostasismetabolismMALE
limbsdigitstailskeleton
9
18
30
60
23
19
12
17
30
cellular
process
ENSBTAG00000001262
IRGQ
Cells
6
1
5
3
06
2
5
04
2
immunity
ENSBTAG00000001286
ELMO3
httpswwwmousephenotypeor
gdatagenesMGI2679007sect
ionassociations
Decreasedstartlereflex
02
27
14
7
7
02
1
fertility
ENSBTAG00000001291
OR8H3
Nomatch
fertility
ENSBTAG00000001500
FIGNL1
Cells
08
2
01
05
07
08
04
2
3
cellular
process
ENSBTAG00000001505
GRK4
Cells
10
7
09
4
6
3
2
6
218
signalling
ENSBTAG00000002220
SPTLC1
Hereditarysensory
andautonomic
neuropathytype1
No
17
27
3
28
12
28
5
19
6
metabolism
ENSBTAG00000002288
NT5DC4
Nomatch
02
cellular
process
ENSBTAG00000002289
NLRP14
No
04
cellular
process
ENSBTAG00000002660
CCDC11
Situsinversustotalis
Nomatch
3
1
06
2
07
3
03
2
54
fertility
121
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000002809
CAPN10
AutosomalRecessive
MentalRetardation
DIABETESMELLITUS
NONINSULIN
DEPEND
ENT1
Cells
2
13
41
33
29
immunity
ENSBTAG00000002966
DNAJC13
Youngadultonset
Parkinsonism
No
5
52
94
62
84
fertility
ENSBTAG00000003231
LIM2
Pulverulentcataract
Cells
01
02
01
02
cellular
process
ENSBTAG00000003508
CFAP69
httpswwwmousephenotypeor
gdatagenesMGI2443778sect
ionassociations
Abnormalcirculatinginsulinlevel
MaleInfertilityDecreasedmean
plateletvolum
eIncreasedleanbody
mass
203
07
1
01
7NC
ENSBTAG00000003523
UGT2B15
Nomatch
34
metabolism
ENSBTAG00000004081
FAT3
No
2
03
01
01
05
signalling
ENSBTAG00000004555
LRP2
DONN
AIBARROW
SYND
ROME
Intellectualdisability
Cells
56
3
01
disease
ENSBTAG00000004772
THEM
4
Cells
8
16
295
27
52
63
metabolism
ENSBTAG00000004860
SLC27A6
Cells
6
02
10
602
207
01
immunity
ENSBTAG00000005021
SEMA5A
Monosom
y5p
Cells
9
04
89
13
08
08
6immunity
ENSBTAG00000005170
GPR114
Mice
101
02
01
09
6
01
fertility
ENSBTAG00000005248
OFCC1
No
01
01
03
01
01
01
03
NC
ENSBTAG00000005340
SERGEF
Cells
7
10
35
15
58
23
cellular
process
ENSBTAG00000005357
VPS11
Mice
14
13
619
717
815
15
cellular
process
ENSBTAG00000005466
C8H8orf58
Nomatch
05
01
08
02
103
02
04
NC
ENSBTAG00000005514
TRIO
Cells
17
66
10
210
88
7immunity
ENSBTAG00000005633
RGNEF
httpswwwmousephenotypeor
gdatagenesMGI1346016sect
ionassociations
Vertebralfusion
36
214
05
801
08
1cellular
process
ENSBTAG00000006021
CEP250
Mice
3
22
43
41
88
metabolism
ENSBTAG00000006027
USP34
Mice
11
63
74
88
912
fertility
ENSBTAG00000006532
CDK5RAP2
Autosomalrecessive
primary
microcephaly
httpswwwmousephenotypeor
gdatagenesMGI2384875sect
ionassociations
skeletonimmunesystem
growthsizebodyhem
atopoietic
system
5
34
92
74
784
developm
ent
ENSBTAG00000007001
SLC26A1
No
03
01
02
32
903
01
07
metabolism
ENSBTAG00000007340
LOC101906349
Nomatch
NC
ENSBTAG00000007568
MEPE
Cells
cellular
process
ENSBTAG00000007910
EXOC7
Cells
13
24
23
22
12
22
15
22
28
cellular
process
122
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000008650
CAMK1D
Cells
24
2
01
06
1
2
03
7
3
signalling
ENSBTAG00000008797
Uncharacterized
Nomatch
09
04
07
04
NC
ENSBTAG00000008871
DDX4
Cells
02
01
01
132
cellular
process
ENSBTAG00000008996
SFI1
Mice
2
1
06
3
04
3
2
5
NC
ENSBTAG00000009014
UPK1B
Mice
2
2
2
5
4
4
2
7
3
fertility
ENSBTAG00000009030
NOX5
Nomatch
04
01
8
01
immunity
ENSBTAG00000009033
TEX37
httpswwwmousephenotypeor
gdatagenesMGI1921471sect
ionassociations
Abnormalstartlereflex
03
83
NC
ENSBTAG00000009064
CSF2RB
Congenitalpulmonary
alveolarproteinosis
Mice
01
2
04
06
2
23
03
17
05
signalling
ENSBTAG00000009144
BPIFA2A
Nomatch
6
06
05
metabolism
ENSBTAG00000009337
PNMAL1
Cells
8
05
01
3
02
03
03
04
3
Immunity
ENSBTAG00000009444
SLC15A5
Cells
01
cellular
process
ENSBTAG00000009568
KANK2
Cells
1
17
30
9
3
31
9
10
3
cellular
process
ENSBTAG00000009569
DOCK6
AdamsOliver
syndrome
Cells
2
5
10
12
5
15
5
6
21
metabolism
ENSBTAG00000009836
CHGA
Mice
140
555
01
2
02
01
07
1
metabolism
ENSBTAG00000010522
Uncharacterized
Nomatch
01
02
02
03
2
metabolism
ENSBTAG00000010601
DOLK
Familialisolated
dilated
cardiomyopathy
No
4
8
6
22
9
6
5
5
12
metabolism
ENSBTAG00000010847
FOXN4
Cells
01
01
metabolism
ENSBTAG00000010954
ART3
Cells
02
1
55
01
08
2
10
06
71
metabolism
ENSBTAG00000011011
SSH2
No
3
09
5
4
2
2
6
7
10
cellular
process
ENSBTAG00000011387
NAT9
Cells
6
7
3
8
6
8
4
6
6
metabolism
ENSBTAG00000011420
CA9
Cells
04
3
02
2
03
1
01
2
56
metabolism
ENSBTAG00000011433
RGP1
httpswwwmousephenotypeor
gdatagenesMGI1915956sect
ionassociations
Completepreweaninglethality
7
19
11
18
8
12
9
8
25
cellular
process
ENSBTAG00000011622
C18orf21
Mice
9
10
6
6
2
10
5
10
34
NC
ENSBTAG00000011683
NUMB
Cells
21
25
8
35
17
47
6
8
19
immunity
ENSBTAG00000011684
RAB11FIP1
Cells
6
01
8
09
11
02
1
4
NC
ENSBTAG00000012053
ACCSL
No
03
01
cellular
process
ENSBTAG00000012314
LDLR
Homozygousfamilial
hypercholesterolemia
Cells
8
33
11
21
17
24
2
15
2
cellular
process
123
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000012326
LOC101906218
Nomatch
11
01
01
09
01
NC
ENSBTAG00000012458
PSAPL1
Cells
01
NC
ENSBTAG00000012565
PCNX
No
3
32
32
33
48
developm
ent
ENSBTAG00000012833
ATG2B
No
4
21
21
21
29
NC
ENSBTAG00000012921
KIF24
Mice
05
07
03
08
02
202
214
metabolism
ENSBTAG00000013568
UEVLD
Cells
5
41
53
42
214
metabolism
ENSBTAG00000013685
KRT31
Mice
cellular
process
ENSBTAG00000013753
DDRGK1
Mice
22
32
18
56
55
25
14
22
37
NC
ENSBTAG00000013789
OR6K6
No
NC
ENSBTAG00000013838
ALLC
Mice
02
01
78
metabolism
ENSBTAG00000013916
HERC2
Mentalretardation
autosomalrecessive
38SKINHAIREYE
PIGM
ENTATION
VARIATIONIN1
Developm
entaldelay
withautismspectrum
disorderandgait
instability
No
8
74
83
74
65
metabolism
ENSBTAG00000014001
MAP1A
Cells
136
23
22
202
43
352
cellular
process
ENSBTAG00000014058
LDB2
No
28
33
12
825
410
3cellular
process
ENSBTAG00000014284
ALPK2
No
01
10
01
304
02
cellular
process
ENSBTAG00000014750
EPB41L4A
httpswwwmousephenotypeor
gdatagenesMGI103007secti
onassociations
Decreasedcirculatingglucoselevel
Abnormalconeelectrophysiology
Immunesystem
s1
22
20
07
12
13
43
NC
ENSBTAG00000014752
AKAP11
httpswwwmousephenotypeor
gdatagenesMGI2684060sect
ionassociations
Nopheno
24
72
63
306
68
cellular
process
ENSBTAG00000014768
ZNF786
Cells
3
23
42
48
33
cellular
process
ENSBTAG00000014794
SCYL3
Cells
6
10
59
613
516
26
cellular
process
ENSBTAG00000015027
Clusterin
httpswwwmousephenotypeor
gdatagenesMGI88423sectio
nassociations
Aggressionvertebralfusion
05
03
03
02
05
3NC
124
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000015210
LOXHD1
DEAFNESS
AUTOSOMAL
RECESSIVE77
Autosomalrecessive
nonsyndromic
sensorineuDominant
lateonsetFuchs
cornealdystrophy
deafnessautosomal
recessivetype77
No
3
03
NC
ENSBTAG00000015294
COX10
Leighsyndrome
Mitochondrial
complexIVdeficiency
(MTC4D)
Cells
4
6
24
5
2
3
14
3
16
cellular
process
ENSBTAG00000015335
BAI3
No
38
03
8
01
01
2
signalling
ENSBTAG00000015490
HS1BP3
Cells
1
10
5
9
4
8
2
1
37
cellular
process
ENSBTAG00000015602
C7H5orf45
Nomatch
10
cellular
process
ENSBTAG00000015839
MAP4
No
27
85
104
35
11
100
82
46
61
cellular
process
ENSBTAG00000015912
DMKN
No
01
04
03
NC
ENSBTAG00000015980
FASN
AutosomalRecessive
MentalRetardation
Cells
19
29
7
12
6
39
6
7
22
metabolism
ENSBTAG00000015985
GPR20
httpswwwmousephenotypeor
gdatagenesMGI2441803sect
ionassociations
Anatomicaldefectsvarioustissues
03
02
08
01
09
signalling
ENSBTAG00000016495
LINS
AutosomalRecessive
MentalRetardation
Cells
6
3
1
5
3
3
1
5
12
NC
ENSBTAG00000016691
LACC1
Mice
2
9
1
9
9
8
08
7
27
NC
ENSBTAG00000017096
ERICH1
Cells
6
4
2
4
2
5
2
5
13
metabolism
ENSBTAG00000017165
MATN2
No
2
7
2
20
08
3
2
2
7
cellular
process
ENSBTAG00000017418
RRP1B
Cells
6
4
4
5
3
7
3
12
159
cellular
process
ENSBTAG00000017436
ALS2CR11
Cells
01
37
disease
ENSBTAG00000017505
PAXIP1
Cells
2
3
2
4
2
4
2
3
12
signalling
ENSBTAG00000017576
GPR144
Nomatch
02
signalling
ENSBTAG00000018528
USP45
No
1
1
04
1
05
06
02
2
04
cellular
process
ENSBTAG00000018810
THBS2
INTERVERTEBRAL
DISCDISEASE
Cells
1
2
04
09
1
09
04
1
2
fertility
ENSBTAG00000019072
PSD4
Cells
7
8
1
1
7
02
6
4
cellular
process
125
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000019265
AGK
Sengerssyndrom
e
Congenitalcataract
hypertrophic
cardiomyopathy
mitochondrial
myopathy
Cells
7
36
43
52
65
metabolism
ENSBTAG00000019752
BPIFA2B
Nomatch
01
1metabolism
ENSBTAG00000019799
FCAM
R
Cells
07
01
02
01
2
7immunity
ENSBTAG00000019961
ATP4B
Cells
12
03
1
05
cellular
process
ENSBTAG00000020414
DHX37
No
3
33
73
73
413
cellular
process
ENSBTAG00000021073
KIAA1549
PILOCYTIC
ASTROCYTOM
ASOMATIC
httpswwwmousephenotypeor
gdatagenesMGI2669829sect
ionassociations
behaviourneurological
homeostasism
etabolism
hematopoieticsystem
3
04
23
01
21
04
3NC
ENSBTAG00000021233
OR5B17
No
fertility
ENSBTAG00000021375
OR10Q1
No
fertility
ENSBTAG00000021643
EFCAB12
Cells
01
1
9cellular
process
ENSBTAG00000021645
MBD4
Cells
8
43
82
51
10
6cellular
process
ENSBTAG00000022509
DNAH
9
No
02
5
05
5metabolism
ENSBTAG00000022858
LOC783561
Nomatch
fertility
ENSBTAG00000026247
PKHD1L1
Cells
02
01
cellular
process
ENSBTAG00000026323
LYSB
Nomatch
68
metabolism
ENSBTAG00000026350
SPATC1
Cells
05
06
01
01
01
39
NC
ENSBTAG00000027390
RTFDC1
Mice
28
19
21
20
922
13
20
49
NC
ENSBTAG00000031115
Uncharacterized
Nomatch
01
cellular
process
ENSBTAG00000031718
OGFR
Mice
6
17
612
517
39
1fertility
ENSBTAG00000032264
Uncharacterized
Nomatch
52
4
22
NC
ENSBTAG00000032481
DAPL1
No
1
6
07
01
13
10
cellular
process
126
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000032527
ERCC6
CEREBROOCULOFACI
OSKELETAL
SYND
ROME1
COCKAYNE
SYND
ROMEBDe
SanctisCacchione
syndromeMACULAR
DEGENERATION
AGERELATED5UV
SENSITIVE
SYND
ROME1COFS
syndromeCockayne
syndrometype1
Cockaynesyndrome
type2Cockayne
syndrometype3UV
sensitivesyndrome
Cockaynesyndrome
typeB(CSB)De
SanctisCacchione
syndrome(DSC)UV
sensitivesyndrome
(UVS)cerebrooculo
facioskeletal
syndrometype1
(COFS1)
Mice
3
205
22
207
42
cellular
process
ENSBTAG00000032821
SCEL
No
01
02
17
04
NC
ENSBTAG00000033954
PRPS1L1
No
35
metabolism
ENSBTAG00000034991
SLX4IP
Cells
6
21
32
207
33
NC
ENSBTAG00000035226
TOR1AIP1
Cells
9
12
513
816
716
83
NC
ENSBTAG00000035309
OR4F15
No
fertility
ENSBTAG00000035708
LOC527795
Nomatch
01
01
23
metabolism
ENSBTAG00000035985
LOC101909743
OR8K1
No
fertility
ENSBTAG00000036019
RIPPLY3
httpswwwmousephenotypeor
gdatagenesMGI2181192sect
ionassociations
hemopoieticsystem
01
04
01
9
01
malformation
ENSBTAG00000037399
Uncharacterized
Nomatch
2
92
4
24
05
5signalling
ENSBTAG00000038606
DBDR
No
02
01
signalling
ENSBTAG00000038831
PHYHD1
No
2
18
21
23
30
18
16
15
1metabolism
ENSBTAG00000039110
OR8U1
No
fertility
ENSBTAG00000039117
FAM179B
Cells
14
206
31
308
33
cellular
process
ENSBTAG00000039520
SIRPB1
Cells
4
08
32
516
05
12
01
signalling
127
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000040568
Uncharacterized
Nomatch
4
307
54
33
25
cellular
process
ENSBTAG00000045558
LOC100299084
Nomatch
1
22
110
10
09
52
fertility
ENSBTAG00000045588
LOC100298356
Nomatch
metabolism
ENSBTAG00000045709
LOC514057
Nomatch
fertility
ENSBTAG00000046175
Uncharacterized
Nomatch
01
01
cellular
process
ENSBTAG00000046228
Olfactory
receptor
Nomatch
fertility
ENSBTAG00000046239
C10orf12
No
03
104
07
108
06
209
cellular
process
ENSBTAG00000046285
OR8I2
No
fertility
ENSBTAG00000046360
Uncharacterized
Nomatch
NC
ENSBTAG00000046423
LOC618007
Nomatch
NC
ENSBTAG00000046527
LOC100139830
Nomatch
fertility
ENSBTAG00000046590
FAM181A
Cells
09
44
NC
ENSBTAG00000046593
Uncharacterized
Nomatch
20
metabolism
ENSBTAG00000046727
Olfactory
receptor
Nomatch
04
fertility
ENSBTAG00000047150
RTEL
Cells
4
23
64
55
616
cellular
process
ENSBTAG00000047166
SH2D7
Mice
03
101
01
1
02
02
2immunity
ENSBTAG00000047174
Uncharacterized
Nomatch
03
7105
02
01
271
05
03
NC
ENSBTAG00000047317
CL43
Nomatch
01
120
01
immunity
ENSBTAG00000047587
Uncharacterized
Nomatch
22
09
09
35
03
4
06
2cellular
process
ENSBTAG00000047661
FABP12
httpswwwmousephenotypeor
gdatagenesMGI1922747sect
ionassociations
Growthsizebody
01
02
69
metabolism
ENSBTAG00000048021
PNMAL2
No
04
03
2
03
03
02
NC
ENSBTAG00000048153
Uncharacterized
Nomatch
2
05
01
02
01
cellular
process
128
Discussion(
Deleterious recessive variants arenaturally present in outbred animal species
Theydonotcreateproblemsaslongastheyremainatlowfrequencyinrandom
mating populations Under these conditions these variants remain in the
heterozygousstateanddonotproducephenotypiceffectsHoweverassoonas
their frequency increases andmatingoccursbetween related individuals their
deleteriouseffectsbecomeapparentandhighlyaffectthewelfareandviabilityof
animalscarryingtheminthehomozygousstate
In industrialdairy cattlebreedscrossbreeding isgenerallyavoidedandbreeds
areinpracticegeneticisolatesinwhichartificialinsemination(AI)iswidelyused
for rapidly spreading thegeneticprogress in thepopulationAsa consequence
elitesireshavemanythousandandsometimeshundredsofthousanddaughters
Under these conditions mating between relatives becomes practically
unavoidable in the longterm In fact inspiteof theclosecontrolof inbreeding
andofthematingsystemgeneticdefectsregularlyappeartothesurfaceSome
examples are the Complex Vertebral Malformation (CVM) and the Bovine
Leucocyte Adhesion Deficiency (BLAD) in Holstein (Schuumltz et al 2008) and
WeaverdiseaseinBrown(McClureetal2013)Inthesecasestheeffectsofthe
recessivedeleteriousallelesareclearandmoleculardiagnostictoolshavebeen
developedtoavoidtheirfurtherspreadingandtoinitiatetheeradicationprocess
However recessive deleterious alleles may produce their effects in a more
deceitful way for example by affecting fertility inducing early lethality or
influence disease susceptibility All these effects are difficult to trackwith the
current methods of phenotype measurements but have an impact on the
economyofthelivestocksectorandonanimalwelfare
InthispaperwesoughtfordeleteriousrecessiveallelesinItalianHolsteincattle
This breed is recently experiencing a decrease in the genetic trend of fertility
traits (Pryce et al 2014) therefore we started our search from the exome
sequences of 20 sires 19 of which having extreme EBV values for male and
femalefertility
130
The choiceof sequencingexomeswasa compromisebetween completenessof
informationaccuracyofvariantandgenotypedetectionandcostExomesclearly
give incomplete information since they miss all variation in regulatory sites
outside gene boundaries (ENCODE Consortium 2012) These may be very
relevantforthecontroloftraitvariationbuttheyarestillverypoorlyannotated
in livestock On the other hand sequencing exomes yields millions instead of
billionsbasepairsadeepersequencingoftargetregionsandaconsequentmore
reliablevariantandgenotypecalling(FigureS1andS2)Theanalysisofexome
data is also less demanding in terms of computer time (thousands instead of
millions variants are to be analysed) and easier in terms of biological
interpretation Exome sequencing revealed to be very effective in the
identificationofanumberofmutationsaffectinghumanhealth(Bamshadetal
2011 Yang et al 2013) All thesewere defects caused bymutations in single
genes producing clear phenotypic effects havingMendelian inheritance These
conditionsdonotholdinourinvestigationthereforewehadtodevisestrategies
to integrate exome data with other information collected from the Holstein
populationandotherspecies
Theoverallprocedureforidentifythecandidatedeleteriousrecessivewasbased
on the rationale that deleterious recessive alleles in coding sequences are
expectedtobei)causedbyseverealterationoftheproteinsencodedbygenes
ii)mostlyifnotexclusivelycarriedattheheterozygousstateinthepopulation
iii)unevenlydistributedinanimalsofhighandlowfertilityiv)likelyconserved
indifferentspeciesbecauseofselectionconstraintv)likelyassociatedtogenetic
defectsinanyspecieswhenseverelydamaged
A stepwise procedure was set up for reducing candidates to a number
manageable for functional analysis (Figure1) In the first stepmutationswere
discoveredinthesecondfilteredandinthirdassessed
Thisprocedureidentifiesonlysomeofthedeleteriousrecessivesexistinginthe
populationanalysedSincewesequencedonlyexomesandnotwholegenomes
ouranalysesislimitedtocodingsequencesandatthemomenttoSNPvariation
Wethereforemissregulatoryvariationsoutsidegenesand theeffectsof indels
andmorecomplexvariations
131
ThenumberofanimalssequencedislimitedDeleteriousrecessivesareexpected
toberareinthepopulationandwehavesampledonly20animalsfromasubset
ofsiresauthorizedtoAI inItalyAlthoughwehavetakenamongthemthose in
theminus tail of fertility trait thatmighthave ahigher genetic load itrsquos likely
thatwehavemissedothervariantsexistinginthepopulationOntheotherhand
variantscarriedbyAIauthorizedbullsaretheonesthatwillhavealargerimpact
onthepopulationifwidelyspreadbyelitebullsandthereforetheyarethemost
interestingfromabreedingpointofview
Most deleterious variants are expected to be very rare and therefore not
detected by the approach here adopted or detected and afterwards discarded
duringthefilteringprocesses
Sequencing is prone to errors Some real variants may have been discarded
duringQCofsequencedataduetopoorsequencequalityorpoorcoverageofthe
region
Someofthevariationsclassifiedasldquonondeleteriousrdquobythesoftwareusedmay
indeedhave an effect In fact synonymousmutation arenon`neutralwhenever
thedifferenttRNAsforasameaminoacidhavedifferentrelativeabundanceand
translationrates(Goymer2007)
1(Searching(candidate(deleterious(variants(in(exomes(
ExomesproducedafairamountofdataAlltogether5355Mbweresequenced
These represent 9998 of the exome sequences targeted by probes Among
theseonly35hadasequencingdepthacrossanimalslt60XSequencingdepth
is a key factor for variant detection and even more so when the genotype of
sequencedanimalsistobeinferredInvestigatingtheconcordancebetweenthe
genotypes detected at more than 20K SNP by both Beadchip and exome
sequencingwe observed a clear inverse relationship between sequence depth
andnumberofdiscordances(TableS3andFigureS3)Theaveragesequencing
depth expected (50X) and realised in practice (4058) in this investigation
yieldedratheraccurategenotypessincetheaverageconcordanceobservedwas
over97and increased toover99whentwooutlieranimalswereremoved
132
from the dataset The reason for the outlier behaviour of these samples waslikely a suboptimal quality of DNA submitted to sequencing Overall almost215000 variants passed quality controls and among them 192440 autosomalSNPs have been investigated in detail As found in other investigations mostvariants were rare (Peloso et al 2014) For example the MAF distributionobservedin46904SNPsclassifiedneutralhasameanof0197andamedianof0167values(FigureS6)
ThefirststepinourresearchwastoidentifySNPscausingseverealterationsinthe proteins coded by genes These were relevant changes in conformationalterationsofsplicingsitesortruncationsofproteinsToaccomplishthistaskwereliedontwosoftwareSIFTandANNOVARthatidentified4902suchvariationsin3423genes Interestingly the subsetofdeleteriousvariantshas significantlylowermean (0156)andmedian (010) compared toneutral variations (FigureS6) This subset is therefore at least enriched in variants under purifyingselectionhavingadeleteriouseffectatthepopulationlevel
Thenumberofdeleteriousallelesdetectedishighanditisunlikelytheyallhavea relevant phenotypic effect even if most variants are rare In fact variantsdeleterious for a protein are not automatically deleterious for an organism asthe protein change induced by the variant although relevant may not affectproteinfunctionvariationmayoccurinonememberofagenefamilysothatitsfunction may be compensated by other family members the protein functionmay be compensated by other proteins or the affected metabolic pathwaycompensated by alternative pathways The challenge was therefore to filter asubsetofvariantslikelyhavingaphenotypiceffecttobeproposedforvalidationatalargerscaleToaccomplishthistaskwesoughtforadditionalinformationinexomesequencesandinalargepopulationofHolsteinsiresgenotypedwiththebovineHDBeadchip
( (
133
2(Filtering(deleterious(variants(
The second step consisted in filtering the several thousand deleterious SNPsidentified in exomes to search for SNPs and genes that worth deeperinvestigationWeusedtwostrategiestoaccomplishthistask
21$ Distribution$ of$ deleterious$ variants$ in$ fertility$ EBV$ plus9variant$ and$
minus9variant$animals$
Fertility traits are controlled by a high number of genes and are highlygenetically heterogeneous Even using a selected subset of variants identifiedfrom exome sequencing classic association analyses would need sample sizesmuch larger that the one available in this investigation to have a reasonablestatistical power (Stitziel et al 2011) The first approachwe appliedwas theanalysisof SNPallelic andgenotypicdistributionbetween theminusandplus`variant tails of male and female fertility EBVs Given the very low number ofanimals sequencedwedecided to use a simplemodel to exclude from furtheranalyses all markers showing no evidence of association with the EBV tailsratherthantofindsignificantassociations
We divided samples in cases (minus`variant animals for fertility EBV) andcontrols (plus`variant animals) and tested all possible models of inheritanceimplemented in thePlinkpackage for tackling simple genetic defectsWe thenused an empirical point`wise significant threshold based on permutations tohighlight SNPs having alleles or genotypes showing an uneven distributionbetweencasesandcontrolsandconsideredtheseasworthfurtherinvestigationAtotalof112SNPsin101genespassedthisfilteringprocess
$
22$Detection$of$regions$carrying$deleterious$mutation$from$800K$SNP$data
The analysis of exomes alone was therefore insufficient to tag recessivedeleteriousgenesinItalianHolsteinandwesoughtforadditionalinformationin1009ItalianHolsteinbullsgenotypedathighdensitythatarepartofthegenomicselection program run by ANAFI The rationale of the approach is that
134
deleterious recessives are expected to be absent or rare in the homozygous
status in the population Hence they may be tagged searching for genome
regions having markers or haplotypes lacking or having significantly less
homozygotesthanexpectedunderaneutralmodel
We used two complementary approaches to identify these candidate regions
One(OHW)searchedforverystrongsignalsproducedbysinglemarkersoutof
Hardy`Weinberg equilibrium at Ple847x10`8 A minimum of three significant
markersinaslidingwindowsof10wassettoreachsignificancetocapturemore
reliablesignalsavoidexcessivesinglemarkernoiseandaccount forapossible
Wahlund effect due to presence of the genetic structure in the population
difficulttoavoidinthesetofprogenytestedbullsanalysedThesecondapproach
(LOH) captured more moderate signals but from multiple markers in larger
regions Both approaches were designed to detect signals in regions in which
deleteriousvariantsarestillinsegregationandarenotexcessivelyrare
Asexpectedthetwoapproachesidentifiedregionsthatonlypartiallyoverlapped
(TablesS5S6eS7)Inparticular5regionsoutofthe11identifiedbyOHWwere
in common to the 240 identified by LOH Two of these regions overlap with
previous investigations that used the BovineSNP50 Beadchip Sahana et al
(2013) found a region candidate to carrypotential lethal haplotypes inNordic
Holstein on BTA7 in the region 6708007`10907840 This interval overlaps
withbothBTA7regionsdetectedinthisstudyVanRadenetal(2011)detected
inUSHolsteinalargeregiononBTA15spanning76Mb`82M(onBTA30)that
includedLRP4 gene responsible for themulefootdefectOuranalyseswith the
HDBeadchipindicatesontheonesidethatthelargeregiondetectedbySahana
etal(2013)mayconsistintwonearbyregionsOntheothersidethesignalwe
detectedonBTA15doesnot includetheLRP4 locusthat iscompletely fixed in
theItalianbullspopulationanalysed
In total83mutations in66genes fell in eitherorbothOHWandLOHregions
These were joined to the SNPs output of the EBV tail analyses for further
assessment
(
135
3(Assessing(filtered(deleterious(variants(
Different characteristics of the filtered deleterious variants were assessed to
identify among them strong candidates to submit to a large`scale validation
studyTheassessmentphasepermittedtogainadditionalinformationinfavour
or against SNP and gene candidature to be deleterious We quantified this
evidenceusingasubjectivescore
31$Comparative$genomics$analysis$
In the case of comparative genomics the score quantified the level of
conservationof thealternativeaminoacidresidues inducedbySNPvariants in
ItalianHolsteincattleComparisonwasrunagainstaset100vertebratespecies
that included 62 mammals The level of conservation is a good proxy for the
expected severity of a mutation In the case of the 46 SNPs that induced the
change of an amino acid highly conserved in the species investigatedwemay
hypothesize a strong selection pressure in favour of the maintenance or the
invariant amino acid at that position Therefore the amino acid change or the
stopcodonweobserved incattlemayaffectnotonlytheproteinstructureand
function but also the phenotype of homozygous animals Interestingly in all
thesecasestheSNPallelecodingforthenonconservedvariantswastheminor
allele
Converselyinallcasesofhighvariabilityofthetargetaminoacidacrossspecies
(asmanyassevendifferentaminoacidshavesometimesbeenobserved)poorly
supports the presence of phenotypic effects due to its change Sometimes the
informationcollectedisagainstthepresenceofadeleteriousmutationasinthe
case of the presence in different species of both alleles identified by exome
sequencingThis suggests thatneither of these is greatly affecting the viability
andfitnessofanimalscarryingthem
Wealsoobservedanumberoffailures(N=45)intheliftoverofSNPcoordinates
fromcattletohumanorinproteinalignmentThishappenedforexampleinthe
caseofuncharacterized cattleproteins thathavenoorthologous in thehuman
genome and of someolfactory receptors that belong to a highly variable gene
136
family Even if the failure of alignment is an indication of poor proteinconservationacrossspeciestheconservationofthetargetaminoacidcouldnotbeassessesandinthesecaseswepreferredtoconsidertheinformationmissingandtoscoreitaccordingly
32$Haplotype$analysis$
Asecondassessmentofthefilteredcandidateswasontheirbehaviourinthesirepopulation The ideal test would be the direct genotyping of the SNPs in thepopulationand theevaluationof theirallelicandgenotypicproportions In theabsenceofthisinformationwefirstinferredtheHDhaplotypeassociatedtoSNPallelesandassessedthebehaviourofthehaplotypeinthepopulationassumingthisasasurrogateoftheSNPbehaviourSincedeleteriousallelesareexpectedtoberareandthis isalsowhatweobserved fromcomparativegenomicanalysesweinvestigatethefrequencyonlyofthehaplotypecarryingtheminorSNPalleleagainsttheexpectationto findthecorrespondinghaplotypeneverhomozygousor a number of times much lower than expected from its frequency in thepopulation The strategy yielded some interesting confirmation but was onlypartiallysuccessful
Firstofalltheidentificationofauniquehaplotypecarryingtherarevariantwasnot always possible or straightforward In some cases carriers of a same rarevariantdidnotshareexactlythesamehaplotypeWeacceptedsomevariationifthe haplotype remained clearly distinguishable from those of animals notcarryingtherarevariantbutatthesametimereducedthestringencyofthetestInquiteanumberofcasesasamehaplotypecarriedbothcandidatedeleteriousSNP variants This may occur for example in the case of recent mutations IntheselastcasesthetestcouldnotbecarriedoutFinallymostrarevariantswereassociatedtorarehaplotypesthatarenotlikelytohavehomozygousindividualsinthepopulationindependentlyoftheSNPeffect
Thisassessmentwas thereforeonly significant fora fewhaplotypes thathadalarge number of heterozygous individuals and no or only a very fewhomozygotesandforthosethathadalargenumberofhomozygotes
137
33$Functional$analysis$
The final assessment was the analysis of the function of genes carrying the
candidate deleterious SNP This has some limitation since knowledge of gene
function is only partial and mainly gained from investigations on species
different from cattle In any case we considered the association to genetic
defectsinhumanandtheinductionofseverephenotypesinmouseknockoutas
a positive indication of the likely phenotypic effect of the severe protein
alterationweobservedincattleInadditiongenefunctionandexpressionprofile
was evaluated in search for evidence of gene involvement in fertility
developmentorprocessesfundamentalforcellandorganismviability
Amonggenesidentified22areassociatedtohumansyndromes(Table4)Eleven
syndromesareneurologicaldiseasesaffectingbraindevelopmentand inducing
mental retardation (eg Donnay`Barrow Syndrome and Autosomal recessive
primary microcephaly) the others influence development (eg
Cerebrooculofacioskeletal Syndrome 1 and Adams`Oliver syndrome) or organ
function(egCongenitalpulmonaryalveolarproteinosis)
Most of the same defects translated into Holstein would induce physical and
behavioural anomalies likely causingyoungbulls tobe excluded fromprogeny
testing Itshould in factbeconsideredthattheanimalsweanalysedareonthe
onesidethosethatwillmostinfluencethegeneticmakeoffuturegenerationbut
are not a random sample of Holstein animals These bulls have been progeny
testedandauthorisedtobeusedinartificialinseminationinItalyCandidatesto
progenytestingderivefromthematingofthebest1siresandbest2damsof
the Italian Holstein population At the age of 4 months they are taken to the
Italian Holstein breeder association (ANAFI) test station where they are
subjected to performance test sanitary controls and behavioural tests
Thereafterthequalityoftheirsemenischeckedbeforeauthorizationisgranted
toenterinprogenytestInthissampleofanimalsitisthereforeexpectedtofind
never at the homozygous state mutations causing lethality physical
malformation abnormal behaviour and semen infertility Also those having a
138
majoreffectonsemenqualityandhighsusceptibilitytodiseasesareexpectedto
berarelycarriedatthehomozygousstatus
Association with human syndromes is a strong evidence of the potential
phenotypiceffectofmutationsalteringthegenestructureHoweveritshouldbe
considered that variants found in humans are different fromvariants found in
cattle and that the severity of the phenotype induced by a deleterious gene is
often variable even within species Therefore we evaluated this information
togetherwiththeotherindirectevidencesofvariantstobedeleterious
WealsoenquiredthewebsiteoftheInternationalMousePhenotypeConsortium
(httpswwwmousephenotypeorg) to search for the effects of null alleles in
mice The few genes forwhichnull`mice line is available are involved in basic
functionsfortheorganismthatspanfromvision(IAH1andEPB41L4A)toaging
(RGP1)andtissuesdevelopment(FABP12RIPPLY3RGNEF)orfertility(CFAP96)
Aswithgenesassociatedtohumansyndromesmostoftheabnormalphenotypes
observedinknockmousewouldhamperyoungbullstoenterprogenytesting
Gene Ontology (GO) was evaluated on the entire gene set No enrichment on
particularcategoriesGOtermswasobservedThemostrepresentedcategories
in Biological Processes were as expected ldquometabolic processrdquo and ldquocellular
processrdquo however lower level terms ldquoreproductionrdquo and ldquodevelopmental
processrdquo are represented with meaningful hits The analysis of Molecular
Function indicates thatmany of the hits are enzymes (ldquocatalytic activityrdquo) and
haveregulatoryroles(ldquobindingrdquoandldquoenzymeregulatoractivityrdquo)Thisresultcan
be interpreted indifferentwaysOn theone sidedeleterious allelesmay exert
their deleterious effect in a number of different cellular and developmental
processes the correct process is one while there is many options to make it
wrongAlsothenumberofgenesinvestigatedisrelativelylowandformanyof
them there is little or no information available (eg 11 genes code for a
completelyuncharacterizedprotein)Inadditionsomeofthemmayturnoutnot
tobereallydeleteriousandaddnoisetotheGOanalysis
As a final step all the previous information (GO expression patterns known
function)havebeenused to estimate a scoremeasuring the importanceof the
gene function forcattle survivalandreproductionThescoringof functionwas
139
particularlydifficultGOcategoriesweregeneralandallfunctionstaggedhavearelevant importance for animal survival andproductionTherefore in this casewedidnrsquotusethewholescorerangeplanned(from`1to1)andgavenonegativescoresApositivescoreof1wasattributedonly togenesplayingan importantroleinfertilityanddevelopment
$
34$Strong$candidate$deleterious$variants$
Thescoringprocessyielded21variantswithatotalscore=035variantswithpositive and 135 with negative values (Figure 2 and Table 5) Although theprocesswasnoterrorfreeasitreliedonincompleteinformationandhadsomedegreeofsubjectivityparticularlyinthechoiceofrelativeweightsitidentifiedaset of variants that may be considered strong candidates to have deleteriouseffects in the Italian Holstein breed Variants were evaluated individually andthose in a same genes had sometimes different scores For example the fivevariantscarriedbyALPK2hadscoresrangingfrom`7to`2
140
Figure2Graphicalrepresentationoftotalscoreforeachvariants
minus505
Variation
Score
1_645923001_645985561_1381698111_1464490851_150972144
2_7478962_270362092_270455392_375933832_555609862_904645543_75418123_110403993_110404003_188945763_1137877703_1203356984_53767714_265405974_265406204_749514994_1033779064_1057246734_1130233264_1178417155_445557055_445557425_757447955_940308955_940309556_207762566_382803486_863518016_927092906_1075769256_1078851146_1090916426_1166055226_1186534167_13339757_56553577_97149147_167941797_168358027_168621947_197404097_197409057_263293538_602606098_603324848_658345648_704101558_760297748_770028908_872798628_1117618168_1128416959_83888819_83889249_237928389_510133929_1047396559_10515702910_171251510_1584300310_2802235210_2802311010_8290312110_8517371111_4630576711_4675390311_4740957711_5979141011_7832201011_8794387511_9548610911_9945389411_9946100112_1190618112_1247843312_1248084512_1413689712_5304908112_9076120913_381566413_1178793413_1178793813_5247168413_5393493013_5393494713_5393985313_5454042513_5504774013_5998820513_6304575013_6316961013_6539670814_197749414_364069514_3556894514_4688575014_5718402214_6863536315_18267915_18331715_18340415_248692
15_3018920915_3504878815_4630194115_7503314015_7503764515_8024448915_8025571015_8028875915_8038974515_8048559315_8055593115_8094968915_8275455615_8292533216_456201716_3831963616_6257435517_5309539517_5311265517_6609153117_7237753218_2562355018_3495651318_4637204118_5215480618_5408127718_5413095518_5781976819_2144410319_3107215219_3111397219_3279084319_4225178519_5139519419_5619117519_5726625120_763980120_2341043120_5888423620_6417173921_623911921_2034587821_2034628921_2683773321_3100359921_5527419521_5582646321_5817338421_5912071921_6048248421_6048251021_6284314222_5241595922_5242173422_5242204622_5242610422_5242666822_5695165722_5697484722_5779767423_4588505224_2132460424_2145336424_4676490924_5032719524_5815817824_5815862024_5815878424_5820400324_5820447925_3743318026_1811648726_2571599626_2576457027_503107827_3283255728_28422828_169040328_3571880728_4408176529_198588129_753052329_753066329_26397931
141
Table5SingleanalysesscoreandtotalforeachcandidatevariantsChrchrom
osom
ePosphysicalposition
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
164592300
ENSBTAG00000009014
UPK1B
02
I1
01
2
164598556
ENSBTAG00000009014
UPK1B
0I1
I1
00
I2
1138169811
ENSBTAG00000002966
DNAJC13
03
10
15
1146449085
ENSBTAG00000017418
RRP1B
I3
I1
I1
00
I5
1150972144
ENSBTAG00000036019
RIPPLY3
I3
I1
I1
I1
1I5
2747896
ENSBTAG00000013916
HERC2
30
10
04
227036209
ENSBTAG00000004555
LRP2
I3
I1
10
0I3
227045539
ENSBTAG00000004555
LRP2
I3
31
00
1
237593383
ENSBTAG00000032481
DAPL1
00
I1
00
I1
255560986
ENSBTAG00000032264
Uncharacterized
3I3
I1
00
I1
290464554
ENSBTAG00000017436
ALS2CR11
0I1
I1
00
I2
37541812
ENSBTAG00000046175
Uncharacterized
00
00
00
311040399
ENSBTAG00000013789
OR6K6
I3
I3
I1
00
I7
311040400
ENSBTAG00000013789
OR6K6
I3
I3
I1
00
I7
318894576
ENSBTAG00000004772
THEM
40
I3
I1
00
I4
3113787770
ENSBTAG00000000149
USP40
11
I1
01
2
3120335698
ENSBTAG00000002809
CAPN10
I3
21
00
0
45376771
ENSBTAG00000001500
FIGNL1
01
I1
00
0
426540597
ENSBTAG00000033954
PRPS1L1
1I3
I1
00
I3
426540620
ENSBTAG00000033954
PRPS1L1
1I1
I1
00
I1
474951499
ENSBTAG00000003508
CFAP69
I3
I3
I1
10
I6
4103377906
ENSBTAG00000021073
KIAA1549
0I1
11
01
4105724673
ENSBTAG00000019265
AGK
01
10
02
142
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
4113023326
ENSBTAG00000014768
ZNF786
I3
I3
I1
00
I7
4117841715
ENSBTAG00000017505
PAXIP1
0I1
I1
00
I2
544555705
ENSBTAG00000026323
LYSB
I3
3I1
00
I1
544555742
ENSBTAG00000026323
LYSB
I3
0I1
00
I4
575744795
ENSBTAG00000009064
CSF2RB
I3
I3
10
0I5
594030895
ENSBTAG00000009444
SLC15A5
I3
I1
I1
00
I5
594030955
ENSBTAG00000009444
SLC15A5
I3
I3
I1
00
I7
620776256
ENSBTAG00000008797
Uncharacterized
I3
I3
00
0I6
638280348
ENSBTAG00000007568
MEPE
0I3
I1
00
I4
686351801
ENSBTAG00000003523
UGT2B15
1I1
I1
00
I1
692709290
ENSBTAG00000010954
ART3
I3
2I1
00
I2
6107576925
ENSBTAG00000038606
DBDR
0I3
I1
00
I4
6107885114
ENSBTAG00000001505
GRK4
I3
1I1
00
I3
6109091642
ENSBTAG00000007001
SLC26A1
1I3
I1
00
I3
6116605522
ENSBTAG00000014058
LDB2
01
I1
00
0
6118653416
ENSBTAG00000012458
PSAPL1
I3
I1
I1
00
I5
71333975
ENSBTAG00000015602
C7H5orf45
I3
I3
I1
00
I7
75655357
ENSBTAG00000045588
LOC100298356
1I3
I1
00
I3
79714914
ENSBTAG00000046423
LOC618007
00
I1
00
I1
716794179
ENSBTAG00000012314
LDLR
02
10
03
716835802
ENSBTAG00000009568
KANK2
01
I1
00
0
716862194
ENSBTAG00000009569
DOCK6
01
10
02
719740409
ENSBTAG00000000414
FUT5
I3
I3
I1
00
I7
719740905
ENSBTAG00000000414
FUT5
I3
I3
I1
00
I7
143
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
726329353
ENSBTAG00000004860
SLC27A6
0I1
I1
00
I2
860260609
ENSBTAG00000011420
CA9
I3
I1
I1
00
I5
860332484
ENSBTAG00000011433
RGP1
I3
0I1
10
I3
865834564
ENSBTAG00000035708
LOC527795
00
I1
00
I1
870410155
ENSBTAG00000005466
C8H8orf58
I3
I3
I1
00
I7
876029774
ENSBTAG00000015027
Clusterin
00
I1
10
0
877002890
ENSBTAG00000012921
KIF24
12
I1
00
2
887279862
ENSBTAG00000002220
SPTLC1
00
10
01
8111761816
ENSBTAG00000006532
CDK5RAP2
I3
I3
11
1I3
8112841695
ENSBTAG00000013838
ALLC
01
I1
00
0
98388881
ENSBTAG00000015335
BAI3
00
I1
00
I1
98388924
ENSBTAG00000015335
BAI3
00
I1
00
I1
923792838
ENSBTAG00000046593
Uncharacterized
03
00
03
951013392
ENSBTAG00000018528
USP45
02
I1
00
1
9104739655
ENSBTAG00000018810
THBS2
03
10
15
9105157029
ENSBTAG00000000918
ERMARD
I3
21
00
0
10
1712515
ENSBTAG00000014750
EPB41L4A
0I1
I1
I1
0I3
10
15843003
ENSBTAG00000009030
NOX5
I3
I3
I1
00
I7
10
28022352
ENSBTAG00000035309
OR4F15
I3
1I1
00
I3
10
28023110
ENSBTAG00000035309
OR4F15
I3
I1
I1
00
I5
10
82903121
ENSBTAG00000012565
PCNX
I3
2I1
01
I1
10
85173711
ENSBTAG00000011683
NUM
BI3
0I1
00
I4
11
46305767
ENSBTAG00000002288
NT5DC4
0I3
I1
00
I4
11
46753903
ENSBTAG00000019072
PSD4
0I1
I1
00
I2
144
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
11
47409577
ENSBTAG00000009033
TEX37
I3
I2
I1
10
I5
11
59791410
ENSBTAG00000006027
USP34
02
I1
01
211
78322010
ENSBTAG00000015490
HS1BP3
00
I1
00
I1
11
87943875
ENSBTAG00000001140
IAH1
I3
3I1
10
011
95486109
ENSBTAG00000017576
GPR144
I3
I3
I1
00
I7
11
99453894
ENSBTAG00000038831
PHYHD1
11
I1
00
111
99461001
ENSBTAG00000010601
DOLK
12
10
04
12
11906181
ENSBTAG00000001133
VWA8
I3
1I1
00
I3
12
12478433
ENSBTAG00000014752
AKAP11
1I3
I1
I1
0I4
12
12480845
ENSBTAG00000014752
AKAP11
1I1
I1
I1
0I2
12
14136897
ENSBTAG00000016691
LACC1
I3
I3
I1
00
I7
12
53049081
ENSBTAG00000032821
SCEL
I3
I3
I1
00
I7
12
90761209
ENSBTAG00000019961
ATP4B
I3
I3
I1
00
I7
13
3815664
ENSBTAG00000034991
SLX4IP
00
I1
00
I1
13
11787934
ENSBTAG00000008650
CAMK
1D
I3
0I1
00
I4
13
11787938
ENSBTAG00000008650
CAMK
1D
I3
0I1
00
I4
13
52471684
ENSBTAG00000013753
DDRGK1
02
I1
00
113
53934930
ENSBTAG00000039520
SIRPB1
0I3
I1
00
I4
13
53934947
ENSBTAG00000039520
SIRPB1
00
I1
00
I1
13
53939853
ENSBTAG00000039520
SIRPB1
0I3
I1
00
I4
13
54540425
ENSBTAG00000047150
RTEL
I3
0I1
00
I4
13
55047740
ENSBTAG00000031718
OGFR
01
I1
00
013
59988205
ENSBTAG00000027390
RTFDC1
01
I1
00
013
63045750
ENSBTAG00000009144
BPIFA2A
0I3
I1
00
I4
145
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
13
63169610
ENSBTAG00000019752
BPIFA2B
0I1
I1
00
I2
13
65396708
ENSBTAG00000006021
CEP250
0I3
I1
00
I4
14
1977494
ENSBTAG00000026350
SPATC1
I3
I3
I1
00
I7
14
3640695
ENSBTAG00000015985
GPR20
00
I1
00
I1
14
35568945
ENSBTAG00000037399
Uncharacterized
30
I1
00
2
14
46885750
ENSBTAG00000047661
FABP12
0I1
I1
00
I2
14
57184022
ENSBTAG00000026247
PKHD1L1
I3
3I1
00
I1
14
68635363
ENSBTAG00000017165
MATN2
01
I1
00
0
15
182679
ENSBTAG00000046228
Olfactoryreceptor
00
I1
0o
I1
15
183317
ENSBTAG00000046228
Olfactoryreceptor
00
I1
00
I1
15
183404
ENSBTAG00000046228
Olfactoryreceptor
00
I1
00
I1
15
248692
ENSBTAG00000045558
LOC100299084
00
I1
0o
I1
15
30189209
ENSBTAG00000005357
VPS11
02
I1
00
1
15
35048788
ENSBTAG00000005340
SERGEF
0I3
I1
00
I4
15
46301941
ENSBTAG00000002289
NLRP14
I3
I3
I1
00
I7
15
75033140
ENSBTAG00000012053
ACCSL
1I3
I1
00
I3
15
75037645
ENSBTAG00000012053
ACCSL
I3
I3
I1
00
I7
15
80244489
ENSBTAG00000046527
LOC100139830
I3
0I1
00
I4
15
80255710
ENSBTAG00000046285
OR8I2
10
I1
00
0
15
80288759
ENSBTAG00000001291
OR8H3
1I1
I1
00
I1
15
80389745
ENSBTAG00000045709
LOC514057
13
I1
00
3
15
80485593
ENSBTAG00000035985
LOC101909743OR8K1
1I1
I1
00
I1
15
80555931
ENSBTAG00000022858
LOC783561
I3
0I1
00
I4
15
80949689
ENSBTAG00000039110
OR8U1
1I3
I1
00
I3
146
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
15
82754556
ENSBTAG00000021375
OR10Q1
01
I1
00
015
82925332
ENSBTAG00000021233
OR5B17
I3
0I1
00
I4
16
4562017
ENSBTAG00000019799
FCAM
RI3
I3
I1
00
I7
16
38319636
ENSBTAG00000014794
SCYL3
I3
0I1
00
I4
16
62574355
ENSBTAG00000035226
TOR1AIP1
00
I1
00
I1
17
53095395
ENSBTAG00000020414
DHX37
1I1
I1
00
I1
17
53112655
ENSBTAG00000020414
DHX37
0I3
I1
00
I4
17
66091531
ENSBTAG00000010847
FOXN4
0I1
I1
00
I2
17
72377532
ENSBTAG00000008996
SFI1
0I3
I1
00
I4
18
25623550
ENSBTAG00000005170
GPR114
0I3
I1
01
I3
18
34956513
ENSBTAG00000001286
ELMO
33
3I1
11
718
46372041
ENSBTAG00000015912
DMKN
0
1I1
00
018
52154806
ENSBTAG00000001262
IRGQ
I3
1I1
00
I3
18
54081277
ENSBTAG00000009337
PNMA
L1
0I3
I1
00
I4
18
54130955
ENSBTAG00000048021
PNMA
L2
I3
1I1
00
I3
18
57819768
ENSBTAG00000003231
LIM2
1
I3
10
0I1
19
21444103
ENSBTAG00000011011
SSH2
I3
I3
I1
00
I7
19
31072152
ENSBTAG00000022509
DNAH
9I3
0I1
00
I4
19
31113972
ENSBTAG00000022509
DNAH
90
0I1
00
I1
19
32790843
ENSBTAG00000015294
COX10
12
10
04
19
42251785
ENSBTAG00000013685
KRT31
I3
0I1
00
I4
19
51395194
ENSBTAG00000015980
FASN
1I1
10
01
19
56191175
ENSBTAG00000007910
EXOC7
3I3
I1
00
I1
19
57266251
ENSBTAG00000011387
NAT9
12
I1
00
2
147
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
20
7639801
ENSBTAG00000005633
RGNEF
1I1
I1
10
0
20
23410431
ENSBTAG00000008871
DDX4
1I3
I1
00
I3
20
58884236
ENSBTAG00000005514
TRIO
I3
0I1
00
I4
20
64171739
ENSBTAG00000005021
SEMA5A
01
10
02
21
6239119
ENSBTAG00000016495
LINS
02
10
03
21
20345878
ENSBTAG00000012326
LOC101906218
00
I1
00
I1
21
20346289
ENSBTAG00000012326
LOC101906218
I3
0I1
00
I4
21
26837733
ENSBTAG00000047587
Uncharacterized
10
00
01
21
31003599
ENSBTAG00000047166
SH2D7
0I3
I1
00
I4
21
55274195
ENSBTAG00000039117
FAM179B
0I3
I1
00
I4
21
55826463
ENSBTAG00000014001
MAP1A
0I3
I1
00
I4
21
58173384
ENSBTAG00000009836
CHGA
I3
I1
I1
00
I5
21
59120719
ENSBTAG00000046590
FAM181A
01
I1
00
0
21
60482484
ENSBTAG00000017096
ERICH1
0I3
I1
00
I4
21
60482510
ENSBTAG00000017096
ERICH1
01
I1
00
0
21
62843142
ENSBTAG00000012833
ATG2B
0I3
I1
00
I4
22
52415959
ENSBTAG00000015839
MAP4
00
I1
00
I1
22
52421734
ENSBTAG00000015839
MAP4
1I1
I1
00
I1
22
52422046
ENSBTAG00000015839
MAP4
I3
I3
I1
00
I7
22
52426104
ENSBTAG00000047174
Uncharacterized
I3
00
00
I3
22
52426668
ENSBTAG00000047174
Uncharacterized
10
00
01
22
56951657
ENSBTAG00000021645
MBD4
1I1
I1
00
I1
22
56974847
ENSBTAG00000021643
EFCAB12
0I1
I1
00
I2
22
57797674
ENSBTAG00000031115
Uncharacterized
01
00
01
148
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
23
45885052
ENSBTAG00000005248
OFCC1
00
I1
00
I1
24
21324604
ENSBTAG00000000998
SLC39A6
02
I1
00
1
24
21453364
ENSBTAG00000011622
C18orf21
I3
I3
I1
00
I7
24
46764909
ENSBTAG00000015210
LOXHD1
0I3
10
0I2
24
50327195
ENSBTAG00000002660
CCDC11
I3
11
01
0
24
58158178
ENSBTAG00000014284
ALPK2
I3
I1
I1
00
I5
24
58158620
ENSBTAG00000014284
ALPK2
0I1
I1
00
I2
24
58158784
ENSBTAG00000014284
ALPK2
I3
I3
I1
00
I7
24
58204003
ENSBTAG00000014284
ALPK2
I3
I3
I1
00
I7
24
58204479
ENSBTAG00000014284
ALPK2
I3
2I1
00
I2
25
37433180
ENSBTAG00000040568
Uncharacterized
00
I1
00
I1
26
18116487
ENSBTAG00000046239
C10orf12
0I1
I1
00
I2
26
25715996
ENSBTAG00000010522
Uncharacterized
10
00
01
26
25764570
ENSBTAG00000046727
Olfactoryreceptor
00
I1
00
I1
27
5031078
ENSBTAG00000007340
LOC101906349
0I1
I1
00
I2
27
32832557
ENSBTAG00000011684
RAB11FIP1
10
I1
00
0
28
284228
ENSBTAG00000000079
CCSAP
0I1
I1
00
I2
28
1690403
ENSBTAG00000048153
Uncharacterized
10
00
01
28
35718807
ENSBTAG00000047317
CL43
1I1
I1
00
I1
28
44081765
ENSBTAG00000032527
ERCC6
I3
01
00
I2
29
1985881
ENSBTAG00000004081
FAT3
30
I1
00
2
29
7530523
ENSBTAG00000046360
Uncharacterized
01
00
01
29
7530663
ENSBTAG00000046360
Uncharacterized
1I1
00
00
29
26397931
ENSBTAG00000013568
UEVLD
I3
I3
I1
00
I7
149
Interestingly13ofthe22genesassociatedtohumansyndromescarryvariants
appearing in the short list of the 35 positives and only 6 in that of 135 with
negative values This in spite of the low weight (from 1 to 1) given to this
parameterEightofthe35positivevariantsareinproteinsstilluncharacterized
Sixvariantshadascoreof4orhigherThehighestscore(totalscore=7)wasfor
ELMO3 (Engulfment and cell motility gene 3) This gene is involved in cell
motility and migration ELMO3 is also a homolog of C( elegans Ced12 that is
required for many developmental processes included gonadal morphogenesis
Mutant for this gene in C( elegans suffer from abnormal development usually
with lethal consequences or sublethal developmental defects including small
embryosdelayeddevelopmentandreducedfertility(Gumiennyetal2001)In
human it is expressed in testis prostate and ovary
(httpwwwgenecardsorgcgibincarddispplgene=ELMO3)
ThegeneDNAJC13(DnaJ(Hsp40)homologsubfamilyCmember13)hadatotal
score of 5 This gene is involved in the onset of Parkinson disease DNAJC13
regulates the dynamics of clathrin coats on early endosomes Cellular analysis
shows that the mutation Arg855Ser identified by VilarintildeoGuumlell et al (2014)
confers a gainoffunction that impairs endosomal transportApossible roleof
DNAJC13geneinTourettesyndromechronicticdisorderisunderinvestigation
(Sundarametal2011)
A second genehad a scoreof 5THBS2 (Thrombospondin2) It belongs to the
thrombospondin family and is a powerful inhibitor of tumour growth and
angiogenesis It is highly expressed in testis in fetal Leydig cell Hirose et al
(2008) found a significant association between an intronic SNP in the THBS2
gene and lumbar disc herniation Thrombospondin2 is also a matricellular
proteinfoundinhumanserumDeletionofTSP2causesagedependentdilated
cardiomyopathy (Hanatani et al 2014) Mutant mice for this gene display a
variety of mutant phenotype Kyriakides et al 1998 suggest that THBS2
modulates the cell surface properties of mesenchymal cells affecting cell
functionssuchasadhesionandmigration
150
Three genes had a score of 4 all three are associated to human syndromes
Mutation in HERC2 (HECT and RLD domain containing E3 ubiquitin protein
ligase2)isinvolvedinautosomalrecessivementalretardation38(Puffenberger
etal2012)Jietal(2000)identifiedinmutantmouseneuromuscularsecretory
vesicledefectsandspermacrosomedefectsMoreovergeneticvariationsinthis
gene are associated with skinhaireye pigmentation variability Mutations in
DOLK gene (dolichol kinase) are associated with dolichol kinase deficiency
(Lefeberetal2011)COX10(cytochromecoxidaseassemblyhomolog10)gene
is associated to mitochondrial complex IV deficiency causing Leigh syndrome
(Antonickaetal2003)
Conclusions)
Exomesequencingthereforeprovidedvaluableinformationoncodingsequence
variants and on their potential effect on protein function As a group variants
classifiedasdeleteriouswererarerthanvariantsclassifiedasneutralSincethis
isanexpectedeffectofpurifyingselectionthissetofvariantwaslikelyenriched
forvariantshavingaphenotypiceffect
Theinclusionintheanalysesofplusandminusvariantanimalsforfertilitytraits
likelypermittedtosequencethemostimportantvariantsinfluencingthesetraits
butthelownumberofanimalssequencedcombinedwiththecomplexityofthe
traitpermittedtofindonlyasuggestiveassociationbetweenSNPsandtraitsItrsquos
likely that amuch larger number of individuals is to be sequenced if genome
wideassociationisthestrategyoneswanttopursue
We explored an alternative strategy to confirm suggestivedeleterious variants
based on indirect evidence of the variant effect by assessing i) evolutionary
constraintsassociatedtotheaminoacidresiduecodedbythevariantii)variant
behaviour in a large population In the absence of direct genotyping of the
candidatevariantsweusedhaplotypeinformationassurrogateiii)functionand
effectsinotherspecies
151
We attempted to quantify these criteria through a score subjective and only
indicativebutpermitted tohighlighted35genes so farneglected in cattlebut
deservingfurtherattentionandpotentiallyhavingdeleteriouseffectsinHolstein
andothercattlebreedsThesemaybepartoftherecessivedeleterioussetthat
constitutethegeneticloadofItalianHolstein
References)AntonickaHLearySCGuercinGHAgarJNHorvathRKennawayNG
HardingCOJakschMShoubridgeEA2003MutationsinCOX10
resultinadefectinmitochondrialhemeAbiosynthesisandaccountfor
multipleearlyonsetclinicalphenotypesassociatedwithisolatedCOX
deficiencyHumMolGenet122693ndash2702doi101093hmgddg284
AshwellMSHeyenDWSonstegardTSVanTassellCPDaYVanRaden
PMRonMWellerJILewinHA2004DetectionofQuantitativeTrait
LociAffectingMilkProductionHealthandReproductiveTraitsin
HolsteinCattleJournalofDairyScience87468ndash475
doi103168jdsS00220302(04)731860
AstonKICarrellDT2009GenomeWideStudyofSingleNucleotide
PolymorphismsAssociatedWithAzoospermiaandSevere
OligozoospermiaJournalofAndrology30711ndash725
doi102164jandrol109007971
AulchenkoYSRipkeSIsaacsADuijnCMvan2007GenABELanRlibrary
forgenomewideassociationanalysisBioinformatics231294ndash1296
doi101093bioinformaticsbtm108
BamshadMJNgSBBighamAWTaborHKEmondMJNickersonDA
ShendureJ2011ExomesequencingasatoolforMendeliandisease
genediscoveryNatRevGenet12745ndash755doi101038nrg3031
BieseckerLGShiannaKVMullikinJC2011Exomesequencingtheexpert
viewGenomeBiology12128doi101186gb2011129128
152
BiffaniSMarusiMBiscariniFCanavesiF2005Developingagenetic
evaluationforfertilityusingangularityandmilkyieldascorrelatedtraits
InterbullBulletin063
BiffaniSSamoreacuteABCanavesiF2002Inbreedingdepressionforproduction
reproductionandfunctionaltraitsinItalianHolsteincattleInstitut
NationaldelaRechercheAgronomique(INRA)pp0ndash4
BolgerAMLohseMUsadelB2014TrimmomaticAflexibletrimmerfor
IlluminaSequenceDataBioinformaticsbtu170
doi101093bioinformaticsbtu170
BrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingand
MissingDataInferenceforWholeGenomeAssociationStudiesByUseof
LocalizedHaplotypeClusteringAmJHumGenet811084ndash1097
CharlesworthDWillisJH2009ThegeneticsofinbreedingdepressionNat
RevGenet10783ndash796doi101038nrg2664
ChunSFayJC2009Identificationofdeleteriousmutationswithinthree
humangenomesGenomeRes191553ndash1561
doi101101gr092619109
ConsortiumTEP2012AnintegratedencyclopediaofDNAelementsinthe
humangenomeNature48957ndash74doi101038nature11247
DePristoMABanksEPoplinRGarimellaKVMaguireJRHartlC
PhilippakisAAdelAngelGRivasMAHannaMMcKennaA
FennellTJKernytskyAMSivachenkoAYCibulskisKGabrielSB
AltshulerDDalyMJ2011Aframeworkforvariationdiscoveryand
genotypingusingnextgenerationDNAsequencingdataNatGenet43
491ndash498doi101038ng806
GoymerP2007SynonymousmutationsbreaktheirsilenceNatRevGenet8
92ndash92doi101038nrg2056
GumiennyTLBrugneraEToselloTrampontACKinchenJMHaneyLB
NishiwakiKWalkSFNemergutMEMacaraIGFrancisRSchedl
TQinYVanAelstLHengartnerMORavichandranKS2001CED
12ELMOaNovelMemberoftheCrkIIDock180RacPathwayIs
RequiredforPhagocytosisandCellMigrationCell10727ndash41
doi101016S00928674(01)005207
153
HanataniSIzumiyaYTakashioSKimuraYArakiSRokutandaTTsujita
KYamamotoETanakaTYamamuroMKojimaSTayamaS
KaikitaKHokimotoSOgawaH2014Circulatingthrombospondin2
reflectsdiseaseseverityandpredictsoutcomeofheartfailurewith
reducedejectionfractionCircJ78903ndash910
HiroseYChibaKKarasugiTNakajimaMKawaguchiYMikamiY
FuruichiTMioFMiyakeAMiyamotoTOzakiKTakahashiA
MizutaHKuboTKimuraTTanakaTToyamaYIkegawaS2008
AfunctionalpolymorphisminTHBS2thataffectsalternativesplicingand
MMPbindingisassociatedwithlumbardischerniationAmJHum
Genet821122ndash1129doi101016jajhg200803013
httpsgenomeucsceduindexhtml[WWWDocument]nd
JamrozikJFatehiJKistemakerGJSchaefferLR2005EstimatesofGenetic
ParametersforCanadianHolsteinFemaleReproductionTraitsJournalof
DairyScience882199ndash2208doi103168jdsS00220302(05)728952
JanisEWiggintonDJC2005AnoteonexacttestsofHardyWeinberg
equilibriumAmericanjournalofhumangenetics76887ndash93
doi101086429864
JiYRebertNAJoslinJMHigginsMJSchultzRANichollsRD2000
StructureoftheHighlyConservedHERC2GeneandofMultiplePartially
DuplicatedParalogsinHumanGenomeRes10319ndash329
KentWJSugnetCWFureyTSRoskinKMPringleTHZahlerAM
HausslerandD2002TheHumanGenomeBrowseratUCSCGenome
Res12996ndash1006doi101101gr229102
KumarPHenikoffSNgPC2009Predictingtheeffectsofcodingnon
synonymousvariantsonproteinfunctionusingtheSIFTalgorithmNat
Protoc41073ndash1081doi101038nprot200986
KyriakidesTRZhuYHSmithLTBainSDYangZLinMTDanielson
KGIozzoRVLaMarcaMMcKinneyCEGinnsEIBornsteinP
1998Micethatlackthrombospondin2displayconnectivetissue
abnormalitiesthatareassociatedwithdisorderedcollagenfibrillogenesis
anincreasedvasculardensityandableedingdiathesisJCellBiol140
419ndash430
154
LaneEACroweMABeltmanMEMoreSJ2013Theinfluenceofcowand
managementfactorsonreproductiveperformanceofIrishseasonal
calvingdairycowsAnimReprodSci14134ndash41
doi101016janireprosci201306019
LefeberDJdeBrouwerAPMMoravaERiemersmaMSchuurs
HoeijmakersJHMAbsmannerBVerrijpKvandenAkkerWMR
HuijbenKSteenbergenGvanReeuwijkJJozwiakAZuckerN
LorberALammensMKnopfCvanBokhovenHGruumlnewaldSLehle
LKapustaLMandelHWeversRA2011AutosomalRecessive
DilatedCardiomyopathyduetoDOLKMutationsResultsfromAbnormal
DystroglycanOMannosylationPLoSGenet7e1002427
doi101371journalpgen1002427
LiHDurbinR2009FastandaccurateshortreadalignmentwithBurrows
WheelertransformBioinformatics251754ndash1760
doi101093bioinformaticsbtp324
LiHHandsakerBWysokerAFennellTRuanJHomerNMarthG
AbecasisGDurbinR1000GenomeProjectDataProcessingSubgroup
2009TheSequenceAlignmentMapformatandSAMtoolsBioinformatics
252078ndash2079doi101093bioinformaticsbtp352
LiMXKwanJSHBaoSYYangWHoSLSongYQShamPC2013
PredictingMendelianDiseaseCausingNonSynonymousSingle
NucleotideVariantsinExomeSequencingStudiesPLoSGenet9
e1003143doi101371journalpgen1003143
MacArthurDGBalasubramanianSFrankishAHuangNMorrisJWalter
KJostinsLHabeggerLPickrellJKMontgomerySBAlbersCA
ZhangZConradDFLunterGZhengHAyubQDePristoMA
BanksEHuMHandsakerRERosenfeldJFromerMJinMMuXJ
KhuranaEYeKKayMSaundersGISunerMMHuntTBarnes
IHAAmidCCarvalhoSilvaDRBignellAHSnowCYngvadottirB
BumpsteadSCooperDNXueYRomeroIGWangJLiYGibbs
RAMcCarrollSADermitzakisETPritchardJKBarrettJCHarrow
JHurlesMEGersteinMBTylerSmithC2012Asystematicsurvey
155
oflossoffunctionvariantsinhumanproteincodinggenesScience335
823ndash828doi101126science1215040
MajewskiJSchwartzentruberJLalondeEMontpetitAJabadoN2011
WhatcanexomesequencingdoforyouJMedGenet48580ndash589
doi101136jmedgenet2011100223
McClureMCBickhartDNullDVanRadenPXuLWiggansGLiuG
SchroederSGlasscockJArmstrongJColeJBVanTassellCP
SonstegardTS2014BovineExomeSequenceAnalysisandTargeted
SNPGenotypingofRecessiveFertilityDefectsBH1HH2andHH3Reveal
aPutativeCausativeMutationinSMC2forHH3PLoSONE9e92769
doi101371journalpone0092769
McClureMKimEBickhartDNullDCooperTColeJWiggansGAjmone
MarsanPColliLSantusELiuGESchroederSMatukumalliLVan
TassellCSonstegardT2013FineMappingforWeaverSyndromein
BrownSwissCattleandtheIdentificationof41ConcordantMutations
acrossNRCAMPNPLA8andCTTNBP2PLoSONE8e59251
doi101371journalpone0059251
McKennaAHannaMBanksESivachenkoACibulskisKKernytskyA
GarimellaKAltshulerDGabrielSDalyMDePristoMA2010The
GenomeAnalysisToolkitAMapReduceframeworkforanalyzingnext
generationDNAsequencingdataGenomeRes201297ndash1303
doi101101gr107524110
MiHDongQMuruganujanAGaudetPLewisSThomasPD2010
PANTHERversion7improvedphylogenetictreesorthologsand
collaborationwiththeGeneOntologyConsortiumNuclAcidsRes38
D204ndashD210doi101093nargkp1019
MrodeRKearneyJFBiffaniSCoffeyMCanavesiF2009Short
communicationGeneticrelationshipsbetweentheHolsteincow
populationsofthreeEuropeandairycountriesJournalofDairyScience
925760ndash5764doi103168jds20081931
PelosoGMAuerPLBisJCVoormanAMorrisonACStitzielNOBrody
JAKhetarpalSACrosbyJRFornageMIsaacsAJakobsdottirJ
FeitosaMFDaviesGHuffmanJEManichaikulADavisBLohman
156
KJoonAYSmithAVGroveMLZanoniPRedonVDemissieS
LawsonKPetersUCarlsonCJacksonRDRyckmanKKMackey
RHRobinsonJGSiscovickDSSchreinerPJMychaleckyjJC
PankowJSHofmanAUitterlindenAGHarrisTBTaylorKD
StaffordJMReynoldsLMMarioniREDehghanAFrancoOHPatel
APLuYHindyGGottesmanOBottingerEPMelanderOOrho
MelanderMLoosRJFDugaSMerliniPAFarrallMGoelA
AsseltaRGirelliDMartinelliNShahSHKrausWELiMRader
DJReillyMPMcPhersonRWatkinsHArdissinoDZhangQWang
JTsaiMYTaylorHACorreaAGriswoldMELangeLAStarrJM
RudanIEiriksdottirGLaunerLJOrdovasJMLevyDChenYDI
ReinerAPHaywardCPolasekODearyIJBoreckiIBLiuY
GudnasonVWilsonJGvanDuijnCMKooperbergCRichSSPsaty
BMRotterJIOrsquoDonnellCJRiceKBoerwinkleEKathiresanS
CupplesLA2014AssociationofLowFrequencyandRareCoding
SequenceVariantswithBloodLipidsandCoronaryHeartDiseasein
56000WhitesandBlacksTheAmericanJournalofHumanGenetics94
223ndash232doi101016jajhg201401009
PryceJEVeerkampRFThompsonRHillWGSimmG1997Genetic
aspectsofcommonhealthdisordersandmeasuresoffertilityinHolstein
FriesiandairycattleAnimalScience65353ndash360
doi101017S1357729800008559
PryceJEWoolastonRBerryDPWallEWintersMButlerRShafferM
2014WorldTrendsinDairyCowFertilityin10ThWorldCongressof
GeneticsAppliedtoLivestockProduction
PuffenbergerEGJinksRNWangHXinBFiorentiniCShermanEA
DegrazioDShawCSougnezCCibulskisKGabrielSKelleyRI
MortonDHStraussKA2012Ahomozygousmissensemutationin
HERC2associatedwithglobaldevelopmentaldelayandautismspectrum
disorderHumMutat331639ndash1646doi101002humu22237
PurcellSNealeBToddBrownKThomasLFerreiraMARBenderD
MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKA
157
ToolSetforWholeGenomeAssociationandPopulationBasedLinkage
AnalysesAmJHumGenet81559ndash575
SahanaGNielsenUSAamandGPLundMSGuldbrandtsenB2013Novel
HarmfulRecessiveHaplotypesIdentifiedforFertilityTraitsinNordic
HolsteinCattlePLoSONE8e82909doi101371journalpone0082909
SchuumltzEScharfensteinMBrenigB2008Implicationofcomplexvertebral
malformationandbovineleukocyteadhesiondeficiencyDNAbased
testingondiseasefrequencyintheHolsteinpopulationJDairySci91
4854ndash4859doi103168jds20081154
SimmonsMJCrowJF1977MutationsAffectingFitnessinDrosophila
PopulationsAnnualReviewofGenetics1149ndash78
doi101146annurevge11120177000405
SoggiuAPirasCHusseinHACanioMDGaviraghiAGalliAUrbaniA
BonizziLRoncadaP2013UnravellingthebullfertilityproteomeMol
BioSyst91188ndash1195doi101039C3MB25494A
StitzielNOKiezunASunyaevS2011Computationalandstatistical
approachestoanalyzingvariantsidentifiedbyexomesequencing
GenomeBiology12227doi101186gb2011129227
SundaramSKHuqAMSunZYuWBennettLWilsonBJBehenME
ChuganiHT2011ExomesequencingofapedigreewithTourette
syndromeorchronicticdisorderAnnNeurol69901ndash904
doi101002ana22398
VanRadenPMOlsonKMNullDJHutchisonJL2011Harmfulrecessive
effectsonfertilitydetectedbyabsenceofhomozygoushaplotypesJournal
ofDairyScience946153ndash6161doi103168jds20114624
VilarintildeoGuumlellCRajputAMilnerwoodAJShahBSzuTuCTrinhJYuI
EncarnacionMMunsieLNTapiaLGustavssonEKChouP
TatarnikovIEvansDMPishottaFTVoltaMBeccanoKellyD
ThompsonCLinMKShermanHEHanHJGuentherBL
WassermanWWBernardVRossCJAppelCresswellSStoesslAJ
RobinsonCADicksonDWRossOAWszolekZKAaslyJOWuR
MHentatiFGibsonRAMcPhersonPSGirardMRajputMRajput
158
AHFarrerMJ2014DNAJC13mutationsinParkinsondiseaseHumMolGenet231794ndash1801doi101093hmgddt570
WangKLiMHakonarsonH2010ANNOVARfunctionalannotationofgeneticvariantsfromhighthroughputsequencingdataNuclAcidsRes38e164ndashe164doi101093nargkq603
WathesDCPollottGEJohnsonKFRichardsonHCookeJS2014HeiferfertilityandcarryoverconsequencesforlifetimeproductionindairyandbeefcattleAnimal8Suppl191ndash104doi101017S1751731114000755
YangYMuznyDMReidJGBainbridgeMNWillisAWardPABraxtonABeutenJXiaFNiuZHardisonMPersonRBekheirniaMRLeducMSKirbyAPhamPScullJWangMDingYPlonSELupskiJRBeaudetALGibbsRAEngCM2013ClinicalWhole
ExomeSequencingfortheDiagnosisofMendelianDisordersNewEnglandJournalofMedicine3691502ndash1511doi101056NEJMoa1306555
ZiminAVDelcherALFloreaLKelleyDRSchatzMCPuiuDHanrahanFPerteaGTassellCPVSonstegardTSMarccedilaisGRobertsMSubramanianPYorkeJASalzbergSL2009Awholegenome
assemblyofthedomesticcowBostaurusGenomeBiology10R42doi101186gb2009104r42
159
Supplementary)files)
YoucanfindChapter6supplementaryfilesat
- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)
- GoogleDrive(httptinyurlcomPhDMMChapter06)orscantheQRcode
)
Supplementary)Figure)S1)Coverage)analyses)
A)Averagecoverageperanimals
B)Boxplotofchromosomescoverageperanimal
C)BoxplotofanimalcoverageperchromosomeBluelinerepresenttheexpected
50Xcoverage
Supplementary)Figure)S2)Distribution)of)variant)sequencing)depth)
)
Supplementary) Figure) S3) Correlation) between) sequencing) depth) and)
number) of) discordant) genotypes) In) red) values) for) two) outlier) animals)
having)concordance)lt846))
)
Supplementary)Figure)S4)Multidimensional)scaling)of)Italian)Holstein)bulls)
sequenced))
)
Supplementary)Figure)S5)Multidimensional)scaling)of)Italian)Holstein)bulls)
analysed)with)the)800K)SNPchip))
160
Supplementary) Figure) S6) MAF) distribution) on) neutral) (blue)) and)
deleterious)(red))variants))
)
Supplementary) Table) S1) EBV) distribution) on) Italian) Holstein) population)
and)SNPchip)dataset)
Sheet1EBVthresholdvaluetodefineplusandminustailforPROT(kgprotein)
SCC(somaticcellcount)FERT(fertility)andLONG(functional longevity) All
ItalianHolsteinFAbullswereincludeinthisanalysis
Sheet2Numberofdatasetanimalsineachtailforeachtrait
Supplementary)Table)S2)Exome)probes)statistics)number)and)bp)covered)
per)chromosomes)
Supplementary) Table) S3) Concordance) between) animal) genotypes) in)
sequencing)and)SNPchip)data)
Error1 HD data homozygotes ndash Exome data heterozygotes Error2 HD data
heterozygotesndashExomedatahomozygotes
Supplementary) Table) S4) List) of) suggestive) genes) deriving) from) the)
analysis)of)exomes)of)animals) in)the)tails)of)male)and)female)fertility)EBV)
distributions))
Header(code
bull Bfreq_totFrequencyofBalleleinentiredataset
bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset
bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset
bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset
bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset
bull CallRate_totSNPcallratendashentiredataset
bull CallRate_MHSNPcallratendashhighmalefertilitydataset
bull CallRate_MLSNPcallratendashlowmalefertilitydataset
bull CallRate_FHSNPcallratendashhighfemalefertilitydataset
bull CallRate_FLSNPcallratendashhighmalefertilitydataset
bull HWeqHardyWeinbergequilibriumpvalue
bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele
bull Geno_HetNumberofanimalswithheterozygotegenotype
161
bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele
bull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)dataset
bull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)dataset
bull LOW_CRSNPcallratefromlowfertility(maleandfemale)dataset
bull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallele
bull LOW_Het Number of animals from low fertility (male and female) dataset with
heterozygotegenotype
bull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallele
bull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and
female)dataset
bull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)dataset
bull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)dataset
bull HIGH_CRSNPcallratefromhighfertility(maleandfemale)dataset
bull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallele
bull HIGH_Het Number of animals from high fertility (male and female) dataset with
heterozygotegenotype
bull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallele
bull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and
female)dataset
bull Male_TREND inheritance models implemented in PLINK trend test on male fertility
dataset
bull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale
fertilitydataset
bull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male
fertilitydataset
bull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility
dataset
bull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on
femalefertilitydataset
bull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale
fertilitydataset
bull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male
andfemalefertilitydataset
bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test
onmaleandfemalefertilitydataset
162
bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston
maleandfemalefertilitydataset
bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset
bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset
bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset
Supplementary)Table) S5) Regions) candidate) to) carry) deleterious) variants)
identified)by)OHW)approach))
Supplementary)Table) S6) Regions) candidate) to) carry) deleterious) variants)
identify)by)LOH)approach))
Ingreyregionsremovedinfurtheranalysis
Supplementary)Table)S7)List)of)candidate)deleterious)variants)mapping)in)
OHW)and)LOH)regions))Header(code
bull Bfreq_totFrequencyofBalleleinentiredataset
bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset
bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset
bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset
bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset
bull CallRate_totSNPcallratendashentiredataset
bull CallRate_MHSNPcallratendashhighmalefertilitydataset
bull CallRate_MLSNPcallratendashlowmalefertilitydataset
bull CallRate_FHSNPcallratendashhighfemalefertilitydataset
bull CallRate_FLSNPcallratendashhighmalefertilitydataset
bull HWeqHardyWeinbergequilibriumpvalue
bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele
bull Geno_HetNumberofanimalswithheterozygotegenotype
bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele
bull HW_insideDeleteriousmutationsinsideldquoOutofHardyWeinbergrdquoregion
bull HW_outsideDeleteriousmutationsoutsideldquoOutofHardyWeinbergrdquoregion
bull DH_insideDeleteriousmutationsinsideldquoLackofHomozygosityrdquoregion
bull DH_outsideDeleteriousmutationsoutsideldquoLackofHomozygosityrdquoregion
163
bull GeneClass Classifications of gene 1 deleterious mutations inside OHW and LOHregion2deleteriousmutationsinsideOHWorLOHregion
bull 2deleteriousmutationsoutsideOHWandorLOHregionbull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)datasetbull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)datasetbull LOW_CRSNPcallratefromlowfertility(maleandfemale)datasetbull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallelebull LOW_Het Number of animals from low fertility (male and female) dataset with
heterozygotegenotypebull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallelebull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and
female)datasetbull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)datasetbull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)datasetbull HIGH_CRSNPcallratefromhighfertility(maleandfemale)datasetbull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallelebull HIGH_Het Number of animals from high fertility (male and female) dataset with
heterozygotegenotypebull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallelebull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and
female)datasetbull Male_TREND inheritance models implemented in PLINK trend test on male fertility
datasetbull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale
fertilitydatasetbull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male
fertilitydatasetbull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility
datasetbull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on
femalefertilitydatasetbull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale
fertilitydatasetbull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male
andfemalefertilitydataset
164
bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test
onmaleandfemalefertilitydataset
bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston
maleandfemalefertilitydataset
bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset
bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset
bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset
Supplementary)Table)S8)Results)of)haplotype)analysis)))Header(code
bull Chromchromosome
bull PosphysicalpositiononUMD31genome
bull Exome_homonumberofhomozygotefordeleteriousmutationsequencedanimals
bull Exome_hetnumberofheterozygotesequencedanimalsfordeleteriousmutation
bull Outcome NotFound It was impossible to associated a haplotype to the mutation
NotOpt Non completely invariant haplotype could be associated to themutationOK
Putativehaplotype associate tomutationwas found In few case alternativehaplotype
wasreported
bull LenlengthofhaplotypeinnumberofSNPs
bull Pop_numnumbersofselectedhaplotypedetectedinHolsteinpopulation
bull Pop_homo numbers of homozygote animals in Holstein population for selected
haplotype
bull Pop_het numbers of heterozygotes animals in Holstein population for selected
haplotype
bull ExpExpectednumberofhomozygoteanimal
bull Alternative if only one heterozygote sequenced animals for deleteriousmutationwas
foundalltwopossiblehaplotypeswheretested
Supplementary)Table)S9)Descriptive)results)for)the)Biological)Process)(BP)))
NumberofgenesbelongingtoBiologicalProcessGOterms)
Supplementary) Table) S10) Descriptive) results) for) the)Molecular) Function)
(MF))
NumberofgenesbelongingtoMolecularFunctionGOterms
165
CHAPTER7
Conclusion
Since the domestication event occurred in the Fertile Crescent around 10000yearsagothebovinespecieswasselectedbyhumankindtofulfilhisneedsThecreationofbreedsoccurredaround200yearsagointensifiedthedifferentiationbetween populations and promoted the development of intensive selectionschemesTraditionalAandnowadaysgenomicAselectionsareproducingpositivegenetictrendsinmanyproductivetraitsdespitethestillpoorknowledgeaboutphenotype biology A deeper understanding of genome organization and genebiology would increase the accuracy of genomic evaluation by incorporatingpriorknowledgeThisthesisexploresthebovinegenomebyhighAthroughputtechnologiesAsuchasNextGenerationSequencingandHighADensityGenotypingAwithestablishedand innovative procedures to investigate the biology of complex traits and toprovidenewdataandmoleculartoolstobovinebreedingThese goals have been achieved by complementary approaches and differentmethodsInChapter2selectionsignatureswereinvestigatedindependentlyintofive Italian breeds using Illumina BovineSNP50 BeadChip Then amultiAbreedapproach was applied clustering breeds into dairy (Italian Holstein ItalianBrownand Italian Simmental) andbeef (Marchigiana Piedmontese and ItalianSimmental) groups Regions under recent positive selection shared by breedswithasameproductiveaptitudewereinvestigatedinfurtherdetailbyretrievinggenes in the region and analysing their function by ontology and pathwayanalysis In the dairy group selection signals appear nearby genes related tomammarygland(metabolismandresistancetomastitis)andfeedingadaptationIn thebeefgroupgenes inselectedregionsare involved inanimalgrowthandmeatquality(textureandjuiciness)Hencecandidategenesidentifiedbelongtonetworks and pathways important in directing a breed towards either beef ordairyproductionIn Chapter 3 a GWAS approach was used in dairy cattle to correlate SNP tocomplex phenotypes having an economic value Three independent ldquoclassicalrdquosingleAmarkerregressionswereruninthreeItaliandairycattle(ItalianHolsteinItalianBrownandItalianSimmental)UsingthetraditionalsingleSNPapproach
168
only few SNPs were above the nominal genomeAwide significance thresholdwhile applying a geneAbased method identified significant candidate genesassociatedwithmilkphysiology andmammary glanddevelopment in all threepopulationsInterestinglygenesfoundindifferentbreedshavesimilarfunctionbut are not the same suggesting that breed history demography selectioncriteriaandstochasticeventsasgeneticdrifthaveanimpactonthedefinitionofthemainmoleculartargetofselectionschemesCurrent intensive selection strategies rapidly spread the genetic gain in thepopulationbutholdtheassociatedriskofdisseminatingdefectivegenescausinggenetic defects (eg mulefoot complex vertebral malformation and others) ordecreasing fertility In Chapter 6 different technologies and approaches werecombined to obtain filter and prioritize genes candidate to be deleteriousExome sequence datawere exploited to identify deleterious variants affectingItalianHolsteincattleinasetof20animalsextremeforfertilitytraitsMorethan5000 candidate deleterious SNP variants were identified To bypass the smallnumberlimitationandthelackofpowerofanystatisticanalysison20animalsastepwise approachwas used to first filter and then prioritize these candidatedeleterious variants Polymorphisms were retained if having asymmetricdistribution in bulls from opposite tails of fertility EBV distributions andormapping in genomic regions outlier for lack of homozygosity in a 1009 bullpopulation genotyped with the Illumina BovineHD BeadChip A total of 191variants in163genespassed theprocessesThesewereassessedby searchingadditional evidences in favour or against their deleterious effect throughdifferentapproachesi)checkingthevariantconservationacross100vertebratespecies by comparative genomics ii) finding haplotypes associated withcandidatemutationsandevaluatingtheirfrequencyinHolsteinsiii)integratinghumanandmouse functionalannotationsA scorewasproposed toweight theresult of these assessments and rank the variants 35 of them had positivescores These occurred in genes involved in basic biological mechanisms asfertilitydevelopmentand immunityandresultedenriched ingenesassociatedtogeneticdefectsreportedinhumanThesemutationsarestrongcandidatestobe recessive deleterious variants worth to be further investigated in a larger
169
populationtobetterunderstandtheirbiologyimpactandpossibleapplicationinbreedingTo increase the reliability of results based on genomic data and to reduce theimpact of subjective choices in data analysis new toolsapproaches weredeveloped(iethemultiAbreedapproachinSelectionSignaturendashChapter2thepostAGWAS geneAbased approach ndash Chapters 3 and 4) and the existent onescriticallyevaluated(imputationmethodsndashChapter5)In Chapter 4 the postAGWAS geneAbased method used in Chapter 3 wasimplemented in the MUGBAS (MUlti species GeneABased Association Suite)software In contrast to existing software MUGBAS is species and annotationfree AmultipleAtesting correctionwas included in the package in addition tocomputing parallelization for faster processing and plot representation forbetterunderstandingThesoftwareusessingleAmarkerGWASdataandagivenannotation to estimate geneAregionAwise association pAvalues As previouslymentioned in Chapter 3 the ldquoclassicalrdquo GWAS approach (single SNP) foundsignals only in ItalianHolsteinwhileMUGBAS geneAbased approach identifiedsignificantsignalsinallbreedsinvestigatedIn Chapter 5 the impact of different bovine reference sequences on genotypeimputation was investigated In this work Illumina BovineSNP50 BeadChipgenotypesof ItalianSimmentalbullswerereAmappedonthethreemostrecentreferencegenomeassembliesWecomparedfourdifferent imputationmethodsfromlow(IlluminaBovineLDv10Beadchip)tohighdensityThetestpopulationwasdividedintofoursubpopulationsonthebasisofrelationshipandgenotypedataavailabilityUpdatingSNPcoordinatesonthe three testedcattlereferencegenome assemblies determined only a slight variation on imputation resultswithinmethodsOntheotherhandlargedifferencesintermsofaccuracywereobservedamongthefourimputationmethodstestedGenetic structure of industrial breeds was evaluated assessing LinkageDisequilibrium (LD) Since LD decay is correlated to intensity of selection weexpected different behaviours in the breeds analysed Indeed as evident inChapter 2 ndash Figure 7 breeds subjected to lower selection intensity (ie
170
MarchigianaPiedmonteseandItalianSimmental)displayalowerpersistenceof
LDatgreaterdistancescomparedtothosesubjectedtohigherselectionpressure
(ieItalianHolsteinandItalianBrown)Chapter3investigatesLDintheDGAT1
region of dairy cattle breeds In Italian Simmental LD is lower compared to
Italian Holstein and Italian Brown In spite of this significant association
betweentheDGAT1regionandmilktraitswasfoundbothinItalianSimmental
andinItalianHolsteinInItalianBrownnoassociationwasfoundbecauseDGAT1
isfixed
Data and results presented in this thesis represent a clear example of the
progress of molecular technologies occurred in last few years spanning from
mediumdensitygenotypingtonextgenerationsequencing
Thebiologylandscapesofsomecomplextraits(iemilkproductionandquality
fertility)were improved finding candidate genes andpathwaysnotpreviously
associated to the phenotype in the bovine species Moreover insights on
deleterious mutations were inferred through a deep and integrated analysis
usingNGSdata
This new knowledge could be used to improve cattle breeding For example
deleteriousrecessivevariantscouldbeincludedinbullevaluationsbybreedersrsquo
associations to reduce the genetic load of deleterious genes in parallel to
improvingproductivity
While technological progress has been impressivewe are still only scratching
thesurfaceofanimalbiologyNewfrontiersarebeingopenedbywholegenome
sequencing of many animals (eg in the 1000 bull genome project) and by
stepping beyond sequencing (a number of epigenomic projects in animals are
following the footprints of the human ENCODE project) The trend is towards
unravelling complexity and data production ability is to be flanked by data
storingandprocessingandaboveallbynewmethodsandtoolsfordataanalysis
171
ACKNOWLEDGMENTS
ldquoNon egrave la destinazione ma il viaggio che contardquo Questi tre anni sono stati un gran bel viaggio di crescita personale e ricchi diesperienze Non egrave stato un viaggio solitarioma condiviso con vecchie e nuoveconoscenzechemisembradoverosomenzionareInnanzitutto vorrei ringraziare il prof Paolo AjmoneMarsan chemi ha dato lapossibilitagrave di fare questa esperienza Grazie per gli insegnamenti e per leopportunitagravechemihaoffertoSperodinonavertraditolesueaspettativeAgli attuali colleghi Lorenzo Stefano Licia Elisa Riccardo ed Elia e a quellipassati Nicola Fatima Raffaele Ezequiel e Barbara va il mio piugrave sincero edaffettuoso ringraziamento per aver condiviso anche solo in parte questomiopercorso Grazie per avermi sopportato (e non egrave poco) guidato (non solo nelcampolavorativo)efattovivereinunmodospecialelrsquoambientedilavoroUnsincerograzievaatuttiimieicolleghidelXXVIIcicloAgrisystemRicorderogravesempretuttiimomentichehopotutocondividereconvoi UngrazieancheachihafattopartedellamiaesperienzasvedeseGraziedicuoreaCarlFlavioChristianEricaatuttiiragazziitalianiatuttiicolleghildquosvedesirdquoea tutte le altre persone che ho conosciuto Mi avete fatto vivere appienolrsquoesperienzaesteraefattotornareacasaconqualcheconsapevolezzainpiugrave UnsentitoringraziamentoatuttiglialtricolleghiconcuihocollaboratoinquestianniRingrazio tutti gli amici e conoscenti che mi sono stati vicino per avermisupportatoesopportatoognunoamodoproprio Ultimomanonperimportanza ilringraziamentochevaatuttalamiafamigliaper avermi aiutato spronato sostenuto sempre e in particolare in questaesperienzaSemplicementegrazie PS Adirlatuttaancheladestinazionenonegravepoicosigravemale
173
- INDEX
- CHAPTER 1 - Introduction
- CHAPTER 2 - Relative extended haplotype homozygosity (rEHH) signals across breeds reveal dairy and beef specific signatures of selection
- CHAPTER 3 - Searching new signals for productive traits in three Italian cattle breeds
- CHAPTER 4 - MUGBAS a species free gene based ased association suite
- CHAPTER 5 - Imputation accuracy is robust to cattle reference genome updates
- CHAPTER 6 - Quest for deleterious recessive variants in Italian Holstein bulls combining exome sequencing and high density SNP analysis
- CHAPTER 7 - Conclusion
- ACKNOWLEDGMENTS
-
Scuola di Dottorato per il Sistema Agro-alimentare
Doctoral School on the Agro-Food System
cycle XXVII
SSD AGR17
High-throughput sequencing technologies and genomics insights into the bovine species
Coordinator Chmo Prof Antonio Albanese
_______________________________________
Candidate Marco Milanesi Matriculation n 4011148
Tutor Prof Paolo AjmoneMarsan PhD Lorenzo Bomba PhD Stefano Capomaccio PhD Ezequiel Louis Nicolazzi
Academic Year 20132014
Alla$mia$famiglia$
A$mio$nonno$Mario$
INDEXamp
Chapteramp1ndashIntroduction 3
Chapteramp2Relativeextendedhaplotypehomozygosity(rEHH)signalsacrossbreedsrevealdairyandbeefspecificsignaturesofselection
11
Chapteramp3SearchingnewsignalsforproductivetraitsthroughgenebasedassociationanalysisinthreeItaliancattlebreeds
49
Chapteramp4ampMUGBASaspeciesfreegenebasedassociationsuite
81
Chapteramp5Imputationaccuracyisrobusttocattlereferencegenomeupdates
89
Chapteramp6QuestfordeleteriousrecessivevariantsinItalianHolsteinbullscombiningexomesequencingandhighdensitySNPanalysis
99
Chapteramp7ndashConclusion 167
amp
Acknowledgmentsampamp 173
CHAPTER1
Introduction
Background
The bovine species counts almost 15 billion of heads classified in some 800
breeds worldwide (httpfaostat3faoorg) The species is adapted to a wide
varietyofdifferentenvironmentandhusbandrysystemsandisoftenrearedfor
multiple purposes (Taberlet et al 2008) draft power milk meat leather
productionandotheruses(Elsiketal2009)
European cattle (Bos$ taurus) derive from a single domestication event of the
auroch (Bos$primigenius)occurred in theFertileCrescent around10000years
ago(Taberletetal2011)FollowingdomesticationcattlefollowedtheNeolithic
expansionofagricultureandcolonisedEurope(seeAjmoneMMarsanetal2010
for a review) Many thousand years of natural and artificial selection have
shaped cattle phenotype and genotype to meet environmental conditions and
human needs and together with demographic events and genetic drift have
started thedifferentiation of the bovine species in diverse and locally adapted
populations This differentiation process accelerated in the last two centuries
whenthebreedconceptwasinventedsothatnowthisspeciescountshundreds
of different breeds with high diversity in coat colour body size production
aptitudeadaptationtoclimate tolerancetodiseaseetc(Taberletetal2008
Elsiketal2009TheBovineHapMapConsortium2009)
Thisdiversityisnowbeingprogressivelylostwiththeextinctionofmanylocal
populationsmainlyforeconomicreasonsSelectioneffortshavethereforebeen
focussedonafewoutperformingindustrialbreedsthatarepresentlyselectedfor
specialiseddairyorbeefproductionandsometimesdoublepurpose
In these breeds breeding schemes are well developed and have been
traditionally based on the genetic evaluation of sires and dams using data
collectedfromperformanceandprogenytestingandunbiasedlinearprediction
modelsThesearefoundedontheFisherinfinitesimalmodellingofquantitative
trait variation and need no genomic information to estimate breeding values
(EBVs) The genetic progress recorded using this approach has been relevant
particularly for traits having medium to high heritability In Italian Holstein
genetic trends for themain production traits (milk protein and fat yield) are
positiveHoweveraccurateestimateofbreedingvalueshasahighcostinterms
4
of investment and time so that 5 to 6 years are needed to have a dairy bullprogenytested(wwwanafiit)Theadventofgenomicselection(Meuwissenetal2001)markedamilestoneindairycattlebreedingTheideawasdevelopedbeforethemoleculartoolswereinplaceforimplementitinpracticeandhasbecomearealityonlyrecentlyInthisapproach progeny tested bulls are genotyped at many thousand loci and thevalue of their EBVs used to estimate the value of each allele at each locusgenotyped These values are then used to predict the genetic value of younganimals on the basis of their genotype As a result animals from the newgeneration receive an EBV with the same accuracy of estimates obtained byprogenytestingbutmuchearlierThislowerthegenerationintervalandgreatlyincreasestherateofgeneticprogress indairypopulationsThe initial ideawasso good that nowadays genomic selection is widely applied in industrialcountriestothemajordairycattlebreedsWiththedecreaseofgenotypingcostit is likely that its use will be soon expanded to animals selected for otherpurposes (eg beef cattle) and other species (eg pigs) However genomicselection is an extension of traditional selection It is extremely useful inbreedingforcomplextraitsbutgivesnobiologicalinformationonthegenesthatcontrolthemIfthetraditionalselectionviewsasiregenomeasabigblackboxhaving a value but no clue why genomic selection views the same genomedisassembled inmany small boxes but still black and stillwithno clueon thereasonoftheirvalueIt is reasonable to think that a deeper understanding of the biologicalmechanisms controlling traits may further increase the accuracy of genomicevaluationsIndeedthenewtechnologiesoffernewopportunitiesinthissensebyloweringcostsandincreasingprecisionofdataproductionThis thesis is exploring the use of these technologies for retrieving biologicalinformationfromgenomicdataItusesmethodsnowconsideredldquotraditionalrdquoingenomics (GWAS and analysis of selection sweeps) and less traditional ones(gene centered GWAS and exome analysis) All data used here have beenproducedwithin twonationalprojectson farmanimalgenomics fundedby theItalianMinistryofAgriculture
5
M SELMOL ldquoRicerca e innovazione nelle attivitagrave di miglioramento genetico
animalemediante tecniche di geneticamolecolare per la competitivitagrave del
sistemazootecniconazionaleIrdquo(ResearchandInnovationinanimalbreeding
sheepgoatbuffalohorseanddonkeyI)
M INNOVAGENldquoRicercaeinnovazionenelleattivitagravedimiglioramentogenetico
animalemediante tecniche di geneticamolecolare per la competitivitagrave del
sistema zootecnico nazionale IIrdquo (Research and Innovation in animal
breedingsheepgoatbuffalohorseanddonkeyII)
In these projects the research group I work with M Istituto di Zootecnica at
Universitagrave Cattolica del Sacro Cuore di Piacenza M coordinated all the research
activitiesconcerningtheapplicationofgenomicstechniquesindairycattle
Contentsofthethesis
The thesis is structured into a short introduction (this first chapter) 5 core
chapters(Chapter2to6)andashortConclusion(Chapter7)Withtheexception
of Chapter 6 still in preparation the other core chapters containmanuscripts
that at time of writing are under review for their publication in international
peerMreviewedjournalsormanuscript
InChapter2selectionsignatureindairyandbeefcattleareinvestigatedwiththe
Illumina BovineSNP50 Genotyping BeadChip Five Italian cattle breeds (Italian
Holstein Italian Brown Italian Simmental Marchigiana and Piedmontese) are
analyzedwiththerEHHmethod(Sabetietal2002)tofindsignaturesofrecent
selectionCandidategenesinthevicinityofsignaturessharedbyeitherdairyor
beefbreeds are identified and theirpossible role indairy andbeefproduction
discussed
In Chapter 3 a GWAS analysis is run to investigate production traits in three
Italian dairy cattle breeds (Italian Holstein Italian Brown Italian Simmental)
Data are analysed with a well established method (Amin et al 2007)
6
implemented in theGenABELRpackage (Aulchenkoet al 2007) andwith the
newgeneMbasedmethodAssociationisinvestigatedbetween50KSNPsandmilk
production(milkyield)andqualitytraits(percentageofmilkproteinandfat)
InChapter4thegeneMbasedmethodusedinthepreviouschapter3inspiredby
theVEGASapproach(Liuetal2010)isdescribedThesoftwarewasdeveloped
andreMcodedinourlaboratoryinordertopermittheanalysisofanyspeciesin
additiontohumanMultiMtestingcorrectionandplotrepresentationcompletethe
toolsmadeavailable
InChapter5wetesttheimpactoftheuseofdifferentreferencesequencesinthe
imputation step Italian Simmental data are used to test variation in genomeM
wide inputation accuracy using different cattle reference sequences (BTAU42
UMD31 and BTAU46) To increase the analyses reliability four different
inputationsoftwaresaretested
InChapter6Holsteindataareanalysedcombiningexomesequencesand800K
SNPs data to seek deleterious recessive variantsWhile in natural populations
the destiny of these variants is to disappear in livestock breeds they can be
maintained and disseminated if carried by a highly productive sire The
intersection of deleterious mutations identified by sequencing and lack of
homozygousregionsidentifiedbytheanalysisofthepopulationwiththeHDSNP
chipidentifiedgenescandidatetocarryrecessivedeleteriousvariants
Objective
Themainobjectiveofthethesiswastoinvestigatethebovinegenomewiththe
aid new highMthroughput technology and to explore new strategies for linking
biology (genes and their function) to traits This linkage once validated may
increase the accuracy of genomic prediction and find application in selection
schemesegineradicatingthemostdeleteriousvariantsandorintheplanning
ofmatings
7
ReferencesAjmoneMMarsanPGarciaJFLenstraJA2010OntheoriginofcattleHow
aurochsbecamecattleandcolonizedtheworldEvolAnthropol19148ndash157doi101002evan20267
AminNvanDuijnCMAulchenkoYS2007AGenomicBackgroundBasedMethodforAssociationAnalysisinRelatedIndividualsPLoSONE2e1274doi101371journalpone0001274
AulchenkoYSKoningDMJdeHaleyC2007GenomewideRapidAssociationUsingMixedModelandRegressionAFastandSimpleMethodForGenomewidePedigreeMBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614
ElsikCGTellamRLWorleyKC2009TheGenomeSequenceofTaurineCattleAWindowtoRuminantBiologyandEvolutionScience324522ndash528doi101126science1169588
LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKMHaywardNKMontgomeryGWVisscherPMMartinNGMacgregorS2010AVersatileGeneMBasedTestforGenomeMwideAssociationStudiesTheAmericanJournalofHumanGenetics87139ndash145doi101016jajhg201006009
MeuwissenTHHayesBJGoddardME2001PredictionoftotalgeneticvalueusinggenomeMwidedensemarkermapsGenetics1571819ndash1829
SabetiPCReichDEHigginsJMLevineHZPRichterDJSchaffnerSFGabrielSBPlatkoJVPattersonNJMcDonaldGJAckermanHCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash837doi101038nature01140
TaberletPCoissacEPansuJPompanonF2011ConservationgeneticsofcattlesheepandgoatsCRBiol334247ndash254doi101016jcrvi201012007
8
TaberletPValentiniARezaeiHRNaderiSPompanonFNegriniR
AjmoneMMarsanP2008Arecattlesheepandgoatsendangered
speciesMolEcol17275ndash284doi101111j1365M294X200703475x
TheBovineHapMapConsortium2009GenomeMWideSurveyofSNPVariation
UncoverstheGeneticStructureofCattleBreedsScience324528ndash532
doi101126science1167936
9
CHAPTER2
Relative(extendedamphaplotype(homozygosity)(rEHH)ampsignalsampacrossampbreedsamprevealampdairyampand$beef$specificampsignaturesampofampselection
LBomba1ELNicolazzi2MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly2FondazioneParcoTecnologicoPadanoViaEinsteinLocCascinaCodazza26900LodiItalyTheseauthorscontributedequallytothiswork
____________________SubmittedtoGeneticSelectionEvolution
Abstract
Background
Nowadays many methods have been implemented to scan for signatures of
selection at genomescale level within and across breeds Extended haplotype
homozygosity is a proficient approach to detect genome regions under recent
selective pressure within breeds The objective of this study is to identify
common regions under positive recent selection shared by most represented
dairyandbeefItaliancattlebreeds
Results
A total of 3220 animals from Italian Holstein (2179) Italian Brown (775)
Simmental (493) Marchigiana (485) and Piedmontese (379) breeds were
genotyped with the Illumina BovineSNP50 BeadChip v1 in the frame of the
ItalianlivestockgenomicprojectsldquoSelMolrdquoandldquoProzoordquoAfterstandardcleaning
proceduresgenotypeswerephasedandcorehaplotypesidentifiedThedecayof
Linkage Disequilibrium (LD) for each core haplotype was assessedmeasuring
the extended haplotype homozygosity (EHH) To correct for the lack of good
estimatesof local recombinationrates therelativeEHH(rEHH)wascalculated
for each corehaplotypeGenomic regionsunder recentpositive selectionwere
identifiedasthosehighlyfrequentcorehaplotypeswithsignificantrEHHvalues
Significant independentcorehaplotypeswerealignedacrossbreeds to identify
signalsofrecentselectionsharedbydairyandsharedbybeefbreedsOverall82
and 87 common regions under selectionwere detected among dairy and beef
cattlebreedsrespectivelyBioinformaticsanalysisidentified244and232genes
mapping in these common genomic regions Pathway and network analysis of
significant genes revealed molecular functions biologically related to signal of
selectionsdetectedinmilkormeatproductiontypes
Conclusions
Resultssuggest that themultibreedapproach isamethodto identifyselection
signatures in cattle breeds under selection for the same production goal
12
allowing to better pinpoint the genomic regions of interest in dairy and beef
production
Background
The advent of genomic technologies and the consequent availability of single
nucleotidepolymorphism(SNP)datahaveenabledthestudyofselectionevents
on the cattle genome (Hayes et al 2008 Qanbari et al 2010) Such selection
signals arise from environmental or anthropogenic pressures and help to
understand the causes that led to breed formation These studies are usually
addressedinaldquotopdownrdquoapproach(ShahzadandLoor2012)fromgenotypeto
phenotypeinwhichgenomicdataisstatisticallyevaluatedtodetectdirectional
selectionThephenotypeneededtorunselectionsignaturesisintendedinavery
generalsenseabreedaproductionaptitudeorevenageographicareaoforigin
Thisapproachholdsthepotentialto investigatetraitsasadaptationtoextreme
climatesanddifferentfeedingandhusbandrysystemsresiliencetodiseasesetc
thatareveryexpensivedifficultandsometimesimpossibletostudywithclassic
GWAS (GenomeWide Association Study) approaches Therefore it is expected
thatgenomic informationretrievedinthesestudiesonlypartiallyoverlapsthat
identified by GWAS on specific traits Searching for these ldquosignatures of
selectionrdquo is of interest for understanding molecular mechanisms underlying
important biological processes and providing new insights to functional
biologicalinformation(egdiseaseandorimportanteconomictraits(Nielsenet
al 2007)) To date many methods have been implemented to scan these
selectionsignaturesatthegenomiclevelMethodsdifferwithrespecttoseveral
properties (Lenstra et al 2012) there are methods that perform analyses
withinbreedoracrossbreedsthatcomparealleleorhaplotypefrequenciesand
size that detect recent or ancient selection events either still segregating or
fixatedintopopulations(Lenstraetal2012OrsquoBrienetal2014Utsunomiyaet
al2013)Differentmethodshavedifferentsensitivityandrobustnesseg they
maybeinfluencedatadifferentextentbymarkerascertainmentbiasanduneven
distributionofrecombinationhotspotsalongthegenome
13
Extended haplotype homozygosity (EHH) is a method that has been used in
many animal species including cattle (Pan et al 2013 Qanbari et al 2010
Zhangetal2006)TheEHHmethodwasfirstdevelopedbySabetietal(2002)
for human population analysis The main objective of this methodology is to
identifylongrangehaplotypesinheritedwithoutrecombinationeventsMorein
detailunderaneutralevolutionmodelchangesinallelefrequenciesaredriven
onlybygeneticdriftInthisscenarioanewvariantwouldrequirealongtimeto
reach high frequency in the population and the Linkage Disequilibrium (LD)
surroundingitwoulddecayduetorecombinationevents(Kimura1984)Inthe
caseofpositiveselectionaquickriseinfrequencyofabeneficialmutationina
relatively short time will preserve the original haplotype structure as the
number of recombination events would be limited Therefore a signature of
positiveselectionisdefinedasaregionwithanalleleinlongrangeLDlocatedin
anuncommonlyhighfrequencyhaplotype
One of themost interesting features of thismethod is that it is able to detect
genomic regions that are under recent selection These recent selection
signatures can be therefore used to identify the genomic regions involved in
breedformationInadditionEHHmethoddoesnotrequireanydefinitionofan
ancestralallele(asneededintheintegratedhaplotypescoreiHS(Voightetal
2006) and is suited to be applied to SNP data because it is less sensitive to
ascertainmentbiasthanothermethods(Nielsenetal2007)HowevertheEHH
methodgeneratesa largenumberof falsepositiveandnegative resultsdue to
heterogeneous recombination rates along the genome (Qanbari et al 2010)
Moreover another drawback not specific of EHH but shared by all selection
signaturemethod is the lack of robust inferences able to discern true signals
fromthosegeneratedbychance(Kemperetal2014)
To partially account for these limitations Sabeti et al (2002) developed the
relative extended haplotype homozygosity (rEHH) including an empirical
methodologytoobtainsignificantsignalsTherEHHofaputativecorehaplotype
compares its original EHH valuewith that of other haplotypes at that specific
locus using all other haplotypes as control for local variation in the
recombination rate Therefore it identifies genomic regions carrying variants
14
under selection that are still segregating in the population analysed Although
thesemethodswere conceived for human populations theywere successfully
appliedto livestockspeciessuchaspig(Maetal2012)andcattle(Qanbariet
al2010)
After domestication about 10000 years ago in the fertile crescent taurine
cattle have colonized Europe and Africa and have been selected for different
human needs (Taberlet et al 2011) Particularly in the last centuries the
anthropic pressure led to the formation of hundreds of specialized breeds
adapted to different environmental conditions and linked to local tradition
constitutingagenepooldeservingattentionforconservation(AjmoneMarsanet
al2010)Someofthesebreedshavebeenselectedspecificallyfordairybeefor
bothproductiontypesfollowingstrongartificialselectionforthesetraits(Hayes
etal2009)Thepresentstudyaimsat identifyingsignalsof recentdirectional
selectionusingrEHHmethodindairyandbeefproductiontypesusingdatafrom
five Italian dairy beef and dual purpose cattle breeds We focused on the
significant core haplotypes scanned by rEHH that are shared among breeds of
thesameproductiontypeThenweusedabioinformaticsapproachto identify
positionalcandidategeneslyingwithinthegenomicregionsunderselectionand
investigatetheirbiologicalrole
Methods
SampleAnimalsandGenotyping
A total of 4311 bulls of five Italian dairy beef anddual purpose breedswere
genotyped with the Illumina BovineSNP50 BeadChip v1 (Illumina San Diego
CA)joiningthegenotypingeffortsoftwoItalianprojects(namelyldquoSelMolrdquoand
ldquoProzoordquo)Thedatasetincluded101replicatesand773siresonspairsusedfor
downstreamqualitycheckofdataproducedIndetailgenotypesof2954dairy
(2179 ItalianHolsteinand775 ItalianBrown)864beef (485Marchigianaand
379 Piedmontese) and 493 dual purpose (Italian Simmental) bulls were
availableDataqualitycontrol(QC)wasperformedintwostagesafirststageon
15
animals independently for each breed applying the same methods and
thresholdsandasecondstageappliedonmarkersacrossall theindividualsof
thedatasetThefirststageexcludedindividualswithunexpectedlyhigh(gt02)
Mendelian errors for fatherson couples andwith low call rates (lt95)The
secondstageexcludedi)SNPswithle25missingvaluesinthewholedataset
orcompletelymissing inonebreed ii)SNPswithminorallele frequencyle5
and iii) SNPs in sexual chromosomes and with unassigned chromosome or
physicalposition
EstimationofrEHH
HaplotypeswereobtainedusingfastPHASEsoftwareconsideringdefaultoptions
(ScheetandStephens2006)andwererunbreedwiseandchromosomewisein
eachbreedPedigreeinformationforallbullswereprovidedbytheirrespective
breederassociations andwereused to filteroutdirect relatives (in fatherson
pairsthesonwasmaintainedinthedatasetandthefatherdiscarded)andover
representedfamilies(amaximumof5randomlychosenindividualsperhalfsib
familywasallowed)Thefinaldatasetcontainingtheseldquolessrelatedrdquoanimalswill
behenceforthcalled ldquononredundantrdquodatasetThisnonredundantdatasetwas
used also to calculate within breed pairwise wholegenome Linkage
Disequilibrium(LD)Ther2statisticforallpairsofmarkerswasobtainedusing
PLINK software v107 (Purcell et al 2007)Thedecayof LDwas assessedby
averagingr2valuesupto1Mbdistance
To test if the population structure influenced the detection of rEHH we
replicated all analyses on the whole Italian Holstein dataset without any
population correction (ie ldquoredundantrdquo dataset) focusing on genes or gene
clustersthatarewellknowntobeunderrecentselectionincattle(ieldquocandidate
regionsrdquo) In particularwe focused on the casein polled gene cluster and coat
colourgenes(MC1RandKIT(Kemperetal2014Qanbarietal2010))
The EHH and rEHH calculations were performed using Sweep v11 software
(Sabetietal2002)Somedefaultprogramsettingshadtobemodifiedtoadapt
theseanalyses to thebovinegenomeas thesoftwarewasoriginallydeveloped
forhumangeneticanalysesSpecificallylocalrecombinationratesbetweenSNPs
16
wereapproximatedto1cMperMbEHHandrEHHcalculationswereperformed
breedwise and chromosomewise using an automatic core selection with
defaultoptions(ieconsideringlongestnonoverlappingcoresandlimitingcores
toatleast3andnomorethan20SNPs)asinQanbarietal(2010)AlthoughEHH
and rEHH values were obtained for all core haplotypes only those with
frequency ge 25 were retained for further analyses The (empirical) rEHH
significancethresholdwasobtainedbydividingtherEHHvaluesinto20binsof
5 frequency range each logtransforming withinbin values to achieve
normality Core haplotypes with pvalues lt 005 were considered significant
DetectionofcommonselectionsignatureswasobtainedcomparingrEHHvalues
acrossgenomicregionsacrossbreeds
Breedgroupingaccordingtoproductiontype
Toexamine thesignals common toeachof the twoproduction types (iedairy
andbeef) corehaplotypes thatsharedoneormoreSNP inat least twobreeds
from the same production type were selected The dual purpose Italian
Simmentalwas included inbothdairy (ItalianHolsteinand ItalianBrown)and
beef groups (Piedmontese and Marchigiana) since potentially it possesses
haplotypes selected for both production types All downstream analyses were
performedseparatelyforthedairyandthebeefproductiontypes
Detectionandannotationofcandidategenes
Genome annotation was performed using Biomart
(httpwwwbiomartorgindexhtml) a comprehensive source of gene
annotationformanyspeciesprovidedbyEnsembl(wwwensemblorg)Thelist
ofsharedhaplotypesregions(inbp)fordairyandbeefwasusedasinputfilein
theBiomartwebinterfaceThesetofgenesretrievedbyBiomartwasthenused
as input for theCanonicalpathwayandregulatorynetworkanalyses Ingenuity
PathwayAnalysis toolversion80(IPA IngenuityregSystems IncRedwoodCity
CA httpwwwingenuitycom) and a substantial examination of published
literaturewasusedtoexaminethefunctionalrelationshipsamongtheresulting
genes The IPA operates with a proprietary knowledge database providing
complementarypathwayanalysisforseveralspeciesincludingcattleIntheIPA
17
analyses the significance of each biological function identified was estimated
using Fisherrsquos exact test A Benjamini and Hochberg correction for multiple
testingwas calculated for function and canonical pathways For eachNetwork
IPAcomputedascore(pscore=log10(pvalue))fittingthesetstudiedgenesand
a list of biological functionspresent in the IPAknowledgedatabase The score
considered thenumberofgenes in thenetworkand thesizeof thenetwork to
estimatehowrelevantthisnetworkwasaccordingtothelistofgenesprovided
Thenthescorewasusedtorankthenetworks
Results
DatasetQC
Therepeatabilityobtainedfromthe101replicatespresentinthewholedataset
washigherthan998AfterthetwoQCstages105individualsand9730SNPs
were discarded Moreover after phasing 1292 individuals were removed to
reducethe largenumberofsibfamiliespresent intheredundantdatasetAfter
quality control procedures the final dataset had 44271 SNPs and 1132 514
393 410 and 364 individuals from Italian Holstein Italian Brown Italian
SimmentalMarchigianaandPiedmontesebullsrespectively(Table1)
18
Table1JNumberofanimalsgenotypedbeforeandafterQCTotalnumberofanimalgenotypedandrelativenumberofanimaldiscardedafterQCanalysis
Detectionofselectionsignatures
Sweepv11 softwaredetectedrEHHsignalswitha frequencyhigher than25overlapping the test candidate regions in both datasets corrected or not forpopulation structure In the redundant dataset we found only one significantrEHHsignal(caseincluster)whereasinthenonredundantdatasetweidentified5significantrEHHsignalsoverlappingallthetestcandidateregionsconsidered(caseinclusterpolledclusterMC1R)(Table2)
BreedTotal
Genotyped
ED15
misAN
ED1
REPL
ED1
MENDCleaned
HOL 2179 40 31 5 2093
BRW 775 6 16 4 749
SIM 493 6 6 2 479
MAR 485 37 38 410
PIE 379 5 10 364
19
Table2JComparisonofcandidateregionrEHHinnonJredundantandredundantdatasetrEHH overlapping candidate regions in both corrected (nonredundant) and not corrected
(redundant)datasetsforpopulation
Redundant NonRedundant
Candidate
geneBTA1
Closest
SNP(bp)CH2range CH2Frequency
rEHHJ
log(pval)CH2Frequency
rEHHJ
log(pval)
POLLED
Cluster1 1981154 18974181981154 H1079 093037 H1078 167174
MC1R 18 13657912 1331772014007505 H1069 120096 H1069 077167
KIT_BOVIN 6 72821175 7250492172821175 H1054 030030 H1054 033048
Casein
Cluster6 88427760 8835009888452835 H1047 094139 H1046 148133
Casein
Cluster6 88427760 8835009888452835 H2032 002003 H2033 002003
In total5526567847724388and4049corehaplotypeswitha frequency
higher than 25 were detected on Italian Holstein Italian Brown Italian
Simmental Marchigiana and Piedmontese breeds respectively A total of 838
866 740 692 and 613 core haplotypes were found significant outliers (see
Materials and Methods) in the aforementioned breeds Table 3 shows the
distribution of the total and significant core haplotypes per chromosome and
breed
20
Tableamp3amp(ampCoreamphaplotypesampdistributionampDistributionoftotalandsignificantcorehaplotypesperchromosomeandbreed
ampamp ItalianampampHolsteinamp
ItalianampampBrownamp
ItalianampSimmentalamp Marchigianaamp Piedmonteseamp
BTA1amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
1amp 389 66 389 68 335 49 310 51 288 46
2amp 312 47 314 52 267 44 257 42 238 33
3amp 292 48 313 62 264 46 236 34 227 34
4amp 268 46 279 50 245 41 237 39 229 39
5amp 223 32 231 29 200 30 183 27 171 22
6amp 291 51 307 40 265 45 252 38 233 42
7amp 249 37 253 46 214 39 195 33 204 27
8amp 268 39 270 38 235 35 223 32 209 31
9amp 227 37 228 41 210 37 165 22 187 30
10amp 219 36 234 34 192 25 196 26 168 25
11amp 248 39 246 34 209 30 205 33 187 25
12amp 162 24 176 18 148 21 148 25 135 22
13amp 205 32 200 19 155 18 178 35 135 19
14amp 175 17 189 25 168 29 143 22 129 19
15amp 180 31 185 29 148 23 139 13 130 19
16amp 176 28 192 29 160 25 149 19 127 15
17amp 170 27 183 29 168 31 135 22 123 16
18amp 133 18 153 19 119 14 108 10 82 12
19amp 137 22 130 23 118 16 99 18 92 15
20amp 184 35 172 33 145 23 124 23 124 17
21amp 145 23 147 30 113 18 110 21 94 13
22amp 140 24 145 19 115 21 112 18 104 13
23amp 106 13 100 16 67 10 64 9 57 8
24amp 129 11 140 20 124 17 97 23 101 16
25amp 91 12 98 11 68 10 77 12 70 6
26amp 125 7 114 14 93 12 96 15 90 16
27amp 91 12 91 15 75 11 84 12 60 10
28amp 88 9 96 10 70 10 66 9 55 11
29amp 103 15 103 13 82 10 72 9 67 12
TOTamp 5526 838 5678 866 4772 740 4388 692 4049 613
1BostaurusAutosomes2Detectedcorehaplotypeswithafrequencyinthebreedhigherthan253Significantcorehaplotypesatplt005
Comparisonamptoamppreviouslyampreportedampdataamp
AnumberofstudieshavesearchedselectionsweepsinHolstein(Qanbarietal
2010) Brown (Qanbari et al 2011) and Simmental (Fan et al 2014) breeds
Since different methods are expected to identify signatures having different
characteristicsthecomparisonwithpreviousstudiesislimitedtothosestudies
using thesamemethodand thesamebreed(s)analysed inourstudyOnlyone
study analysed (German) HolsteinRFriesian cattle with rEHH (Qanbari et al
2010)Qanbarietal(2010)reportedcandidategenesandgeneclustersunder
recent positive selection that we compared with our rEHH in dairy breeds
dataset(Table4)
Table4JCorehaplotypesincandidategenesfollowingQanbarietal2011
1BosTaurusChromosome2Corehaplotype
Thenumberofcorehaplotypesfoundinourstudyinthecandidateregionswas
lowerpreviouslyreportedWhenconsideringHolsteinsthetwomostsignificant
candidateregionsinbothstudiesmatch(caseinclusterandthesomatostatinSST
gene)althoughitisimpossibletodetermineifthehaplotypeunderselectionis
the same as this information was not provided in Qanbari et al (2010)
However other genes considered significant in Qanbari et al (with pvalues
ranging from 004 to 010) were not significant in our studyWhen using the
same loose significant threshold (pvalues le 010) ldquosignificantldquo signals were
foundalsoinItalianBrownfortheCaseinCluster(log10pvalue=109)andin
ItalianSimmentalfortheSSTgene(log10pvalue=126)
HOLSTEIN BROWN SIMMENTAL
Candidate
geneBTA1
Closest
SNP
(bp)
CH2rangeCH2
Frequency
rEHH
Hlog(p)CH2range
CH2
Frequency
rEHH
Hlog(p)CH2range
CH2
Frequency
rEHH
Hlog(p)
DGAT1 14 444963236653443936
H1057 011443936763332
H1042 0130007
Casein
Cluster6 88391612
8835009888452835
H1046 1481338832601288452835
H2068 0801098835009888452835
H1044 037022
8835009888452835
H2033 018030
8835009888452835
H3034 022045
GH 19 49652377 GHR 20 33908597
SST 1 813769568128358581376961
H1036 200189 8131845181376961
H1042 126053
8128358581376961
H2029 00630084 8131845181376961
H2042 006027
IGFH1 5 71169823
ABCG2 6 37374911 3731702038256889
H1044 031027
3731702038256889
H1040 029031
Leptin 4 95715500
LPR 3 855692038549710885594551
H1047 0910728549710885794693
H1068 0800638549710885794693
H1063 002005
8549710885594551
H2041 027025
PITH1 1 35756434
22
Signaturessharedbydairyandbybeefbreeds
Significantcorehaplotypeswerealignedacrossbreedsto identifythoseshared
among dairy or beef production types Since breeds can be considered
independentsetsofobservationssharedsignaturesaremorelikelytorepresent
real effects rather than false positives A total of 123 and 142 significant core
haplotypesweresharedbetweenatleasttwodairyorbeefbreedsrespectively
(File S1) Only 82 and 87 of those significant core haplotypes contained genes
considered positional candidates under positive selection In total 22 and
17 of the genomewas under selection for dairy and beef production types
respectivelyTherEHHaveragelengthswere216932and190994bpfordairy
andbeefrespectively
Genesetannotationandnetworkpathwayanalysis
Atotalof244and232annotatedgenesfellwithintheselectedregionsfordairy
andbeefproductiontypesrespectively(Table5)
23
Table5JStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreedsStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreeds
DairyCattle BeefCattle
BTA SelSig
n
Genes1 Sum
SelSignsize
(bp)
Mean
SelSign
size(bp)
SelSi
gn
Genes1 Sum
SelSignsize
(bp)
Mean
SelSign
size(bp)
1 7 11 1521608 138328 7 13 2263213 174093
2 5 9 2216685 246298 8 11 2658549 2416863 2 5 1619336 323867 7 21 3755847 1788504 7 21 5956755 283655 6 11 1761288 1601175 4 17 4506119 265066 5 11 1251127 1137396 2 4 1467738 366934 5 12 5149692 4291417 6 30 4842969 161432 3 28 11889191 4246148 0 0 0 0 4 12 3512215 2926859 4 7 1554863 222123 2 6 1106768 18446110 4 17 8839762 519986 2 3 573138 19104611 6 8 1841802 230225 1 1 100439 10043912 2 8 4244133 530517 1 1 536461 53646113 3 8 1933026 241628 2 8 985376 12317214 0 0 0 0 3 9 692397 7693315 5 9 1415405 157267 2 2 189094 9454716 3 6 1302506 217084 4 6 905719 15095317 3 4 471046 117762 2 9 1551010 17233418 0 0 0 0 3 23 2961718 12877019 3 23 5744143 249745 2 2 116938 5846920 1 2 148886 74443 2 4 627971 15699321 3 7 1930206 275744 1 5 1150760 23015222 3 5 620674 124135 3 10 2411751 24117523 0 0 0 0 1 1 68421 6842124 2 18 9166004 509222 2 4 803770 20094225 2 11 2016602 183327 1 3 329199 10973326 2 4 466134 116534 4 7 776080 11086927 1 1 631792 631792 2 3 1004717 33490628 0 0 0 0 1 1 93464 9346429 2 9 935209 103912 1 5 798300 159660TOT 82 244 65393403 216932 87 232 50024613 190994
1GenesidentifiedGenesintheregionssharedbyallthreedairybreeds(8genes)orallthreebeefbreeds(11genes)foramoredetailedanalysiswereselected(Fig1)
24
Figure1JGenomiclocationoftheselectionsignaturessharedamongstudiedbreedsa)GenesinensembletracksaredisplayedasaredboxcorehaplotypesandSNPsarecolouredinorange(MarchigianaMAR)inviolet(PiedmontesePIE)andpink(SimmentalSIM)b)Genesinensemble tracks are displayed as a red box core haplotypes and SNPs are coloured in blue(HolsteinHOL)ingreen(ItalianBrownBRW)andpink(SimmentalSIM)
All the genes identified were submitted to downstream pathwaynetwork
analyses(Table5Figure1FileS1)Interestinglygenesintheregionssharedby
allthreedairyorbeefbreedsweremanifestedinthemostlikelynetworksforthe
two production aptitudes The IPA analysis detected a total of 12 and 15
networksindairyandbeefbreedsrespectively
Indairybreeds network scores ranged from46 to2 (File S2) Eightnetworks
obtainedscoreshigherthan20andincludedmorethan14moleculeseachThey
wereassociatedtoi)geneexpressionRNAdamageandrepairproteinsynthesis
andposttranslationalmodificationii)cellassemblyorganizationfunctionand
maintenance iii) cell cycle proliferation and cancer iv) carbohydrate
metabolism v) gastrointestinal and neurological disease developmental and
endocrine system disorders A total of 23moleculeswere associatedwith the
highest ranked network (score = 46) In this the immunoglobulins mitogen
25
activatedproteinkinasesandNFkBcomplexesformacentralhublinkinggenes
involved in celltocell signaling and interaction hematological system
developmentandfunctionandimmunecelltraffickingfunctions(Figure2)
Figure2JThemostlikelyIPAnetworkindairycattlebreedsThefunctionsassociatedtothenetworkarecelltocellsignallingandinteractionhematologicalsystem development and function immune cell trafficking Nodes in red correspond to genesidentifiedincorehaplotypesoverlappinginallthethreebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds
Breast cancer antiNestrogen resistance gene 3 (BARC3) andpituitary glutaminyl
cyclase gene (QPCT) genes were common to all three dairy breeds and are
directlyconnectedwithmammaryglandmetabolisms(Ezuraetal2004GSun
et al 2012) The solute carrier family 2 member 5 (SLC2A5) gene facilitates
glucosefructose transport (Zhao et al 1998) and the zetaNchain (TCR)
26
associated protein kinase 70kDa (ZAP70) gene plays a critical role in Tcell
signaling (Bonnefont et al 2011) Calpain is another important complex that
togetherwithcalpainN3(CAPN3)mediatesepithelialcelldeathduringmammary
gland involution (Wilde et al 1997) Furthermore RAS guanyl nucleotide
releasingprotein(RASGRP1)activatestheErkMAPkinasecascaderegulatesT
cellsandBcellsdevelopmenthomeostasisanddifferentiationandisinvolvedin
the regulation of breast cancer cell (Madani et al 2004 Yasuda et al 2007
Wickramasingheetal2009)Genesinvolvedintheothernetworksarelistedin
FileS2
The scoreof thebeef cattlebreedsnetwork rangedbetween46and2 and the
numberofmoleculespresentineachnetworkrangedfrom23to1(FileS3)Asin
dairy breeds eight networks obtained a score higher than 20 and 13 ormore
molecules each These networkswere associated to i) carbohydrate lipid and
nucleic acidsmetabolism energyproduction and smallmoleculebiochemistry
ii) cardiovascular system development and function iii) cellular and tissue
developmentcellassemblyandorganizationfunctionandmaintenanceandcell
and organmorphology iv) cell cycle celltocell signaling and interaction v)
DNA replication recombination and repair and RNA posttranscriptional
modification vi) dermatological diseases and conditions infectious disease
organismal injury and abnormalities In the network with the highest score
(score=46)chondroitinsulfateproteoglycan4(CSPG4)andsnurportinN1(SNUPN)
genes were shared among all beef cattle breeds investigated CSPG4 gene is
relatedtomeattendernesswhileSNUPN isanimprintedgeneimportantinthe
embryodevelopmentandinvolvedinhumanmuscleatrophy(Narayananetal
2002) (Figure 3) All the genes included in this network were involved in
carbohydratemetabolismsmallmoleculesbiochemistryandcellcycleTheother
networkswere related to important signaling andmetabolic process essential
foranimalgrowth(FileS3)
27
Figure3JThemostlikelyIPAnetworkinbeefcattlebreedsThe functions associated to the network are carbohydrate metabolism small molecule
biochemistry and cell cycle Nodes in red correspond to genes identified in core haplotypes
overlapping in all the three breeds of each production type whereas those in green depict
overlappingcorehaplotypesinatleasttwoofthosebreeds
A total of 6 and 9 statistically significant Canonical Pathways (FDR le 005
log10(FDR) ge 13) were identified using IPA for dairy and beef breeds
respectively(Figure4)
28
Figure4JBarplotofstatisticallysignificantCanonicalPathwaysPvalues were corrected for multiple testing using BenjaminiHochberg method and arepresented in thegraphas log(pvalue)Thebar represents thepercentageof genes in a givenpathway that meet cutoff criteria within the total number of molecules that belong to thefunctiona)BarplotofstatisticallysignificantCanonicalPathways indairycattlebreedsb)BarplotofstatisticallysignificantCanonicalPathwaysinbeefcattlebreeds
Themost significant Canonical Pathway in dairywas for purinemetabolism (
log10(FDR) = 26) supporting the highly synthetic processes in the mammary
epithelium(Figure5)
29
Figure5JGenesdetectedunderrecentpositiveselectionindairycattleinvolvedinpurinemetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds
Inbeefproductiontheephrinreceptorsignal(log10(FDR)=27)wasthemost
significant Canonical Pathway (Figure 6) Among other functions ephrin is
knowntopromotemuscleprogenitorcellmigrationbeforemitoticactivation(Li
andJohnson2013)AlltheotherCanonicalPathwaysarereportedinFileS4
30
Figure6JGenesdetectedunderrecentpositiveselectioninbeefcattleinvolvedinEphrinmetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds
Discussion
In this work more than 4000 genotyped bulls of five dairy beef and dualpurposeItalianbreedswerescreenedsearchingforselectionsignaturesindairyand beef production types A strict data quality controlwas applied to reducepossiblesourcesofbiassuchastechnicalgenotypingproblemsandpopulationstructurethatcouldhavelargeimpactonrEHHresultsInfactsincetheimpactofpopulationstructureonrEHHvaluesisstillunexploredwereplicatedpartofouranalyseswithandwithoutcloserelativesinanattempttostudytheeffectofpopulationstructureonthisselectionsignaturemethodInprinciplepopulationstratificationshouldleadtoanoverrepresentationofcertainhaplotypessimplyduetothepresenceoftightpedigreelinks(egsirespassinghalfoftheirgeneticmaterialtotheirsons)ratherthanactualselectionprocessesForthisreasonfor
31
the full analyses all siresons couples were removed after haplotype phasing(retaining only the younger animal) and halfsib familieswere restricted to amaximum of five randomly chosen individuals in an attempt to reduce overrepresentation This assessment was restricted to four regions known to beunderselectioninItalianHolsteinasthecaseinclusterthepolledgeneclusterMC1R and KIT Italian Holstein was selected for two reasons i) it is a highlystructuredbreedaccordingtoourdata(abouthalfofindividualsremovedwereclose relatives) ii) allowed us to compare our results with a previous studyAlthoughtheanalysesconductedonbothredundantandnonredundantdatasetswereable to identify rEHHsignals in these regions thenonredundantdatasetproduced five significant rEHH signals rather than just one in the redundantdataset(Table2)Thisresulthighlightstheconfoundingeffectofthepresenceofcloserelativesinthedatasetandconsequentlytheimprovedabilitytodetectedsignificant selection signature while correcting for population structureInterestingly our results only partially overlap those found by Quanbari et al(2010) These authors detected significant signals at the casein cluster andsomatostatinSSTaswedid but also at ahighernumberof candidate regionsThese inconsistencies may be the effect of different sires included in theanalyses of the different dataset size and to the close relative trimmingprocedure we adopted to decrease the effect of population structure andconsequent bias (Table 4) However poor correlation across investigations isoftenobservedalsointhedeeplyinvestigatedhumanspeciesThisisduetotheuse of different within and between breed statistics that identify selectionsignatureshavingdifferent characteristics (ancientrecent segregatingfixatedunder directionalbalancing selection) to the likely high rate of falsepositivesnegativesresultsandalsotothedifferentconsiderationofpopulationstructureandbackgroundselection(Enardetal2014)The number of total and significant core haplotypes identified by the SweepsoftwarewasthehighestinItalianBrownandthelowestinPiedmontesebreedSinceEHHisheavilybasedonpopulationLDtheaverageLDleveloverdistancewascalculatedineachbreed(Figure7)
32
Figure7JMultiJbreedaveragelinkagedisequilibriumagainstphysicaldistance(inkb)Marchigiana Piedmontese and Italian Simmental breeds present a lower persistence of LD allargerdistancethanItalianHolsteinandItalianBrownbreedsThisanalysishighlightedageneralpositivecorrelationbetweenthe levelofLDover distance and the number of total and significant core haplotypes foundHoweverconsideringthatrEHHisarelativemeasure it ismorelikelythatthehighernumberofsignificantcorehaplotypesidentifiedindairybreedswasduemostlytoahigherselectivepressureindairycomparedtobeefbreedsSignificance tests used in detecting selection signatures should measure theprobability of a statistic to be anoutlier compared to its expecteddistributionunder a neutral model However no reliable neutral model has so far beendevelopedincattlebecauseofthecomplexityofthedemographichistoryofthisspecies (Kemper et al 2014) As a consequence empiric rather than modelbasedsignificancetestaregenerallyusedinthedetectionofselectionsignaturesAccordinglyweconsideredasoutliersvaluesas thosesimply falling in the5plusvarianttailoftherEHHdistributionWekeptthewithinbreedsignificancethresholdloosenoncorrectingitformultipletestingbutconsideredonlysignals
33
eitherexceedingthisthresholdbyatleasttwoordersofmagnitudeorsharedby
twoormoreindependentbreeds
Parallel comparison of results from independent analyses of different breeds
allowed the achievement of a double objective i) to identify putative regions
under(recent)selectioninbreedsbelongingtotwoproductiontypeswhichwas
the main objective of this study and ii) reducing false positives since multi
breed analyses served as internal control Since the rEHH method does not
considerphenotypicinformationasignificantsignalmightarisebecausei)the
core haplotype is actually under a selective pressure ii) the result is a false
positive ie causedby chance population structureor anyotherdriving force
However even considering an unrealistic scenario with no false positives a
proportionofthetotalamountofsignalswouldactuallybeselectionsignatures
due to a selectionpressurenot ondairy or beef traits This becausedairy and
beefbreedsinvestigatedareonlyafewandshareanumberoftraitsnotdirectly
related to dairybeef production Even considering this limitation to our
knowledge this is the first dairy and beef multibreed study applying this
strategytoreducefalsepositivesatthecostofpossiblelossofinformationdue
tohighfalsenegativeratesSignificantsignalsharedbydairyandbybeefbreeds
wereused fordownstreamnetworkanalysesonpositional candidategenes to
search forabiological justification towhatwasobtainedstudying thegenomic
information Only the top network for dairy and the top network for beef
productionarediscussedindetail
Dairybreeds
ThemostsignificantnetworkdetectedindairybreedsisassociatedwithCellTo
CellSignalingandInteractionHematologicalSystemDevelopmentandFunction
andImmuneCellTraffickingfunctions(Figure2)ImmunoglobulinandNFkBact
aspivotalgeneswithinthisnetworkInterestinglyBARC3andQPCTgeneswere
sharedamongallthethreedairybreedsBothgeneshavenotyetbeenstudiedin
cattlealthoughinhumanstudiesthesegenesarewellknowntobelinkedtothe
mammaryglandmetobolismandthecalciumregulationTheformerisinvolved
in integrinmediated cell adhesions and signaling which is required for
mammary gland development and functions (G Sun et al 2012) The latter is
34
associatedwithlowradialbodymineraldensity(BMD)inadultwomen(Ezuraet
al2004)AnotherimportantplayeronthisnetworkisSLC2A5genealsoknown
asGLUT5 SLC2A5 gene acts as fructose transporter in the intestine and has a
significantroleinenergybalanceofdairycows(Burantetal1992)Findingthis
gene in adairy cattle specificnetwork seems surprising since there shouldbe
little need for transporting glucose andor fructose in the ruminant intestinal
tractassimplecarbohydratesaredegradedintovolatilefattyacids(VFA)inthe
rumen (Iqbal et al 2009) However it is often the case that large amount of
starchbypasstherumenincowsfedwithdietsrichincerealgrains(Zhaoetal
1998)Thisbypassedstarchneedstobedigestedinthesmallintestinetoavoid
highlevelsofglucoseconcentrationinthelargeintestine
WedetectedcalpaincomplexandcalpainN3(CAPN3)genesunderpositiverecent
selection in the dairy group as reported also by Utsunomiya et al (2013)
AltoughCalpainisknowntobeinvolvedinpostmortemmeattendernizationitis
also related to dairy metabolism as the muscle breakdown promoted by the
calpain gene provides an energy source for milk production especially at the
beginningoflactation(Kuhlaetal2011)InadditionasreportedbyWildeetal
(1997) calpainNcalpastatin system is related to programmed cell death of
alveolar secretory epithelial cells during lactation Zap70 gene is a cytoplasmic
proteintyrosinekinaseItisrelatedtotheimmunesystemasacomponentofthe
Tcell receptor playing a central role in Tcell responses (Wang et al 2010)
Bonnefontet al (2011) found thatZap70 genewasup regulated (FoldChange
(FC)=82FDR=277E12)onmilksomaticcellsofsheepinfectedbySaureus
andSepidermidisshowinganassociationwithmastitisresistance
Purinemetabolismwasthemostsignificantcanonicalpathwayindairybreeds
In a study conducted in human mammary epthelium Maningat et al (2008)
conductedageneexpressionanalysisonbreastmilkfatglobulesidentifyingthe
PurinemetabolismasthemostsignificantpathwaySynthesisandbreakdownof
purine is essential in the metabolism of the tissues of many organisms
particularlyofthemammaryglandduringlactation
Another interesting canonical pathway was endothelin signaling (Figure 5)
endothelin functions as vasocontrictor and is secreted by endothelial cells
35
(Nussdorferetal1999)Acostaetal(1999)reportedthatincattleendothelinsareinvolvedinthefollicularproductionofprostaglandinsandtheregulationofsteroidogenesis in the mature follicle In a recent study Puglisi et al (2013)confirmed the implication of endothelin in particular EDNRA and endothelinconvertin enzyme 1 in reproductive disorder in bovine They classified theEDNRAaspotentialbiomarkerforfertilityincow
Beefbreeds
Inbeefcattlebreedsthefocaladhesionkinase(FAK)togetherwithintegrinandERK12genesplayacentralroleashubsinthemostsignificantnetwork(Figure3)Theyareinvolvedincellularadhesionandcellmotilityprocessesparticularlyduringskeletalmyogenesis(Nguyenetal2014)Moreovertheoverexpressionof FAK can determine a reduced apoptosis and an increase risk of metastaticcancers(MehlenandPuisieux2006)Thesehubsarenotdirectlyinvolvedinourstudybut sharemany interactors thatare located in region found tobeunderselction such as CSPG4 RB1CC1 MGAT3 CIRPB CSPG4 gene belongs tochondroitin sulfate proteoglycans (CSPGs) family CSPGs are proteoglycansconsistingofaproteincoreandachondroitinsulfatesidechainTheyareknownto be structural components of a variety of tissues includingmuscle andplaykey roles inneuraldevelopmentandglial scar formationTheyare involved incellular processes such as cell adhesion cell growth receptor binding cellmigrationandinteractionswithotherextracellularmatrixconstituents
Manystudiesreporttheroleofproteoglycansinmeattextureofseveralmusclesfromthebovinespecies(Nishimuraetal1996)Dubostetal(2013)highlightadirect involvement of proteoglycans with respect to cooked meat juicinessRB1CC1geneplaysacrucialroleinmusculardifferentiationanditsactivationisa essential for myogenic differentiation (Watanabe et al 2005 p 1)Monoacylglycerol acyltransferase (MGAT3) gene cause the synthesis ofdiacylglycerol (DAG)using2monoacylglyceroland fattyacyl coenzymeAThisenzymaticreactionis fundamental fortheabsorptionofdietaryfat inthesmallintestineSunetal(2012p3)reportedinastudyonfiveChinesecattlebreedsthatMGAT3gene isassociatedwithgrowthtraitsCIRPBgenemaybepartofacompensatorymechanism inmuscles undergoing atrophy It preservesmuscle
36
tissuemassduringcoldshockresponsesaginganddisuse(DupontVersteegden
et al 2008) SNUPN is an imprinted gene and is expressed monoallelically
dependingonitsparentaloriginSNUPNplayimportantrolesinembryosurvival
andpostnatal growth regulation (Smithet al 2012Wanget al 2012)Ephrin
receptorsignalingwasthetopcanonicalpathwayproducebyIPAandhasalsoan
interestingbiologicalinterpretationformeatproduction(Figure6)Indeedthis
pathaway is important for the muscle tissue growth and regeneration
participating in correct positioning and formation of the neural muscular
junction(LiandJohnson2013)
Conclusions
In this study we analysed the selection signatures at genomewide level in 5
Italian cattle breeds Then we used a multibreed approach to identify the
genomicregionssharedamongcattlebreedsselectedfordairyorbeefpurpose
Thisapproachincreasedthepowertopinpointregionsofthegenomethatplay
important roles in economically relevant traits in both dairy and beef cattle
Moreover network and pathways analyses were used to display a detailed
functional characterization of genes mapping in the regions detected under
recentpositiveselection
Particularly in dairy cattle genes under directional selection are related to
feeding adaptation (increasing level of starch in the diet) mammary gland
metabolismandresistencetomastitisThebiologicalfeaturesunderselectionin
beef cattle breeds are involved in animal growth meat texture and juiceness
These results have a logical biological explanation However the frequent
approach of interpreting selection signatures on the basis of the function of
geneslocatednearbythepeaksignalofthestatisticusedispresentlychallenged
Novelinformationinhumanssuggeststhatmostselectedvariationisnotlocated
within genes and coding regions but in regulatory sites identified by the
ENCODEproject(Enardetal2014)Thesemaycontroltheexpressionofentire
genomicregionsorofgeneslocatedatarelavantdistancefromtheselectedsite
makingbiologicalinterpretationmorecomplex
37
InanycasefurtherstudiesusingdenserSNPchipsorwholegenomesequencing
producinginformationnotsubjectedtoascertainmentbias(Qanbarietal2014)
may increase the resolution of our analysis and togetherwith new knowledge
arisingonthecontrolofgeneexpressionvalidateorcorrectourresults
Acknowledgements
ANAFI (Italian Association of Holstein Friesian Breeders) ANAPRI (Italian
Association Simmental Breeders) ANABORAPI (National Association of
PiedmonteseBreeders)ANABIC(AssociationofItalianBeefCattleBreeders)and
ANARB (Italian Brown Cattle Breeders Association) are acknowledged for
providing the biological samples and the necessary information for this work
The Authors are also grateful to SelMol project (MiPAAF Ministero delle
PoliticheAgricoleAlimentarieFortestali)andbytheProZooproject(Regione
LombardiandashDGAgricoltura Fondazione Cariplo Fondazione Banca Popolare di
Lodi)forfunding
Thefundershadnoroleinstudydesigndatacollectionandanalysisdecisionto
publishorpreparationofthemanuscript
References
Acosta TJ Berisha B Ozawa T Sato K Schams D Miyamoto A 1999
Evidence for a Local EndothelinAngiotensinAtrial Natriuretic Peptide
SysteminBovineMature Follicles In Vitro Effects on SteroidHormones
and Prostaglandin Secretion Biol Reprod 61 1419ndash1425
doi101095biolreprod6161419
AjmoneMarsan P Garcia JF Lenstra JA 2010 On the origin of cattle how
aurochs became cattle and colonized theworld Evol Anthropol Issues
NewsRev19148ndash157
38
BonnefontCToufeerMCaubetCFoulonETascaCAurelMRBergonier
D Boullier S RobertGranieacute C Foucras G 2011 Transcriptomic
analysisofmilksomaticcells inmastitisresistantandsusceptiblesheep
upon challenge with Staphylococcus epidermidis and Staphylococcus
aureusBMCGenomics12208
BurantCFTakedaJBrotLarocheEBellGIDavidsonNO1992Fructose
transporter inhumanspermatozoaandsmall intestine isGLUT5 JBiol
Chem26714523ndash14526
DubostAMicolDPicardBLethiasCAnduezaDBauchartDListratA
2013Structuralandbiochemicalcharacteristicsofbovineintramuscular
connective tissue and beef quality Meat Sci 95 555ndash561
doi101016jmeatsci201305040
DupontVersteegden EE Nagarajan R Beggs ML Bearden ED Simpson
PMPetersonCA2008IdentificationofcoldshockproteinRBM3asa
possible regulator of skeletal muscle size through expression profiling
AJP Regul Integr Comp Physiol 295 R1263ndashR1273
doi101152ajpregu904552008
Enard D Messer PW Petrov DA 2014 Genomewide signals of positive
selection in human evolution Genome Res gr164822113
doi101101gr164822113
EzuraYKajitaMIshidaRYoshidaSYoshidaHSuzukiTHosoiTInoue
SShirakiMOrimoHEmiM2004Associationofmultiplenucleotide
variationsinthepituitaryglutaminylcyclasegene(QPCT)withlowradial
BMDinadultwomenJBoneMinRes191296ndash301
Fan H Wu Y Qi X Zhang J Li J Gao X Zhang L Li J Gao H 2014
Genomewide detection of selective signatures in Simmental cattle J
ApplGenet1ndash9doi101007s1335301402006
HayesBJChamberlainAJMaceachernSSavinKMcPartlanHMacLeodI
SethuramanLGoddardME2009Agenomemapofdivergentartificial
selectionbetweenBostaurusdairycattleandBostaurusbeefcattleAnim
Genet40176ndash184doi101111j13652052200801815x
Hayes BJ Lien S Nilsen H Olsen HG Berg P Maceachern S Potter S
Meuwissen THE 2008 The origin of selection signatures on bovine
39
chromosome 6 Anim Genet 39 105ndash111 doi101111j1365
2052200701683x
IqbalSZebeliQMazzolariABertoniGDunnSMYangWZAmetajBN
2009 Feeding barley grain steeped in lactic acid modulates rumen
fermentation patterns and increases milk fat content in dairy cows J
DairySci926023ndash6032doi103168jds20092380
KemperKESaxtonSJBolormaaSHayesBJGoddardME2014Selection
for complex traits leaves littleornoclassic signaturesof selectionBMC
Genomics15246doi1011861471216415246
KimuraM1984Theneutraltheoryofmolecularevolution
KuhlaBNuumlrnbergGAlbrechtDGoumlrsSHammonHMMetgesCC2011InvolvementofSkeletalMuscleProteinGlycogenandFatMetabolismin
the Adaptation on Early Lactation of Dairy Cows J Proteome Res 10
4252ndash4262doi101021pr200425h
Lenstra JAGroeneveld LF EdingHKantanen JWilliams JL Taberlet P
NicolazziELSolknerJSimianerHCianiEGarciaJFBrufordMW
AjmoneMarsan P Weigend S 2012 Molecular tools and analytical
approaches for the characterization of farm animal genetic diversity
AnimGenet43483ndash502
Li J Johnson SE 2013 EphrinA5 promotes bovine muscle progenitor cell
migration before mitotic activation J Anim Sci 91 1086ndash1093
doi102527jas20125728
Ma YL Zhang Q Ding XD 2012 [Detecting selection signatures on X
chromosome in pig through high density SNPs] Yi Chuan Hered
ZhongguoYiChuanXueHuiBianJi341251ndash1260
Madani S Hichami A CharkaouiMalki M Khan NA 2004 Diacylglycerols
Containing Omega 3 and Omega 6 Fatty Acids Bind to RasGRP and
Modulate MAP Kinase Activation J Biol Chem 279 1176ndash1183
doi101074jbcM306252200
Maningat PD Sen P Rijnkels M Sunehag AL Hadsell DL Bray M
Haymond MW 2008 Gene expression in the human mammary
epitheliumduring lactation themilk fat globule transcriptome Physiol
Genomics3712ndash22doi101152physiolgenomics903412008
40
Mehlen P Puisieux A 2006Metastasis a question of life or deathNat Rev
Cancer6449ndash458doi101038nrc1886
NarayananUOspinaJKFreyMRHebertMDMateraAG2002SMNthe
Spinal Muscular Atrophy Protein Forms a PreImport Snrnp Complex
withSnurportin1andImportin HumMolGenet111785ndash1795
Nguyen N Yi JS Park H Lee JS Ko YG 2014 Mitsugumin 53 (MG53)
LigaseUbiquitinatesFocalAdhesionKinaseduringSkeletalMyogenesisJ
BiolChem2893209ndash3216doi101074jbcM113525154
NielsenRHellmannIHubiszMBustamanteCClarkAG2007Recentand
ongoing selection in the human genome Nat Rev Genet 8 857ndash868
doi101038nrg2187
NishimuraTHattoriATakahashiK1996Relationshipbetweendegradation
of proteoglycans andweakening of the intramuscular connective tissue
duringpostmortemageingofbeefMeatSci42251ndash260
Nussdorfer GG Rossi GPMalendowicz LKMazzocchi G 1999Autocrine
ParacrineEndothelinSysteminthePhysiologyandPathologyofSteroid
SecretingTissuesPharmacolRev51403ndash438
OrsquoBrienAMPUtsunomiyaYTMeacuteszaacuterosGBickhartDM LiuGE Tassell
CPV Sonstegard TS Silva MVD Garcia JF Soumllkner J 2014
Assessing signatures of selection through variation in linkage
disequilibrium between taurine and indicine cattle Genet Sel Evol 46
19doi101186129796864619
Pan D Zhang S Jiang J Jiang L Zhang Q Liu J 2013 GenomeWide
DetectionofSelectiveSignatureinChineseHolsteinPLoSONE8e60440
doi101371journalpone0060440
PuglisiRCambuliCCapoferriRGianninoLLukajADuchiRLazzariG
Galli C Feligini M Galli A Bongioni G 2013 Differential gene
expressionincumulusoocytecomplexescollectedbyovumpickupfrom
repeat breeder and normally fertile Holstein Friesian heifers Anim
ReprodSci14126ndash33doi101016janireprosci201307003
Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender D
MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKa
41
tool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795
Qanbari S Gianola D Hayes B Schenkel F Miller S Moore S Thaller GSimianer H 2011 Application of site and haplotypefrequency basedapproachesfordetectingselectionsignaturesincattleBMCGenomics12318doi1011861471216412318
Qanbari S Pausch H Jansen S Somel M Strom TM Fries R Nielsen RSimianer H 2014 Classic Selective Sweeps Revealed by MassiveSequencinginCattlePLoSGenet10doi101371journalpgen1004148
Qanbari S Pimentel ECG Tetens J Thaller G Lichtner P Sharifi ARSimianerH2010AgenomewidescanforsignaturesofrecentselectioninHolsteincattleAnimGenetdoi101111j13652052200902016x
Sabeti PC Reich DE Higgins JM Levine HZ Richter DJ Schaffner SFGabriel SB Platko JV Patterson NJ McDonald GJ Ackerman HCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash7
ScheetPStephensM2006Afastandflexiblestatisticalmodelforlargescalepopulation genotype data applications to inferring missing genotypesandhaplotypicphaseAmJHumGenet78629ndash44
Shahzad K Loor JJ 2012 Application of topdown and bottomup systemsapproaches in ruminantphysiologyandmetabolismCurrGenomics13379
SmithLSuzuki JGoffAFilionFTherrien JMurphyBKohanGhadrHLefebvreRBrisvilleABuczinskiSFecteauGPerecinFMeirellesF2012DevelopmentalandEpigeneticAnomaliesinClonedCattleReprodDomestAnim47107ndash114doi101111j14390531201202063x
Sun G Cheng SYS Chen M Lim CJ Pallen CJ 2012 Protein TyrosinePhosphatase Phosphotyrosyl789 Binds BCAR3 To Position Cas forActivationatIntegrinMediatedFocalAdhesionsMolCellBiol323776ndash3789doi101128MCB0021412
Sun J Zhang C Lan X Lei C ChenH 2012 Exploring polymorphisms andassociationsofthebovineMOGAT3genewithgrowthtraitsGenomeNatl
42
Res Counc Can Geacutenome Cons Natl Rech Can 55 56ndash62doi101139g11077
TaberletPCoissacEPansu J PompanonF 2011Conservationgeneticsofcattle sheep and goats C R Biol On the trail of domesticationsmigrationsandinvasionsinagricultureDominiqueJobGeorgesPelletierJeanClaudePernollet334247ndash254doi101016jcrvi201012007
Utsunomiya YT Perez OrsquoBrien AM Sonstegard TS Van Tassell CP doCarmo AS Meszaros G Solkner J Garcia JF 2013 Detecting lociunder recent positive selection in dairy and beef cattle by combiningdifferentgenomewidescanmethodsPLoSOne8e64280
Voight BF Kudaravalli S Wen X Pritchard JK 2006 A Map of RecentPositive Selection in the Human Genome PLoS Biol 4 e72doi101371journalpbio0040072
WangHKadlecekTAAuYeungBBGoodfellowHESHsuLYFreedmanTSWeissA2010ZAP70AnEssentialKinaseinTcellSignalingColdSpringHarbPerspectBiol2doi101101cshperspecta002279
WangMZhangXKangLJiangCJiangY2012MolecularcharacterizationofporcineNECD SNRPNandUBE3Agenesand imprinting status in theskeletal muscle of neonate pigs Mol Biol Rep 39 9415ndash9422doi101007s1103301218066
WatanabeRChanoTInoueHIsonoTKoiwaiOOkabeH2005Rb1cc1iscritical for myoblast differentiation through Rb1 regulation VirchowsArchIntJPathol447643ndash648doi101007s0042800411831
WickramasingheNSManavalanTTDoughertySMRiggsKALiYKlingeCM 2009 Estradiol downregulates miR21 expression and increasesmiR21targetgeneexpressioninMCF7breastcancercellsNucleicAcidsRes372584ndash2595doi101093nargkp117
WildeCJAddeyCVLiPFernigDG1997Programmedcelldeathinbovinemammary tissue during lactation and involution Exp Physiol 82 943ndash953
YasudaSStevensRLTeradaTTakedaMHashimotoTFukaeJHoritaTKataoka H Atsumi T Koike T 2007 Defective Expression of Ras
43
Guanyl NucleotideReleasing Protein 1 in a Subset of Patients with
SystemicLupusErythematosusJImmunol1794890ndash4900
ZhangCBaileyDKAwadTLiuGXingGCaoMValmeekamVRetiefJ
Matsuzaki H Taub M Seielstad M Kennedy GC 2006 A whole
genome longrange haplotype (WGLRH) test for detecting imprints of
positiveselectioninhumanpopulationsBioinformatics222122ndash8
ZhaoFQOkineEKCheesemanCIShiraziBeecheySPKennellyJJ1998
Glucose transporter gene expression in lactating bovine gastrointestinal
tractJAnimSci762921ndash2929
44
Supplementaryfiles
YoucanfindChapterʹsupplementaryfilesat
- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)
- GoogleDrive(httptinyurlcomPhDMMChapter02)orscantheQRcode
SupplementaryfileS1ndashSignificantcorehaplotypesandgenessharedin
dairyandbeef
Significantcorehaplotypes(pvaluele005haplotypefrequencyge025)shared
indairyandbeefcattlebreedsandrelativegenesintersectingwiththemThefile
isdividedinto4tablesThefirst2sheets(sheet1and2)shownthesignificant
corehaplotypesfordairyandbeefThelast2sheets(sheet3and4)showngenes
intersectingwiththesignificantcorehaplotypesfordairyandbeef
SupplementaryfileS2ndashNetworksrankingindairybreeds
Networksrankingindairybreedswithrelativemoleculessymbolslistscore
valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop
functionforeachnetwork
SupplementaryfileS3ndashNetworksrankinginbeefbreeds
Networksrankinginbeefbreedswithrelativemoleculessymbolslistscore
valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop
functionforeachnetwork
45
SupplementaryfileS4ndashCanonicalpathwaysrankingindairyorbeefcattle
breeds
Canonicalpathwaysrankingindairy(sheet1)orbeef(sheet2)cattlebreedswithrelativemoleculessymbolslistratioandndashlog10ofthepvaluesforeachcanonicalpathway
46
CHAPTER3Searchingnewsignalsforproductivetraitsthroughgenebasedassociationanalysisin
threeItaliancattlebreedsSCapomaccio1MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItalyTheseauthorscontributedequallytothiswork
____________________SubmittedtoAnimalGenetics
Abstract
Genome wide association studies (GWAS) have been widely applied to
disentanglethegeneticbasisofcomplextraits IncattlebreedsclassicalGWAS
approach with medium density markers panels are far to be conclusive
especiallyforcomplextraitsThisisduetotheintrinsiclimitationsofGWASand
to the choices that are made to step from the associationrsquos signals to the
functionalunits
Here we applied a geneHbased association strategy to prioritize genotypeH
phenotype associations found with classical approaches in three Italian dairy
cattlebreedswithdifferentsamplesizes(ItalianHolsteinn=2058ItalianBrown
n=745ItalianSimmentaln=477)formilkproductionandqualitytraits
WhileclassicalsinglemarkerregressionfailedtorevealgenomeHwidesignificant
genotypeHphenotype association except for Italian Holstein the gene based
approach found specific genes in each breed that are associated with milk
physiologyandmammaryglanddevelopment
Since no established method is yet implemented to step from variation to
functional units our strategy may contribute to reveal new genes that play
significant roles in complex traits like thosehere investigated condensing low
signalsofassociationinageneHcentricfashionapproach
Background
Modern cattle breeds have nomore than 200 years of history (Taberlet et al
2008) a time spanwhere selection reproductive isolationand the consequent
geneticdriftshapedallelefrequenciesandlinkagedisequilibriumthroughoutthe
genome This is particularly true for industrial dairy breeds where the joint
application of reproductive technologies and modern genetic evaluation
methodshave imposedahigh selectionpressureona limitednumberof traits
(Taberletetal2008)involvedinmilkquantityandquality
Indeed the milk production industry has been historically ready to acquire
technicalimprovementsfromdifferentfieldsstatisticsbiologyandengineering
50
ForexampleBLUPbasedstrategiesandhighHdensitygenotyping throughSNPHchipsarenowroutinelyincludedinbullevaluationshavingsignificantimpactonselectionefficiencyandpopulationstructureThese techniques have dramatically improved milk yield and compositionwithout knowing the biological basis of the traits under selection To datedespite many efforts and many QTL mapped only a few genes have beeneffectivelyassociatedwithmilkyieldandcomposition(Grisartetal2002Blottetal2003CohenHZinderetal2005)In thiscontextGenomeWideAssociationStudies(GWAS) is themostcommonmethodfordissectingthebiologythatunderliescomplextraits(McCarthyetal2008)TheunderlyingideaofGWASissimplefindmarkerHtraitassociationsexploitingthe linkage disequilibrium (LD) that exists between the causative mutation ndashwhichweignorendashandoneormoreanalyzedmarkerswhichwewellknowInthiswayonecanpinpointgenomicregionscarryingcausalvariantsforanytrait
Although a number of GWAS have been conducted in cattle productive andmorphological traits (some examples (Jiang et al 2010 Cole et al 2011Meredith et al 2012)) and found several genomic regions associated to thesetraits none focused their attention directly on genes involved inmilk biologyMoreoveroneof themajordrawbacksofGWASisrelatedtothescarceresultsportability across populations due to a variety of confounding factorspopulationstructuredifferentialLD levelsbreedspecific selection targetsandeven SNPs ascertainment bias (Clark et al 2005) For instance significantassociationsonfatpercentagewereidentifiedinmanyregionsoftheBostaurusgenome in different breeds Except for genes of major effect there is littleagreement on which regions and H more importantly H genes are actuallyinvolvedinthecontrolofthetraitinvestigatedThisissomehownotsurprisingone favorable allele may segregate in one breed and be fixated in a differentbreed the same allele segregates in both breeds but allelesmay differ or thesegregates in both breeds but the genetic background masks the effect ofsegregation
51
AnotherlimitationinGWASisthestringentsignificancethresholdoftenapplied
tocorrectformultipletestingAsaresulta largeproportionofgeneswithlow
effectaredisregardedwithconsequentoverestimationof thehigheffectgenes
variance(Xu2003)Thisbiasisparticularlyworryingwheninvestigatingtraits
controlledbya largenumberofgeneswithsmalleffectand inhighLDasmilk
yield(GIANOLAETAL2013)
To overcome this kind of limitation different strategies have been invented
usingprior knowledge Interesting ldquocandidatepathwayrdquo approacheshavebeen
conducted(Ravenetal2013)testingforassociationbetweenaphenotypeand
genes inpathwaysthatare likelytobe involvedinthecontrolofthetraitThis
approach reduces thenumberofmarkers tobe tested increases the statistical
poweroftheexperimentrestrictingtheanalysistoexistingknowledge
Other methods based on statistical manipulations or particular experimental
designs have been developed to prioritize or validate genotypeHphenotype
associations For example geneHbased association strategies while restricting
GWA study only to genes and neighboring genomic regionsmay identify new
candidategenesdeservingpostHGWASanalysis(Akulaetal2011Cantoretal
2010 Liu et al 2010) Moreover it is true that classical GWAS downstream
bioinformatics analyses concentrate their efforts in findingmeaningful results
usinggenesasfinaltarget
The aim of ourworkwas to find new candidate genes and novel information
associated to complex traits as milk production (yiled) and quality (fat and
protein percentage) using a geneHcentric genome wide association in three
differentdairybreedsItalianHolsteinItalianBrownandItalianSimmental
Methods
BiologicalSamples
Bulls from the three most represented dairy breeds in Italy (Italian Holstein
ItalianBrownandItalianSimmental)wereselectedfromArtificialInsemination
(AI) bulls progeny tested in Italy until 2010 and forwhich biological samples
52
wereavailableItalianHolsteinbullswerechosenaccordingtofollowingcriteria
i) having the national Selection Index namely production functionality type
(PFT)withreliabilitygt75andii)havingthelowestpossiblerelationshipwith
theothersiresinthesetThiswasachievedbymaximizingthevariabilitywithin
the group On the contrary considering the low number of available Italian
BrownandItalianSimmentalbullstheywereallincludedintheanalysesAfter
thisprocedureatotalof2139ItalianHolstein775ItalianBrownand486Italian
Simmental samples (including quality control replicates)were genotypedwith
theBovineSNP50BeadChipv1(IlluminaSanDiegoCA)
Traitsconsideredintheanalysis
Italian Holstein (ANAFI) Italian Brown (ANARB) and Italian Simmental
(ANAPRI)breedersassociationsprovidedderegressedproofs(DRP)ordaughter
yielddeviations (DYD)hereafter referredasphenotype forall traitsanalyzed
Traitsanalyzedweremilkyieldfatpercentageandproteinpercentage
Genotypingandqualitycontrol
Quality controls (QC) were run independently for each breed using the same
pipelineandthresholds IndetailQCexcluded i)thereplicatesamplewiththe
highernumberofmissinggenotypesii)sampleswithunexpectedlyhigh(ge100)
mendelianerrors(forfatherHsonpairs)iii)sampleswithmorethan5missing
genotypes iv) SNPs with ge 25 missing data v) SNPs with minor allele
frequency(MAF)le5vi)SNPswithmendelianerrorsge25andvii)SNPsout
ofHardyHWeinbergequilibriumtestedbyFisherexacttest(ple0001Bonferroni
corrected)
Analysisofpopulationstructure
MultiDimensionScaling(MDS)wasperformedusingPLINK107(Purcelletal
2007)andresultsplottedwithRscriptsThisanalysisshowninsupplementary
material (FigureS1) exploredpopulationsubstructureandverified thegenetic
homogeneityofthedatasetpriortofurtheranalyses
53
PairwiseLDwascalculatedusingPLINK107(Purcelletal2007)LDdecayand
LD structure on specific regions were calculated with in house R scripts and
snpplotterRpackage(LunaandNicodemus2007)
GWAS
GenomeHwide association analysis was made with the GenABEL package in R
(Team 2014) using GRAMMARHCG (Genome wide Association using Mixed
Model and Regression H Genomic Control) approach (Amin et al 2007
Aulchenkoetal2007)
Basingonpriorknowledge(Minozzietal2013) in ItalianHolsteintheDGAT1
regionwas also treated as fixed effect inGenABEL analysis of the three traits
ResultsusingthismodelarereferredtoasldquoHolsteinDGATrdquo
Manhattan andquantileHquantileplots for each traitwereproduced to allowa
thoroughevaluationoftheGWASresultsAllcalculatedpHvaluesfromtheGWAS
analysiswere retained for the downstream prioritizationwith the geneHbased
association
Workingongenomecoordinates
Coordinatesof themarkerswere liftedover fromBTA40(Elsiketal2009)to
UMD31assembly(Ziminetal2009)usingSnpChimpv1(Nicolazzietal2014)
andgenecoordinateswereretrievedusingBioMartEnsembl
QTLcoordinatesweredownloadedfromtheAnimalQTLdb(Huetal2013) in
bedformat
Operations on genomic intervalswereperformedusing thebedtools suite ver
2162(QuinlanandHall2010)
GeneBasedAssociationanalysis
Toreducetheantagonismbetweensignificancethresholdandfalsepositiverate
of a classical GWA study (Gianola et al 2013) and to facilitate selection
signature comparison across the three breeds investigated we used a geneH
54
centricstatisticthatcollapsesinasinglevalueallthepHvaluesfromSNPsrelated
toagenecorrectingforLDinformation
Our approachderivesdirectly from the softwareVEGAS (VersatileGeneHBased
TestforGenomeHwideAssociationStudies)(Liuetal2010)adaptedtopursue
meaningfulresultsinanyspeciesbeyondhumans
Briefly the geneHwise test statistic is the sum of the n upper tail chiHsquared
valuesofaSNPsubset withonedegreeof freedom(wheren is thenumberof
SNPmappinginthegivengene)WithperfectlinkageequilibriumthegeneHwise
pHvaluewouldbetheoneHsidepHvalueofachiHsquaredistributionwithndegrees
of freedom Otherwise more likely the distribution of the pHvalues has to be
evaluated(withsimulations)andweighted(withLDvalues)(Liuetal2010)
The VEGAS speciesHfree procedure was implemented in a UNIX environment
usingBashandRlanguagesusingEnsemblannotationasinputfilesPLINKdata
typeandpHvaluesforeachSNPfromthepreviouslycitedGRAMMARprocedure
WemappedSNPsonBostaurusgenes(UMD31)andtheirboundaries(100000
bpupHanddownHstream)toaccountforregulatoryregionsOnlygeneswithat
least 2 SNPs mapped were retained Significant entries from the geneHbased
associationweremappedinthecattleQTLdatabasewithbedtools
ResultsandDiscussion
In this study a gene based association strategy was applied to identify new
genomicregionsboostingGWASanalysisresultsformilkproductionandquality
fromthreeItaliandairybreeds
Complextraitslikethosehereevaluatedareaffectedbyafewmajorgeneswith
large effects andmany otherswithmoderate to low effects the latter are not
easily identifiedby genomewide scans inmodern cattle breedsduemainly to
samplesizelimitationwhilesignalfromtheformersarelostingenomescansH
withsomeexceptionsHduerapidfixations
InthisscenariowithmediumdensitySNPpanelsliketheoneusedinthisstudy
associationbasedonsingleSNPregressionorotherldquoclassicalrdquomethodscouldbe
inconclusiveintermsofresults
55
AftertheQCprocess7452058and477samplesand3575938535and38574SNPs were retained for Italian Brown Italian Holstein and Italian Simmentalbulls respectively (Table S1) HolsteinDGAT analysis retained 38534markersoneSNPlessthantheotherHolsteinpanelDescriptive statistics on genotypes for each breed (histogram and barplot forheterozygosis and call rate per individual Figure S7 and multi dimensionalscalingFigureS1)revealedabsenceofconfoundingeffectsWiththesedatasetsatthenominalgenomeHwidepHvaluesignificancethresholdof 005 (ple125 x 10H6 after Bonferroni correction (Burton et al 2007Dudbridge and Gusnanto 2008)) significant associations were found only inItalianHolsteinbetweenmilkyieldandfatpercentageand26SNPsmappingintheproximalregionofBTA14nearbyDGAT1(Table1)
56
Table1PHvaluesoftheSNPsovercomingthegenomewidesignificancethresholdintheHolsteinbreedformilkyieldandfatpercentage(notaccountingforDGAT1effect)SNPsaresortedbypositionsintheUMD31assembly
SNPID BTA Position(Bp) POvalue(Fat) POValue(Milk)
Hapmap30383OBTCO005848 14 1489496 112EH12 361EH12
ARSOBFGLONGSO57820 14 1651311 247EH46 161EH16
ARSOBFGLONGSO34135 14 1675278 857EH17
ARSOBFGLONGSO94706 14 1696470 665EH16
ARSOBFGLONGSO4939 14 1801116 534EH49 106EH20
Hapmap52798Oss46526455 14 1923292 915EH21
UAOIFASAO6878 14 2002873 382EH24
ARSOBFGLONGSO107379 14 2054457 118EH33 664EH15
ARSOBFGLONGSO18365 14 2117455 250EH16
Hapmap30922OBTCO002021 14 2138926 439EH13
Hapmap25384OBTCO001997 14 2217163 390EH10 697EH09
Hapmap24715OBTCO001973 14 2239085 108EH09 369EH09
BTAO35941OnoOrs 14 2276443 402EH13
Hapmap30374OBTCO002159 14 2468020 528EH12
Hapmap30086OBTCO002066 14 2524432 798EH11
ARSOBFGLONGSO74378 14 3640788 161EH08
ARSOBFGLOBACO1511 14 3765019 375EH09
UAOIFASAO9288 14 3956956 811EH09
ARSOBFGLONGSO100480 14 4364952 133EH09
UAOIFASAO5306 14 4468478 458EH08
ThisislikelyduetotheeffectoftheK232ADGAT1mutationthatissegregatingin
thisbreed(Fontanesietal2014)NootherSNPwassignificantlyassociatedto
anyothertraitinthe3breedsinvestigated
SomeSNPswereunderasuggestivesignificancethresholdofple5x10H5Details
on these markers are in supplementary material (Table S3) but they are not
discussedherePlotsdescribingatglancetheassociationresultsrevealingalow
signaltonoiseratioareavailableintheonlinesupplementarymaterial(Figure
S2S3S4andS5respectively for ItalianBrown ItalianHolsteinHolsteinDGAT
andItalianSimmental)
57
Asperthegenecentricanalysiswemappedatotalof192371989919894and
19844 genes on 24616 available on Ensembl (release version 73) in Italian
BrownItalianHolsteinHolsteinDGATandItalianSimmentalrespectivelyGenes
with a geneHwisepHvalue at least 10H5were considered significantly associated
withthestudiedtraitThesignificanceappearedtobeadequatetothehighrate
ofredundancyof theannotationfeaturesandto figureoutthetopassociations
with a reasonable level of confidence Results on these associations per breed
andpertraitaresummarizedinTable2Table3andinsupplementarymaterials
(TableS2andTableS4)ThemajorityofsignificantgenesfoundwiththegeneH
centricanalysismapped inpreviouslycharacterizedQTL thatareassociated to
milkproductionorqualityThesearesummarizedintheTableS5
Table2Numberofsignificantlyassociatedgenesforeachbreedineachtraitinthegenebasedassociationtest
Fat Milkyield Protein
Brown 3 1
Holstein 92 80 1
HolsteinDGAT 1 4 1
Simmental 1 13
58
Tableamp3FeaturessignificantlyassociatedperbreedtraitHolsteinanalysisnotaccountingforDGAT1isshow
ninTableS3
Breedamp
Traitamp
EnsemblampIDamp
GeneWiseampPValueamp
SNPsInGeneamp
BestSNPamp
BestSNP_PValueamp
BTAamp
Startamp(bp)amp
Endamp(bp)amp
GeneampNam
eamp
BROWNamp
FAT
Nosignificantassociationfound
MILK
ENSBTAG00000019237
500EG05
4Hapmap53770Gss46526325
204EG04
23
47333403
47348212
TXNDC5
ENSBTAG00000043918
500EG05
4Hapmap53770Gss46526325
204EG04
23
47337650
47337787
5S_rRNA
ENSBTAG00000046839
500EG05
4Hapmap53770Gss46526325
204EG04
23
47328131
47352621
UncharacterizedProtein
PROTEIN
ENSBTAG00000001260
700EG05
3ARSGBFGLGNGSG116222
441EG03
18
52120387
52124418
PINLYP
amp
HOLSTEINDGATamp
FAT
ENSBTAG00000000369
100EG05
3Hapmap53294Grs29016908
333EG05
594673755
94749093
EPS8
MILK
ENSBTAG00000026896
130EG05
5ARSGBFGLGNGSG104949
385EG04
23
51549721
51553563
FOXF2
ENSBTAG00000005260
400EG05
5Hapmap33288GBTCG033751
120EG03
638120578
38127577
SPP1
ENSBTAG00000006579
500EG05
4ARSGBFGLGBACG2207
632EG05
15
54418559
54459500
P4HA3
ENSBTAG00000007013
700EG05
6UAGIFASAG7665
396EG05
19
5221507
5283308
TOM1L1
PROTEIN
ENSBTAG00000006638
600EG05
3ARSGBFGLGNGSG104774
176EG04
18
56523439
56530499
BCL2L12
amp
SIMMENTHALamp
FAT
ENSBTAG00000037649
900EG05
6Hapmap30001GBTAG142716
529EG05
4120427915
120494453
VIPR2
MILK
ENSBTAG00000017538
100EG05
6ARSGBFGLGBACG13093
587EG06
12
85630428
85630912
Pseudogene
ENSBTAG00000000856
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1766767
1769754
FBXL6
ENSBTAG00000000857
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1763994
1766621
GPR172B
ENSBTAG00000026356
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1795351
1804562
DGAT1
ENSBTAG00000035158
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1770660
1772329
TMEM
249
ENSBTAG00000046208
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1782901
1788087
SCRT1
ENSBTAG00000009811
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1825982
1844587
UncharacterizedProtein
ENSBTAG00000009816
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1838135
1840081
UncharacterizedProtein
ENSBTAG00000014458
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1844664
1894424
MROH1
ENSBTAG00000020751
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1806081
1825793
HSF1
ENSBTAG00000044406
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1883849
1883971
SCARNA15
ENSBTAG00000046889
400EG05
5ARSGBFGLGBACG13093
587EG06
12
85610279
85610762
UncharacterizedProtein
ENSBTAG00000020117
800EG05
3BTAG78494GnoGrs
188EG04
721189879
21196263
ZBTB7A
PROTEIN
Nosignificantassociationfound
amp
59
1 ItalianHolstein
The$ large$ number$ of$ significant$ signals$ was$ observed$ in$ the$ Italian$ Holstein$
especially$ for$ milk$ yield$ and$ fat$ percentage$ As$ shown$ in$ Table$ 2$ the$ strong$
effect$ of$ DGAT1$ in$ Italian$ Holsteins$ observed$ with$ individual$ SNPs$ was$
confirmed$by$the$geneBcentric$approach$In$this$region$almost$the$totality$of$the$
geneBwise$pBvalues$under$the$chosen$threshold$appeared$to$be$influenced$by$this$
major$ gene$ and$ by$ the$ high$ level$ of$ linkage$ disequilibrium$ that$ characterizes$
cosmopolitan$bovine$breeds$(see$Figure$S6)$Within$this$pool$of$genes$it$is$worth$
to$mention$PLEC$ (plectin)$ a$ cell$ surface$ receptor$ gene$ in$ the$RNASE5$ pathway$
(Angiogenin$ encoded$ by$ angiogenin$ ribonuclease$ RNase$ A$ family$ 5)$ Proteins$
encoded$ by$ these$ genes$ that$ are$ present$ in$ milk$ are$ involved$ in$ protein$
synthesis$ and$ act$ as$ growth$ factors$ in$ vitro$Moreover$RNASE5$ seems$ to$ have$
other$ functions$ such$ as$ regulation$ of$ stress$ response$ and$ apoptosis$ and$ even$
angiogenesis$ through$ which$ it$ may$ also$ contribute$ to$ the$ regulate$ milk$
production$(Raven$et$al$2013)$
The$ exclusion$ of$DGAT1$ effect$ (HolsteinDGAT$ results)$ highlighted$ that$ all$ the$
significant$ effects$ on$ fat$ percentage$ found$ for$ genes$ located$ within$
approximately$25$Mb$up$or$downstream$from$the$excluded$SNP$(ARS5BFGL5NGS5
4939)$were$ redundant$ signals$ of$DGAT1$ (Table3$ and$Table$ S2$ and$ S3)$ This$ is$
also$observed$for$milk$yield$due$to$the$genetic$correlations$between$fat$and$milk$$
Also$ PLEC$ was$ not$ significant$ in$ our$ analyses$ when$ taking$ into$ account$ the$
DGAT1$effect$contradicting$previous$findings$in$Holstein$that$report$association$
of$PLEC$gene$with$production$traits$either$accounting$or$not$for$the$effect$of$the$
diglyceride$ acyltransferase$ (Raven$ et$ al$ 2013)$ A$ possible$ explanation$ for$ the$
different$results$between$these$studies$is$a$difference$between$genetic$structures$
of$the$populations$analyzed$To$reduce$false$positive$signals$we$only$evaluated$
the$genes$based$on$HolsteinDGAT$results$for$the$Italian$Holstein$$
11 Milkyield
We$found$genes$associated$with$this$trait$in$Holstein$in$particular$SPP1$P4HA3=
FOXF2$and$TOM1L1$genes$
Among$ these$ the$ osteopontin$ SPP1$ (secreted$ phosphoprotein$ 1)$ is$ directly$
related$to$milk$production$(de$Mello$et$al$2012)$ It$ is$ located$ in$BTA6$nearby$
60
$
ABCG2$ and$ there$ is$ evidence$ that$ polymorphisms$ in$ this$ region$ are$ strongly$
correlated$with$high$production$ levels$of$milk$ in$various$breeds$(CohenBZinder$
et$ al$ 2005)$Osteopontin$SPP1$ is$ a$ secreted$glycophosphoprotein$ that$plays$ a$
crucial$ role$ in$ mammary$ gland$ development$ having$ an$ essential$ role$ also$ in$
mammary$gland$differentiation$and$branching$of$the$mammary$epithelial$ductal$
system$ This$ gene$has$ also$ been$ implicated$ in$many$ types$ of$ cancer$ including$
breast$cancer$ for$ the$potentiality$of$proliferation$and$alveologenesis$ (Hubbard$
et$ al$ 2013)$ It$ is$worth$mentioning$ that$ the$most$ significant$ SNPs$within$ this$
gene$was$Hapmap332885BTC5033751=with=a$pBvalue$of$00012$ largely$over$ the$
genomeBwide$significance$ threshold$This$signal$would$be$ ignored$ in$ the$single$
SNP$approach$
Another$gene$revealed$by$the$geneBbased$association$is$P4HA3$encoding$for$the$
prolyl$4Bhydroxylase$α$polypeptide$III$involved$in$the$correct$folding$of$nascent$
procollagen$ chains$ It$ is$ known$ that$ stromalBepithelial$ interactions$ are$ critical$
for$mammary$gland$development$and$defective$deposition$or$reorganization$of$
extracellular$ matrix$ can$ lead$ to$ incorrect$ mammary$ epithelial$ morphogenesis$
(Gillette$ et$ al$ 2013)$ highlighting$ the$ role$ of$ this$ gene$ in$ the$ healthy$ breast$
tissue=
FOXF2=gene$ (Forkhead$Box$F2)$ is$ a$member$of$ the$ forkhead$box$ transcription$
factors$ known$ to$ be$ involved$ in$ tissue$ development$ as$ part$ of$ the$ ldquoHedgehog$
signaling$pathwayrdquo$This$key$cascade$is$essential$for$the$correct$segmentation$of$
tissues$during$embryonic$development$but$has$been$recently$demonstrated$that$
is$active$in$adult$stem$cells$proliferation$in$mammary$tissue$(Liu$et$al$2006)$In$
particular$ FOXF2$ plays$ a$ key$ role$ in$ extracellular$ matrix$ synthesis$ and$
epithelialBmesenchymal$ interactions$ and$ could$ therefore$ affect$ the$ proper$
development$of$the$mammary$stroma$In$addition$low$levels$of$transcription$of$
this$gene$are$correlated$ to$early$onset$metastasis$ in$breast$cancer$ (Kong$et$al$
2013)$ underlining$ a$ specific$ role$ in$ the$ mammary$ gland$
Lastly$ TOM1L1$ (Target$ Of$ Myb1$ ChickenBLike$ 1)$ is$ another$ gene$ potentially$
involved$ with$ the$ milk$ physiology$ as$ is$ known$ its$ implication$ with$ the$ early$
endosome$ sorting$ coupled$ with$ EGFR= (epidermal$ growth$ factor$ receptor)$
another$ important$player$ in$ lactation$(Fowler$et$al$1995)$TOM1L1$belongs$to$
61
$
the$ VHS$ domain$ containing$ proteins$ that$ are$ involved$ in$ the$ postBGolgi$
trafficking$and$signaling$(Wang$et$al$2010)$
12 Fatpercentage
In$ Italian$ Holstein$ breed$ we$ found$ a$ significant$ association$ with$ or$ without$
DGAT1$ effect$ correction$ to$ the$EPS8$ gene$ located$on$BTA5$at$around$95$Mbp$
This$gene$encodes$for$a$substrate$of$the$already$cited$EGFR$and$is$involved$in$the$
fatty$acid$metabolism$(Fazioli$et$al$1993)$A$recent$study$on$Holstein$Friesian$
(Wang$et$al$2012)$found$strong$association$with$a$variant$located$in$the$second$
intron$of$the$epidermal$growth$factor$receptor$pathway$substrate$enforcing$the$
hypothesis$of$a$true$involvement$of$this$gene$on$fat$percentage$
13 Proteinpercentage
The$BCL2L12$gene$(BclB2Blike$protein$12)$associated$with$protein$percentage$in$
Italian$Holstein$breed$represents$another$interesting$result$The$product$of$this$
gene= is$ an$ important$ oncoprotein$ in$ two$ fundamental$ cell$ cycle$ pathways$
namely$ p53$ and$ cytoplasmic$ caspase$ signaling$ BCL2L12= resides$ in$ both$
cytoplasm$and$nucleus$inhibiting$caspases$3$and$7$and$forming$a$complex$with$
p53$thereby$ inhibiting$p53Bdirected$transcriptomic$changes$upon$DNA$damage$
(Stegh$ and$DePinho$ 2011)$ The$members$ of$ BCL2$ family$ are$ considered$ antiB
apoptotic$genes$and$it$is$obvious$that$dysfunctional$programmed$cell$death$plays$
an$ important$ role$ in$ the$ pathogenesis$ and$ in$ the$ progression$ of$ cancer$Many$
members$of$this$family$were$found$to$be$modulated$in$various$malignancies$and$
BCL2L12= in$ particular$ was$ expressed$ in$ mammary$ gland$ and$ proposed$ as$
prognostic$ cancer$ biomarker$ (Thomadaki$ et$ al$ 2007)$ In$ dairy$ cows$ the$
members$of$BCL2$family$are$involved$in$the$key$events$ in$the$mammary$tissue$
development$ and$ remodeling$ during$ lactation$ and$ involution$ (Stefanon$ et$ al$
2002$p$200)$Members$of$BCL2$family$regulate$the$turnover$of$cell$population$
in$the$mammary$gland$during$lactation$and$its$overexpression$was$found$to$be$
linearly$related$to$milk$production$in$Holstein$cow$(Colitti$et$al$2004)$
$
62
$
2 ItalianSimmental
In$ the$ Italian$ Simmental$ breed$ we$ found$ one$ association$ on$ BTA4$ for$ fat$
percentage$(VIPR2)$and$two$associations$ for$milk$yield$one$ in$BTA7$(ZBTB7A)$
and$ the$ other$ in$ BTA14$ related$ to$ the$ DGAT1$ No$ association$ was$ found$ for$
protein$percentage$
21 Fatpercentage
Among$the$genes$found$significantly$associated$with$production$traits$in$Italian$
Simmental$VIPR2=a$ gene$ encoding$ a$ receptor$ for$ a$ small$ vasoactive$ intestinal$
neuropeptide$ was$ the$ single$ signal$ in$ BTA4$ for$ fat$ percentage$ Vasoactive$
intestinal$ peptide$ (VIP)$ is$ synthesized$ in$ the$pituitary$ gland$ and$ among$other$
functions$ stimulates$ prolactin$ secretion$ and$ is$ involved$ in$ proliferation$
survival$and$differentiation$in$human$breast$cancer$cells$(Valdehita$et$al$2010)$$
An$interesting$novel$finding$is$the$effect$of$immunosuppression$(through$direct$
effect$of$IL56)$on$prolactin$cells$and$its$ability$to$regulate$prolactin$by$acting$on$
pituitary$ VIP$ through$ VIP$ receptors$ (Blanco$ et$ al$ 2013)$ These$ evidences$
suggest$ that$VIPR2$ could$ be$ an$ important$ candidate$ for$ further$ investigations$
through$post$GWAS$approaches$such$as$reBsequencing$or$fine$mapping$
22 Milkyield
Surprisingly$the$DGAT1$gene$was$significantly$associated$with$milk$yield$but$not$
with$fat$percentage$in$Italian$Simmental$breed$Indeed$the$most$significant$SNP$
for$the$milk$yield$trait$in$this$breed$was$the$ARS5BFGL5NGS54939$with$a$pBvalue$
of$ 413EB06$ The$ importance$ of$ this$ gene$ in$ the$ lactation$ is$ largely$ known$
(Grisart$et$al$2002)$$
To$ investigate$DGAT1$ signal$ in$ Simmental$ we$ analyzed$ the$ fine$ LD$ structure$
genomeBwide$ and$ in$ BTA14$ proximal$ region$ in$ all$ the$ three$ breeds$ revealing$
different$patterns$
Observed$LD$decay$on$ the$ three$breeds$genomes$revealed$similar$behavior$ for$
Italian$Holstein$and$Italian$Brown$with$r2$gt$010$$until$ around$ 250$ Kbp$ while$
Italian$Simmental$showed$lower$LD$Distances$higher$that$400$Kbp$revealed$half$
level$of$LD$in$Simmental$(Figure$S6)$
63
$
In$DGAT1$region$LD$behavior$is$very$similar$between$Italian$Holstein$and$Italian$
Brown$ (Figure$ S8)$ while$ association$ values$ are$ really$ different$ because$ ARS5
BFGL5NGS54939=is$fixed$in$Italian$Brown$and$purged$by$QC$process$Interestingly$
association$ patterns$ become$ similar$ when$ DGAT1$ effect$ is$ removed$ from$ the$
Holstein$ panel$ (HolsteinDGAT$ Figure$ S8)$ In$ Simmental$ we$ found$ a$ different$
situation$ very$ low$ LD$ level$ and$ significant$ association$ levels$ with$ ARS5BFGL5
NGS54939$with$ the$milk$ trait$This$ is$probably$due$ to$ the$ low$ frequency$of$ the$
p232K$ allele$ in$ Italian$ Simmental$ (Scotti$ et$ al$ 2010)$ and$ to$ the$ different$
selection$strategies$applied$in$these$breeds$
The$ relatively$ high$ pBvalue$ of$ this$ SNP$ is$ responsible$ for$ the$ statistical$
significance$of$many$of$the$13$genes$found$significant$in$this$breed$for$this$trait$
as$ shown$ in$ Table$ 3$ therefore$ reducing$ the$ gene$ set$ to$ three$ locations$ two$
overlapping$features$indicating$a$pseudogene$in$BTA12$and$a$zinc$finger$protein$
ZBTB7A$in$BTA7$DGAT1$role$and$function$will$be$not$discussed$here$as$already$
done$elsewhere$(Grisart$et$al$2002)$
ZBTB7A=is$part$of$zinc$finger$and$BTB$proteins$(Broad$complex$Tramtrack$and$
Bricabrac)$ transcription$ factors$ with$ emerging$ importance$ for$ their$
involvement$ in$ oncogenesis$ and$ cell$ differentiation$by$ repressing$key$ genes$of$
cell$cycle$regulation$as$retinoblastoma$(Rb)$and=p53=(ldquoJeon$et$al$B$2008$B$ProtoB
oncogene$ FBIB1$ (PokemonZBTB7A)$ Represses$ Trpdfrdquo$ nd)= ZBTB7A$ over$
expression$ is$ shown$ in$primary$ and$ recurrent$ breast$ cancer$ tissues$ leading$ to$
disease$progression$and$poor$survival$(Zu$et$al$2011)$Link$between$this$gene$
polymorphism$and$milk$production$might$be$in$the$proper$cellular$development$
of$ the$ mammary$ gland$ This$ transcription$ factor$ is$ also$ known$ as$ a$ master$
regulator$ of$ B$ and$ TBcell$ lineage$ (Lee$ and$ Maeda$ 2012)$ enforcing$ the$ link$
between$ milk$ production$ and$ immune$ protection$ as$ already$ outlined$ by$ of$
Lemay$ and$ colleagues$ with$ proteomic$ studies$ where$ the$ ldquoactivation$ of$ the$
innate$ immune$systemrdquo$resulted$as$one$of$ the$most$represented$gene$ontology$
categories$(Lemay$et$al$2009)$
$
64
$
3 ItalianBrown
In$ the$ Italian$Brown$breed$we$ found$ one$ association$ on$BTA23$ (TXNDC5)$ for$
milk$yield$and$one$in$BTA18$for$protein$percentage$(PINLYP)$The$three$features$
listed$in$Table$3$for$milk$yield$in$Italian$Brown$were$collapsed$due$to$overlap$$
31 Milkyield
Thioredoxin$ domainBcontaining$ protein$ 5$ (TXNDC5)$ belongs$ to$ thioredoxin$
family$small$ubiquitous$proteins$containing$disulphide$domain$affecting$protein$
folding$with$chaperone$activity$preventing$protein$misBfolding$within$the$lumen$
of$ the$endoplasmic$reticulum$This$multifunctional$organelle$have$a$key$role$ in$
several$ processes$ including$ lipid$ catabolism$ and$ protein$ synthesis$ (RamiacuterezB
Torres$ et$ al$ 2012)$ Because$ of$ its$ disulphide$ bonds$ thioredoxin$ facilitate$ the$
reduction$of$other$proteins$by$cysteine$thiol$disulfide$exchange$This$mechanism$
is$essential$for$detossification$
During$ lactation$ indeed$ an$ enhanced$ detoxification$ level$ is$ functional$ to$
preserve$ milk$ quality$ against$ increased$ metabolic$ activities$ Higher$ metabolic$
levels$ lead$ to$ free$ radicals$ production$ making$ necessary$ the$ maintenance$ of$
redox$ homeostasis$ that$ include$ among$ others$ the$ reduction$ and$ oxidation$ of$
peroxiredoxin$and$thioredoxin$(Lu$and$Holmgren$2014)$
32 Proteinpercentage
For$ protein$ percentage$ in$ Italian$ Brown$ there$ is$ only$ one$ associated$ gene$
PINLYP=is$encoding$a$phospholipase$A2$inhibitor$but$this$protein$product$is$not$
yet$well$studied$ $Many$types$of$phospholipase$A2$were$characterized$and$ it$ is$
known$ that$ they$ are$ mediator$ of$ inflammation$ atherosclerosis$ and$ cancer$ in$
mammals$and$that$plays$an$important$role$in$the$innate$immune$response$(Qiu$
and$Lai$2013)$It$may$be$reasonable$to$imagine$a$role$for$PINLYP$in$the$lactation$
in$light$of$previously$discussed$results$that$link$milk$to$innate$immune$response$
(Lemay$et$al$2009)$
$
Although$ limited$ by$ the$ size$ of$ the$ study$ and$ the$ number$ of$ the$ included$
markers$these$results$represent$both$a$confirmation$and$a$step$forward$towards$
the$comprehension$of$the$biological$bases$of$milk$yield$and$quality$$
65
$
New$ genes$ potentially$ involved$ with$ production$ traits$ in$ dairy$ cows$ were$
tracked$when$GWAS$signals$resulted$too$weak$to$be$considered$in$a$classical$SNP$
based$mapping$with$the$ldquoclosest$gene$to$peakrdquo$or$the$windowBbased$strategy$
It$ is$ worth$ mentioning$ that$ none$ of$ the$ analyzed$ SNPs$ (not$ considering$ the$
Italian$Holstein$with$ARS5BFGL5NGS54939= included$ in$ the$ analysis)$were$ above$
the$ genomeBwide$ significance$ threshold$ in$ single$ SNP$ analysis$ meaning$ that$
important$information$is$often$lost$in$such$studies$$
In$addition$ to$ the$newly$ identified$ loci$we$observed$a$confirmation$of$existing$
associations$ with$ genes$ related$ to$ milk$ production$ not$ only$ dissecting$ gene$
specific$ functions$ but$ from$ mapping$ these$ genes$ in$ previously$ characterized$
regions$(Table$S5)$This$ is$not$trivial$when$using$new$data$analysis$techniques$
especially$ in$ high$ noise$ to$ signal$ ratio$ datasets$ like$ those$ here$ presented$We$
believe$ that$ this$ approach$will$ benefit$ of$data$ from$denser$marker$panels$ (eg$
BovineHD)$ to$ exploit$ local$ linkage$ disequilibrium$ and$ increase$ the$ number$ of$
mapped$genes$with$more$than$two$SNPs$
Results$ obtained$ with$ the$ gene$ centric$ approach$ although$ having$ intrinsic$
limitations$are$not$ influenced$by$a$number$of$subjective$choices$often$made$in$
GWAS$ For$ instance$when$ ldquowindow$mappingrdquo$ is$ used$ to$decrease$ the$noise$ to$
signal$ ratio$ eg$ due$ to$ nearby$ markers$ having$ different$ allele$ frequencies$
windows$size$ is$ to$be$decided$This$can$be$based$on$equal$number$of$markers$
equal$number$of$base$pairs$or$equal$map$distance$in$cM$per$window$The$three$
choices$ do$ not$ coincide$ because$ markers$ are$ not$ equally$ spread$ along$ the$
chromosomes$ and$ recombination$ differs$ in$ different$ genomic$ regions$ In$
addition$ windows$ can$ be$ chosen$ nonBoverlapping$ or$ with$ different$ degree$ of$
overlap$Finally$when$stepping$ from$significant$SNPs$or$windows$ to$candidate$
genes$the$size$of$the$flanking$region$from$which$picking$up$candidate$genes$is$to$
be$decided$as$well$Sometimes$the$gene$closer$to$the$signal$peak$is$selected$ in$
other$cases$all$genes$within$certain$boundaries$are$included$in$further$analyses$
All$ these$ choices$ may$ generate$ misleading$ results$ For$ example$ not$ truly$
associated$genes$may$be$retained$for$further$analyses$disregarding$the$correct$
ones$ In$addition$ a$ large$number$of$ false$positives$can$be$ included$ to$be$ later$
interpreted$with$ the$help$of$network$and$pathway$analysis$As$ a$ consequence$
these$decisions$have$a$direct$impact$in$terms$of$production$and$interpretation$of$
66
$
results$leading$to$conclusions$heavily$based$on$the$enrichment$analysis$and$preB
existing$knowledge$$
With$ the$ gene$ based$ association$ we$ also$ noticed$ some$ advantages$ in$ the$
downstream$data$mining$First$genes$are$functional$units$and$the$golden$targets$
of$ the$ ldquoclassical$ postBGWASrdquo$ bioinformatics$ pipelines$ Hence$ the$ gene$ based$
association$leads$to$genes$with$an$embedded$statistically$robust$framework$This$
is$ a$ ldquoplusrdquo$ in$ studying$ complex$ traits$ because$ one$ can$ concentrate$ on$ specific$
signals$ rather$ than$ cherry$ picking$meaningful$ features$ from$ a$ list$ constructed$
with$the$previously$cited$approaches$$
Moreover$it$is$easier$to$track$back$gene$clusters$to$major$gene$effects$B$like$for$
example$DGAT1$in$a$gene$rich$genome$chunk$many$genes$may$result$significant$
due$to$the$same$SNP$(Table$S2$and$Table$3)$With$the$VEGAS$method$finding$the$
ldquotruerdquo$association$is$facilitated$by$following$the$ldquobest$SNPrdquo$in$the$region$
In$addition$we$found$phenotype$coherent$signals$in$such$situation$where$only$a$
few$SNPs$B$ if$none$ndash$are$above$an$acceptable$statistical$genomeBwide$threshold$
using$a$single$SNP$regression$method$
The$gene$centric$approach$ is$a$ first$and$slight$ improvement$ in$ livestock$GWAS$
analyses$since$we$still$have$a$partial$genome$annotation$ In$addition$ it$ is$now$
known$that$ the$definition$of$gene$ is$ to$be$revised$many$genes$do$not$code$ for$
proteins$ have$ alternative$ starting$ sites$ have$ antisense$ transcripts$ host$
regulatory$ RNA$ are$ subjected$ to$ alternative$ splicing$ and$may$ be$ regulated$ by$
sequences$quite$ far$ away$ from$ their$ coding$ regions$ (Consortium$2012)$Other$
limitations$are$the$need$for$larger$number$of$animals$in$these$type$of$studies$as$
in$this$study$and$the$still$relatively$high$cost$of$sequencing$that$cannot$be$used$
routinely$in$large$experiments$at$least$not$yet$$
$
$
Conclusions
This$study$underlines$the$importance$of$the$implementation$of$new$methods$to$
analyze$complex$traits$with$genomic$data$No$golden$path(s)$is$discovered$yet$to$
walk$from$variation$to$functional$units$The$geneBcentric$analysis$may$contribute$
67
$
to$ reveal$ new$ signals$ throughout$ the$ bovine$ genome$ condensing$ weak$
association$signals$in$chromosome$bits$having$functional$significance$
$
$
Acknowledgements
The$Authors$would$ like$ to$ thank$Dr$ Jimmy$ Liu$ for$ his$ advices$ in$ building$ the$
software$AIA$ANAFI$ANARB$and$ANAPRI$breedersrsquo$associations$ for$ their$help$
and$ support$during$ the$data$analysis$ SelMol$ and$ProZoo$projects$ that$provide$
samples$ The$ Authors$ are$ also$ grateful$ to$MIPAAF$ and$ Cariplo$ Foundation$ for$
funding$
$
$
$
References
Akula$ N$ Baranova$ A$ Seto$ D$ Solka$ J$ Nalls$MA$ Singleton$ A$ Ferrucci$ L$
Tanaka$T$Bandinelli$ S$Cho$YS$Kim$YJ$Lee$ JBY$Han$BBG$Bipolar$
Disorder$ Genome$ Study$ (BiGS)$ Consortium$ The$ Wellcome$ Trust$ CaseB
Control$Consortium$McMahon$FJ$2011$A$NetworkBBased$Approach$to$
Prioritize$ Results$ from$ GenomeBWide$ Association$ Studies$ PLoS$ ONE$ 6$
e24220$doi101371journalpone0024220$
Amin$N$ van$Duijn$ CM$ Aulchenko$ YS$ 2007$ A$Genomic$Background$Based$
Method$ for$ Association$ Analysis$ in$ Related$ Individuals$ PLoS$ ONE$ 2$
e1274$doi101371journalpone0001274$
Aulchenko$YS$de$Koning$DBJ$Haley$C$2007$Genomewide$Rapid$Association$
Using$ Mixed$ Model$ and$ Regression$ A$ Fast$ and$ Simple$ Method$ For$
Genomewide$PedigreeBBased$Quantitative$Trait$Loci$Association$Analysis$
Genetics$177$577ndash585$doi101534genetics107075614$
Blanco$ EJ$ CarreteroBHernaacutendez$ M$ GarciacuteaBBarrado$ J$ IglesiasBOsma$ MC$
Carretero$M$Herrero$JJ$Rubio$M$Riesco$JM$Carretero$J$2013$The$
activity$and$proliferation$of$pituitary$prolactinBpositive$cells$and$pituitary$
68
$
VIPBpositive$ cells$ are$ regulated$by$ interleukin$6$Histol$Histopathol$ 28$
1595ndash1604$
Blott$S$Kim$JBJ$Moisio$S$SchmidtBKuntzel$A$Cornet$A$Berzi$P$Cambisano$
N$Ford$C$Grisart$B$Johnson$D$Karim$L$Simon$P$Snell$R$Spelman$
R$ Wong$ J$ Vilkki$ J$ Georges$ M$ Farnir$ F$ Coppieters$ W$ 2003$
Molecular$ dissection$ of$ a$ quantitative$ trait$ locus$ a$ phenylalanineBtoB
tyrosine$substitution$in$the$transmembrane$domain$of$the$bovine$growth$
hormone$ receptor$ is$ associated$ with$ a$ major$ effect$ on$ milk$ yield$ and$
composition$Genetics$163$253ndash266$
Burton$PR$Clayton$DG$Cardon$LR$Craddock$N$Deloukas$P$Duncanson$A$
Kwiatkowski$ DP$ McCarthy$ MI$ Ouwehand$ WH$ Samani$ NJ$ Todd$
JA$ Donnelly$ P$ Barrett$ JC$ Burton$ PR$ Davison$ D$ Donnelly$ P$
Easton$ D$ Evans$ D$ Leung$ HBT$ Marchini$ JL$ Morris$ AP$ Spencer$
CCA$Tobin$MD$Cardon$LR$Clayton$DG$Attwood$AP$Boorman$JP$
Cant$B$Everson$U$Hussey$JM$Jolley$JD$Knight$AS$Koch$K$Meech$
E$ Nutland$ S$ Prowse$ CV$ Stevens$ HE$ Taylor$ NC$ Walters$ GR$
Walker$NM$Watkins$NA$Winzer$T$Todd$JA$Ouwehand$WH$Jones$
RW$ McArdle$ WL$ Ring$ SM$ Strachan$ DP$ Pembrey$ M$ Breen$ G$
Clair$ DS$ Caesar$ S$ GordonBSmith$ K$ Jones$ L$ Fraser$ C$ Green$ EK$
Grozeva$D$Hamshere$ML$Holmans$PA$Jones$IR$Kirov$G$Moskvina$
V$ Nikolov$ I$ OrsquoDonovan$ MC$ Owen$ MJ$ Craddock$ N$ Collier$ DA$
Elkin$A$Farmer$A$Williamson$R$McGuffin$P$Young$AH$Ferrier$IN$
Ball$SG$Balmforth$AJ$Barrett$JH$Bishop$DT$Iles$MM$Maqbool$A$
Yuldasheva$N$Hall$AS$Braund$PS$Burton$PR$Dixon$RJ$Mangino$
M$ Stevens$ S$ Tobin$ MD$ Thompson$ JR$ Samani$ NJ$ Bredin$ F$
Tremelling$ M$ Parkes$ M$ Drummond$ H$ Lees$ CW$ Nimmo$ ER$
Satsangi$ J$Fisher$SA$Forbes$A$Lewis$CM$Onnie$CM$Prescott$NJ$
Sanderson$J$Mathew$CG$Barbour$J$Mohiuddin$MK$Todhunter$CE$
Mansfield$JC$Ahmad$T$Cummings$FR$Jewell$DP$Webster$J$Brown$
MJ$ Clayton$DG$ Lathrop$GM$ Connell$ J$Dominiczak$A$ Samani$NJ$
Marcano$CAB$Burke$B$Dobson$R$Gungadoo$J$Lee$KL$Munroe$PB$
Newhouse$SJ$Onipinla$A$Wallace$C$Xue$M$Caulfield$M$Farrall$M$
Barton$A$ (braggs)$TB$ in$RG$and$G$Bruce$ IN$Donovan$H$Eyre$S$
69
$
Gilbert$ PD$ Hider$ SL$ Hinks$ AM$ John$ SL$ Potter$ C$ Silman$ AJ$Symmons$ DPM$ Thomson$ W$ Worthington$ J$ Clayton$ DG$ Dunger$DB$ Nutland$ S$ Stevens$ HE$ Walker$ NM$ Widmer$ B$ Todd$ JA$Frayling$TM$Freathy$RM$Lango$H$Perry$JRB$Shields$BM$Weedon$MN$ Hattersley$ AT$ Hitman$ GA$ Walker$ M$ Elliott$ KS$ Groves$ CJ$Lindgren$ CM$ Rayner$ NW$ Timpson$ NJ$ Zeggini$ E$ McCarthy$ MI$Newport$M$Sirugo$G$Lyons$E$Vannberg$F$Hill$AVS$Bradbury$LA$Farrar$ C$ Pointon$ JJ$ Wordsworth$ P$ Brown$ MA$ Franklyn$ JA$Heward$JM$Simmonds$MJ$Gough$SCL$Seal$S$(uk)$BCSC$Stratton$MR$Rahman$N$Ban$M$Goris$A$Sawcer$SJ$Compston$A$Conway$D$Jallow$ M$ Newport$ M$ Sirugo$ G$ Rockett$ KA$ Kwiatkowski$ DP$Bumpstead$ SJ$ Chaney$A$Downes$K$ Ghori$MJR$ Gwilliam$R$Hunt$SE$Inouye$M$Keniry$A$King$E$McGinnis$R$Potter$S$Ravindrarajah$R$ Whittaker$ P$ Widden$ C$ Withers$ D$ Deloukas$ P$ Leung$ HBT$Nutland$S$Stevens$HE$Walker$NM$Todd$JA$Easton$D$Clayton$DG$Burton$PR$Tobin$MD$Barrett$JC$Evans$D$Morris$AP$Cardon$LR$Cardin$NJ$Davison$D$Ferreira$T$PereiraBGale$ J$Hallgrimsdoacutettir$ IB$Howie$ BN$Marchini$ JL$ Spencer$ CCA$ Su$ Z$ Teo$ YY$ Vukcevic$ D$Donnelly$P$Bentley$D$Brown$MA$Cardon$LR$Caulfield$M$Clayton$DG$ Compston$ A$ Craddock$ N$ Deloukas$ P$ Donnelly$ P$ Farrall$ M$Gough$ SCL$ Hall$ AS$ Hattersley$ AT$ Hill$ AVS$ Kwiatkowski$ DP$Mathew$ CG$McCarthy$MI$ Ouwehand$WH$ Parkes$M$ Pembrey$M$Rahman$N$Samani$NJ$Stratton$MR$Todd$ JA$Worthington$ J$2007$GenomeBwide$ association$ study$ of$ 14000$ cases$ of$ seven$ common$diseases$ and$ 3000$ shared$ controls$ Nature$ 447$ 661ndash678$doi101038nature05911$
Cantor$ RM$ Lange$ K$ Sinsheimer$ JS$ 2010$ Prioritizing$ GWAS$ Results$ A$Review$ of$ Statistical$ Methods$ and$ Recommendations$ for$ Their$Application$Am$J$Hum$Genet$86$6ndash22$doi101016jajhg200911017$
Clark$ AG$ Hubisz$ MJ$ Bustamante$ CD$ Williamson$ SH$ Nielsen$ R$ 2005$Ascertainment$ bias$ in$ studies$ of$ human$ genomeBwide$ polymorphism$Genome$Res$15$1496ndash1502$doi101101gr4107905$
70
$
CohenBZinder$M$ Seroussi$ E$ Larkin$ DM$ Loor$ JJ$Wind$ AE$ der$ Lee$ JBH$Drackley$ JK$Band$MR$Hernandez$AG$Shani$M$Lewin$HA$Weller$JI$ Ron$ M$ 2005$ Identification$ of$ a$ missense$ mutation$ in$ the$ bovine$ABCG2$gene$with$a$major$effect$on$ the$QTL$on$ chromosome$6$affecting$milk$yield$and$composition$in$Holstein$cattle$Genome$Res$15$936ndash944$doi101101gr3806705$
Cole$JB$Wiggans$GR$Ma$L$Sonstegard$TS$Lawlor$TJ$Crooker$BA$Tassell$CPV$ Yang$ J$ Wang$ S$ Matukumalli$ LK$ Da$ Y$ 2011$ GenomeBwide$association$ analysis$ of$ thirty$ one$ production$ health$ reproduction$ and$body$ conformation$ traits$ in$ contemporary$ US$ Holstein$ cows$ BMC$Genomics$12$408$doi1011861471B2164B12B408$
Colitti$M$Wilde$CJ$Stefanon$B$2004$Functional$expression$of$bclB2$protein$family$and$AIF$in$bovine$mammary$tissue$in$early$lactation$J$Dairy$Res$71$20ndash27$
Consortium$ TEP$ 2012$ An$ integrated$ encyclopedia$ of$ DNA$ elements$ in$ the$human$genome$Nature$489$57ndash74$doi101038nature11247$
De$Mello$F$Cobuci$JA$Martins$MF$Silva$MVGB$Neto$JB$2012$Association$of$the$polymorphism$g8514CgtT$in$the$osteopontin$gene$(SPP1)$with$milk$yield$ in$ the$ dairy$ cattle$ breed$ Girolando$ Anim$ Genet$ 43$ 647ndash648$doi101111j1365B2052201102312x$
Dudbridge$ F$ Gusnanto$ A$ 2008$ Estimation$ of$ significance$ thresholds$ for$genomewide$ association$ scans$ Genet$ Epidemiol$ 32$ 227ndash234$doi101002gepi20297$
Elsik$ CG$ Tellam$ RL$ Worley$ KC$ 2009$ The$ Genome$ Sequence$ of$ Taurine$Cattle$ A$window$ to$ ruminant$ biology$ and$ evolution$ Science$ 324$ 522ndash528$doi101126science1169588$
Fazioli$ F$ Minichiello$ L$ Matoska$ V$ Castagnino$ P$ Miki$ T$ Wong$ WT$ Di$Fiore$ PP$ 1993$ Eps8$ a$ substrate$ for$ the$ epidermal$ growth$ factor$receptor$kinase$enhances$EGFBdependent$mitogenic$signals$EMBO$J$12$3799ndash3808$
Fontanesi$ L$ Calograve$ DG$ Galimberti$ G$ Negrini$ R$ Marino$ R$ Nardone$ A$AjmoneBMarsan$ P$ Russo$ V$ 2014$ A$ candidate$ gene$ association$ study$
71
$
for$ nine$ economically$ important$ traits$ in$ Italian$ Holstein$ cattle$ Anim$
Genet$nandashna$doi101111age12164$
Fowler$KJ$Walker$F$Alexander$W$Hibbs$ML$Nice$EC$Bohmer$RM$Mann$
GB$ Thumwood$ C$ Maglitto$ R$ Danks$ JA$ 1995$ A$ mutation$ in$ the$
epidermal$growth$factor$receptor$in$wavedB2$mice$has$a$profound$effect$
on$ receptor$ biochemistry$ that$ results$ in$ impaired$ lactation$ Proc$ Natl$
Acad$Sci$92$1465ndash1469$doi101073pnas9251465$
Gianola$ D$ Hospital$ F$ Verrier$ E$ 2013$ Contribution$ of$ an$ additive$ locus$ to$
genetic$variance$when$inheritance$is$multiBfactorial$with$implications$on$
interpretation$ of$ GWAS$ Theor$ Appl$ Genet$ 126$ 1457ndash1472$
doi101007s00122B013B2064B2$
Gillette$ M$ Bray$ K$ Blumenthaler$ A$ VargoBGogola$ T$ 2013$ P190B$ RhoGAP$
Overexpression$ in$ the$ Developing$Mammary$ Epithelium$ Induces$ TGFβB
dependent$ Fibroblast$ Activation$ PLoS$ ONE$ 8$ e65105$
doi101371journalpone0065105$
Grisart$B$Coppieters$W$Farnir$F$Karim$L$Ford$C$Berzi$P$Cambisano$N$
Mni$ M$ Reid$ S$ Simon$ P$ Spelman$ R$ Georges$ M$ Snell$ R$ 2002$
Positional$Candidate$Cloning$of$a$QTL$ in$Dairy$Cattle$ Identification$of$a$
Missense$Mutation$ in$the$Bovine$DGAT1$Gene$with$Major$Effect$on$Milk$
Yield$ and$ Composition$ Genome$ Res$ 12$ 222ndash231$
doi101101gr224202$
Hubbard$NE$Chen$QJ$Sickafoose$LK$Wood$MB$Gregg$ JP$Abrahamsson$
NM$ Engelberg$ JA$ Walls$ JE$ Borowsky$ AD$ 2013$ Transgenic$
Mammary$Epithelial$Osteopontin$(Spp1)$Expression$Induces$Proliferation$
and$ Alveologenesis$ Genes$ Cancer$ 4$ 201ndash212$
doi1011771947601913496813$
Hu$ ZBL$ Park$ CA$ Wu$ XBL$ Reecy$ JM$ 2013$ Animal$ QTLdb$ an$ improved$
database$tool$for$livestock$animal$QTLassociation$data$dissemination$in$
the$ postBgenome$ era$ Nucleic$ Acids$ Res$ 41$ D871ndashD879$
doi101093nargks1150$
Jeon$et$al$ B$2008$ B$ProtoBoncogene$FBIB1$ (PokemonZBTB7A)$Represses$Trpdf$
nd$
72
$
Jiang$ L$ Liu$ J$ Sun$ D$ Ma$ P$ Ding$ X$ Yu$ Y$ Zhang$ Q$ 2010$ Genome$Wide$
Association$ Studies$ for$ Milk$ Production$ Traits$ in$ Chinese$ Holstein$
Population$PLoS$ONE$5$e13661$doi101371journalpone0013661$
Kong$ PBZ$ Yang$ F$ Li$ L$ Li$ XBQ$ Feng$ YBM$ 2013$Decreased$FOXF2$mRNA$
Expression$ Indicates$ EarlyBOnset$ Metastasis$ and$ Poor$ Prognosis$ for$
Breast$ Cancer$ Patients$ with$ Histological$ Grade$ II$ Tumor$ PLoS$ ONE$ 8$
e61591$doi101371journalpone0061591$
Lee$SBU$Maeda$T$2012$POKZBTB$proteins$an$emerging$ family$of$proteins$
that$ regulate$ lymphoid$ development$ and$ function$ Immunol$ Rev$ 247$
107ndash119$
Lemay$ DG$ Lynn$ DJ$ Martin$ WF$ Neville$ MC$ Casey$ TM$ Rincon$ G$
Kriventseva$ EV$ Barris$ WC$ Hinrichs$ AS$ Molenaar$ AJ$ 2009$ The$
bovine$lactation$genome$ insights$ into$the$evolution$of$mammalian$milk$
Genome$Biol$10$R43$
Liu$JZ$Mcrae$AF$Nyholt$DR$Medland$SE$Wray$NR$Brown$KM$2010$A$
versatile$ geneBbased$ test$ for$ genomeBwide$ association$ studies$ Am$ J$
Hum$Genet$87$139$
Liu$S$Dontu$G$Mantle$ID$Patel$S$Ahn$N$Jackson$KW$Suri$P$Wicha$MS$
2006$Hedgehog$Signaling$and$BmiB1$Regulate$SelfBrenewal$of$Normal$and$
Malignant$ Human$ Mammary$ Stem$ Cells$ Cancer$ Res$ 66$ 6063ndash6071$
doi1011580008B5472CANB06B0054$
Lu$ J$ Holmgren$ A$ 2014$ The$ thioredoxin$ superfamily$ in$ oxidative$ protein$
folding$ Antioxid$ Redox$ Signal$ 140202091634000$
doi101089ars20145849$
Luna$ A$ Nicodemus$ KK$ 2007$ snpplotter$ an$ RBbased$ SNPhaplotype$
association$and$linkage$disequilibrium$plotting$package$Bioinforma$Oxf$
Engl$23$774ndash776$doi101093bioinformaticsbtl657$
McCarthy$ MI$ Abecasis$ GR$ Cardon$ LR$ Goldstein$ DB$ Little$ J$ Ioannidis$
JPA$ Hirschhorn$ JN$ 2008$ GenomeBwide$ association$ studies$ for$
complex$traits$consensus$uncertainty$and$challenges$Nat$Rev$Genet$9$
356ndash369$doi101038nrg2344$
Meredith$ BK$ Kearney$ FJ$ Finlay$ EK$ Bradley$ DG$ Fahey$ AG$ Berry$ DP$
Lynn$ DJ$ 2012$ GenomeBwide$ associations$ for$ milk$ production$ and$
73
$
somatic$cell$score$in$HolsteinBFriesian$cattle$in$Ireland$BMC$Genet$13$21$doi1011861471B2156B13B21$
Minozzi$ G$Nicolazzi$ EL$ Stella$ A$ Biffani$ S$Negrini$ R$ Lazzari$ B$ AjmoneBMarsan$ P$Williams$ JL$ 2013$ Genome$Wide$ Analysis$ of$ Fertility$ and$Production$ Traits$ in$ Italian$ Holstein$ Cattle$ PLoS$ ONE$ 8$ e80219$doi101371journalpone0080219$
Nicolazzi$EL$Picciolini$M$Strozzi$F$Schnabel$RD$Lawley$C$Pirani$A$Brew$F$ Stella$ A$ 2014$ SNPchiMp$ a$ database$ to$ disentangle$ the$ SNPchip$jungle$ in$ bovine$ livestock$ BMC$ Genomics$ 15$ 123$ doi1011861471B2164B15B123$
Purcell$ S$ Neale$ B$ ToddBBrown$ K$ Thomas$ L$ Ferreira$ MAR$ Bender$ D$Maller$J$Sklar$P$de$Bakker$PIW$Daly$MJ$Sham$PC$2007$PLINK$a$tool$ set$ for$ wholeBgenome$ association$ and$ populationBbased$ linkage$analyses$Am$J$Hum$Genet$81$559ndash575$doi101086519795$
Qiu$ S$ Lai$ L$ 2013$ Antibacterial$ properties$ of$ recombinant$ human$ nonBpancreatic$secretory$phospholipase$A2$Biochem$Biophys$Res$Commun$441$453ndash456$doi101016jbbrc201310092$
Quinlan$AR$Hall$IM$2010$BEDTools$a$flexible$suite$of$utilities$for$comparing$genomic$ features$ Bioinformatics$ 26$ 841ndash842$doi101093bioinformaticsbtq033$
RamiacuterezBTorres$ A$ BarceloacuteBBatllori$ S$ MartiacutenezBBeamonte$ R$ Navarro$ MA$Surra$ JC$ Arnal$ C$ Guilleacuten$N$ Aciacuten$ S$ Osada$ J$ 2012$ Proteomics$ and$gene$ expression$ analyses$ of$ squaleneBsupplemented$ mice$ identify$microsomal$thioredoxin$domainBcontaining$protein$5$changes$associated$with$ hepatic$ steatosis$ J$ Proteomics$ 77$ 27ndash39$doi101016jjprot201207001$
Raven$ LBA$ Cocks$ BG$ Pryce$ JE$ Cottrell$ JJ$ Hayes$ BJ$ 2013$ Genes$ of$ the$RNASE5$pathway$ contain$ SNP$ associated$with$milk$ production$ traits$ in$dairy$cattle$Genet$Sel$Evol$45$25$doi1011861297B9686B45B25$
Scotti$E$Fontanesi$L$Schiavini$F$La$Mattina$V$Bagnato$A$Russo$V$2010$DGAT1$ pK232A$ polymorphism$ in$ dairy$ and$ dual$ purpose$ Italian$ cattle$breeds$Ital$J$Anim$Sci$9$doi104081ijas2010e16$
74
$
Stefanon$ B$ Colitti$ M$ Gabai$ G$ Knight$ CH$ Wilde$ CJ$ 2002$ Mammary$
apoptosis$and$lactation$persistency$in$dairy$animals$J$Dairy$Res$69$37ndash
52$
Stegh$ AH$ DePinho$ RA$ 2011$ Beyond$ effector$ caspase$ inhibition$ Bcl2L12$
neutralizes$ p53$ signaling$ in$ glioblastoma$ Cell$ Cycle$ 10$ 33ndash38$
doi104161cc10114365$
Taberlet$ P$ Valentini$ A$ Rezaei$ HR$ Naderi$ S$ Pompanon$ F$ Negrini$ R$
AjmoneBMarsan$ P$ 2008$ Are$ cattle$ sheep$ and$ goats$ endangered$
species$Mol$Ecol$17$275ndash284$doi101111j1365B294X200703475x$
Team$RC$ 2014$ R$ A$ Language$ and$Environment$ for$ Statistical$ Computing$ R$
Foundation$for$Statistical$Computing$Vienna$Austria$
Thomadaki$H$ Talieri$M$ Scorilas$ A$ 2007$ Prognostic$ value$ of$ the$ apoptosis$
related$genes$BCL2$and$BCL2L12$in$breast$cancer$Cancer$Lett$247$48ndash
55$doi101016jcanlet200603016$
Valdehita$ A$ Bajo$ AM$ FernaacutendezBMartiacutenez$ AB$ Arenas$ MI$ Vacas$ E$
Valenzuela$ P$ RuiacutezBVillaespesa$ A$ Prieto$ JC$ Carmena$ MJ$ 2010$
Nuclear$ localization$ of$ vasoactive$ intestinal$ peptide$ (VIP)$ receptors$ in$
human$ breast$ cancer$ Peptides$ 31$ 2035ndash2045$
doi101016jpeptides201007024$
Wang$T$Liu$NS$Seet$LBF$Hong$W$2010$The$Emerging$Role$of$VHS$DomainB
Containing$Tom1$Tom1L1$and$Tom1L2$in$Membrane$Trafficking$Traffic$
11$1119ndash1128$doi101111j1600B0854201001098x$
Wang$ X$Wurmser$ C$ Pausch$ H$ Jung$ S$ Reinhardt$ F$ Tetens$ J$ Thaller$ G$
Fries$R$2012$Identification$and$Dissection$of$Four$Major$QTL$Affecting$
Milk$Fat$Content$in$the$German$HolsteinBFriesian$Population$PLoS$ONE$7$
e40711$doi101371journalpone0040711$
Xu$S$2003$Theoretical$Basis$of$the$Beavis$Effect$Genetics$165$2259ndash2268$
Zimin$AV$Delcher$AL$Florea$L$Kelley$DR$Schatz$MC$Puiu$D$Hanrahan$
F$ Pertea$ G$ Tassell$ CPV$ Sonstegard$ TS$ Marccedilais$ G$ Roberts$ M$
Subramanian$ P$ Yorke$ JA$ Salzberg$ SL$ 2009$ A$ wholeBgenome$
assembly$ of$ the$ domestic$ cow$ Bos$ taurus$ Genome$ Biol$ 10$ R42$
doi101186gbB2009B10B4Br42$
75
$
Zu$X$Ma$J$Liu$H$Liu$F$Tan$C$Yu$L$Wang$J$Xie$Z$Cao$D$Jiang$Y$2011$ProBoncogene$ Pokemon$ promotes$ breast$ cancer$ progression$ by$upregulating$ survivin$ expression$ Breast$ Cancer$ Res$ 13$ R26$doi101186bcr2843$ $
76
$
Supplementaryfiles$
You$can$find$Chapter$͵$supplementary$files$at$
- DocTa$(Doctoral$Thesis$Archive)$(httptesionlineunicattit)$
- Google$Drive$(httptinyurlcomPhDMMChapter03)$or$scan$the$QR$code$
$
$
SupplementaryFigureS1Multidimensionalscaling
Spatial$distribution$of$genetic$structure$in$the$analyzed$breeds$$
$
Supplementary$FigureS2GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$Italian$Brown$
$
Supplementary$FigureS3GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$Italian$Holstein$
$
Supplementary$FigureS4GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$HolsteinDGAT$
$
Supplementary$FigureS5GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$Italian$Simmental$
77
$
$Supplementary$FigureS6LDdecayGenomeBwide$LD$decay$for$the$three$breeds$(Holstein$Simmental$Brown)$$Supplementary$FigureS7DescriptiveplotsDescriptive$statistics$for$the$Italian$Brown$Italian$Holstein$and$Italian$Simmental$$Supplementary$FigureS8DGAT1LDFocus$on$the$LD$of$the$DGAT1$region$in$all$the$datasets$(Brown$Holstein$HolsteinDGAT$Simmental)$$Supplementary$TableS1SNPfactNumber$of$subjects$and$SNPs$after$QC$$Supplementary$TableS2GeneBasedAssociationresultsFull$results$of$the$significant$genes$associated$with$the$traits$in$all$the$datasets$$ Sheet$1$Brown$ndash$fat$percentage$$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS3SingleSNPAssociationresultsFull$results$of$the$SNP$that$overcome$the$suggestive$threshold$$$ Sheet$1$Brown$ndash$fat$percentage$
78
$
$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS4GenesandtraitsGeneWise$pBvalues$of$the$discussed$genes$in$all$the$analysis$In$grey$are$evidenced$the$pBvalues$that$overcome$the$significance$threshold$Genes$are$alphabetically$listed$$Supplementary$TableS5GenesinQTLsGenes$from$the$geneBbased$association$mapping$into$known$QTLs$$$$$
79
CHAPTER(4(
MUGBAS(a(species(free(gene(based(association(suite(
S(Capomaccio1(M(Milanesi1(L(Bomba1(et(al(1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122ItalyTheseauthorscontributedequallytothiswork
____________________ApplicationnotesubmittedtoBioinformatics
Abstract(Theassociationstudiesbetweengenomewidemarkersandphenotypes(GWAS)
arenowwidelyusedinmodelandnonmodelspeciesHowevertoolsthatmerge
singlemarkerresultsandtheirprobabilityofbeingassociatedtofunctionalunits
(typicallygenes)areseldomdevelopedforspeciesotherthanhuman
Herewe presentMUGBAS a suite that calculates a generegionbased pvalue
fromagivenannotationsinglemarkerGWAresultandgenotypedatawiththe
objective to ease postGWAS analysis The software is species and annotation
independentfasthighlyparallelizedandreadyforhighdensitymarkerstudies
Availabilityandimplementationhttpsbitbucketorgcapemastermugbas
Introduction(With the availability of highdensity SNP panels Genome Wide Association
Studies (GWAS) have become the gold standard for dissecting the biology of
complextraits(McCarthyetal2008)Thisistrueforhumansandotherspecies
TheunderlyingideaofGWASissimplefindmarkertraitassociationsexploiting
the linkage disequilibrium (LD) that exists between the causative mutation ndash
whichweignorendashandoneormoremarkersofknownchromosomalposition
Whileanumberofwellestablishedapproachescanbeeasilyusedinastandard
GWAS (Aulchenko et al 2007 Kang et al 2010 Purcell et al 2007) to find
marker(s) associated to a particular phenotype several issues have to be
accountedforduringtheupanddownstreamanalyses
Firstly genome wide studies have to take into account multiple testing If
stringent statistical thresholds are applied a proportion ofmoderate or small
effectsactuallyassociatedwiththetraitwillbedisregardedThereforereducing
the number of tests while maintaining all the information provided by high
densitypanelswouldbeaclearadvantageInadditiononceasignificantsignalis
detected different strategiesmaybeused to tag the candidate causative gene
Sometimesthegeneclosertothepeaksignalisselectedinothercasesallgenes
withinaflankingregionofanarbitrarysizeofthesignificantSNP(s)areincluded
82
infurtheranalysesThesechoicesmayendupinbiasresultseitherretaininga
largenumberofnottrulyassociatedgenesordisregardingthecorrectones
Somemethodshaverecentlybeenproposedtoovercometheselimitationsusing
aprioriknowledgeastheldquocandidatepathwayrdquoanalysis(Ravenetal2013)and
the ldquogenebasedrdquo association strategies (Akula et al 2011 Cantor et al 2010
Liuetal2010)TheseapproachesrestrictGWAstudiesonlytoknowngenesor
genesubsetsresizingthemultipletestingproblemandincreasingthepowerof
theanalyses
Usually bioinformatics pipelines are developed for the analysis of human and
modelorganismsandarenotadaptedtononmodelorganismThisisthecaseof
the VEGAS software (Liu et al 2010 Yang et al 2014) developed only for
humansandonlyforgenebasedanalyses
Here we introduce MUGBAS [MUlti species GeneBased Association Suite] a
softwarebasedonVEGASthatusesGWASdataandagivenannotation(typically
genesbutanyregionmaybesuitable)toestimategenebasedandregionbased
associationpvaluesThesuite is speciesandannotation free ready toanalyze
highdensitydatafromanyexperimentaldesigns
Methods(MUGBASisasuiteofscriptsbuiltinBashRPythonandPerlThesuiterequires
several prerequisites that can be easily fulfilled in a LinuxUnixMac
environmentwith a few command lines Further information on how tomeet
theserequirementsaredetailedintheonlinerepository
MUGBAS suite can be invoked launching the Bash wrapper that checks
dependenciesandstoresuserchoices
Required input files are MAPPED files in PLINK format with genotype data
from(atleast200individualsarerecommendedforaproperLDcalculation)an
annotation file inBED format (chromosome startingposition endingposition
nameandusercustomfields)ofthedesiredgenomefeatureandtheGWASresult
file with mandatory headers ldquoSNPNAMErdquo and ldquoPVALUErdquo Multiple association
resultscanbetestedatthesametime ifprovidedinseparatefiles inthesame
directoryAtrialdatasetisdownloadablealongwiththesoftwarereleasePlease
83
notethatforsimplicityweprovideageneannotationbutanyBEDfilewithany
desiredfeaturecanbeused
The program starts the analysis assigning SNPs to any feature of the given
annotationusingbedtools(QuinlanandHall2010)Thisstepiscustomizableby
user defined boundaries (up and downstream) in order to catch regulatory
regionsthatcouldbeinhighLDwiththecurrentgeneregion
Inordertospeeduptheanalysistheprogramsubsamples200individualsfrom
theinputdatasetWethenimplementedtwolevelsofparallelizationoneforthe
LDandonefortheldquogenewiserdquoorldquoregionwiserdquopvaluecalculationThelatteris
particularlyusefulanalyzingmorethanoneGWAresultsfromthesamedataset
BrieflyMUGBASfirstsplitstheLDcalculationinncores(asdefinedbytheuser)
using the foreach doParallel packages (Analytics andWeston 2014a 2014b)
and then distributes traits across cores using GNU Parallel and foreach
doParallel
SinceLDcalculationisaslowprocessVEGASbenefitsofHapMapphase2datato
speed up the processwhile preserving the option for a user defined (human)
population using PLINK In order to expand these analysis to any species LD
calculationinMUGBASisimplementedwithther2fast()functionfromGenABEL
package(Aulchenkoetal2007)whilethegenewisepvalueiscalculatedasin
theVEGASapproachBrieflythegenewiseteststatisticisthesumofthenupper
tailchisquaredvaluesofaSNPsubsetwithonedegreeoffreedom(wheren is
thenumberofSNPmappinginthegivengene)Withperfectlinkageequilibrium
the genewise pvalue would be the onetailed pvalue of a chisquare
distributionwithndegreesof freedomOtherwisemorelikelythedistribution
of the pvalues has to be evaluated (with simulations) andweighted (with LD
values)(Liuetal2010)
Once every generegion has its ldquogenewiserdquo pvalue a FDR qvalue (False
Discovery Rate) statistics is calculated with the base R function padjust()
Results are stored and enrichedwith ldquomanhattanrdquo and ldquoundergroundrdquo plots of
theldquogenewiserdquopvalueandtheqvaluerespectively
84
Results(MUGBAS output is generegionoriented with useful information about the
location of the analyzed feature the ldquoBest SNPrdquo within that feature with its
genomiccoordinatesandtheoriginalpvaluethegeneorregionwisestatistics
(pvalue number of simulations and qvalue) the custom annotation of that
particularentryandthepositionoftheSNPwithrespecttothefeatureandthe
decidedboundariesAll this informationareuseful to track theeffectofhighly
significant markers that could map into multiple region of interest and
effectively pinpoint the most probable location to focus on with data mining
processes
A typical analysis performed with a highdensity panel (gt600K markers)
distributed in 10 midperformance cores (Intel Xeon X5675 307 GHz) on
threetraitssimultaneouslytakesaboutfourhours
Discussion(Generegionbased methods are now widely used and accepted in genomic
research However non modelspecies suffer for the lack of availability of
specifictoolstoenhancediscoveryfromgenomewidedataMUGBASprovidesa
robust and accepted statistics to researchers dealing with species other than
humantoeasilyjumpfromassociatedmarkerstothedesiredfunctionalunits
Acknowledgements(((WewouldliketothankDrJimmyLiuforhisadvicesinbuildingthesuite
Reference(Akula N Baranova A Seto D Solka J NallsMA Singleton A Ferrucci L
TanakaTBandinelli SChoYSKimYJLee JYHanBGBipolar
Disorder Genome Study (BiGS) Consortium The Wellcome Trust Case
85
ControlConsortiumMcMahonFJ2011ANetworkBasedApproachtoPrioritize Results from GenomeWide Association Studies PLoS ONE 6e24220doi101371journalpone0024220
Analytics R Weston S 2014a doParallel Foreach parallel adaptor for theparallelpackage
AnalyticsRWestonS2014bforeachForeachloopingconstructforRAulchenkoYSdeKoningDJHaleyC2007GenomewideRapidAssociation
Using Mixed Model and Regression A Fast and Simple Method ForGenomewidePedigreeBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614
Cantor RM Lange K Sinsheimer JS 2010 Prioritizing GWAS Results AReview of Statistical Methods and Recommendations for TheirApplicationAmJHumGenet866ndash22doi101016jajhg200911017
KangHMSulJHServiceSKZaitlenNAKongSFreimerNBSabattiCEskin E 2010 Variance component model to account for samplestructure in genomewide association studies Nat Genet 42 348ndash354doi101038ng548
LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKM2010Aversatile genebased test for genomewide association studies Am JHumGenet87139
McCarthy MI Abecasis GR Cardon LR Goldstein DB Little J IoannidisJPA Hirschhorn JN 2008 Genomewide association studies forcomplextraitsconsensusuncertaintyandchallengesNatRevGenet9356ndash369doi101038nrg2344
Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender DMallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKatool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795
QuinlanARHallIM2010BEDToolsaflexiblesuiteofutilitiesforcomparinggenomic features Bioinformatics 26 841ndash842doi101093bioinformaticsbtq033
86
Raven LA Cocks BG Pryce JE Cottrell JJ Hayes BJ 2013 Genes of the
RNASE5pathway contain SNP associatedwithmilk production traits in
dairycattleGenetSelEvol4525doi101186129796864525
87
CHAPTER5
Imputationaccuracyisrobusttocattlereferencegenomeupdates
MMilanesi1etal1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122Italy
____________________ShortCommunicationacceptedinAnimalGenetics
SummaryGenotype imputation is routinely applied in a large number of cattle breeds Imputation
hasbecomeaneedduetothelargenumberofSNParrayswithvariabledensity(currently
from2900to777962SNPs)Althoughmanyauthorshavestudiedtheeffectofdifferent
statisticalmethodsonimputationaccuracytheimpactofa(likely)changeinthereference
genomeassemblyonimputationfromlowertohigherdensityhasnotbeendeterminedso
far In this work 1021 Italian Simmental SNP genotypes were reJmapped on the three
mostrecentreferencegenomeassembliesFour imputationmethodswereusedtoassess
theimpactofanupdateinthereferencegenomeAsexpectedthefourmethodsbehaved
differentlywith largedifferences in termsof accuracyUpdatingSNPcoordinateson the
three tested cattle reference genome assemblies determined only a slight variation on
imputationresultswithinmethod
KeywordsImputationreferencegenomeassemblySNPbovine
MaintextFromthepublicationof the firstbovinesinglenucleotidepolymorphism(SNP)array the
wholegenomicscenarioofthisspeciesevolvedrapidlyNowadaysseveralSNPpanelshave
been designed with a range of densities mapping on different reference genome
assemblies and produced by different manufacturers When running analyses with SNP
arrays of different densities or from different commercial companies (eg Illumina and
Affymetrix) an imputation step is often required The impact of different statistical
methods on imputation accuracy has already been assessed (Pei et al 2008) (Marchini
andHowie2010)(Maetal2013)Surprisinglyuptonowtherearenotavailablestudies
investigating the impact of an update on the reference genome assembly on imputation
accuracy This assessment would be of fundamental importance considering that reJ
mappingSNPsdeterminesarearrangementofthehaplotypestructureofthegenotypes(in
somecasesSNPsarereJmappedtoadifferentchromosome)andthatthebovinereference
assembly is likely toberepeatedlyupdatedover timeThisworkwasaimedatassessing
the effect of different reference genome assemblies in cattle As test casewe compared
four different imputation methods from low (Illumina BovineLDv10 Beadchip) to
90
mediumhigh (Illumina BovineSNP50v2 Beadchip) density in the Italian Simmental
populationForeachmethodweassessedthe impacton imputationaccuracyofmapping
SNPs on the BTAU42 BTAU46 (wwwhgscbcmeduotherJmammalsbovineJgenomeJ
project)orUMD31(Ziminetal2009)cattlereferencegenomeassemblies
Atotalof900and121SimmentalbullsweregenotypedwiththeIlluminaBovineSNP50v1
and v2 Beadchips (Illumina Inc San Diego CA) respectively Illumina BovineSNP50v2
Beadchip (hereinafter named 54k) the official reference chip for the Italian Simmental
breedersassociation(ANAPRI)wasconsideredasreferenceSNPpanelTheSNPchiMpv1
tool(Nicolazzietal2014)wasusedtoobtainSNPcoordinatesonBTAU42UMD31and
BTAU46genomeassembliesForeachassemblySNPsnotcontainedinthe54kunmapped
orinthesexualchromosomeswerediscardedThefinaldatasetcontained5118952886
and51290SNPsinBTAU42UMD31andBTAU46respectively
FourimputationmethodswereconsideredPedImpute(Nicolazzietal2013)FindHapv2
(VanRadenetal2011)FImpute(Sargolzaeietal2014)andBeaglev332(Browningand
Browning2007)Thefirstthreeusepedigreeandpopulation(ie linkagedisequilibrium
LD) informationwhereas the fourth uses only LD information Performance of the four
methods testedacrossgenomicassemblieswas comparedusingwrongly imputedalleles
(ldquoerrorsrdquo)andallelicsquaredcorrelation(ldquoallelicJR2rdquo)statisticsAllelicJR2wasdefined
asthesquaredcorrelationbetweenimputedandtrue(eggenotyped)allelesmultipliedby
100 as in Browning amp Browning (2009) The 878 oldest sires were used as training
population whereas the 143 youngest bulls were used as test population SNPs not in
commonwith the lowdensity panelwere set tomissing in the test population The test
population was split in four scenarios individuals with sire and maternal grandsire
genotypedinthereferencepopulation(scenarioAJ84animals)individualswithonlythe
sire genotyped (scenario B J 34 animals) individuals with only the maternal grandsire
genotyped(scenarioCJ13animals)ornoneofcloserelativesgenotyped(scenarioDJ12
animals)Thepopulationusedwashighlystructuredwithnearlyhalfofthebullspresent
in the dataset (490) sired by 115 genotyped sires In addition 35 out of these 115
genotypedsiresweresiredby22bulls(eggrandsires)alsopresentinthedataset
DifferencesacrossassembliesarenothomogeneousFromBTAU42toBTAU4646SNPs
weremapped to different chromosomes and the average difference in physical position
withinchromosomeswasnearly025MbpHowevertherewasnosubstantialdifferencein
the order of markers In fact not considering unmapped SNPs on either of the
aforementionedassembliesadifferenceinthesequentialorderwasobservedforonly134
91
single (ie spread throughout the genome) SNPs On the contrary from UMD31 to
BTAU46 222 SNPsweremapped to different chromosomes and amuch larger average
differenceinSNPsphysicalpositionwasobserved(iemorethan2Mbp)Alsothenumber
ofSNPswithdifferentsequentialorderwasmuchlarger(ie1893SNPs)Howevermore
than50oftherearrangementsinvolvedblocksfrom3tonearly100markersAsaresult
slight differences in overall imputation accuracy from BTA46 to UMD31 rather than
between the two BTAU assemblies were expected across methods On the contrary if
rearrangements are due to issues in the assembly of scaffolds theymight have a direct
(negative) impact on local imputation performances In fact a preliminary analysis on
BTA1consideringlocalerroracrossassembliesandmethodssuggestatleasttwosuch
SNPsclustersat44and138MbponUMD31(FiguresS1S2andS3)
Considering allmethods and scenarios amaximum average variation of 02 inerrors
(Table1)andof09inallelicJR2(Table2)wasobservedacrossreferenceassembliesThe
standarddeviation(SD)oftheimputationaccuracywasstableacrossreferenceassemblies
withinmethodsOnthecontraryalargervariabilitywasobservedacrossmethods
PedImpute and FindHap showed a relatively large average allelicJR2 variability across
scenarios evaluated over the same reference assembly Only PedImpute showed a large
variation in SD across scenarios in errors and allelicJR2 with high variability on
scenarios C and D showing a low performance of thismethodwhen there are no close
relatives(egsiredam)genotypedinthereferencedatasetFImputeinsteadachievedthe
highestaccuracyacrossallreferencegenomesandscenariossurprisinglyobtainingitsbest
resultsJerrorsof14(06SD)andallelicJR2of955(26SD)JinscenarioCwhereonly
onerelativelycloserelativeisgenotyped(maternalgrandJsire)FImputeappeartobeable
todetectshorterhaplotypesfromdistantrelativesmoreaccuratelythananyothermethod
tested in this study thus it is lessdependenton theavailabilityofpedigree information
thantheothertwopedigreeJbasedmethods(Sargolzaeietal2010)
The large variability across methods within assembly was expected although the
differences obtained among PedImpute FindHap and Beagle were larger in this study
comparedtoasimilarstudyinItalianHolstein(Nicolazzietal2013)Thiscoulddependon
a number of factors First the sample size of the test population is small (especially for
groups C and D where less than 15 individuals were present) Second the interaction
between population structure and imputationmethodsmay have a role PedImpute and
FindHap rely much more heavily on pedigree information than FImpute and Beagle
assumesall individualsareunrelatedHowever the lattertwomethods indirectlybenefit
92
greatly of from highly structured populations with large number of relatives ndash close or
distantndashpresent inthereferencepopulation(Sargolzaeietal2014)Anotherinteresting
factor that influences the imputation accuracy may be the persistence of LD at long
distances this is much lower in the Italian Simmental than in the Italian Holstein
population(FigureS4)Informationonrelatives(eitherpedigreeinformationorgenotyped
relativesinthereferencepopulation)isexpectedtoplayabiggerroleifLDisconservedat
shorter distances This hypothesis is supported by the fact that although there is high
variability acrossmethods better performances of the pedigreeJbasedmethods are still
observedinscenariosAandB(andCforFImpute)
Theslightdifferencesinimputationaccuracyobservedacrossreferenceassembliesmaybe
explained by the aforementioned small variation in the SNP blocks across reference
assemblies irrespectively of the rearrangements of single SNPs Considering that SNP
blocksaremostlymaintainedweexpectthatbreedswithlargerLDpersistenceatlonger
distances (ie Holstein Brown) will obtain even better imputation accuracy across
assembliescomparedtowhatreportedinthisstudyinItalianSimmentalHoweverastill
relativelyunexploredaspectof imputationperformance is thedistributionof imputation
errors across the genomeNot considering genotyping errors (which are expected to be
randomly distributed in the genome) a cluster of consecutive markers with high
percentage of imputation error across methods are most probably identifying missJ
assembledregions in thegenome(FiguresS1S2andS3)Althoughthe impactonoverall
imputation accuracy is limited as results in this study show these errorsmight have a
large impact on any analyses relying on specific genomic coordinates (eg genomeJwide
associationstudies)Morecomprehensiveresultsincludingannotationstudiesandwhole
genomesequencingdataarerequiredtoconfirmandinvestigatetheseobservations
Toconcludeonlyasmalleffectofupdatingthereferencegenomeassemblyonimputation
accuracywas observed in Italian Simmental On the other hand as already reported by
other studies the variability of imputation accuracy estimates is much larger across
imputation methods The choice of the most suitable imputation method based on the
breedgeneticbackgrounditscharacteristicsandontheavailabledataisthereforecritical
AcknowledgementsThe research leading to these results has received funding from the European Unions
Seventh Framework Programme (FP72007J2013) under grant agreement ndeg 289592 ndash
93
Gene2FarmPartofthegenotypicdatausedinthisstudywasprovidedbyprojectldquoSelMolrdquo
fundedbytheItalianMinistryofAgriculture(MiPAAF)MMwassupportedbytheDoctoral
SchoolontheAgroJFoodSystem(Agrisystem)of theUniversitagraveCattolicadelSacroCuore
(Italy) FB was supported by the Marie Curie European Reintegration Grant
ldquoNEUTRADAPTrdquo(FP7JPEOPLEJ2009JRG)
ReferencesBrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingandMissingJ
Data Inference for WholeJGenome Association Studies By Use of Localized
HaplotypeClusteringAmJHumGenet811084ndash1097
MaPBroslashndumRFZhangQLundMSSuG2013Comparisonofdifferentmethods
forimputinggenomeJwidemarkergenotypesinSwedishandFinnishRedCattleJ
DairySci964666ndash4677doi103168jds2012J6316
Marchini J Howie B 2010 Genotype imputation for genomeJwide association studies
NatRevGenet11499ndash511doi101038nrg2796
NicolazziELBiffaniSJansenG2013ShortcommunicationImputinggenotypesusing
PedImputefastalgorithmcombiningpedigreeandpopulationinformationJournal
ofDairyScience962649ndash2653doi103168jds2012J6062
NicolazziELPiccioliniMStrozziFSchnabelRDLawleyCPiraniABrewFStella
A 2014 SNPchiMp a database to disentangle the SNPchip jungle in bovine
livestockBMCGenomics15123doi1011861471J2164J15J123
PeiYLiJZhangLPapasianCJDengH2008Analysesandcomparisonofaccuracyof
differentgenotypeimputationmethodsPLoSONE3551
VanRadenPMOrsquoConnellJRWiggansGRWeigelKA2011Genomicevaluationswith
manymoregenotypesGenetSelEvol43
94
Tableamp1ampIm
putationampaccuracyampacrossampassem
bliesampforampPedImputeampFindH
apampFImputeampandampBeagleamppercentageampofampwronglyampim
putedamp
allelesamp
PedImputeamp
FindHapamp
FImputeamp
Beagleamp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
Scenarioamp
Namp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
A1amp
84
20
07
21
07
21
07
32
09
32
09
32
09
15
09
15
09
15
09
24
11
25
11
25
11
B2amp
34
23
08
23
09
22
08
36
11
36
12
36
11
15
07
15
09
14
06
23
08
23
08
23
08
C3amp
13
98
41
98
41
98
41
51
10
52
10
51
10
14
06
14
06
14
06
23
06
23
06
24
06
D4 amp
12
88
49
89
49
88
49
54
11
56
12
54
11
16
11
17
12
16
11
26
15
26
15
26
14
1 Sireandm
aternalgrandsiregenotyped
2 Siregenotypedmate
rnalgrandsireno
tgenotyped
3 Sireno
tgenotypedmate
rnalgrandsiregenotyped
4 Sireandm
aternalgrandsireno
tgenotyped
Tableamp2ampIm
putationampaccuracyampacrossampassem
bliesampforampPedImputeampFindH
apampFImputeampandampBeagleampallelicJR2amp
PedImputeamp
FindHapamp
FImputeamp
Beagleamp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
Scenarioamp
Namp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
A1amp
84
941
20
940
20
941
20
907
24
908
25
907
24
952
26
952
27
952
27
925
33
923
33
925
33
B2amp
34
935
24
934
25
935
24
896
30
896
34
895
32
954
20
954
21
954
20
929
24
928
25
928
24
C3amp
13
759
89
761
88
758
87
848
33
845
30
848
32
955
18
955
19
955
19
931
19
928
19
927
19
D4 amp
12
782
113
777
113
781
112
841
32
832
35
839
33
949
34
947
38
949
35
921
44
919
45
921
44
1 Sireandm
aternalgrandsiregenotyped
2 Siregenotypedmate
rnalgrandsireno
tgenotyped
3 Sireno
tgenotypedmate
rnalgrandsiregenotyped
4 Sireandm
aternalgrandsireno
tgenotyped
95
Supplementaryfiles
YoucanfindChapterͷsupplementaryfilesat
- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)
- GoogleDrive(httptinyurlcomPhDMMChapter05)orscantheQRcode
SupplementaryFigureS1ErrorpercentageacrossmethodsforBTAU42onBTA1
SupplementaryFigureS2ErrorpercentageacrossmethodsforUMD31onBTA1
SupplementaryFigureS3ErrorpercentageacrossmethodsforBTAU46onBTA1
SupplementaryFigureS4Linkagedisequilibrium(LD)decayoverdistanceinItalian
SimmentalandHolstein
ThedottedlinewithblackdotsindicatesLDdecayoverdistanceinItalianHolstein
whereasthedashedlinewithwhitediamondsindicatesthesamevaluesforItalian
SimmentalItalianHolsteingenotypeswereasubsetofthedatausedinNicolazzietal
(2013)
96
CHAPTER6
QuestfordeleteriousrecessivevariantsinItalianHolsteinbullscombiningexomesequencingandhighdensitySNPanalysis
MMilanesi1etal
1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly
____________________Manuscript
Abstract
Outbreed species carry a genetic load of deleterious recessive variants Those
that are lethal survive only in the heterozygous state in the population while
otherssimplyreducethefitnessofindividualshomozygousforthevariantThe
long=termdestinyofthesemutationsistodecreaseinfrequencyduetonegative
selectionandeventuallydisappearbecauseofgeneticdriftHowever theymay
bemaintainedinlivestockpopulationsandevenincreaseinfrequencywhenin
linkage with favourable alleles for traits under selection or occurring in the
genomeofsireshavinghighgeneticvalueforproductivetraitsWesearchedfor
these variants combining exome sequence data from 20 Italian Holstein bulls
sampledfromoppositetailsofthemaleandfemalefertilityEBVdistributionthe
HighDensity(HD)SNPgenotypingof1009progenytestedItalianHolsteinbulls
and information existing in public databases Exome sequencing detected
202956 variants Among these 5167 were classified as ldquodeleteriousrdquo when
annotatedwithSIFTandAnnovarThesewerefilteredbypickingthosemapping
in regions out of equilibrium and lack of homozygotes in the progeny tested
Holstein bulls or having an asymmetric distribution in bulls plus=variant and
minus=variant for fertility EBV The resulting 193 variants in 163 genes were
furtherassessedbycomparativegenomicshaplotypeanalysisandfunctionFor
eachassessmentapositiveornegativescorewasgiventoevidenceinfavouror
against thevariant tobedeleteriousA totalof35variantshadapositive total
score These genes are involved in development cell motility and immune
responseSomeof themareknowntobeexpressed in the testiclesandappear
importantforspermfertilityinmodelspeciesothersareresponsibleforgenetic
defectsinhumanandmousecarryingmutationssimilartotheonesdetectedin
cattleThesegenevariantsarecandidatetobeamongthosethatconstitutethe
ldquogeneticloadrdquoofdeleteriousrecessivegenesintheHolsteinpopulation
100
Introduction
The presence of deleterious recessive variants is well tolerated by outbred
speciesWhenthesevariantsappeartheymayspreadandsurvivealongtimein
the population in the heterozygous state (Simmons and Crow 1977) It is
estimatedthatoutbredindividualshumanincludedcarryonaverageagenetic
loadofmorethan100loss=of=functionvariantsintheirgenome(MacArthuretal
2012)
Theeffectofdeleterious recessivevariantsappearswhendemographicevents
eggeneticbottlenecksreducethegeneticvariabilityofthepopulationorwhen
relatedindividualsareintentionallymatedforbreedingpurposesInthesecases
recessive deleterious reduce the fitness of individuals carrying them in the
homozygous state and in extreme case induce very severe genetic defects
premature death and abortion (McClure et al 2014 Sahana et al 2013
VanRadenetal2011)Theythereforecontributetoexplainthephenomenonof
inbreeding depression and its negative effect on fertility (Charlesworth and
Willis2009)
The search for deleterious recessive variants is particularly interesting in
Holstein cattle Thisbreed is experiencing anegative genetic trend for fertility
traits (Jamrozik et al 2005) it is worth noting that this is occurring while
inbreedingisunderstrictmonitoringandistheoreticallyincreasingataverylow
pace(about005peryear=Mrodeetal2009)Inadditionmatingisplannedto
minimise relationship between animals and the additive genetic variance for
mosttraitsndashincludingmilkproduction=isapparentlynotdecreasinginspiteof
the selection pressure applied by modern schemes (Biffani et al 2002)
indicatingthatoveralldiversityisonlyslightlydecreasing
SelectiontoincreasefertilityischallengingSelectioncriteriagenerallyadopted
to improve this trait are far from the biological mechanisms that govern
reproduction Inaddition thesemechanismsarecomplexandverysensitive to
environmental(management)conditions(Laneetal2013Wathesetal2014)
Asa result fertility traitshave loworvery lowheritability (Pryceetal1997)
andthereforelowresponsetoselectionandslowgeneticprogress
101
Forthesereasons thequest formolecularvariants influencing fertility traits is
on=goingsincetheavailabilityofmoleculartoolsforDNAanalysisAnumberof
investigations tested the effect of functional candidate genes running genome
wide association analyses to find QTLs for fertility traits using hundreds
microsatellite markers in structured families (Ashwell et al 2004) or tens of
thousand SNP panels in whole populations (Aston and Carrell 2009) The
availability of medium density SNP panels allowed the search for deleterious
recessivesevenwithouttheuseofphenotypictraitsbyscanningthegenomein
searchofregionslackingoneofthetwohomozygousclasses(Sahanaetal2013
VanRaden et al 2011) This approach has highlighted a number of genomic
regions likely carrying deleterious recessives This information has immediate
application inmolecular assisted breeding and a relevant scientific interest in
thesearchofcausativevariantsandassessmentoftheirfunctions
Thedevelopmentofnewsequencingtechnologieshasgreatlydecreasedthecost
of sequencing so that nowadays the sequencing of few target individuals is
becoming the gold standard to seek for causal mutations for rare monogenic
defectsinhumansandotherspecies(ChunandFay2009)Oftenonlytheexome
the protein coding part of the genome is sequenced searching for those
mutationsthatseverelyalterproteinstructureandorfunction(Bamshadetal
2011 Li et al 2013) A typical mammalian exome consists in a few tens of
millionnucleotides(around2ofthereferencegenome)comparedtothefew
billionofthewholegenomeThereforewiththesamenumberofreadsexome
sequencinghastheadvantageofamoreaccurategenotypingdueto thehigher
sequencing depth an easier data management lower computation load and
easier interpretation (Bieseckeretal2011Majewskietal2011) Indeed the
numbervariants found inexomesareone=twoordersofmagnitude lowerthan
thatdetectedinwholegenomesandmutationsaredetectedinannotatedgenes
orimmediatelynearbyOntheotherhandtheexomeapproachcanonlyidentify
variantson annotatedgenesVariants inunknownexonsor that occur innon=
coding regulatory features will be missed (Bamshad et al 2011) In humans
these are turning out to be of outmost importance for gene regulation and
downstreamphenotypic effects (ENCODEConsortium2012) but they are still
verypoorlyannotatedinlivestock
102
Herewesequencedtheexomesof20Holsteinbullsprogenytestedselectedfrom
the tails of male and female fertility EBVs (Estimated Breeding Value) Since
livestockfertilityisacomplextraitandcausativemutationscannotbesoughtby
the approach used in monogenic diseases we integrated the sequence
information with high density SNP chip data obtained on a much larger
populationofItalianHolsteinbullstodetectgenescarryingallelescandidateto
be deleterious recessives and confirm their likely deleterious effect by
comparativegenomics
Materialsandmethods
1SamplingandDNAextraction
Semen(straws)andbloodsampleswereprovidedbyItaliancertificatedartificial
insemination centres (UE Directive 88407CEE) thus ethics committee
approvalforthisstudyisnotrequired
A total of 20 ItalianHolstein progeny tested bullswere sampled for complete
exome sequencing Among them 19were chosen from the tails of female and
male fertilityEBV(EstimatedBreedingValue)distribution (Table1) tohave5
animals fromeach tail One animalwas to be in theminus=variant tail of both
traitsBullswereselectedasunrelatedaspossibleSamplingwaspossiblethanks
tothecollaborationwithANAFI(ItalianHolsteinBreederAssociation)
ANAFIdevelopedageneticevaluationfor fertilitycombiningdifferenttraitsas
explained by Biffani and colleagues (Biffani et al 2005) The traits are ldquodays
from calving to first inseminationrdquo ldquocalving intervalrdquo ldquofirst=service non return
rateto56daysrdquoldquoangularityrdquoandldquomatureequivalentmilkyieldat305daysrdquo
MalefertilityisexpressedasanEstimatedRelativeConceptionRate(ERCR)and
isameasureofconceptionrateofaservicesirerelativetoservicesiresofherd
mates(Soggiuetal2013)
103
Table1FemaleandmalefertilityEBVsofthe20animalssequenced
Animal EBVFemaleFertility EBVMaleFertility GroupfriA01 97 =111 NAfriA02 98 =166 MalLfriA03 115 0 FemHfriA04 114 0 FemHfriA05 114 0 FemHfriA06 86 0 FemLfriA07 98 195 MalHfriA08 105 213 MalHfriA09 95 =154 MalLfriA10 95 149 MalHfriA11 102 264 MalHfriA12 91 =475 FemLMalLfriA13 105 =179 MalLfriA14 102 207 MalHfriA15 99 =405 MalLfriA16 85 0 FemLfriA17 87 0 FemLfriA18 121 0 FemHfriA19 117 0 FemHfriA20 88 0 FemL
A total of 1009 Italian Holstein bulls were chosen from the entire population
balancing theirpresence in the tailsof theEBV forkgprotein (PROT) somatic
cell count (SCC) fertility (FERT) and functional longevity (LONG) for high
densitygenotyping(TableS1)
DNAwasisolatedfromwholebloodandsemenstrawssamplesinLGSlaboratory
(CremonaItalywwwlgscrit)usingtheGenomixkitaccordingtoLGSinternal
protocolDNAsforexomecaptureandsequencingwereextractedindependently
tosatisfy therequirement indicatedby IGA technologies (Udine Italy)at least
12mgofDNAataconcentration100ngulwithR260280higherthan18and
R260230near19
2Moleculardataproductionandqualitychecking
21Exomecaptureandsequencing
104
Exomes were captured using the ldquoSureSelected All exon Bovinerdquo (Agilent
Technologies) kit in early version Probes coordinates are based on UMD31
bovinereferencesequence(Ziminetal2009)anddesignedagainstNCBIexons
as well as predicted exons UTRs and miRNAs Number and distribution of
probesperchromosomearedetailedinTableS2
rdquoOvation Ultralow Library Systemsrdquo protocol was followed for library
preparation After quantification with Qubit (Life technologies) and quality
controlwith2100Bioanalyzer(AgilentTechnologies)DNAwassonicatedwith
Bioruptor (Diagenode) to obtain fragments between 150 to 200 bp in length
blunt end=repaired adenylated ligated with paired=end adaptors and purified
followingtheprotocolprovidedbythemanufacturer
LibrarieswerehybridizedwithldquoSureSelectedAllexonBovinerdquoprobesusingthe
Encore Target Capture Module (Nugen) SureSelect Target Enrichment and
Herculase II FusionEnzymewithdNTPCombo (AgilentTechnology) kit Then
capturedDNAwasamplifiedandtaggedwithlibraryspecificindexesQualityof
capturedsampleswastestedwithQubitand2100Bioanalyzer
Paired=end 2X100bp sequencing was performed on Illumina HiSeq2500
(IlluminaSanDiegoCA)atanexpectedcoverageof50X
22Sequencealignment
Adapters and reads shorter than 36bp were trimmed using Trimmomatic
(version032)(Bolgeretal2014)ForeachFastqfilethealignmentandpairing
of the reads to theUMD31 reference sequencewas performedwithBurrows=
WheelerAligner(BWA)(version077=r441)(LiandDurbin2009)applyingalso
trimmingoption (=q5) SAM files resultedwere converted intoBAM file using
SAMtools (version0118) (Lietal2009)Foreach individualBAM fileswere
merged in a single one with Picard (version 1111)
(httpsgithubcombroadinstitutepicard)Duplicatereadswereidentifiedand
markedwithPicardFinallyindividualBAMfileswereindexedwithSAMtools
105
Sequencedepthwasevaluatedsimultaneouslyonall20sequencedanimalsand
onlyonregionstargetedbyprobesusingDepthOfCoverageofGATK(McKennaet
al2010)
23ExomeVariantcallingandqualitycheck
Singlenucleotidepolymorphisms(SNPs)insertionsanddeletions(InDels)were
calledsimultaneouslyonall20sequencedanimalsandonlyonregionstargeted
byprobesusingUnifiedGenotyperofGATK(DePristoetal2011)
All variantswere filteredusingVariantFiltration ofGATK to removeoneswith
lowmappingandorsequencingqualityThechoiceoffiltersandthresholdswas
made in accordance to ldquoGATK best practicerdquo for exome sequencing and
distributionoffiltervalues
bull ldquoSNPclusterrdquoexcludedvariantifmorethan3variantswerefoundin10
bp
bull ldquoBaseQualityRankSumTestrdquoexcludedifitrsquoslowerthen=5ormorethan
5
bull ldquoMappingQualityRankSumTestrdquoexcludedif itrsquoslowerthen=7ormore
than6
bull ldquoRead Position Rank Sum Testrdquo excluded if itrsquos lower then =4 or more
than4
bull ldquoFisherStrandrdquoexcludedifitrsquosmorethan30
bull ldquoDepthrdquoexcluded if itrsquos lower then60X(ieaminimumvalueof3Xper
animals)ormorethan3500X
bull ldquoQualityrdquoexcludedifitrsquoslowerthen40
bull ldquoMappingqualityrdquoexcludedifitrsquoslowerthen30
Filters were applied to all variants but only bi=allelic SNPs were used in the
following analyses Genotypes with ldquoGenotype Depthrdquo lower than 5 and
ldquoGenotype Qualityrdquo lower than 20 were discarded The remaining ones were
converted intoldquoABrdquocode(A=referencealleleandB=alternativeallele)Then
data were transformed in PLINK (Purcell et al 2007) format using a home=
made python script To prepare the working dataset the final quality control
106
removedvariantswithmore than25missingdata (that iswithmore than5
animals out of 20 without genotypes) monomorphic (minor allele frequency
MAFlt1)ornotmappinginautosomesBasicpopulationgeneticsparameters
werecalculatedtoevaluatethepresenceofcrypticstructureinthedatathatmay
influencetheoutcomeofthedownstreamanalyses
In total 24390 SNPs identified in the exomes were also included in the HD
SNPchip panel HD genotype from nineteen out to twenty of the sequenced
animals sequenced was used to compare technologies Genotypes at common
locationswerecomparedacrosstechnologies toevaluatesequencequalityand
theeffectofsequencedepthongenotypecalling
24BovineHDGenotypingBeadChipdataandqualitycheck
AllanimalsweregenotypedusingtheBovineHDGenotypingBeadChip(Illumina
San Diego CA) Genotypes were quality controlled and filtered with standard
threshold Markers with more than 5 missing data and MAF lt 1 were
excluded Animals with more than 5missing data were also excluded Only
autosomal SNPs were retained Thereafter genotypes were phased and the
remainingmissingdata imputedusingBeaglev332(BrowningandBrowning
2007)
Ofthe20animalssequenced12hadalsoBovineHDGenotypingBeadChipand7
BovineSNP50 Genotyping BeadChip version 2 genotypes available For
concordance and haplotype analysis we imputed the 50K data to 800K as
previouslydescribedabove
3Strategyfordetectingcandidatedeleteriousvariants
Wesearchedforcandidatedeleteriousvariantsfollowinganintegratedapproach
that joined the analysis of i) exomes of a few Italian Holstein bulls having
plusvariant and minusvariant EBVs for a target trait ii) HD genotyping of a
population of 1009 progeny tested Italian Holstein bulls iii) the comparative
genomic analysis of 100 vertebrate species The strategy followed for the
107
analysisisshowninFigure1Inthefirststep(Search)welookedforcandidate
deleterious variants in the exomes In step2 (Filter)we filtered candidatesby
assessingwhichof thesevariantseitherhadanon=randomdistributionamong
animals having extreme EBVs or co=mapped with genomic regions carrying
outliersignals inthelargepopulationofprogenytestedbulls Instep3(Asses)
candidateswere further evaluated by i) assessing their conservation across a
panel of 100 vertebrate species ii) searching for associated haplotypes and
evaluating their frequencies in the population of progeny tested bulls iii)
evaluatingtheirassociationwithgeneticdefectsinhumanandmodelspeciesiv)
examining their expression profile and function when known Methods are
belowdescribedindetail
Figure1Strategyfollowedtoidentifydeleteriousvariants
108
31Step1Searchforcandidatedeleteriousvariants
Quest for candidate deleterious variants in exome sequence data by
variantannotation
WeannotatedallvariantsusingANNOVAR(Wangetal2010)andVEP(Variant
Effect Predictor) (McClure et al 2014) software They use the Ensembl gene
annotation database (UMD31 reference sequence release 74) to investigate
locationandpotentialeffectoncodingsequencesofallvariantsVEPintegrates
the SIFT algorithm (Kumar et al 2009) to predict the effect of amino acid
substitutiononproteinfunctionForeachsequencechangesequencehomology
and physic=chemical similarity between the reference amino acid and the
alternative is considered providing a score that evaluates the mutation
tolerabilityVariantswithscorelowerthan005wereclassifiedasldquodeleteriousrdquo
inaccordingtosoftwarerecommendations
SIFTalsocomparedallvariants found in theexomeswith thosepresent in the
dbSNPdatabase(version140)toidentifythosethatwerenovel
32Step2Filteringofcandidatedeleteriousvariants
2a Analysis of deleterious variant distribution among animals having
extremesEBVvaluesformaleandfemalefertilitytraits
Association between variants and the classification in the tails of the EBV
distributionwas estimated byFisherexact testwhich compares frequencies of
alleles (a vs A) in the two tails not assumingH=W equilibriumgenotypic test
that compares the frequency of genotypes (aa vs Aa vs AA) and the recessive
geneactiontestwhichtestsforthedistributionofonehomozygousclassversus
theothers(aavsaAAA)AlltestsareimplementedinPLINKfunctionsThe5
significance threshold was set empirically by permutation (N=100000) either
point=wise that is not corrected for multiple testing (EMP1 = Empirical
Probability1)andthereafterfamily=wisecorrectedformultipletesting(EMP2=
EmpiricalProbability2)
109
GenesexceedingEMP1were considered suggestive and thoseexceedingEMP2
significant candidates to be deleterious recessives and their characteristics
conservationacrossspeciesandfunctionfurtherinvestigated
2b Quest for regions candidate to carry deleterious variants in the
populationanalysedwiththeHDBeadChip
Basic population genetics parameters and Multi Dimensional Scaling were
calculated with GenABEL R package (Aulchenko et al 2007) to evaluate the
presenceofcrypticstructure inthedatathatmayinfluencetheoutcomeofthe
downstreamanalyses
ExpectedandobservedallelicandgenotypicfrequenciesandFisherexacttestof
Hardy=Weinberg proportions (Janis E Wigginton 2005) were calculated with
PLINKon theHDSNPdatasetValueswere averaged in slidingwindowsof10
markersThisvalue(10markers)waschosenaftertestingdifferentwindowsize
as represent a good compromise between noise reduction and resolution (not
shown)
Candidate regions containing deleterious mutations were identified following
twodifferentapproaches
In one approach hereafter refereed to as ldquoOut of Hardy=Weinbergrdquo approach
(OHW)markerssignificantlydeviatingfromofHardy=WeinbergequilibriumatP
le84710=8 (nominalvaluePle005Bonferronicorrected formultiple testing)
werefirstidentifiedWindowswithatleastthreesignificantmarkersbelowthis
thresholdandhavingahomozygotedeficitwereconsideredcandidatestocarry
deleterious mutations In this OHW approach we applied very stringent
constraints to reduce false positive discoveries due to the Wahlund effect
inducedbypopulationsubstructure(seeresultsandFigureS5)
In thesecondapproachhereafterreferred toas ldquoLackOfHomozygotesrdquo (LOH)
approach the differences between observed and expected frequencies of
homozygotesfortheminorallelewerecomputedandaveragedacrossmarkers
included in each sliding window Values were standardized and only regions
carrying three overlapping windows below =3Z score value were considered
110
candidates to carry deleterious mutations Thereafter significant windows
carryingatleastonemarkerincommonwerejoinedtodefineregionboundaries
ExomesequencingknowledgeandHDBeadChipdatawerecrossed inorder to
identifygenescarryingseverevariantslocatedwithinsignificantregionsortheir
proximity (50Kb upstream or downstream) These genes were considered
candidates to be deleterious recessives and their characteristics conservation
acrossspeciesandfunctionwasfurtherinvestigated
33Step3Assessmentofcandidatedeleteriousvariants
Genes carrying variations classified as deleterious by SIFT or ANNOVAR and
eitherassuggestivebyfertilityEBVtailsanalysisorassignificantbyOHWeLOH
analyses of the Holstein bull population were submitted to comparative
genomics population genetics and functional analyses In order to prioritize
genesvariants discoveredwe scored each evaluation using a subjective scale
ranging from =3 to 3 to measuring the evidence in favour (scores 1 to 3) or
against(scores=1to=3)thedeleteriouseffectoftheSNPsThescore0inallcases
indicated the absence of evidence or that a specific assessment failed The
completescorescalethereforerangedfrom=9to9
3aComparativegenomics
For comparative genomics purposes the position of eachmutationwas ldquolifted
overrdquo from UMD31 bovine coordinates to hg19 human coordinates using the
specifictoolatUCSC(Kentetal2002=httpgenomeucscedu)Orthologous
proteins were aligned using the UCSC genome browser and amino acid
conservation assessed in a set of 100 vertebrate species that included human
primates (11 species) euarchontoglires (egmouse14 species) laurasiatheria
(egsheep25species)afrotheira(egelephant6species)othermammals(eg
platypus5species)birds(egchicken14species)sarcopterygii(egturtle8
species) fish (eg zebrafish 16 species) Results of amino acid conservation
analysiswererecordedonlyifhomologousproteinfromatleast50speciescould
beretrievedandalignedThereafteraminoacidconservationwasscored3ifthe
111
aminoacidwasconservedinallvertebrates2ifconservedinallmammalsand1
ifconservedinthelargemajority(atleast90)ofmammalsAscoreof=1was
given when the amino acid was not conserved among species and a score =3
when both variants observed in cattle were carried by other species Finally
wheneithertheliftoverprocessortheproteinalignmentfailedthecomparison
wasconsiderednoninformativeandgivenascore0
3bSearchofhaplotypesassociatedtodeleteriousvariantsandanalysisof
HDpopulationdata
To acquire further evidence we searched for the haplotypes that carry the
deleterious variations and estimated their frequency in the
homozygousheterozygousstatuswithinthebullpopulationgenotypedwiththe
HDBeadChip
Weusedamultistepapproachtofindhaplotypescarryingdeleteriousvariantsi)
from the phased HD BeadChip data we extracted the haplotypes of the 20
sequencedanimalsintheregionofthevariant(100markersoneachside)ii)in
these 20 animals we searched the shorter haplotype able to distinguish
homozygous andor heterozygous animals for the deleterious variant from all
other haplotypes iii) we assessed the frequency of homozygotes and
heterozygotes for thehaplotype in thewholepopulationofbulls characterised
withtheHDBeadChip
Scoresinthiscasewere=3inthecaseofuniquehaplotypesshowinganumberof
homozygotesnonsignificantlydifferentfromthenumberexpectedfromHardy=
Weinbergproportions3inthecaseofuniquehaplotypesshowinganumberof
homozygotes significantly lower than expected 1 in the case of unique
haplotypesneverhomozygousandrare0whenevernouniquehaplotypecould
befound
3cFunctionalanalyses
With PantherDB (Mi et al 2010) resources we classified our gene list into
functionalcategoriesaccordingtotheGeneOntologyclassificationAllsuggestive
112
andcandidateEnsemblGeneIDsubmittedtothebulkanalysiswererecognizedand annotated Beyond the GO term characterization in order to understandpotentialdetrimentaleffectsofmutationsandalsopinpointpotentialcandidatesfor fertility traits we retrieved information for each gene according to thefollowing parameters i) the involvement in known Mendelian or complexdiseases fromOMIMdatabase ii) theavailabilityofknock=outmice iii) and theeventual expression in reproductive tissues from the Gene Expression Atlas(httpwwwebiacukgxa)(Table4)
Alsoforthesedataweusedthesamescaleattributingarangefrom=3pointstogenes having no relation with fertility viability and genetic defects and norelevant phenotypic effects in knock=out mouse models to 3 to gene closelyrelated to fertility and viability associated to genetic defects in human andaffectingfertilityorsurvivalwhenknocked=outInparticularascoreranging=1to+1wasattributedtogenesassociatedtohumansyndromes(=1=noassociationrecordedinOMIMdatabase0=noinformationinOMIM1=associationinOMIM)A score with same range to information collected from knock=out mousephenotypes(=1=knock=outmouseshowsnoimpairedphenotype0=noknockoutmouse has been produced 1= knock=out mouse shows impaired phenotype)Thesamescorewasattributedtomolecularfunctions(=1=functionunrelatedtofertilityandviability0=functionindirectlyrelatedasimmunityandmetabolism1=functiondirectlyrelatedasfertilityanddevelopment)
Results
Exomecaptureandsequencing
A totalof225414probeswereused for capturing thecattleexome (5355Mbabout 2 of UMD31 reference genomes) Mitochondrial (MT) and Ychromosome were not included in probe design The number of probes perchromosomespannedfrom2500onBTA27toalmost12000inBTA3(TableS2)
113
The average exome sequencing depth was 4058 slightly lower than the
expected50XAllanimalshadatleast24Xaveragegenomessequencedbutina
same animal the sequencing depth varied across chromosomes and regions
withinchromosomesThelowestchromosomeaveragevalueobservedwas10X
inBTA25oftheshallowersequencedanimalAveragecoverageperanimaland
chromosomeandoverallmeansareshowninFiguresS1
In total1613461402 raw100bppair=ends sequence readswereproducedA
total of 1425514112 100bp reads were mapped and properly paired to the
reference genome Paired reads per animal ranged between 41973510 and
138840392
Insynthesissequencingdepthwasvariableasexpectedandalmostallanimals
hadsomeregionsofscarceornosequencedepthbutoverallallexomeregions
were sufficiently covered to permit a comprehensive analysis of molecular
variantsoftheanimalsinvestigated
Step1Searchforcandidatedeleteriousvariants
Variantdetectionandannotation
Thesequencingdepthofallvariantscalledonthealignmentofallreadsacross
animalsfollowedaChi=squaredistributionAveragedepthwas79694spanning
from2Xto5000Xwithpeakfrequencyclassat150=250X(FigureS2)
A multistep approach was followed to clean the exome sequence dataset
Cleaning process steps and the number onmarkers retained in each step are
describedinTable2
114
Table2Typeandnumberofmarkersdetectedandretainedafterdatasetcleaning
TypeofvariationNumberofvariations(N)
Rawdata
Aftervariantcleaning
Aftergenotypecleaning(onlyautosomal)
Complex 1213 963 0Deletion 6938 5031 0Insertion 5510 3656 0MultiallelicComplex 5710 3995 0MultiallelicSNP 803 512 0SNP 242988 203079 164277Total 263162 217236 164277whencomparedtothereferencegenome
Atotalof263162variantswereidentifiedAmongthese217236(8255)wereretainedafterapplyingthevariantfilters(seematerialsandmethodssection)
According to GATK classification the vast majority (9348) of variationsidentified were biallelic SNPs followed by insertions and deletions (InDels40) more complex variations and multiallelic SNPs (252) Indels andcomplexvariationsalthoughlessfrequentthanSNPsaffectedarelevantportionoftheexome(49589bp)Thenumberofvariantsdetectedperchromosomewassignificantly correlated to the number of probes used (r=08761P=228x10=10)andofbasescaptured(r=08424P=53x10=19)perchromosomehoweverBTA23BTA27 BTAX had an outlier behaviour the former two containing highervariation(450and445variantsperKb)andthelatterlower(084variantsperKb)thantheaverage(322variantsperKb)
Bialleic SNPswere further filtered for genotype quality (sequence depth gt 5Xandsequencequalitygt20)andBTAXmarkersThefinalautosomalSNPdatasetcounted164277SNPsAmongthese17829biallelicSNPs(109)describedforthefirsttimehavebeensubmittedtodbSNP
Variantsrsquo annotation is shown in Table 3a Variantswere detected in differentgenedomains(exonsintrons3rsquountranslatedregions)aswellasupstreamanddownstreamgenesMorethan31ofthevariationsweredetectedinexonsandamong them more than 41 have the potential to have an effect on genefunctionbyaffectingsplicingoralterproteinstructure(Table3b)
115
Table 3 A) number of autosomal SNPs annotated in each gene region B) number and
classificationofautosomalSNPsalteringgenefunction
A)
Generegion NofSNPs(autosomes)Downstream 1170Exonic 51293Intergenic 49901Intronic 58116ncRNA_exonic 160ncRNA_intronic 3ncRNA_splicing 2Splicing 184Upstream 1445Upstreamdownstream 54UTR3 1345UTR5 604Total 164277
B)
EffectofexonicSNPs NofSNPs(autosomes)Nonsynonymoussinglenucleotidevariation 20839Stopgainsinglenucleotidevariation 213Stoplosssinglenucleotidevariation 10Synonymoussinglenucleotidevariation 30226Unknown 5Total 51293
whencomparedtothereferencegenome
Amongthe51293exonicSNPs4166in2978geneswereclassifiedldquodeleteriousrdquo
by SIFT Among these 3210 in 2356 genes were observed at least twice
ANNOVARidentified223stopgain(213)orloss(10)variantsin217genes
Theaverageconcordanceat24390SNPs in commonbetweenexomesand the
HD Beadchip was 9716 (Table S3) Discordance was due to either alleles
detected by sequencing andnot detected by genotyping (exome=heterozygote
Beadchip=homozygote) and vice=versa (exome=homozygote
Beadchip=heterozygote) Discordances were randomly distributed among
markers and inversely correlated to sequencing depth (Figure S3) indicating
that most errors are likely occurring during sequencing even if occasional
116
Beadchip failure indetectingsomeallelecannotbeexcludedTwoanimalshad
anoutlierbehaviouronehaving846andthesecond719concordanceBoth
animalswereretainedforvariantdiscoverywhiletheywereexcludedfromthe
analysesofEBVtailsWithoutthesetwooutlieranimalsthemeanconcordance
increasesto9938
Multidimensional scaling confirmed the absence of a substructure in animals
sequenced(FigureS4)
Step2Filteringofcandidatedeleteriousvariants
AnalysisofEBVtails
Fisherexact test isgenerallyused to test for theassociationbetweenavariant
and a Mendelian disease Samples are divided in cases and controls and the
association between alleles or genotypes and the disease is tested under
different inheritance models Here we used the same method to test for
association between allelesgenotypes and animals from opposite tails of the
EBVdistributionfori)malefertility(N=4high+4low)ii)femalefertility(N=5
high+4low)andiii)maleandfemalefertility(N=9high+8low)
Genotypes were tested under all inheritance models implemented in PLINK
Giventhelownumberofsamplesnovariantwassignificantaftercorrectionfor
multiple comparisons (EMP2) and only suggestive variants could be found
(significant at 5 at EMP1)We considered suggestive of carrying deleterious
allelesalistof101genescarrying112variants(TableS4)
Detectionofregionscarryingdeleteriousmutationfrom800KSNPdata
After applying quality control parameters 26653 and 146991 markers were
removed from the original dataset for excessive missing data and MAF
respectivelyInaddition17individualsremovedforlowgenotypingrateAswith
exomes only autosomal SNPswere retained for the analyses thus obtaining a
workingdatasetof590680SNPsand992animals
117
MDS(FigureS5)analysis indicates that thedatasethassomesubstructure that
mightaffectdownstreamanalysesWeaccountedfortheriskoffindingmarkers
outofHardy=WeinbergproportionsduetotheWahlundeffectbychoosingvery
stringentsignificancethresholdintheOHWanalyses
To detect regions candidate to carry deleterious recessivemutations we used
two approaches the ldquoOutside Hardy Weinbergrdquo (OHW) and the ldquoLack Of
Homozygotesrdquo (LOH) approaches (see materials and methods) The two
approaches complement each other in the detection of regions carrying less
homozygotes than expected OHW identifies regions in which a few markers
have extreme homozygote deficit while LOH detects regions in which many
markershaveamoderatedeficit
In the SNP dataset 1884 markers were significantly out of Hardy=Weinberg
equilibrium(Ple84710=8)Atotalof485windowswereidentifiedwithatleast
3 markers out of 10 exceeding this threshold Among these 84 contained
markerssignificantbecauseofhomozygotedeficitThese84windowsidentified
11OHWcandidateregionsin9chromosomes(TableS5)
TheLOHapproachidentified2335windowsbelow=3Zscoreofthestandardised
differencebetweenobservedandexpected frequencyofhomozygotes forMAF
Theseidentified326LOHcandidateregionsspreadonallautosomes(TableS6)
Onlyregionsformedbyatleast3singlewindowswereretainedreducingLOH
candidateregionsto240
Deleterious variants (SIFT deleterious and stop codon gainedlost) were
intersectedwithsignificantOHWandLOHregionSevenofthesemappedwithin
OHW sliding windows 36 within LOH sliding windows Among these 5 were
located in regions significant in both OHW and LOH Other 51 deleterious
variants mapped in the immediate vicinity (+=50Kb) of significant windows
These 83 variants affect 66 unique genes candidate to carry deleterious
recessivealleles(TableS7)
Five regionswere common toOHWandLOHoneonBTA4 (LOH=74959610=
75151417 OHW=74970653=75151417) two on BTA7 (LOH=9626435=
10099512 OHW=9628735=10099512 LOH=10575691=11171132
118
OHW=10486424=11087441) one on BTA15 (LOH=80299924=80928311
OHW=80299924=80921274) and one on BTA18 (LOH=28122373=
28185525OHW=28106022=28164693)
Step3Assessingcandidatedeleteriousvariants
After the two filtering step previously described a total of 191 SNPs were
retained out of the 3433 classified as deleterious by SIFT and Annovar and
analysed further These are carried by 163 genes Eighteen genes carried 2
candidateSNPsthreegenes3SNPsandonegene5SNPsInaddition4SNPsin
four geneswere identified as potential candidates in both the analysis of EBV
tailsandofbullpopulation
A totalof12SNPs introducedstopcodonswhile the remaining inducedamino
acid substitutionsor splicing variants In the filtereddataset theproportionof
stop codons on total variation (63) was only slightly higher than the
proportion found in the entire dataset (51 225 stop codon out of 4391
deleteriousSNPs)
Comparativegenomics
Proteinmultiplealignmentwassuccessfulin146outof191comparisonsIn26
casesthemostfrequentaminoacidincattlewasveryhighlyconservedeitherin
allvertebrates(N=9)orinallmammals(N=17)Inother23casestheaminoacid
was conserved in at least 90 of mammalian species In the last class were
generally included amino acids that changed in mammalian species
phylogenetically verydistant from cattle (eg platypus and related species) In
97 instances the amino acid could change quite freely and occasionally both
variantsobservedincattlewerecarriedbyotherspecies
HaplotypeanalysisofHDpopulationdata
The search for haplotypes associated to the191 variants for further testing in
the population identified 101 unique and invariant haplotypes common to all
119
carriersofatargetvariantanddifferentfromallthosefoundinnoncarriersIn
additional36casesanoncompletelyinvarianthaplotypecouldbeassociatedto
themutationConverselyin54instancesareliablehaplotypecouldnotbefound
(table S8) In total 38 haplotypes had no homozygous individuals in the
population Thiswas expected in the case of 33 rather rare haplotypes (fewer
than 100 heterozygotes observed in the population) Conversely 5 haplotypes
were rather frequent (from 170 to almost 300 heterozygotes observed in the
population)andtheexpectationwastofind78to222homozygousindividuals
in thepopulation insteadof thenone foundAnothervarianthadaremarkable
high frequency (35) and a clear outlier behaviour Only 11 homozygous
individualswereobservedoutofthe121expected
Functionalanalysisofcandidategenes
To assess the potential deleterious effects of the variants identified we also
retrieved functional information for each gene according to the following
parametersi)associationtoknownMendelianorcomplexdiseaseslistedinthe
OMIM database ii) availability and effect of gene knock=out in mice iii)
expression profileswith particular attention in reproductive tissues from the
GeneExpressionAtlas(Table4)
Twenty genes out of 163 have a recognized phenotype in humans These are
mostlyseveresyndromesUnfortunatelyonlyafewgeneshaveanull=miceline
producedwithphenotypicdataavailable
According to the expression data available from Ensembl these genes are
generally (with someexceptions)expressed inavarietyof tissueswitha clear
tendencytobehighlyexpressedintestis
120
Tableamp4Informationon163genesanalyzedconcerningeventualphenotypesinmanorknockoutsinmouseHeaderampcodesAbrainBcolonCheartDkidney
EliverFlungGskeletalmuscleHspleenItestisColoursrepresentthegenelevelofexpressionindifferenttissuefromlow(whitegrey)tohigh(blue)NCin
FunctioncolumnmeansldquoNotClassifiedrdquo
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000000079
CCSAP
Cells
9
1
03
1
07
2
06
2
2
NC
ENSBTAG00000000149
USP40
Cells
6
7
2
7
5
7
2
8
9
fertility
ENSBTAG00000000414
FUT5
Nomatch
37
1
cellular
process
ENSBTAG00000000918
ERMARD
Periventricular
nodularheterotopia
6Xlinked
periventricular
heterotopia6q
terminaldeletion
syndrome
Periventricular
nodularheterotopia
Nomatch
5
4
4
9
6
7
4
9
17
cellular
process
ENSBTAG00000000998
SLC39A6
Cells
6
9
2
7
4
10
1
10
9
cellular
process
ENSBTAG00000001133
VWA8
Mice
2
3
6
7
10
3
3
4
4
blood
metabolism
ENSBTAG00000001140
IAH1
httpswwwmousephenotypeor
gdatagenesMGI1914982sect
ionassociations
BOTHSEXadiposetissue
behaviourneurological
growthsizebodyFEMALE
homeostasismetabolismMALE
limbsdigitstailskeleton
9
18
30
60
23
19
12
17
30
cellular
process
ENSBTAG00000001262
IRGQ
Cells
6
1
5
3
06
2
5
04
2
immunity
ENSBTAG00000001286
ELMO3
httpswwwmousephenotypeor
gdatagenesMGI2679007sect
ionassociations
Decreasedstartlereflex
02
27
14
7
7
02
1
fertility
ENSBTAG00000001291
OR8H3
Nomatch
fertility
ENSBTAG00000001500
FIGNL1
Cells
08
2
01
05
07
08
04
2
3
cellular
process
ENSBTAG00000001505
GRK4
Cells
10
7
09
4
6
3
2
6
218
signalling
ENSBTAG00000002220
SPTLC1
Hereditarysensory
andautonomic
neuropathytype1
No
17
27
3
28
12
28
5
19
6
metabolism
ENSBTAG00000002288
NT5DC4
Nomatch
02
cellular
process
ENSBTAG00000002289
NLRP14
No
04
cellular
process
ENSBTAG00000002660
CCDC11
Situsinversustotalis
Nomatch
3
1
06
2
07
3
03
2
54
fertility
121
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000002809
CAPN10
AutosomalRecessive
MentalRetardation
DIABETESMELLITUS
NONINSULIN
DEPEND
ENT1
Cells
2
13
41
33
29
immunity
ENSBTAG00000002966
DNAJC13
Youngadultonset
Parkinsonism
No
5
52
94
62
84
fertility
ENSBTAG00000003231
LIM2
Pulverulentcataract
Cells
01
02
01
02
cellular
process
ENSBTAG00000003508
CFAP69
httpswwwmousephenotypeor
gdatagenesMGI2443778sect
ionassociations
Abnormalcirculatinginsulinlevel
MaleInfertilityDecreasedmean
plateletvolum
eIncreasedleanbody
mass
203
07
1
01
7NC
ENSBTAG00000003523
UGT2B15
Nomatch
34
metabolism
ENSBTAG00000004081
FAT3
No
2
03
01
01
05
signalling
ENSBTAG00000004555
LRP2
DONN
AIBARROW
SYND
ROME
Intellectualdisability
Cells
56
3
01
disease
ENSBTAG00000004772
THEM
4
Cells
8
16
295
27
52
63
metabolism
ENSBTAG00000004860
SLC27A6
Cells
6
02
10
602
207
01
immunity
ENSBTAG00000005021
SEMA5A
Monosom
y5p
Cells
9
04
89
13
08
08
6immunity
ENSBTAG00000005170
GPR114
Mice
101
02
01
09
6
01
fertility
ENSBTAG00000005248
OFCC1
No
01
01
03
01
01
01
03
NC
ENSBTAG00000005340
SERGEF
Cells
7
10
35
15
58
23
cellular
process
ENSBTAG00000005357
VPS11
Mice
14
13
619
717
815
15
cellular
process
ENSBTAG00000005466
C8H8orf58
Nomatch
05
01
08
02
103
02
04
NC
ENSBTAG00000005514
TRIO
Cells
17
66
10
210
88
7immunity
ENSBTAG00000005633
RGNEF
httpswwwmousephenotypeor
gdatagenesMGI1346016sect
ionassociations
Vertebralfusion
36
214
05
801
08
1cellular
process
ENSBTAG00000006021
CEP250
Mice
3
22
43
41
88
metabolism
ENSBTAG00000006027
USP34
Mice
11
63
74
88
912
fertility
ENSBTAG00000006532
CDK5RAP2
Autosomalrecessive
primary
microcephaly
httpswwwmousephenotypeor
gdatagenesMGI2384875sect
ionassociations
skeletonimmunesystem
growthsizebodyhem
atopoietic
system
5
34
92
74
784
developm
ent
ENSBTAG00000007001
SLC26A1
No
03
01
02
32
903
01
07
metabolism
ENSBTAG00000007340
LOC101906349
Nomatch
NC
ENSBTAG00000007568
MEPE
Cells
cellular
process
ENSBTAG00000007910
EXOC7
Cells
13
24
23
22
12
22
15
22
28
cellular
process
122
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000008650
CAMK1D
Cells
24
2
01
06
1
2
03
7
3
signalling
ENSBTAG00000008797
Uncharacterized
Nomatch
09
04
07
04
NC
ENSBTAG00000008871
DDX4
Cells
02
01
01
132
cellular
process
ENSBTAG00000008996
SFI1
Mice
2
1
06
3
04
3
2
5
NC
ENSBTAG00000009014
UPK1B
Mice
2
2
2
5
4
4
2
7
3
fertility
ENSBTAG00000009030
NOX5
Nomatch
04
01
8
01
immunity
ENSBTAG00000009033
TEX37
httpswwwmousephenotypeor
gdatagenesMGI1921471sect
ionassociations
Abnormalstartlereflex
03
83
NC
ENSBTAG00000009064
CSF2RB
Congenitalpulmonary
alveolarproteinosis
Mice
01
2
04
06
2
23
03
17
05
signalling
ENSBTAG00000009144
BPIFA2A
Nomatch
6
06
05
metabolism
ENSBTAG00000009337
PNMAL1
Cells
8
05
01
3
02
03
03
04
3
Immunity
ENSBTAG00000009444
SLC15A5
Cells
01
cellular
process
ENSBTAG00000009568
KANK2
Cells
1
17
30
9
3
31
9
10
3
cellular
process
ENSBTAG00000009569
DOCK6
AdamsOliver
syndrome
Cells
2
5
10
12
5
15
5
6
21
metabolism
ENSBTAG00000009836
CHGA
Mice
140
555
01
2
02
01
07
1
metabolism
ENSBTAG00000010522
Uncharacterized
Nomatch
01
02
02
03
2
metabolism
ENSBTAG00000010601
DOLK
Familialisolated
dilated
cardiomyopathy
No
4
8
6
22
9
6
5
5
12
metabolism
ENSBTAG00000010847
FOXN4
Cells
01
01
metabolism
ENSBTAG00000010954
ART3
Cells
02
1
55
01
08
2
10
06
71
metabolism
ENSBTAG00000011011
SSH2
No
3
09
5
4
2
2
6
7
10
cellular
process
ENSBTAG00000011387
NAT9
Cells
6
7
3
8
6
8
4
6
6
metabolism
ENSBTAG00000011420
CA9
Cells
04
3
02
2
03
1
01
2
56
metabolism
ENSBTAG00000011433
RGP1
httpswwwmousephenotypeor
gdatagenesMGI1915956sect
ionassociations
Completepreweaninglethality
7
19
11
18
8
12
9
8
25
cellular
process
ENSBTAG00000011622
C18orf21
Mice
9
10
6
6
2
10
5
10
34
NC
ENSBTAG00000011683
NUMB
Cells
21
25
8
35
17
47
6
8
19
immunity
ENSBTAG00000011684
RAB11FIP1
Cells
6
01
8
09
11
02
1
4
NC
ENSBTAG00000012053
ACCSL
No
03
01
cellular
process
ENSBTAG00000012314
LDLR
Homozygousfamilial
hypercholesterolemia
Cells
8
33
11
21
17
24
2
15
2
cellular
process
123
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000012326
LOC101906218
Nomatch
11
01
01
09
01
NC
ENSBTAG00000012458
PSAPL1
Cells
01
NC
ENSBTAG00000012565
PCNX
No
3
32
32
33
48
developm
ent
ENSBTAG00000012833
ATG2B
No
4
21
21
21
29
NC
ENSBTAG00000012921
KIF24
Mice
05
07
03
08
02
202
214
metabolism
ENSBTAG00000013568
UEVLD
Cells
5
41
53
42
214
metabolism
ENSBTAG00000013685
KRT31
Mice
cellular
process
ENSBTAG00000013753
DDRGK1
Mice
22
32
18
56
55
25
14
22
37
NC
ENSBTAG00000013789
OR6K6
No
NC
ENSBTAG00000013838
ALLC
Mice
02
01
78
metabolism
ENSBTAG00000013916
HERC2
Mentalretardation
autosomalrecessive
38SKINHAIREYE
PIGM
ENTATION
VARIATIONIN1
Developm
entaldelay
withautismspectrum
disorderandgait
instability
No
8
74
83
74
65
metabolism
ENSBTAG00000014001
MAP1A
Cells
136
23
22
202
43
352
cellular
process
ENSBTAG00000014058
LDB2
No
28
33
12
825
410
3cellular
process
ENSBTAG00000014284
ALPK2
No
01
10
01
304
02
cellular
process
ENSBTAG00000014750
EPB41L4A
httpswwwmousephenotypeor
gdatagenesMGI103007secti
onassociations
Decreasedcirculatingglucoselevel
Abnormalconeelectrophysiology
Immunesystem
s1
22
20
07
12
13
43
NC
ENSBTAG00000014752
AKAP11
httpswwwmousephenotypeor
gdatagenesMGI2684060sect
ionassociations
Nopheno
24
72
63
306
68
cellular
process
ENSBTAG00000014768
ZNF786
Cells
3
23
42
48
33
cellular
process
ENSBTAG00000014794
SCYL3
Cells
6
10
59
613
516
26
cellular
process
ENSBTAG00000015027
Clusterin
httpswwwmousephenotypeor
gdatagenesMGI88423sectio
nassociations
Aggressionvertebralfusion
05
03
03
02
05
3NC
124
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000015210
LOXHD1
DEAFNESS
AUTOSOMAL
RECESSIVE77
Autosomalrecessive
nonsyndromic
sensorineuDominant
lateonsetFuchs
cornealdystrophy
deafnessautosomal
recessivetype77
No
3
03
NC
ENSBTAG00000015294
COX10
Leighsyndrome
Mitochondrial
complexIVdeficiency
(MTC4D)
Cells
4
6
24
5
2
3
14
3
16
cellular
process
ENSBTAG00000015335
BAI3
No
38
03
8
01
01
2
signalling
ENSBTAG00000015490
HS1BP3
Cells
1
10
5
9
4
8
2
1
37
cellular
process
ENSBTAG00000015602
C7H5orf45
Nomatch
10
cellular
process
ENSBTAG00000015839
MAP4
No
27
85
104
35
11
100
82
46
61
cellular
process
ENSBTAG00000015912
DMKN
No
01
04
03
NC
ENSBTAG00000015980
FASN
AutosomalRecessive
MentalRetardation
Cells
19
29
7
12
6
39
6
7
22
metabolism
ENSBTAG00000015985
GPR20
httpswwwmousephenotypeor
gdatagenesMGI2441803sect
ionassociations
Anatomicaldefectsvarioustissues
03
02
08
01
09
signalling
ENSBTAG00000016495
LINS
AutosomalRecessive
MentalRetardation
Cells
6
3
1
5
3
3
1
5
12
NC
ENSBTAG00000016691
LACC1
Mice
2
9
1
9
9
8
08
7
27
NC
ENSBTAG00000017096
ERICH1
Cells
6
4
2
4
2
5
2
5
13
metabolism
ENSBTAG00000017165
MATN2
No
2
7
2
20
08
3
2
2
7
cellular
process
ENSBTAG00000017418
RRP1B
Cells
6
4
4
5
3
7
3
12
159
cellular
process
ENSBTAG00000017436
ALS2CR11
Cells
01
37
disease
ENSBTAG00000017505
PAXIP1
Cells
2
3
2
4
2
4
2
3
12
signalling
ENSBTAG00000017576
GPR144
Nomatch
02
signalling
ENSBTAG00000018528
USP45
No
1
1
04
1
05
06
02
2
04
cellular
process
ENSBTAG00000018810
THBS2
INTERVERTEBRAL
DISCDISEASE
Cells
1
2
04
09
1
09
04
1
2
fertility
ENSBTAG00000019072
PSD4
Cells
7
8
1
1
7
02
6
4
cellular
process
125
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000019265
AGK
Sengerssyndrom
e
Congenitalcataract
hypertrophic
cardiomyopathy
mitochondrial
myopathy
Cells
7
36
43
52
65
metabolism
ENSBTAG00000019752
BPIFA2B
Nomatch
01
1metabolism
ENSBTAG00000019799
FCAM
R
Cells
07
01
02
01
2
7immunity
ENSBTAG00000019961
ATP4B
Cells
12
03
1
05
cellular
process
ENSBTAG00000020414
DHX37
No
3
33
73
73
413
cellular
process
ENSBTAG00000021073
KIAA1549
PILOCYTIC
ASTROCYTOM
ASOMATIC
httpswwwmousephenotypeor
gdatagenesMGI2669829sect
ionassociations
behaviourneurological
homeostasism
etabolism
hematopoieticsystem
3
04
23
01
21
04
3NC
ENSBTAG00000021233
OR5B17
No
fertility
ENSBTAG00000021375
OR10Q1
No
fertility
ENSBTAG00000021643
EFCAB12
Cells
01
1
9cellular
process
ENSBTAG00000021645
MBD4
Cells
8
43
82
51
10
6cellular
process
ENSBTAG00000022509
DNAH
9
No
02
5
05
5metabolism
ENSBTAG00000022858
LOC783561
Nomatch
fertility
ENSBTAG00000026247
PKHD1L1
Cells
02
01
cellular
process
ENSBTAG00000026323
LYSB
Nomatch
68
metabolism
ENSBTAG00000026350
SPATC1
Cells
05
06
01
01
01
39
NC
ENSBTAG00000027390
RTFDC1
Mice
28
19
21
20
922
13
20
49
NC
ENSBTAG00000031115
Uncharacterized
Nomatch
01
cellular
process
ENSBTAG00000031718
OGFR
Mice
6
17
612
517
39
1fertility
ENSBTAG00000032264
Uncharacterized
Nomatch
52
4
22
NC
ENSBTAG00000032481
DAPL1
No
1
6
07
01
13
10
cellular
process
126
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000032527
ERCC6
CEREBROOCULOFACI
OSKELETAL
SYND
ROME1
COCKAYNE
SYND
ROMEBDe
SanctisCacchione
syndromeMACULAR
DEGENERATION
AGERELATED5UV
SENSITIVE
SYND
ROME1COFS
syndromeCockayne
syndrometype1
Cockaynesyndrome
type2Cockayne
syndrometype3UV
sensitivesyndrome
Cockaynesyndrome
typeB(CSB)De
SanctisCacchione
syndrome(DSC)UV
sensitivesyndrome
(UVS)cerebrooculo
facioskeletal
syndrometype1
(COFS1)
Mice
3
205
22
207
42
cellular
process
ENSBTAG00000032821
SCEL
No
01
02
17
04
NC
ENSBTAG00000033954
PRPS1L1
No
35
metabolism
ENSBTAG00000034991
SLX4IP
Cells
6
21
32
207
33
NC
ENSBTAG00000035226
TOR1AIP1
Cells
9
12
513
816
716
83
NC
ENSBTAG00000035309
OR4F15
No
fertility
ENSBTAG00000035708
LOC527795
Nomatch
01
01
23
metabolism
ENSBTAG00000035985
LOC101909743
OR8K1
No
fertility
ENSBTAG00000036019
RIPPLY3
httpswwwmousephenotypeor
gdatagenesMGI2181192sect
ionassociations
hemopoieticsystem
01
04
01
9
01
malformation
ENSBTAG00000037399
Uncharacterized
Nomatch
2
92
4
24
05
5signalling
ENSBTAG00000038606
DBDR
No
02
01
signalling
ENSBTAG00000038831
PHYHD1
No
2
18
21
23
30
18
16
15
1metabolism
ENSBTAG00000039110
OR8U1
No
fertility
ENSBTAG00000039117
FAM179B
Cells
14
206
31
308
33
cellular
process
ENSBTAG00000039520
SIRPB1
Cells
4
08
32
516
05
12
01
signalling
127
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000040568
Uncharacterized
Nomatch
4
307
54
33
25
cellular
process
ENSBTAG00000045558
LOC100299084
Nomatch
1
22
110
10
09
52
fertility
ENSBTAG00000045588
LOC100298356
Nomatch
metabolism
ENSBTAG00000045709
LOC514057
Nomatch
fertility
ENSBTAG00000046175
Uncharacterized
Nomatch
01
01
cellular
process
ENSBTAG00000046228
Olfactory
receptor
Nomatch
fertility
ENSBTAG00000046239
C10orf12
No
03
104
07
108
06
209
cellular
process
ENSBTAG00000046285
OR8I2
No
fertility
ENSBTAG00000046360
Uncharacterized
Nomatch
NC
ENSBTAG00000046423
LOC618007
Nomatch
NC
ENSBTAG00000046527
LOC100139830
Nomatch
fertility
ENSBTAG00000046590
FAM181A
Cells
09
44
NC
ENSBTAG00000046593
Uncharacterized
Nomatch
20
metabolism
ENSBTAG00000046727
Olfactory
receptor
Nomatch
04
fertility
ENSBTAG00000047150
RTEL
Cells
4
23
64
55
616
cellular
process
ENSBTAG00000047166
SH2D7
Mice
03
101
01
1
02
02
2immunity
ENSBTAG00000047174
Uncharacterized
Nomatch
03
7105
02
01
271
05
03
NC
ENSBTAG00000047317
CL43
Nomatch
01
120
01
immunity
ENSBTAG00000047587
Uncharacterized
Nomatch
22
09
09
35
03
4
06
2cellular
process
ENSBTAG00000047661
FABP12
httpswwwmousephenotypeor
gdatagenesMGI1922747sect
ionassociations
Growthsizebody
01
02
69
metabolism
ENSBTAG00000048021
PNMAL2
No
04
03
2
03
03
02
NC
ENSBTAG00000048153
Uncharacterized
Nomatch
2
05
01
02
01
cellular
process
128
Discussion(
Deleterious recessive variants arenaturally present in outbred animal species
Theydonotcreateproblemsaslongastheyremainatlowfrequencyinrandom
mating populations Under these conditions these variants remain in the
heterozygousstateanddonotproducephenotypiceffectsHoweverassoonas
their frequency increases andmatingoccursbetween related individuals their
deleteriouseffectsbecomeapparentandhighlyaffectthewelfareandviabilityof
animalscarryingtheminthehomozygousstate
In industrialdairy cattlebreedscrossbreeding isgenerallyavoidedandbreeds
areinpracticegeneticisolatesinwhichartificialinsemination(AI)iswidelyused
for rapidly spreading thegeneticprogress in thepopulationAsa consequence
elitesireshavemanythousandandsometimeshundredsofthousanddaughters
Under these conditions mating between relatives becomes practically
unavoidable in the longterm In fact inspiteof theclosecontrolof inbreeding
andofthematingsystemgeneticdefectsregularlyappeartothesurfaceSome
examples are the Complex Vertebral Malformation (CVM) and the Bovine
Leucocyte Adhesion Deficiency (BLAD) in Holstein (Schuumltz et al 2008) and
WeaverdiseaseinBrown(McClureetal2013)Inthesecasestheeffectsofthe
recessivedeleteriousallelesareclearandmoleculardiagnostictoolshavebeen
developedtoavoidtheirfurtherspreadingandtoinitiatetheeradicationprocess
However recessive deleterious alleles may produce their effects in a more
deceitful way for example by affecting fertility inducing early lethality or
influence disease susceptibility All these effects are difficult to trackwith the
current methods of phenotype measurements but have an impact on the
economyofthelivestocksectorandonanimalwelfare
InthispaperwesoughtfordeleteriousrecessiveallelesinItalianHolsteincattle
This breed is recently experiencing a decrease in the genetic trend of fertility
traits (Pryce et al 2014) therefore we started our search from the exome
sequences of 20 sires 19 of which having extreme EBV values for male and
femalefertility
130
The choiceof sequencingexomeswasa compromisebetween completenessof
informationaccuracyofvariantandgenotypedetectionandcostExomesclearly
give incomplete information since they miss all variation in regulatory sites
outside gene boundaries (ENCODE Consortium 2012) These may be very
relevantforthecontroloftraitvariationbuttheyarestillverypoorlyannotated
in livestock On the other hand sequencing exomes yields millions instead of
billionsbasepairsadeepersequencingoftargetregionsandaconsequentmore
reliablevariantandgenotypecalling(FigureS1andS2)Theanalysisofexome
data is also less demanding in terms of computer time (thousands instead of
millions variants are to be analysed) and easier in terms of biological
interpretation Exome sequencing revealed to be very effective in the
identificationofanumberofmutationsaffectinghumanhealth(Bamshadetal
2011 Yang et al 2013) All thesewere defects caused bymutations in single
genes producing clear phenotypic effects havingMendelian inheritance These
conditionsdonotholdinourinvestigationthereforewehadtodevisestrategies
to integrate exome data with other information collected from the Holstein
populationandotherspecies
Theoverallprocedureforidentifythecandidatedeleteriousrecessivewasbased
on the rationale that deleterious recessive alleles in coding sequences are
expectedtobei)causedbyseverealterationoftheproteinsencodedbygenes
ii)mostlyifnotexclusivelycarriedattheheterozygousstateinthepopulation
iii)unevenlydistributedinanimalsofhighandlowfertilityiv)likelyconserved
indifferentspeciesbecauseofselectionconstraintv)likelyassociatedtogenetic
defectsinanyspecieswhenseverelydamaged
A stepwise procedure was set up for reducing candidates to a number
manageable for functional analysis (Figure1) In the first stepmutationswere
discoveredinthesecondfilteredandinthirdassessed
Thisprocedureidentifiesonlysomeofthedeleteriousrecessivesexistinginthe
populationanalysedSincewesequencedonlyexomesandnotwholegenomes
ouranalysesislimitedtocodingsequencesandatthemomenttoSNPvariation
Wethereforemissregulatoryvariationsoutsidegenesand theeffectsof indels
andmorecomplexvariations
131
ThenumberofanimalssequencedislimitedDeleteriousrecessivesareexpected
toberareinthepopulationandwehavesampledonly20animalsfromasubset
ofsiresauthorizedtoAI inItalyAlthoughwehavetakenamongthemthose in
theminus tail of fertility trait thatmighthave ahigher genetic load itrsquos likely
thatwehavemissedothervariantsexistinginthepopulationOntheotherhand
variantscarriedbyAIauthorizedbullsaretheonesthatwillhavealargerimpact
onthepopulationifwidelyspreadbyelitebullsandthereforetheyarethemost
interestingfromabreedingpointofview
Most deleterious variants are expected to be very rare and therefore not
detected by the approach here adopted or detected and afterwards discarded
duringthefilteringprocesses
Sequencing is prone to errors Some real variants may have been discarded
duringQCofsequencedataduetopoorsequencequalityorpoorcoverageofthe
region
Someofthevariationsclassifiedasldquonondeleteriousrdquobythesoftwareusedmay
indeedhave an effect In fact synonymousmutation arenon`neutralwhenever
thedifferenttRNAsforasameaminoacidhavedifferentrelativeabundanceand
translationrates(Goymer2007)
1(Searching(candidate(deleterious(variants(in(exomes(
ExomesproducedafairamountofdataAlltogether5355Mbweresequenced
These represent 9998 of the exome sequences targeted by probes Among
theseonly35hadasequencingdepthacrossanimalslt60XSequencingdepth
is a key factor for variant detection and even more so when the genotype of
sequencedanimalsistobeinferredInvestigatingtheconcordancebetweenthe
genotypes detected at more than 20K SNP by both Beadchip and exome
sequencingwe observed a clear inverse relationship between sequence depth
andnumberofdiscordances(TableS3andFigureS3)Theaveragesequencing
depth expected (50X) and realised in practice (4058) in this investigation
yieldedratheraccurategenotypessincetheaverageconcordanceobservedwas
over97and increased toover99whentwooutlieranimalswereremoved
132
from the dataset The reason for the outlier behaviour of these samples waslikely a suboptimal quality of DNA submitted to sequencing Overall almost215000 variants passed quality controls and among them 192440 autosomalSNPs have been investigated in detail As found in other investigations mostvariants were rare (Peloso et al 2014) For example the MAF distributionobservedin46904SNPsclassifiedneutralhasameanof0197andamedianof0167values(FigureS6)
ThefirststepinourresearchwastoidentifySNPscausingseverealterationsinthe proteins coded by genes These were relevant changes in conformationalterationsofsplicingsitesortruncationsofproteinsToaccomplishthistaskwereliedontwosoftwareSIFTandANNOVARthatidentified4902suchvariationsin3423genes Interestingly the subsetofdeleteriousvariantshas significantlylowermean (0156)andmedian (010) compared toneutral variations (FigureS6) This subset is therefore at least enriched in variants under purifyingselectionhavingadeleteriouseffectatthepopulationlevel
Thenumberofdeleteriousallelesdetectedishighanditisunlikelytheyallhavea relevant phenotypic effect even if most variants are rare In fact variantsdeleterious for a protein are not automatically deleterious for an organism asthe protein change induced by the variant although relevant may not affectproteinfunctionvariationmayoccurinonememberofagenefamilysothatitsfunction may be compensated by other family members the protein functionmay be compensated by other proteins or the affected metabolic pathwaycompensated by alternative pathways The challenge was therefore to filter asubsetofvariantslikelyhavingaphenotypiceffecttobeproposedforvalidationatalargerscaleToaccomplishthistaskwesoughtforadditionalinformationinexomesequencesandinalargepopulationofHolsteinsiresgenotypedwiththebovineHDBeadchip
( (
133
2(Filtering(deleterious(variants(
The second step consisted in filtering the several thousand deleterious SNPsidentified in exomes to search for SNPs and genes that worth deeperinvestigationWeusedtwostrategiestoaccomplishthistask
21$ Distribution$ of$ deleterious$ variants$ in$ fertility$ EBV$ plus9variant$ and$
minus9variant$animals$
Fertility traits are controlled by a high number of genes and are highlygenetically heterogeneous Even using a selected subset of variants identifiedfrom exome sequencing classic association analyses would need sample sizesmuch larger that the one available in this investigation to have a reasonablestatistical power (Stitziel et al 2011) The first approachwe appliedwas theanalysisof SNPallelic andgenotypicdistributionbetween theminusandplus`variant tails of male and female fertility EBVs Given the very low number ofanimals sequencedwedecided to use a simplemodel to exclude from furtheranalyses all markers showing no evidence of association with the EBV tailsratherthantofindsignificantassociations
We divided samples in cases (minus`variant animals for fertility EBV) andcontrols (plus`variant animals) and tested all possible models of inheritanceimplemented in thePlinkpackage for tackling simple genetic defectsWe thenused an empirical point`wise significant threshold based on permutations tohighlight SNPs having alleles or genotypes showing an uneven distributionbetweencasesandcontrolsandconsideredtheseasworthfurtherinvestigationAtotalof112SNPsin101genespassedthisfilteringprocess
$
22$Detection$of$regions$carrying$deleterious$mutation$from$800K$SNP$data
The analysis of exomes alone was therefore insufficient to tag recessivedeleteriousgenesinItalianHolsteinandwesoughtforadditionalinformationin1009ItalianHolsteinbullsgenotypedathighdensitythatarepartofthegenomicselection program run by ANAFI The rationale of the approach is that
134
deleterious recessives are expected to be absent or rare in the homozygous
status in the population Hence they may be tagged searching for genome
regions having markers or haplotypes lacking or having significantly less
homozygotesthanexpectedunderaneutralmodel
We used two complementary approaches to identify these candidate regions
One(OHW)searchedforverystrongsignalsproducedbysinglemarkersoutof
Hardy`Weinberg equilibrium at Ple847x10`8 A minimum of three significant
markersinaslidingwindowsof10wassettoreachsignificancetocapturemore
reliablesignalsavoidexcessivesinglemarkernoiseandaccount forapossible
Wahlund effect due to presence of the genetic structure in the population
difficulttoavoidinthesetofprogenytestedbullsanalysedThesecondapproach
(LOH) captured more moderate signals but from multiple markers in larger
regions Both approaches were designed to detect signals in regions in which
deleteriousvariantsarestillinsegregationandarenotexcessivelyrare
Asexpectedthetwoapproachesidentifiedregionsthatonlypartiallyoverlapped
(TablesS5S6eS7)Inparticular5regionsoutofthe11identifiedbyOHWwere
in common to the 240 identified by LOH Two of these regions overlap with
previous investigations that used the BovineSNP50 Beadchip Sahana et al
(2013) found a region candidate to carrypotential lethal haplotypes inNordic
Holstein on BTA7 in the region 6708007`10907840 This interval overlaps
withbothBTA7regionsdetectedinthisstudyVanRadenetal(2011)detected
inUSHolsteinalargeregiononBTA15spanning76Mb`82M(onBTA30)that
includedLRP4 gene responsible for themulefootdefectOuranalyseswith the
HDBeadchipindicatesontheonesidethatthelargeregiondetectedbySahana
etal(2013)mayconsistintwonearbyregionsOntheothersidethesignalwe
detectedonBTA15doesnot includetheLRP4 locusthat iscompletely fixed in
theItalianbullspopulationanalysed
In total83mutations in66genes fell in eitherorbothOHWandLOHregions
These were joined to the SNPs output of the EBV tail analyses for further
assessment
(
135
3(Assessing(filtered(deleterious(variants(
Different characteristics of the filtered deleterious variants were assessed to
identify among them strong candidates to submit to a large`scale validation
studyTheassessmentphasepermittedtogainadditionalinformationinfavour
or against SNP and gene candidature to be deleterious We quantified this
evidenceusingasubjectivescore
31$Comparative$genomics$analysis$
In the case of comparative genomics the score quantified the level of
conservationof thealternativeaminoacidresidues inducedbySNPvariants in
ItalianHolsteincattleComparisonwasrunagainstaset100vertebratespecies
that included 62 mammals The level of conservation is a good proxy for the
expected severity of a mutation In the case of the 46 SNPs that induced the
change of an amino acid highly conserved in the species investigatedwemay
hypothesize a strong selection pressure in favour of the maintenance or the
invariant amino acid at that position Therefore the amino acid change or the
stopcodonweobserved incattlemayaffectnotonlytheproteinstructureand
function but also the phenotype of homozygous animals Interestingly in all
thesecasestheSNPallelecodingforthenonconservedvariantswastheminor
allele
Converselyinallcasesofhighvariabilityofthetargetaminoacidacrossspecies
(asmanyassevendifferentaminoacidshavesometimesbeenobserved)poorly
supports the presence of phenotypic effects due to its change Sometimes the
informationcollectedisagainstthepresenceofadeleteriousmutationasinthe
case of the presence in different species of both alleles identified by exome
sequencingThis suggests thatneither of these is greatly affecting the viability
andfitnessofanimalscarryingthem
Wealsoobservedanumberoffailures(N=45)intheliftoverofSNPcoordinates
fromcattletohumanorinproteinalignmentThishappenedforexampleinthe
caseofuncharacterized cattleproteins thathavenoorthologous in thehuman
genome and of someolfactory receptors that belong to a highly variable gene
136
family Even if the failure of alignment is an indication of poor proteinconservationacrossspeciestheconservationofthetargetaminoacidcouldnotbeassessesandinthesecaseswepreferredtoconsidertheinformationmissingandtoscoreitaccordingly
32$Haplotype$analysis$
Asecondassessmentofthefilteredcandidateswasontheirbehaviourinthesirepopulation The ideal test would be the direct genotyping of the SNPs in thepopulationand theevaluationof theirallelicandgenotypicproportions In theabsenceofthisinformationwefirstinferredtheHDhaplotypeassociatedtoSNPallelesandassessedthebehaviourofthehaplotypeinthepopulationassumingthisasasurrogateoftheSNPbehaviourSincedeleteriousallelesareexpectedtoberareandthis isalsowhatweobserved fromcomparativegenomicanalysesweinvestigatethefrequencyonlyofthehaplotypecarryingtheminorSNPalleleagainsttheexpectationto findthecorrespondinghaplotypeneverhomozygousor a number of times much lower than expected from its frequency in thepopulation The strategy yielded some interesting confirmation but was onlypartiallysuccessful
Firstofalltheidentificationofauniquehaplotypecarryingtherarevariantwasnot always possible or straightforward In some cases carriers of a same rarevariantdidnotshareexactlythesamehaplotypeWeacceptedsomevariationifthe haplotype remained clearly distinguishable from those of animals notcarryingtherarevariantbutatthesametimereducedthestringencyofthetestInquiteanumberofcasesasamehaplotypecarriedbothcandidatedeleteriousSNP variants This may occur for example in the case of recent mutations IntheselastcasesthetestcouldnotbecarriedoutFinallymostrarevariantswereassociatedtorarehaplotypesthatarenotlikelytohavehomozygousindividualsinthepopulationindependentlyoftheSNPeffect
Thisassessmentwas thereforeonly significant fora fewhaplotypes thathadalarge number of heterozygous individuals and no or only a very fewhomozygotesandforthosethathadalargenumberofhomozygotes
137
33$Functional$analysis$
The final assessment was the analysis of the function of genes carrying the
candidate deleterious SNP This has some limitation since knowledge of gene
function is only partial and mainly gained from investigations on species
different from cattle In any case we considered the association to genetic
defectsinhumanandtheinductionofseverephenotypesinmouseknockoutas
a positive indication of the likely phenotypic effect of the severe protein
alterationweobservedincattleInadditiongenefunctionandexpressionprofile
was evaluated in search for evidence of gene involvement in fertility
developmentorprocessesfundamentalforcellandorganismviability
Amonggenesidentified22areassociatedtohumansyndromes(Table4)Eleven
syndromesareneurologicaldiseasesaffectingbraindevelopmentand inducing
mental retardation (eg Donnay`Barrow Syndrome and Autosomal recessive
primary microcephaly) the others influence development (eg
Cerebrooculofacioskeletal Syndrome 1 and Adams`Oliver syndrome) or organ
function(egCongenitalpulmonaryalveolarproteinosis)
Most of the same defects translated into Holstein would induce physical and
behavioural anomalies likely causingyoungbulls tobe excluded fromprogeny
testing Itshould in factbeconsideredthattheanimalsweanalysedareonthe
onesidethosethatwillmostinfluencethegeneticmakeoffuturegenerationbut
are not a random sample of Holstein animals These bulls have been progeny
testedandauthorisedtobeusedinartificialinseminationinItalyCandidatesto
progenytestingderivefromthematingofthebest1siresandbest2damsof
the Italian Holstein population At the age of 4 months they are taken to the
Italian Holstein breeder association (ANAFI) test station where they are
subjected to performance test sanitary controls and behavioural tests
Thereafterthequalityoftheirsemenischeckedbeforeauthorizationisgranted
toenterinprogenytestInthissampleofanimalsitisthereforeexpectedtofind
never at the homozygous state mutations causing lethality physical
malformation abnormal behaviour and semen infertility Also those having a
138
majoreffectonsemenqualityandhighsusceptibilitytodiseasesareexpectedto
berarelycarriedatthehomozygousstatus
Association with human syndromes is a strong evidence of the potential
phenotypiceffectofmutationsalteringthegenestructureHoweveritshouldbe
considered that variants found in humans are different fromvariants found in
cattle and that the severity of the phenotype induced by a deleterious gene is
often variable even within species Therefore we evaluated this information
togetherwiththeotherindirectevidencesofvariantstobedeleterious
WealsoenquiredthewebsiteoftheInternationalMousePhenotypeConsortium
(httpswwwmousephenotypeorg) to search for the effects of null alleles in
mice The few genes forwhichnull`mice line is available are involved in basic
functionsfortheorganismthatspanfromvision(IAH1andEPB41L4A)toaging
(RGP1)andtissuesdevelopment(FABP12RIPPLY3RGNEF)orfertility(CFAP96)
Aswithgenesassociatedtohumansyndromesmostoftheabnormalphenotypes
observedinknockmousewouldhamperyoungbullstoenterprogenytesting
Gene Ontology (GO) was evaluated on the entire gene set No enrichment on
particularcategoriesGOtermswasobservedThemostrepresentedcategories
in Biological Processes were as expected ldquometabolic processrdquo and ldquocellular
processrdquo however lower level terms ldquoreproductionrdquo and ldquodevelopmental
processrdquo are represented with meaningful hits The analysis of Molecular
Function indicates thatmany of the hits are enzymes (ldquocatalytic activityrdquo) and
haveregulatoryroles(ldquobindingrdquoandldquoenzymeregulatoractivityrdquo)Thisresultcan
be interpreted indifferentwaysOn theone sidedeleterious allelesmay exert
their deleterious effect in a number of different cellular and developmental
processes the correct process is one while there is many options to make it
wrongAlsothenumberofgenesinvestigatedisrelativelylowandformanyof
them there is little or no information available (eg 11 genes code for a
completelyuncharacterizedprotein)Inadditionsomeofthemmayturnoutnot
tobereallydeleteriousandaddnoisetotheGOanalysis
As a final step all the previous information (GO expression patterns known
function)havebeenused to estimate a scoremeasuring the importanceof the
gene function forcattle survivalandreproductionThescoringof functionwas
139
particularlydifficultGOcategoriesweregeneralandallfunctionstaggedhavearelevant importance for animal survival andproductionTherefore in this casewedidnrsquotusethewholescorerangeplanned(from`1to1)andgavenonegativescoresApositivescoreof1wasattributedonly togenesplayingan importantroleinfertilityanddevelopment
$
34$Strong$candidate$deleterious$variants$
Thescoringprocessyielded21variantswithatotalscore=035variantswithpositive and 135 with negative values (Figure 2 and Table 5) Although theprocesswasnoterrorfreeasitreliedonincompleteinformationandhadsomedegreeofsubjectivityparticularlyinthechoiceofrelativeweightsitidentifiedaset of variants that may be considered strong candidates to have deleteriouseffects in the Italian Holstein breed Variants were evaluated individually andthose in a same genes had sometimes different scores For example the fivevariantscarriedbyALPK2hadscoresrangingfrom`7to`2
140
Figure2Graphicalrepresentationoftotalscoreforeachvariants
minus505
Variation
Score
1_645923001_645985561_1381698111_1464490851_150972144
2_7478962_270362092_270455392_375933832_555609862_904645543_75418123_110403993_110404003_188945763_1137877703_1203356984_53767714_265405974_265406204_749514994_1033779064_1057246734_1130233264_1178417155_445557055_445557425_757447955_940308955_940309556_207762566_382803486_863518016_927092906_1075769256_1078851146_1090916426_1166055226_1186534167_13339757_56553577_97149147_167941797_168358027_168621947_197404097_197409057_263293538_602606098_603324848_658345648_704101558_760297748_770028908_872798628_1117618168_1128416959_83888819_83889249_237928389_510133929_1047396559_10515702910_171251510_1584300310_2802235210_2802311010_8290312110_8517371111_4630576711_4675390311_4740957711_5979141011_7832201011_8794387511_9548610911_9945389411_9946100112_1190618112_1247843312_1248084512_1413689712_5304908112_9076120913_381566413_1178793413_1178793813_5247168413_5393493013_5393494713_5393985313_5454042513_5504774013_5998820513_6304575013_6316961013_6539670814_197749414_364069514_3556894514_4688575014_5718402214_6863536315_18267915_18331715_18340415_248692
15_3018920915_3504878815_4630194115_7503314015_7503764515_8024448915_8025571015_8028875915_8038974515_8048559315_8055593115_8094968915_8275455615_8292533216_456201716_3831963616_6257435517_5309539517_5311265517_6609153117_7237753218_2562355018_3495651318_4637204118_5215480618_5408127718_5413095518_5781976819_2144410319_3107215219_3111397219_3279084319_4225178519_5139519419_5619117519_5726625120_763980120_2341043120_5888423620_6417173921_623911921_2034587821_2034628921_2683773321_3100359921_5527419521_5582646321_5817338421_5912071921_6048248421_6048251021_6284314222_5241595922_5242173422_5242204622_5242610422_5242666822_5695165722_5697484722_5779767423_4588505224_2132460424_2145336424_4676490924_5032719524_5815817824_5815862024_5815878424_5820400324_5820447925_3743318026_1811648726_2571599626_2576457027_503107827_3283255728_28422828_169040328_3571880728_4408176529_198588129_753052329_753066329_26397931
141
Table5SingleanalysesscoreandtotalforeachcandidatevariantsChrchrom
osom
ePosphysicalposition
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
164592300
ENSBTAG00000009014
UPK1B
02
I1
01
2
164598556
ENSBTAG00000009014
UPK1B
0I1
I1
00
I2
1138169811
ENSBTAG00000002966
DNAJC13
03
10
15
1146449085
ENSBTAG00000017418
RRP1B
I3
I1
I1
00
I5
1150972144
ENSBTAG00000036019
RIPPLY3
I3
I1
I1
I1
1I5
2747896
ENSBTAG00000013916
HERC2
30
10
04
227036209
ENSBTAG00000004555
LRP2
I3
I1
10
0I3
227045539
ENSBTAG00000004555
LRP2
I3
31
00
1
237593383
ENSBTAG00000032481
DAPL1
00
I1
00
I1
255560986
ENSBTAG00000032264
Uncharacterized
3I3
I1
00
I1
290464554
ENSBTAG00000017436
ALS2CR11
0I1
I1
00
I2
37541812
ENSBTAG00000046175
Uncharacterized
00
00
00
311040399
ENSBTAG00000013789
OR6K6
I3
I3
I1
00
I7
311040400
ENSBTAG00000013789
OR6K6
I3
I3
I1
00
I7
318894576
ENSBTAG00000004772
THEM
40
I3
I1
00
I4
3113787770
ENSBTAG00000000149
USP40
11
I1
01
2
3120335698
ENSBTAG00000002809
CAPN10
I3
21
00
0
45376771
ENSBTAG00000001500
FIGNL1
01
I1
00
0
426540597
ENSBTAG00000033954
PRPS1L1
1I3
I1
00
I3
426540620
ENSBTAG00000033954
PRPS1L1
1I1
I1
00
I1
474951499
ENSBTAG00000003508
CFAP69
I3
I3
I1
10
I6
4103377906
ENSBTAG00000021073
KIAA1549
0I1
11
01
4105724673
ENSBTAG00000019265
AGK
01
10
02
142
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
4113023326
ENSBTAG00000014768
ZNF786
I3
I3
I1
00
I7
4117841715
ENSBTAG00000017505
PAXIP1
0I1
I1
00
I2
544555705
ENSBTAG00000026323
LYSB
I3
3I1
00
I1
544555742
ENSBTAG00000026323
LYSB
I3
0I1
00
I4
575744795
ENSBTAG00000009064
CSF2RB
I3
I3
10
0I5
594030895
ENSBTAG00000009444
SLC15A5
I3
I1
I1
00
I5
594030955
ENSBTAG00000009444
SLC15A5
I3
I3
I1
00
I7
620776256
ENSBTAG00000008797
Uncharacterized
I3
I3
00
0I6
638280348
ENSBTAG00000007568
MEPE
0I3
I1
00
I4
686351801
ENSBTAG00000003523
UGT2B15
1I1
I1
00
I1
692709290
ENSBTAG00000010954
ART3
I3
2I1
00
I2
6107576925
ENSBTAG00000038606
DBDR
0I3
I1
00
I4
6107885114
ENSBTAG00000001505
GRK4
I3
1I1
00
I3
6109091642
ENSBTAG00000007001
SLC26A1
1I3
I1
00
I3
6116605522
ENSBTAG00000014058
LDB2
01
I1
00
0
6118653416
ENSBTAG00000012458
PSAPL1
I3
I1
I1
00
I5
71333975
ENSBTAG00000015602
C7H5orf45
I3
I3
I1
00
I7
75655357
ENSBTAG00000045588
LOC100298356
1I3
I1
00
I3
79714914
ENSBTAG00000046423
LOC618007
00
I1
00
I1
716794179
ENSBTAG00000012314
LDLR
02
10
03
716835802
ENSBTAG00000009568
KANK2
01
I1
00
0
716862194
ENSBTAG00000009569
DOCK6
01
10
02
719740409
ENSBTAG00000000414
FUT5
I3
I3
I1
00
I7
719740905
ENSBTAG00000000414
FUT5
I3
I3
I1
00
I7
143
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
726329353
ENSBTAG00000004860
SLC27A6
0I1
I1
00
I2
860260609
ENSBTAG00000011420
CA9
I3
I1
I1
00
I5
860332484
ENSBTAG00000011433
RGP1
I3
0I1
10
I3
865834564
ENSBTAG00000035708
LOC527795
00
I1
00
I1
870410155
ENSBTAG00000005466
C8H8orf58
I3
I3
I1
00
I7
876029774
ENSBTAG00000015027
Clusterin
00
I1
10
0
877002890
ENSBTAG00000012921
KIF24
12
I1
00
2
887279862
ENSBTAG00000002220
SPTLC1
00
10
01
8111761816
ENSBTAG00000006532
CDK5RAP2
I3
I3
11
1I3
8112841695
ENSBTAG00000013838
ALLC
01
I1
00
0
98388881
ENSBTAG00000015335
BAI3
00
I1
00
I1
98388924
ENSBTAG00000015335
BAI3
00
I1
00
I1
923792838
ENSBTAG00000046593
Uncharacterized
03
00
03
951013392
ENSBTAG00000018528
USP45
02
I1
00
1
9104739655
ENSBTAG00000018810
THBS2
03
10
15
9105157029
ENSBTAG00000000918
ERMARD
I3
21
00
0
10
1712515
ENSBTAG00000014750
EPB41L4A
0I1
I1
I1
0I3
10
15843003
ENSBTAG00000009030
NOX5
I3
I3
I1
00
I7
10
28022352
ENSBTAG00000035309
OR4F15
I3
1I1
00
I3
10
28023110
ENSBTAG00000035309
OR4F15
I3
I1
I1
00
I5
10
82903121
ENSBTAG00000012565
PCNX
I3
2I1
01
I1
10
85173711
ENSBTAG00000011683
NUM
BI3
0I1
00
I4
11
46305767
ENSBTAG00000002288
NT5DC4
0I3
I1
00
I4
11
46753903
ENSBTAG00000019072
PSD4
0I1
I1
00
I2
144
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
11
47409577
ENSBTAG00000009033
TEX37
I3
I2
I1
10
I5
11
59791410
ENSBTAG00000006027
USP34
02
I1
01
211
78322010
ENSBTAG00000015490
HS1BP3
00
I1
00
I1
11
87943875
ENSBTAG00000001140
IAH1
I3
3I1
10
011
95486109
ENSBTAG00000017576
GPR144
I3
I3
I1
00
I7
11
99453894
ENSBTAG00000038831
PHYHD1
11
I1
00
111
99461001
ENSBTAG00000010601
DOLK
12
10
04
12
11906181
ENSBTAG00000001133
VWA8
I3
1I1
00
I3
12
12478433
ENSBTAG00000014752
AKAP11
1I3
I1
I1
0I4
12
12480845
ENSBTAG00000014752
AKAP11
1I1
I1
I1
0I2
12
14136897
ENSBTAG00000016691
LACC1
I3
I3
I1
00
I7
12
53049081
ENSBTAG00000032821
SCEL
I3
I3
I1
00
I7
12
90761209
ENSBTAG00000019961
ATP4B
I3
I3
I1
00
I7
13
3815664
ENSBTAG00000034991
SLX4IP
00
I1
00
I1
13
11787934
ENSBTAG00000008650
CAMK
1D
I3
0I1
00
I4
13
11787938
ENSBTAG00000008650
CAMK
1D
I3
0I1
00
I4
13
52471684
ENSBTAG00000013753
DDRGK1
02
I1
00
113
53934930
ENSBTAG00000039520
SIRPB1
0I3
I1
00
I4
13
53934947
ENSBTAG00000039520
SIRPB1
00
I1
00
I1
13
53939853
ENSBTAG00000039520
SIRPB1
0I3
I1
00
I4
13
54540425
ENSBTAG00000047150
RTEL
I3
0I1
00
I4
13
55047740
ENSBTAG00000031718
OGFR
01
I1
00
013
59988205
ENSBTAG00000027390
RTFDC1
01
I1
00
013
63045750
ENSBTAG00000009144
BPIFA2A
0I3
I1
00
I4
145
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
13
63169610
ENSBTAG00000019752
BPIFA2B
0I1
I1
00
I2
13
65396708
ENSBTAG00000006021
CEP250
0I3
I1
00
I4
14
1977494
ENSBTAG00000026350
SPATC1
I3
I3
I1
00
I7
14
3640695
ENSBTAG00000015985
GPR20
00
I1
00
I1
14
35568945
ENSBTAG00000037399
Uncharacterized
30
I1
00
2
14
46885750
ENSBTAG00000047661
FABP12
0I1
I1
00
I2
14
57184022
ENSBTAG00000026247
PKHD1L1
I3
3I1
00
I1
14
68635363
ENSBTAG00000017165
MATN2
01
I1
00
0
15
182679
ENSBTAG00000046228
Olfactoryreceptor
00
I1
0o
I1
15
183317
ENSBTAG00000046228
Olfactoryreceptor
00
I1
00
I1
15
183404
ENSBTAG00000046228
Olfactoryreceptor
00
I1
00
I1
15
248692
ENSBTAG00000045558
LOC100299084
00
I1
0o
I1
15
30189209
ENSBTAG00000005357
VPS11
02
I1
00
1
15
35048788
ENSBTAG00000005340
SERGEF
0I3
I1
00
I4
15
46301941
ENSBTAG00000002289
NLRP14
I3
I3
I1
00
I7
15
75033140
ENSBTAG00000012053
ACCSL
1I3
I1
00
I3
15
75037645
ENSBTAG00000012053
ACCSL
I3
I3
I1
00
I7
15
80244489
ENSBTAG00000046527
LOC100139830
I3
0I1
00
I4
15
80255710
ENSBTAG00000046285
OR8I2
10
I1
00
0
15
80288759
ENSBTAG00000001291
OR8H3
1I1
I1
00
I1
15
80389745
ENSBTAG00000045709
LOC514057
13
I1
00
3
15
80485593
ENSBTAG00000035985
LOC101909743OR8K1
1I1
I1
00
I1
15
80555931
ENSBTAG00000022858
LOC783561
I3
0I1
00
I4
15
80949689
ENSBTAG00000039110
OR8U1
1I3
I1
00
I3
146
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
15
82754556
ENSBTAG00000021375
OR10Q1
01
I1
00
015
82925332
ENSBTAG00000021233
OR5B17
I3
0I1
00
I4
16
4562017
ENSBTAG00000019799
FCAM
RI3
I3
I1
00
I7
16
38319636
ENSBTAG00000014794
SCYL3
I3
0I1
00
I4
16
62574355
ENSBTAG00000035226
TOR1AIP1
00
I1
00
I1
17
53095395
ENSBTAG00000020414
DHX37
1I1
I1
00
I1
17
53112655
ENSBTAG00000020414
DHX37
0I3
I1
00
I4
17
66091531
ENSBTAG00000010847
FOXN4
0I1
I1
00
I2
17
72377532
ENSBTAG00000008996
SFI1
0I3
I1
00
I4
18
25623550
ENSBTAG00000005170
GPR114
0I3
I1
01
I3
18
34956513
ENSBTAG00000001286
ELMO
33
3I1
11
718
46372041
ENSBTAG00000015912
DMKN
0
1I1
00
018
52154806
ENSBTAG00000001262
IRGQ
I3
1I1
00
I3
18
54081277
ENSBTAG00000009337
PNMA
L1
0I3
I1
00
I4
18
54130955
ENSBTAG00000048021
PNMA
L2
I3
1I1
00
I3
18
57819768
ENSBTAG00000003231
LIM2
1
I3
10
0I1
19
21444103
ENSBTAG00000011011
SSH2
I3
I3
I1
00
I7
19
31072152
ENSBTAG00000022509
DNAH
9I3
0I1
00
I4
19
31113972
ENSBTAG00000022509
DNAH
90
0I1
00
I1
19
32790843
ENSBTAG00000015294
COX10
12
10
04
19
42251785
ENSBTAG00000013685
KRT31
I3
0I1
00
I4
19
51395194
ENSBTAG00000015980
FASN
1I1
10
01
19
56191175
ENSBTAG00000007910
EXOC7
3I3
I1
00
I1
19
57266251
ENSBTAG00000011387
NAT9
12
I1
00
2
147
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
20
7639801
ENSBTAG00000005633
RGNEF
1I1
I1
10
0
20
23410431
ENSBTAG00000008871
DDX4
1I3
I1
00
I3
20
58884236
ENSBTAG00000005514
TRIO
I3
0I1
00
I4
20
64171739
ENSBTAG00000005021
SEMA5A
01
10
02
21
6239119
ENSBTAG00000016495
LINS
02
10
03
21
20345878
ENSBTAG00000012326
LOC101906218
00
I1
00
I1
21
20346289
ENSBTAG00000012326
LOC101906218
I3
0I1
00
I4
21
26837733
ENSBTAG00000047587
Uncharacterized
10
00
01
21
31003599
ENSBTAG00000047166
SH2D7
0I3
I1
00
I4
21
55274195
ENSBTAG00000039117
FAM179B
0I3
I1
00
I4
21
55826463
ENSBTAG00000014001
MAP1A
0I3
I1
00
I4
21
58173384
ENSBTAG00000009836
CHGA
I3
I1
I1
00
I5
21
59120719
ENSBTAG00000046590
FAM181A
01
I1
00
0
21
60482484
ENSBTAG00000017096
ERICH1
0I3
I1
00
I4
21
60482510
ENSBTAG00000017096
ERICH1
01
I1
00
0
21
62843142
ENSBTAG00000012833
ATG2B
0I3
I1
00
I4
22
52415959
ENSBTAG00000015839
MAP4
00
I1
00
I1
22
52421734
ENSBTAG00000015839
MAP4
1I1
I1
00
I1
22
52422046
ENSBTAG00000015839
MAP4
I3
I3
I1
00
I7
22
52426104
ENSBTAG00000047174
Uncharacterized
I3
00
00
I3
22
52426668
ENSBTAG00000047174
Uncharacterized
10
00
01
22
56951657
ENSBTAG00000021645
MBD4
1I1
I1
00
I1
22
56974847
ENSBTAG00000021643
EFCAB12
0I1
I1
00
I2
22
57797674
ENSBTAG00000031115
Uncharacterized
01
00
01
148
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
23
45885052
ENSBTAG00000005248
OFCC1
00
I1
00
I1
24
21324604
ENSBTAG00000000998
SLC39A6
02
I1
00
1
24
21453364
ENSBTAG00000011622
C18orf21
I3
I3
I1
00
I7
24
46764909
ENSBTAG00000015210
LOXHD1
0I3
10
0I2
24
50327195
ENSBTAG00000002660
CCDC11
I3
11
01
0
24
58158178
ENSBTAG00000014284
ALPK2
I3
I1
I1
00
I5
24
58158620
ENSBTAG00000014284
ALPK2
0I1
I1
00
I2
24
58158784
ENSBTAG00000014284
ALPK2
I3
I3
I1
00
I7
24
58204003
ENSBTAG00000014284
ALPK2
I3
I3
I1
00
I7
24
58204479
ENSBTAG00000014284
ALPK2
I3
2I1
00
I2
25
37433180
ENSBTAG00000040568
Uncharacterized
00
I1
00
I1
26
18116487
ENSBTAG00000046239
C10orf12
0I1
I1
00
I2
26
25715996
ENSBTAG00000010522
Uncharacterized
10
00
01
26
25764570
ENSBTAG00000046727
Olfactoryreceptor
00
I1
00
I1
27
5031078
ENSBTAG00000007340
LOC101906349
0I1
I1
00
I2
27
32832557
ENSBTAG00000011684
RAB11FIP1
10
I1
00
0
28
284228
ENSBTAG00000000079
CCSAP
0I1
I1
00
I2
28
1690403
ENSBTAG00000048153
Uncharacterized
10
00
01
28
35718807
ENSBTAG00000047317
CL43
1I1
I1
00
I1
28
44081765
ENSBTAG00000032527
ERCC6
I3
01
00
I2
29
1985881
ENSBTAG00000004081
FAT3
30
I1
00
2
29
7530523
ENSBTAG00000046360
Uncharacterized
01
00
01
29
7530663
ENSBTAG00000046360
Uncharacterized
1I1
00
00
29
26397931
ENSBTAG00000013568
UEVLD
I3
I3
I1
00
I7
149
Interestingly13ofthe22genesassociatedtohumansyndromescarryvariants
appearing in the short list of the 35 positives and only 6 in that of 135 with
negative values This in spite of the low weight (from 1 to 1) given to this
parameterEightofthe35positivevariantsareinproteinsstilluncharacterized
Sixvariantshadascoreof4orhigherThehighestscore(totalscore=7)wasfor
ELMO3 (Engulfment and cell motility gene 3) This gene is involved in cell
motility and migration ELMO3 is also a homolog of C( elegans Ced12 that is
required for many developmental processes included gonadal morphogenesis
Mutant for this gene in C( elegans suffer from abnormal development usually
with lethal consequences or sublethal developmental defects including small
embryosdelayeddevelopmentandreducedfertility(Gumiennyetal2001)In
human it is expressed in testis prostate and ovary
(httpwwwgenecardsorgcgibincarddispplgene=ELMO3)
ThegeneDNAJC13(DnaJ(Hsp40)homologsubfamilyCmember13)hadatotal
score of 5 This gene is involved in the onset of Parkinson disease DNAJC13
regulates the dynamics of clathrin coats on early endosomes Cellular analysis
shows that the mutation Arg855Ser identified by VilarintildeoGuumlell et al (2014)
confers a gainoffunction that impairs endosomal transportApossible roleof
DNAJC13geneinTourettesyndromechronicticdisorderisunderinvestigation
(Sundarametal2011)
A second genehad a scoreof 5THBS2 (Thrombospondin2) It belongs to the
thrombospondin family and is a powerful inhibitor of tumour growth and
angiogenesis It is highly expressed in testis in fetal Leydig cell Hirose et al
(2008) found a significant association between an intronic SNP in the THBS2
gene and lumbar disc herniation Thrombospondin2 is also a matricellular
proteinfoundinhumanserumDeletionofTSP2causesagedependentdilated
cardiomyopathy (Hanatani et al 2014) Mutant mice for this gene display a
variety of mutant phenotype Kyriakides et al 1998 suggest that THBS2
modulates the cell surface properties of mesenchymal cells affecting cell
functionssuchasadhesionandmigration
150
Three genes had a score of 4 all three are associated to human syndromes
Mutation in HERC2 (HECT and RLD domain containing E3 ubiquitin protein
ligase2)isinvolvedinautosomalrecessivementalretardation38(Puffenberger
etal2012)Jietal(2000)identifiedinmutantmouseneuromuscularsecretory
vesicledefectsandspermacrosomedefectsMoreovergeneticvariationsinthis
gene are associated with skinhaireye pigmentation variability Mutations in
DOLK gene (dolichol kinase) are associated with dolichol kinase deficiency
(Lefeberetal2011)COX10(cytochromecoxidaseassemblyhomolog10)gene
is associated to mitochondrial complex IV deficiency causing Leigh syndrome
(Antonickaetal2003)
Conclusions)
Exomesequencingthereforeprovidedvaluableinformationoncodingsequence
variants and on their potential effect on protein function As a group variants
classifiedasdeleteriouswererarerthanvariantsclassifiedasneutralSincethis
isanexpectedeffectofpurifyingselectionthissetofvariantwaslikelyenriched
forvariantshavingaphenotypiceffect
Theinclusionintheanalysesofplusandminusvariantanimalsforfertilitytraits
likelypermittedtosequencethemostimportantvariantsinfluencingthesetraits
butthelownumberofanimalssequencedcombinedwiththecomplexityofthe
traitpermittedtofindonlyasuggestiveassociationbetweenSNPsandtraitsItrsquos
likely that amuch larger number of individuals is to be sequenced if genome
wideassociationisthestrategyoneswanttopursue
We explored an alternative strategy to confirm suggestivedeleterious variants
based on indirect evidence of the variant effect by assessing i) evolutionary
constraintsassociatedtotheaminoacidresiduecodedbythevariantii)variant
behaviour in a large population In the absence of direct genotyping of the
candidatevariantsweusedhaplotypeinformationassurrogateiii)functionand
effectsinotherspecies
151
We attempted to quantify these criteria through a score subjective and only
indicativebutpermitted tohighlighted35genes so farneglected in cattlebut
deservingfurtherattentionandpotentiallyhavingdeleteriouseffectsinHolstein
andothercattlebreedsThesemaybepartoftherecessivedeleterioussetthat
constitutethegeneticloadofItalianHolstein
References)AntonickaHLearySCGuercinGHAgarJNHorvathRKennawayNG
HardingCOJakschMShoubridgeEA2003MutationsinCOX10
resultinadefectinmitochondrialhemeAbiosynthesisandaccountfor
multipleearlyonsetclinicalphenotypesassociatedwithisolatedCOX
deficiencyHumMolGenet122693ndash2702doi101093hmgddg284
AshwellMSHeyenDWSonstegardTSVanTassellCPDaYVanRaden
PMRonMWellerJILewinHA2004DetectionofQuantitativeTrait
LociAffectingMilkProductionHealthandReproductiveTraitsin
HolsteinCattleJournalofDairyScience87468ndash475
doi103168jdsS00220302(04)731860
AstonKICarrellDT2009GenomeWideStudyofSingleNucleotide
PolymorphismsAssociatedWithAzoospermiaandSevere
OligozoospermiaJournalofAndrology30711ndash725
doi102164jandrol109007971
AulchenkoYSRipkeSIsaacsADuijnCMvan2007GenABELanRlibrary
forgenomewideassociationanalysisBioinformatics231294ndash1296
doi101093bioinformaticsbtm108
BamshadMJNgSBBighamAWTaborHKEmondMJNickersonDA
ShendureJ2011ExomesequencingasatoolforMendeliandisease
genediscoveryNatRevGenet12745ndash755doi101038nrg3031
BieseckerLGShiannaKVMullikinJC2011Exomesequencingtheexpert
viewGenomeBiology12128doi101186gb2011129128
152
BiffaniSMarusiMBiscariniFCanavesiF2005Developingagenetic
evaluationforfertilityusingangularityandmilkyieldascorrelatedtraits
InterbullBulletin063
BiffaniSSamoreacuteABCanavesiF2002Inbreedingdepressionforproduction
reproductionandfunctionaltraitsinItalianHolsteincattleInstitut
NationaldelaRechercheAgronomique(INRA)pp0ndash4
BolgerAMLohseMUsadelB2014TrimmomaticAflexibletrimmerfor
IlluminaSequenceDataBioinformaticsbtu170
doi101093bioinformaticsbtu170
BrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingand
MissingDataInferenceforWholeGenomeAssociationStudiesByUseof
LocalizedHaplotypeClusteringAmJHumGenet811084ndash1097
CharlesworthDWillisJH2009ThegeneticsofinbreedingdepressionNat
RevGenet10783ndash796doi101038nrg2664
ChunSFayJC2009Identificationofdeleteriousmutationswithinthree
humangenomesGenomeRes191553ndash1561
doi101101gr092619109
ConsortiumTEP2012AnintegratedencyclopediaofDNAelementsinthe
humangenomeNature48957ndash74doi101038nature11247
DePristoMABanksEPoplinRGarimellaKVMaguireJRHartlC
PhilippakisAAdelAngelGRivasMAHannaMMcKennaA
FennellTJKernytskyAMSivachenkoAYCibulskisKGabrielSB
AltshulerDDalyMJ2011Aframeworkforvariationdiscoveryand
genotypingusingnextgenerationDNAsequencingdataNatGenet43
491ndash498doi101038ng806
GoymerP2007SynonymousmutationsbreaktheirsilenceNatRevGenet8
92ndash92doi101038nrg2056
GumiennyTLBrugneraEToselloTrampontACKinchenJMHaneyLB
NishiwakiKWalkSFNemergutMEMacaraIGFrancisRSchedl
TQinYVanAelstLHengartnerMORavichandranKS2001CED
12ELMOaNovelMemberoftheCrkIIDock180RacPathwayIs
RequiredforPhagocytosisandCellMigrationCell10727ndash41
doi101016S00928674(01)005207
153
HanataniSIzumiyaYTakashioSKimuraYArakiSRokutandaTTsujita
KYamamotoETanakaTYamamuroMKojimaSTayamaS
KaikitaKHokimotoSOgawaH2014Circulatingthrombospondin2
reflectsdiseaseseverityandpredictsoutcomeofheartfailurewith
reducedejectionfractionCircJ78903ndash910
HiroseYChibaKKarasugiTNakajimaMKawaguchiYMikamiY
FuruichiTMioFMiyakeAMiyamotoTOzakiKTakahashiA
MizutaHKuboTKimuraTTanakaTToyamaYIkegawaS2008
AfunctionalpolymorphisminTHBS2thataffectsalternativesplicingand
MMPbindingisassociatedwithlumbardischerniationAmJHum
Genet821122ndash1129doi101016jajhg200803013
httpsgenomeucsceduindexhtml[WWWDocument]nd
JamrozikJFatehiJKistemakerGJSchaefferLR2005EstimatesofGenetic
ParametersforCanadianHolsteinFemaleReproductionTraitsJournalof
DairyScience882199ndash2208doi103168jdsS00220302(05)728952
JanisEWiggintonDJC2005AnoteonexacttestsofHardyWeinberg
equilibriumAmericanjournalofhumangenetics76887ndash93
doi101086429864
JiYRebertNAJoslinJMHigginsMJSchultzRANichollsRD2000
StructureoftheHighlyConservedHERC2GeneandofMultiplePartially
DuplicatedParalogsinHumanGenomeRes10319ndash329
KentWJSugnetCWFureyTSRoskinKMPringleTHZahlerAM
HausslerandD2002TheHumanGenomeBrowseratUCSCGenome
Res12996ndash1006doi101101gr229102
KumarPHenikoffSNgPC2009Predictingtheeffectsofcodingnon
synonymousvariantsonproteinfunctionusingtheSIFTalgorithmNat
Protoc41073ndash1081doi101038nprot200986
KyriakidesTRZhuYHSmithLTBainSDYangZLinMTDanielson
KGIozzoRVLaMarcaMMcKinneyCEGinnsEIBornsteinP
1998Micethatlackthrombospondin2displayconnectivetissue
abnormalitiesthatareassociatedwithdisorderedcollagenfibrillogenesis
anincreasedvasculardensityandableedingdiathesisJCellBiol140
419ndash430
154
LaneEACroweMABeltmanMEMoreSJ2013Theinfluenceofcowand
managementfactorsonreproductiveperformanceofIrishseasonal
calvingdairycowsAnimReprodSci14134ndash41
doi101016janireprosci201306019
LefeberDJdeBrouwerAPMMoravaERiemersmaMSchuurs
HoeijmakersJHMAbsmannerBVerrijpKvandenAkkerWMR
HuijbenKSteenbergenGvanReeuwijkJJozwiakAZuckerN
LorberALammensMKnopfCvanBokhovenHGruumlnewaldSLehle
LKapustaLMandelHWeversRA2011AutosomalRecessive
DilatedCardiomyopathyduetoDOLKMutationsResultsfromAbnormal
DystroglycanOMannosylationPLoSGenet7e1002427
doi101371journalpgen1002427
LiHDurbinR2009FastandaccurateshortreadalignmentwithBurrows
WheelertransformBioinformatics251754ndash1760
doi101093bioinformaticsbtp324
LiHHandsakerBWysokerAFennellTRuanJHomerNMarthG
AbecasisGDurbinR1000GenomeProjectDataProcessingSubgroup
2009TheSequenceAlignmentMapformatandSAMtoolsBioinformatics
252078ndash2079doi101093bioinformaticsbtp352
LiMXKwanJSHBaoSYYangWHoSLSongYQShamPC2013
PredictingMendelianDiseaseCausingNonSynonymousSingle
NucleotideVariantsinExomeSequencingStudiesPLoSGenet9
e1003143doi101371journalpgen1003143
MacArthurDGBalasubramanianSFrankishAHuangNMorrisJWalter
KJostinsLHabeggerLPickrellJKMontgomerySBAlbersCA
ZhangZConradDFLunterGZhengHAyubQDePristoMA
BanksEHuMHandsakerRERosenfeldJFromerMJinMMuXJ
KhuranaEYeKKayMSaundersGISunerMMHuntTBarnes
IHAAmidCCarvalhoSilvaDRBignellAHSnowCYngvadottirB
BumpsteadSCooperDNXueYRomeroIGWangJLiYGibbs
RAMcCarrollSADermitzakisETPritchardJKBarrettJCHarrow
JHurlesMEGersteinMBTylerSmithC2012Asystematicsurvey
155
oflossoffunctionvariantsinhumanproteincodinggenesScience335
823ndash828doi101126science1215040
MajewskiJSchwartzentruberJLalondeEMontpetitAJabadoN2011
WhatcanexomesequencingdoforyouJMedGenet48580ndash589
doi101136jmedgenet2011100223
McClureMCBickhartDNullDVanRadenPXuLWiggansGLiuG
SchroederSGlasscockJArmstrongJColeJBVanTassellCP
SonstegardTS2014BovineExomeSequenceAnalysisandTargeted
SNPGenotypingofRecessiveFertilityDefectsBH1HH2andHH3Reveal
aPutativeCausativeMutationinSMC2forHH3PLoSONE9e92769
doi101371journalpone0092769
McClureMKimEBickhartDNullDCooperTColeJWiggansGAjmone
MarsanPColliLSantusELiuGESchroederSMatukumalliLVan
TassellCSonstegardT2013FineMappingforWeaverSyndromein
BrownSwissCattleandtheIdentificationof41ConcordantMutations
acrossNRCAMPNPLA8andCTTNBP2PLoSONE8e59251
doi101371journalpone0059251
McKennaAHannaMBanksESivachenkoACibulskisKKernytskyA
GarimellaKAltshulerDGabrielSDalyMDePristoMA2010The
GenomeAnalysisToolkitAMapReduceframeworkforanalyzingnext
generationDNAsequencingdataGenomeRes201297ndash1303
doi101101gr107524110
MiHDongQMuruganujanAGaudetPLewisSThomasPD2010
PANTHERversion7improvedphylogenetictreesorthologsand
collaborationwiththeGeneOntologyConsortiumNuclAcidsRes38
D204ndashD210doi101093nargkp1019
MrodeRKearneyJFBiffaniSCoffeyMCanavesiF2009Short
communicationGeneticrelationshipsbetweentheHolsteincow
populationsofthreeEuropeandairycountriesJournalofDairyScience
925760ndash5764doi103168jds20081931
PelosoGMAuerPLBisJCVoormanAMorrisonACStitzielNOBrody
JAKhetarpalSACrosbyJRFornageMIsaacsAJakobsdottirJ
FeitosaMFDaviesGHuffmanJEManichaikulADavisBLohman
156
KJoonAYSmithAVGroveMLZanoniPRedonVDemissieS
LawsonKPetersUCarlsonCJacksonRDRyckmanKKMackey
RHRobinsonJGSiscovickDSSchreinerPJMychaleckyjJC
PankowJSHofmanAUitterlindenAGHarrisTBTaylorKD
StaffordJMReynoldsLMMarioniREDehghanAFrancoOHPatel
APLuYHindyGGottesmanOBottingerEPMelanderOOrho
MelanderMLoosRJFDugaSMerliniPAFarrallMGoelA
AsseltaRGirelliDMartinelliNShahSHKrausWELiMRader
DJReillyMPMcPhersonRWatkinsHArdissinoDZhangQWang
JTsaiMYTaylorHACorreaAGriswoldMELangeLAStarrJM
RudanIEiriksdottirGLaunerLJOrdovasJMLevyDChenYDI
ReinerAPHaywardCPolasekODearyIJBoreckiIBLiuY
GudnasonVWilsonJGvanDuijnCMKooperbergCRichSSPsaty
BMRotterJIOrsquoDonnellCJRiceKBoerwinkleEKathiresanS
CupplesLA2014AssociationofLowFrequencyandRareCoding
SequenceVariantswithBloodLipidsandCoronaryHeartDiseasein
56000WhitesandBlacksTheAmericanJournalofHumanGenetics94
223ndash232doi101016jajhg201401009
PryceJEVeerkampRFThompsonRHillWGSimmG1997Genetic
aspectsofcommonhealthdisordersandmeasuresoffertilityinHolstein
FriesiandairycattleAnimalScience65353ndash360
doi101017S1357729800008559
PryceJEWoolastonRBerryDPWallEWintersMButlerRShafferM
2014WorldTrendsinDairyCowFertilityin10ThWorldCongressof
GeneticsAppliedtoLivestockProduction
PuffenbergerEGJinksRNWangHXinBFiorentiniCShermanEA
DegrazioDShawCSougnezCCibulskisKGabrielSKelleyRI
MortonDHStraussKA2012Ahomozygousmissensemutationin
HERC2associatedwithglobaldevelopmentaldelayandautismspectrum
disorderHumMutat331639ndash1646doi101002humu22237
PurcellSNealeBToddBrownKThomasLFerreiraMARBenderD
MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKA
157
ToolSetforWholeGenomeAssociationandPopulationBasedLinkage
AnalysesAmJHumGenet81559ndash575
SahanaGNielsenUSAamandGPLundMSGuldbrandtsenB2013Novel
HarmfulRecessiveHaplotypesIdentifiedforFertilityTraitsinNordic
HolsteinCattlePLoSONE8e82909doi101371journalpone0082909
SchuumltzEScharfensteinMBrenigB2008Implicationofcomplexvertebral
malformationandbovineleukocyteadhesiondeficiencyDNAbased
testingondiseasefrequencyintheHolsteinpopulationJDairySci91
4854ndash4859doi103168jds20081154
SimmonsMJCrowJF1977MutationsAffectingFitnessinDrosophila
PopulationsAnnualReviewofGenetics1149ndash78
doi101146annurevge11120177000405
SoggiuAPirasCHusseinHACanioMDGaviraghiAGalliAUrbaniA
BonizziLRoncadaP2013UnravellingthebullfertilityproteomeMol
BioSyst91188ndash1195doi101039C3MB25494A
StitzielNOKiezunASunyaevS2011Computationalandstatistical
approachestoanalyzingvariantsidentifiedbyexomesequencing
GenomeBiology12227doi101186gb2011129227
SundaramSKHuqAMSunZYuWBennettLWilsonBJBehenME
ChuganiHT2011ExomesequencingofapedigreewithTourette
syndromeorchronicticdisorderAnnNeurol69901ndash904
doi101002ana22398
VanRadenPMOlsonKMNullDJHutchisonJL2011Harmfulrecessive
effectsonfertilitydetectedbyabsenceofhomozygoushaplotypesJournal
ofDairyScience946153ndash6161doi103168jds20114624
VilarintildeoGuumlellCRajputAMilnerwoodAJShahBSzuTuCTrinhJYuI
EncarnacionMMunsieLNTapiaLGustavssonEKChouP
TatarnikovIEvansDMPishottaFTVoltaMBeccanoKellyD
ThompsonCLinMKShermanHEHanHJGuentherBL
WassermanWWBernardVRossCJAppelCresswellSStoesslAJ
RobinsonCADicksonDWRossOAWszolekZKAaslyJOWuR
MHentatiFGibsonRAMcPhersonPSGirardMRajputMRajput
158
AHFarrerMJ2014DNAJC13mutationsinParkinsondiseaseHumMolGenet231794ndash1801doi101093hmgddt570
WangKLiMHakonarsonH2010ANNOVARfunctionalannotationofgeneticvariantsfromhighthroughputsequencingdataNuclAcidsRes38e164ndashe164doi101093nargkq603
WathesDCPollottGEJohnsonKFRichardsonHCookeJS2014HeiferfertilityandcarryoverconsequencesforlifetimeproductionindairyandbeefcattleAnimal8Suppl191ndash104doi101017S1751731114000755
YangYMuznyDMReidJGBainbridgeMNWillisAWardPABraxtonABeutenJXiaFNiuZHardisonMPersonRBekheirniaMRLeducMSKirbyAPhamPScullJWangMDingYPlonSELupskiJRBeaudetALGibbsRAEngCM2013ClinicalWhole
ExomeSequencingfortheDiagnosisofMendelianDisordersNewEnglandJournalofMedicine3691502ndash1511doi101056NEJMoa1306555
ZiminAVDelcherALFloreaLKelleyDRSchatzMCPuiuDHanrahanFPerteaGTassellCPVSonstegardTSMarccedilaisGRobertsMSubramanianPYorkeJASalzbergSL2009Awholegenome
assemblyofthedomesticcowBostaurusGenomeBiology10R42doi101186gb2009104r42
159
Supplementary)files)
YoucanfindChapter6supplementaryfilesat
- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)
- GoogleDrive(httptinyurlcomPhDMMChapter06)orscantheQRcode
)
Supplementary)Figure)S1)Coverage)analyses)
A)Averagecoverageperanimals
B)Boxplotofchromosomescoverageperanimal
C)BoxplotofanimalcoverageperchromosomeBluelinerepresenttheexpected
50Xcoverage
Supplementary)Figure)S2)Distribution)of)variant)sequencing)depth)
)
Supplementary) Figure) S3) Correlation) between) sequencing) depth) and)
number) of) discordant) genotypes) In) red) values) for) two) outlier) animals)
having)concordance)lt846))
)
Supplementary)Figure)S4)Multidimensional)scaling)of)Italian)Holstein)bulls)
sequenced))
)
Supplementary)Figure)S5)Multidimensional)scaling)of)Italian)Holstein)bulls)
analysed)with)the)800K)SNPchip))
160
Supplementary) Figure) S6) MAF) distribution) on) neutral) (blue)) and)
deleterious)(red))variants))
)
Supplementary) Table) S1) EBV) distribution) on) Italian) Holstein) population)
and)SNPchip)dataset)
Sheet1EBVthresholdvaluetodefineplusandminustailforPROT(kgprotein)
SCC(somaticcellcount)FERT(fertility)andLONG(functional longevity) All
ItalianHolsteinFAbullswereincludeinthisanalysis
Sheet2Numberofdatasetanimalsineachtailforeachtrait
Supplementary)Table)S2)Exome)probes)statistics)number)and)bp)covered)
per)chromosomes)
Supplementary) Table) S3) Concordance) between) animal) genotypes) in)
sequencing)and)SNPchip)data)
Error1 HD data homozygotes ndash Exome data heterozygotes Error2 HD data
heterozygotesndashExomedatahomozygotes
Supplementary) Table) S4) List) of) suggestive) genes) deriving) from) the)
analysis)of)exomes)of)animals) in)the)tails)of)male)and)female)fertility)EBV)
distributions))
Header(code
bull Bfreq_totFrequencyofBalleleinentiredataset
bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset
bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset
bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset
bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset
bull CallRate_totSNPcallratendashentiredataset
bull CallRate_MHSNPcallratendashhighmalefertilitydataset
bull CallRate_MLSNPcallratendashlowmalefertilitydataset
bull CallRate_FHSNPcallratendashhighfemalefertilitydataset
bull CallRate_FLSNPcallratendashhighmalefertilitydataset
bull HWeqHardyWeinbergequilibriumpvalue
bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele
bull Geno_HetNumberofanimalswithheterozygotegenotype
161
bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele
bull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)dataset
bull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)dataset
bull LOW_CRSNPcallratefromlowfertility(maleandfemale)dataset
bull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallele
bull LOW_Het Number of animals from low fertility (male and female) dataset with
heterozygotegenotype
bull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallele
bull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and
female)dataset
bull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)dataset
bull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)dataset
bull HIGH_CRSNPcallratefromhighfertility(maleandfemale)dataset
bull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallele
bull HIGH_Het Number of animals from high fertility (male and female) dataset with
heterozygotegenotype
bull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallele
bull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and
female)dataset
bull Male_TREND inheritance models implemented in PLINK trend test on male fertility
dataset
bull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale
fertilitydataset
bull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male
fertilitydataset
bull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility
dataset
bull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on
femalefertilitydataset
bull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale
fertilitydataset
bull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male
andfemalefertilitydataset
bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test
onmaleandfemalefertilitydataset
162
bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston
maleandfemalefertilitydataset
bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset
bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset
bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset
Supplementary)Table) S5) Regions) candidate) to) carry) deleterious) variants)
identified)by)OHW)approach))
Supplementary)Table) S6) Regions) candidate) to) carry) deleterious) variants)
identify)by)LOH)approach))
Ingreyregionsremovedinfurtheranalysis
Supplementary)Table)S7)List)of)candidate)deleterious)variants)mapping)in)
OHW)and)LOH)regions))Header(code
bull Bfreq_totFrequencyofBalleleinentiredataset
bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset
bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset
bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset
bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset
bull CallRate_totSNPcallratendashentiredataset
bull CallRate_MHSNPcallratendashhighmalefertilitydataset
bull CallRate_MLSNPcallratendashlowmalefertilitydataset
bull CallRate_FHSNPcallratendashhighfemalefertilitydataset
bull CallRate_FLSNPcallratendashhighmalefertilitydataset
bull HWeqHardyWeinbergequilibriumpvalue
bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele
bull Geno_HetNumberofanimalswithheterozygotegenotype
bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele
bull HW_insideDeleteriousmutationsinsideldquoOutofHardyWeinbergrdquoregion
bull HW_outsideDeleteriousmutationsoutsideldquoOutofHardyWeinbergrdquoregion
bull DH_insideDeleteriousmutationsinsideldquoLackofHomozygosityrdquoregion
bull DH_outsideDeleteriousmutationsoutsideldquoLackofHomozygosityrdquoregion
163
bull GeneClass Classifications of gene 1 deleterious mutations inside OHW and LOHregion2deleteriousmutationsinsideOHWorLOHregion
bull 2deleteriousmutationsoutsideOHWandorLOHregionbull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)datasetbull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)datasetbull LOW_CRSNPcallratefromlowfertility(maleandfemale)datasetbull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallelebull LOW_Het Number of animals from low fertility (male and female) dataset with
heterozygotegenotypebull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallelebull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and
female)datasetbull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)datasetbull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)datasetbull HIGH_CRSNPcallratefromhighfertility(maleandfemale)datasetbull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallelebull HIGH_Het Number of animals from high fertility (male and female) dataset with
heterozygotegenotypebull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallelebull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and
female)datasetbull Male_TREND inheritance models implemented in PLINK trend test on male fertility
datasetbull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale
fertilitydatasetbull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male
fertilitydatasetbull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility
datasetbull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on
femalefertilitydatasetbull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale
fertilitydatasetbull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male
andfemalefertilitydataset
164
bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test
onmaleandfemalefertilitydataset
bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston
maleandfemalefertilitydataset
bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset
bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset
bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset
Supplementary)Table)S8)Results)of)haplotype)analysis)))Header(code
bull Chromchromosome
bull PosphysicalpositiononUMD31genome
bull Exome_homonumberofhomozygotefordeleteriousmutationsequencedanimals
bull Exome_hetnumberofheterozygotesequencedanimalsfordeleteriousmutation
bull Outcome NotFound It was impossible to associated a haplotype to the mutation
NotOpt Non completely invariant haplotype could be associated to themutationOK
Putativehaplotype associate tomutationwas found In few case alternativehaplotype
wasreported
bull LenlengthofhaplotypeinnumberofSNPs
bull Pop_numnumbersofselectedhaplotypedetectedinHolsteinpopulation
bull Pop_homo numbers of homozygote animals in Holstein population for selected
haplotype
bull Pop_het numbers of heterozygotes animals in Holstein population for selected
haplotype
bull ExpExpectednumberofhomozygoteanimal
bull Alternative if only one heterozygote sequenced animals for deleteriousmutationwas
foundalltwopossiblehaplotypeswheretested
Supplementary)Table)S9)Descriptive)results)for)the)Biological)Process)(BP)))
NumberofgenesbelongingtoBiologicalProcessGOterms)
Supplementary) Table) S10) Descriptive) results) for) the)Molecular) Function)
(MF))
NumberofgenesbelongingtoMolecularFunctionGOterms
165
CHAPTER7
Conclusion
Since the domestication event occurred in the Fertile Crescent around 10000yearsagothebovinespecieswasselectedbyhumankindtofulfilhisneedsThecreationofbreedsoccurredaround200yearsagointensifiedthedifferentiationbetween populations and promoted the development of intensive selectionschemesTraditionalAandnowadaysgenomicAselectionsareproducingpositivegenetictrendsinmanyproductivetraitsdespitethestillpoorknowledgeaboutphenotype biology A deeper understanding of genome organization and genebiology would increase the accuracy of genomic evaluation by incorporatingpriorknowledgeThisthesisexploresthebovinegenomebyhighAthroughputtechnologiesAsuchasNextGenerationSequencingandHighADensityGenotypingAwithestablishedand innovative procedures to investigate the biology of complex traits and toprovidenewdataandmoleculartoolstobovinebreedingThese goals have been achieved by complementary approaches and differentmethodsInChapter2selectionsignatureswereinvestigatedindependentlyintofive Italian breeds using Illumina BovineSNP50 BeadChip Then amultiAbreedapproach was applied clustering breeds into dairy (Italian Holstein ItalianBrownand Italian Simmental) andbeef (Marchigiana Piedmontese and ItalianSimmental) groups Regions under recent positive selection shared by breedswithasameproductiveaptitudewereinvestigatedinfurtherdetailbyretrievinggenes in the region and analysing their function by ontology and pathwayanalysis In the dairy group selection signals appear nearby genes related tomammarygland(metabolismandresistancetomastitis)andfeedingadaptationIn thebeefgroupgenes inselectedregionsare involved inanimalgrowthandmeatquality(textureandjuiciness)Hencecandidategenesidentifiedbelongtonetworks and pathways important in directing a breed towards either beef ordairyproductionIn Chapter 3 a GWAS approach was used in dairy cattle to correlate SNP tocomplex phenotypes having an economic value Three independent ldquoclassicalrdquosingleAmarkerregressionswereruninthreeItaliandairycattle(ItalianHolsteinItalianBrownandItalianSimmental)UsingthetraditionalsingleSNPapproach
168
only few SNPs were above the nominal genomeAwide significance thresholdwhile applying a geneAbased method identified significant candidate genesassociatedwithmilkphysiology andmammary glanddevelopment in all threepopulationsInterestinglygenesfoundindifferentbreedshavesimilarfunctionbut are not the same suggesting that breed history demography selectioncriteriaandstochasticeventsasgeneticdrifthaveanimpactonthedefinitionofthemainmoleculartargetofselectionschemesCurrent intensive selection strategies rapidly spread the genetic gain in thepopulationbutholdtheassociatedriskofdisseminatingdefectivegenescausinggenetic defects (eg mulefoot complex vertebral malformation and others) ordecreasing fertility In Chapter 6 different technologies and approaches werecombined to obtain filter and prioritize genes candidate to be deleteriousExome sequence datawere exploited to identify deleterious variants affectingItalianHolsteincattleinasetof20animalsextremeforfertilitytraitsMorethan5000 candidate deleterious SNP variants were identified To bypass the smallnumberlimitationandthelackofpowerofanystatisticanalysison20animalsastepwise approachwas used to first filter and then prioritize these candidatedeleterious variants Polymorphisms were retained if having asymmetricdistribution in bulls from opposite tails of fertility EBV distributions andormapping in genomic regions outlier for lack of homozygosity in a 1009 bullpopulation genotyped with the Illumina BovineHD BeadChip A total of 191variants in163genespassed theprocessesThesewereassessedby searchingadditional evidences in favour or against their deleterious effect throughdifferentapproachesi)checkingthevariantconservationacross100vertebratespecies by comparative genomics ii) finding haplotypes associated withcandidatemutationsandevaluatingtheirfrequencyinHolsteinsiii)integratinghumanandmouse functionalannotationsA scorewasproposed toweight theresult of these assessments and rank the variants 35 of them had positivescores These occurred in genes involved in basic biological mechanisms asfertilitydevelopmentand immunityandresultedenriched ingenesassociatedtogeneticdefectsreportedinhumanThesemutationsarestrongcandidatestobe recessive deleterious variants worth to be further investigated in a larger
169
populationtobetterunderstandtheirbiologyimpactandpossibleapplicationinbreedingTo increase the reliability of results based on genomic data and to reduce theimpact of subjective choices in data analysis new toolsapproaches weredeveloped(iethemultiAbreedapproachinSelectionSignaturendashChapter2thepostAGWAS geneAbased approach ndash Chapters 3 and 4) and the existent onescriticallyevaluated(imputationmethodsndashChapter5)In Chapter 4 the postAGWAS geneAbased method used in Chapter 3 wasimplemented in the MUGBAS (MUlti species GeneABased Association Suite)software In contrast to existing software MUGBAS is species and annotationfree AmultipleAtesting correctionwas included in the package in addition tocomputing parallelization for faster processing and plot representation forbetterunderstandingThesoftwareusessingleAmarkerGWASdataandagivenannotation to estimate geneAregionAwise association pAvalues As previouslymentioned in Chapter 3 the ldquoclassicalrdquo GWAS approach (single SNP) foundsignals only in ItalianHolsteinwhileMUGBAS geneAbased approach identifiedsignificantsignalsinallbreedsinvestigatedIn Chapter 5 the impact of different bovine reference sequences on genotypeimputation was investigated In this work Illumina BovineSNP50 BeadChipgenotypesof ItalianSimmentalbullswerereAmappedonthethreemostrecentreferencegenomeassembliesWecomparedfourdifferent imputationmethodsfromlow(IlluminaBovineLDv10Beadchip)tohighdensityThetestpopulationwasdividedintofoursubpopulationsonthebasisofrelationshipandgenotypedataavailabilityUpdatingSNPcoordinatesonthe three testedcattlereferencegenome assemblies determined only a slight variation on imputation resultswithinmethodsOntheotherhandlargedifferencesintermsofaccuracywereobservedamongthefourimputationmethodstestedGenetic structure of industrial breeds was evaluated assessing LinkageDisequilibrium (LD) Since LD decay is correlated to intensity of selection weexpected different behaviours in the breeds analysed Indeed as evident inChapter 2 ndash Figure 7 breeds subjected to lower selection intensity (ie
170
MarchigianaPiedmonteseandItalianSimmental)displayalowerpersistenceof
LDatgreaterdistancescomparedtothosesubjectedtohigherselectionpressure
(ieItalianHolsteinandItalianBrown)Chapter3investigatesLDintheDGAT1
region of dairy cattle breeds In Italian Simmental LD is lower compared to
Italian Holstein and Italian Brown In spite of this significant association
betweentheDGAT1regionandmilktraitswasfoundbothinItalianSimmental
andinItalianHolsteinInItalianBrownnoassociationwasfoundbecauseDGAT1
isfixed
Data and results presented in this thesis represent a clear example of the
progress of molecular technologies occurred in last few years spanning from
mediumdensitygenotypingtonextgenerationsequencing
Thebiologylandscapesofsomecomplextraits(iemilkproductionandquality
fertility)were improved finding candidate genes andpathwaysnotpreviously
associated to the phenotype in the bovine species Moreover insights on
deleterious mutations were inferred through a deep and integrated analysis
usingNGSdata
This new knowledge could be used to improve cattle breeding For example
deleteriousrecessivevariantscouldbeincludedinbullevaluationsbybreedersrsquo
associations to reduce the genetic load of deleterious genes in parallel to
improvingproductivity
While technological progress has been impressivewe are still only scratching
thesurfaceofanimalbiologyNewfrontiersarebeingopenedbywholegenome
sequencing of many animals (eg in the 1000 bull genome project) and by
stepping beyond sequencing (a number of epigenomic projects in animals are
following the footprints of the human ENCODE project) The trend is towards
unravelling complexity and data production ability is to be flanked by data
storingandprocessingandaboveallbynewmethodsandtoolsfordataanalysis
171
ACKNOWLEDGMENTS
ldquoNon egrave la destinazione ma il viaggio che contardquo Questi tre anni sono stati un gran bel viaggio di crescita personale e ricchi diesperienze Non egrave stato un viaggio solitarioma condiviso con vecchie e nuoveconoscenzechemisembradoverosomenzionareInnanzitutto vorrei ringraziare il prof Paolo AjmoneMarsan chemi ha dato lapossibilitagrave di fare questa esperienza Grazie per gli insegnamenti e per leopportunitagravechemihaoffertoSperodinonavertraditolesueaspettativeAgli attuali colleghi Lorenzo Stefano Licia Elisa Riccardo ed Elia e a quellipassati Nicola Fatima Raffaele Ezequiel e Barbara va il mio piugrave sincero edaffettuoso ringraziamento per aver condiviso anche solo in parte questomiopercorso Grazie per avermi sopportato (e non egrave poco) guidato (non solo nelcampolavorativo)efattovivereinunmodospecialelrsquoambientedilavoroUnsincerograzievaatuttiimieicolleghidelXXVIIcicloAgrisystemRicorderogravesempretuttiimomentichehopotutocondividereconvoi UngrazieancheachihafattopartedellamiaesperienzasvedeseGraziedicuoreaCarlFlavioChristianEricaatuttiiragazziitalianiatuttiicolleghildquosvedesirdquoea tutte le altre persone che ho conosciuto Mi avete fatto vivere appienolrsquoesperienzaesteraefattotornareacasaconqualcheconsapevolezzainpiugrave UnsentitoringraziamentoatuttiglialtricolleghiconcuihocollaboratoinquestianniRingrazio tutti gli amici e conoscenti che mi sono stati vicino per avermisupportatoesopportatoognunoamodoproprio Ultimomanonperimportanza ilringraziamentochevaatuttalamiafamigliaper avermi aiutato spronato sostenuto sempre e in particolare in questaesperienzaSemplicementegrazie PS Adirlatuttaancheladestinazionenonegravepoicosigravemale
173
- INDEX
- CHAPTER 1 - Introduction
- CHAPTER 2 - Relative extended haplotype homozygosity (rEHH) signals across breeds reveal dairy and beef specific signatures of selection
- CHAPTER 3 - Searching new signals for productive traits in three Italian cattle breeds
- CHAPTER 4 - MUGBAS a species free gene based ased association suite
- CHAPTER 5 - Imputation accuracy is robust to cattle reference genome updates
- CHAPTER 6 - Quest for deleterious recessive variants in Italian Holstein bulls combining exome sequencing and high density SNP analysis
- CHAPTER 7 - Conclusion
- ACKNOWLEDGMENTS
-
Alla$mia$famiglia$
A$mio$nonno$Mario$
INDEXamp
Chapteramp1ndashIntroduction 3
Chapteramp2Relativeextendedhaplotypehomozygosity(rEHH)signalsacrossbreedsrevealdairyandbeefspecificsignaturesofselection
11
Chapteramp3SearchingnewsignalsforproductivetraitsthroughgenebasedassociationanalysisinthreeItaliancattlebreeds
49
Chapteramp4ampMUGBASaspeciesfreegenebasedassociationsuite
81
Chapteramp5Imputationaccuracyisrobusttocattlereferencegenomeupdates
89
Chapteramp6QuestfordeleteriousrecessivevariantsinItalianHolsteinbullscombiningexomesequencingandhighdensitySNPanalysis
99
Chapteramp7ndashConclusion 167
amp
Acknowledgmentsampamp 173
CHAPTER1
Introduction
Background
The bovine species counts almost 15 billion of heads classified in some 800
breeds worldwide (httpfaostat3faoorg) The species is adapted to a wide
varietyofdifferentenvironmentandhusbandrysystemsandisoftenrearedfor
multiple purposes (Taberlet et al 2008) draft power milk meat leather
productionandotheruses(Elsiketal2009)
European cattle (Bos$ taurus) derive from a single domestication event of the
auroch (Bos$primigenius)occurred in theFertileCrescent around10000years
ago(Taberletetal2011)FollowingdomesticationcattlefollowedtheNeolithic
expansionofagricultureandcolonisedEurope(seeAjmoneMMarsanetal2010
for a review) Many thousand years of natural and artificial selection have
shaped cattle phenotype and genotype to meet environmental conditions and
human needs and together with demographic events and genetic drift have
started thedifferentiation of the bovine species in diverse and locally adapted
populations This differentiation process accelerated in the last two centuries
whenthebreedconceptwasinventedsothatnowthisspeciescountshundreds
of different breeds with high diversity in coat colour body size production
aptitudeadaptationtoclimate tolerancetodiseaseetc(Taberletetal2008
Elsiketal2009TheBovineHapMapConsortium2009)
Thisdiversityisnowbeingprogressivelylostwiththeextinctionofmanylocal
populationsmainlyforeconomicreasonsSelectioneffortshavethereforebeen
focussedonafewoutperformingindustrialbreedsthatarepresentlyselectedfor
specialiseddairyorbeefproductionandsometimesdoublepurpose
In these breeds breeding schemes are well developed and have been
traditionally based on the genetic evaluation of sires and dams using data
collectedfromperformanceandprogenytestingandunbiasedlinearprediction
modelsThesearefoundedontheFisherinfinitesimalmodellingofquantitative
trait variation and need no genomic information to estimate breeding values
(EBVs) The genetic progress recorded using this approach has been relevant
particularly for traits having medium to high heritability In Italian Holstein
genetic trends for themain production traits (milk protein and fat yield) are
positiveHoweveraccurateestimateofbreedingvalueshasahighcostinterms
4
of investment and time so that 5 to 6 years are needed to have a dairy bullprogenytested(wwwanafiit)Theadventofgenomicselection(Meuwissenetal2001)markedamilestoneindairycattlebreedingTheideawasdevelopedbeforethemoleculartoolswereinplaceforimplementitinpracticeandhasbecomearealityonlyrecentlyInthisapproach progeny tested bulls are genotyped at many thousand loci and thevalue of their EBVs used to estimate the value of each allele at each locusgenotyped These values are then used to predict the genetic value of younganimals on the basis of their genotype As a result animals from the newgeneration receive an EBV with the same accuracy of estimates obtained byprogenytestingbutmuchearlierThislowerthegenerationintervalandgreatlyincreasestherateofgeneticprogress indairypopulationsThe initial ideawasso good that nowadays genomic selection is widely applied in industrialcountriestothemajordairycattlebreedsWiththedecreaseofgenotypingcostit is likely that its use will be soon expanded to animals selected for otherpurposes (eg beef cattle) and other species (eg pigs) However genomicselection is an extension of traditional selection It is extremely useful inbreedingforcomplextraitsbutgivesnobiologicalinformationonthegenesthatcontrolthemIfthetraditionalselectionviewsasiregenomeasabigblackboxhaving a value but no clue why genomic selection views the same genomedisassembled inmany small boxes but still black and stillwithno clueon thereasonoftheirvalueIt is reasonable to think that a deeper understanding of the biologicalmechanisms controlling traits may further increase the accuracy of genomicevaluationsIndeedthenewtechnologiesoffernewopportunitiesinthissensebyloweringcostsandincreasingprecisionofdataproductionThis thesis is exploring the use of these technologies for retrieving biologicalinformationfromgenomicdataItusesmethodsnowconsideredldquotraditionalrdquoingenomics (GWAS and analysis of selection sweeps) and less traditional ones(gene centered GWAS and exome analysis) All data used here have beenproducedwithin twonationalprojectson farmanimalgenomics fundedby theItalianMinistryofAgriculture
5
M SELMOL ldquoRicerca e innovazione nelle attivitagrave di miglioramento genetico
animalemediante tecniche di geneticamolecolare per la competitivitagrave del
sistemazootecniconazionaleIrdquo(ResearchandInnovationinanimalbreeding
sheepgoatbuffalohorseanddonkeyI)
M INNOVAGENldquoRicercaeinnovazionenelleattivitagravedimiglioramentogenetico
animalemediante tecniche di geneticamolecolare per la competitivitagrave del
sistema zootecnico nazionale IIrdquo (Research and Innovation in animal
breedingsheepgoatbuffalohorseanddonkeyII)
In these projects the research group I work with M Istituto di Zootecnica at
Universitagrave Cattolica del Sacro Cuore di Piacenza M coordinated all the research
activitiesconcerningtheapplicationofgenomicstechniquesindairycattle
Contentsofthethesis
The thesis is structured into a short introduction (this first chapter) 5 core
chapters(Chapter2to6)andashortConclusion(Chapter7)Withtheexception
of Chapter 6 still in preparation the other core chapters containmanuscripts
that at time of writing are under review for their publication in international
peerMreviewedjournalsormanuscript
InChapter2selectionsignatureindairyandbeefcattleareinvestigatedwiththe
Illumina BovineSNP50 Genotyping BeadChip Five Italian cattle breeds (Italian
Holstein Italian Brown Italian Simmental Marchigiana and Piedmontese) are
analyzedwiththerEHHmethod(Sabetietal2002)tofindsignaturesofrecent
selectionCandidategenesinthevicinityofsignaturessharedbyeitherdairyor
beefbreeds are identified and theirpossible role indairy andbeefproduction
discussed
In Chapter 3 a GWAS analysis is run to investigate production traits in three
Italian dairy cattle breeds (Italian Holstein Italian Brown Italian Simmental)
Data are analysed with a well established method (Amin et al 2007)
6
implemented in theGenABELRpackage (Aulchenkoet al 2007) andwith the
newgeneMbasedmethodAssociationisinvestigatedbetween50KSNPsandmilk
production(milkyield)andqualitytraits(percentageofmilkproteinandfat)
InChapter4thegeneMbasedmethodusedinthepreviouschapter3inspiredby
theVEGASapproach(Liuetal2010)isdescribedThesoftwarewasdeveloped
andreMcodedinourlaboratoryinordertopermittheanalysisofanyspeciesin
additiontohumanMultiMtestingcorrectionandplotrepresentationcompletethe
toolsmadeavailable
InChapter5wetesttheimpactoftheuseofdifferentreferencesequencesinthe
imputation step Italian Simmental data are used to test variation in genomeM
wide inputation accuracy using different cattle reference sequences (BTAU42
UMD31 and BTAU46) To increase the analyses reliability four different
inputationsoftwaresaretested
InChapter6Holsteindataareanalysedcombiningexomesequencesand800K
SNPs data to seek deleterious recessive variantsWhile in natural populations
the destiny of these variants is to disappear in livestock breeds they can be
maintained and disseminated if carried by a highly productive sire The
intersection of deleterious mutations identified by sequencing and lack of
homozygousregionsidentifiedbytheanalysisofthepopulationwiththeHDSNP
chipidentifiedgenescandidatetocarryrecessivedeleteriousvariants
Objective
Themainobjectiveofthethesiswastoinvestigatethebovinegenomewiththe
aid new highMthroughput technology and to explore new strategies for linking
biology (genes and their function) to traits This linkage once validated may
increase the accuracy of genomic prediction and find application in selection
schemesegineradicatingthemostdeleteriousvariantsandorintheplanning
ofmatings
7
ReferencesAjmoneMMarsanPGarciaJFLenstraJA2010OntheoriginofcattleHow
aurochsbecamecattleandcolonizedtheworldEvolAnthropol19148ndash157doi101002evan20267
AminNvanDuijnCMAulchenkoYS2007AGenomicBackgroundBasedMethodforAssociationAnalysisinRelatedIndividualsPLoSONE2e1274doi101371journalpone0001274
AulchenkoYSKoningDMJdeHaleyC2007GenomewideRapidAssociationUsingMixedModelandRegressionAFastandSimpleMethodForGenomewidePedigreeMBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614
ElsikCGTellamRLWorleyKC2009TheGenomeSequenceofTaurineCattleAWindowtoRuminantBiologyandEvolutionScience324522ndash528doi101126science1169588
LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKMHaywardNKMontgomeryGWVisscherPMMartinNGMacgregorS2010AVersatileGeneMBasedTestforGenomeMwideAssociationStudiesTheAmericanJournalofHumanGenetics87139ndash145doi101016jajhg201006009
MeuwissenTHHayesBJGoddardME2001PredictionoftotalgeneticvalueusinggenomeMwidedensemarkermapsGenetics1571819ndash1829
SabetiPCReichDEHigginsJMLevineHZPRichterDJSchaffnerSFGabrielSBPlatkoJVPattersonNJMcDonaldGJAckermanHCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash837doi101038nature01140
TaberletPCoissacEPansuJPompanonF2011ConservationgeneticsofcattlesheepandgoatsCRBiol334247ndash254doi101016jcrvi201012007
8
TaberletPValentiniARezaeiHRNaderiSPompanonFNegriniR
AjmoneMMarsanP2008Arecattlesheepandgoatsendangered
speciesMolEcol17275ndash284doi101111j1365M294X200703475x
TheBovineHapMapConsortium2009GenomeMWideSurveyofSNPVariation
UncoverstheGeneticStructureofCattleBreedsScience324528ndash532
doi101126science1167936
9
CHAPTER2
Relative(extendedamphaplotype(homozygosity)(rEHH)ampsignalsampacrossampbreedsamprevealampdairyampand$beef$specificampsignaturesampofampselection
LBomba1ELNicolazzi2MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly2FondazioneParcoTecnologicoPadanoViaEinsteinLocCascinaCodazza26900LodiItalyTheseauthorscontributedequallytothiswork
____________________SubmittedtoGeneticSelectionEvolution
Abstract
Background
Nowadays many methods have been implemented to scan for signatures of
selection at genomescale level within and across breeds Extended haplotype
homozygosity is a proficient approach to detect genome regions under recent
selective pressure within breeds The objective of this study is to identify
common regions under positive recent selection shared by most represented
dairyandbeefItaliancattlebreeds
Results
A total of 3220 animals from Italian Holstein (2179) Italian Brown (775)
Simmental (493) Marchigiana (485) and Piedmontese (379) breeds were
genotyped with the Illumina BovineSNP50 BeadChip v1 in the frame of the
ItalianlivestockgenomicprojectsldquoSelMolrdquoandldquoProzoordquoAfterstandardcleaning
proceduresgenotypeswerephasedandcorehaplotypesidentifiedThedecayof
Linkage Disequilibrium (LD) for each core haplotype was assessedmeasuring
the extended haplotype homozygosity (EHH) To correct for the lack of good
estimatesof local recombinationrates therelativeEHH(rEHH)wascalculated
for each corehaplotypeGenomic regionsunder recentpositive selectionwere
identifiedasthosehighlyfrequentcorehaplotypeswithsignificantrEHHvalues
Significant independentcorehaplotypeswerealignedacrossbreeds to identify
signalsofrecentselectionsharedbydairyandsharedbybeefbreedsOverall82
and 87 common regions under selectionwere detected among dairy and beef
cattlebreedsrespectivelyBioinformaticsanalysisidentified244and232genes
mapping in these common genomic regions Pathway and network analysis of
significant genes revealed molecular functions biologically related to signal of
selectionsdetectedinmilkormeatproductiontypes
Conclusions
Resultssuggest that themultibreedapproach isamethodto identifyselection
signatures in cattle breeds under selection for the same production goal
12
allowing to better pinpoint the genomic regions of interest in dairy and beef
production
Background
The advent of genomic technologies and the consequent availability of single
nucleotidepolymorphism(SNP)datahaveenabledthestudyofselectionevents
on the cattle genome (Hayes et al 2008 Qanbari et al 2010) Such selection
signals arise from environmental or anthropogenic pressures and help to
understand the causes that led to breed formation These studies are usually
addressedinaldquotopdownrdquoapproach(ShahzadandLoor2012)fromgenotypeto
phenotypeinwhichgenomicdataisstatisticallyevaluatedtodetectdirectional
selectionThephenotypeneededtorunselectionsignaturesisintendedinavery
generalsenseabreedaproductionaptitudeorevenageographicareaoforigin
Thisapproachholdsthepotentialto investigatetraitsasadaptationtoextreme
climatesanddifferentfeedingandhusbandrysystemsresiliencetodiseasesetc
thatareveryexpensivedifficultandsometimesimpossibletostudywithclassic
GWAS (GenomeWide Association Study) approaches Therefore it is expected
thatgenomic informationretrievedinthesestudiesonlypartiallyoverlapsthat
identified by GWAS on specific traits Searching for these ldquosignatures of
selectionrdquo is of interest for understanding molecular mechanisms underlying
important biological processes and providing new insights to functional
biologicalinformation(egdiseaseandorimportanteconomictraits(Nielsenet
al 2007)) To date many methods have been implemented to scan these
selectionsignaturesatthegenomiclevelMethodsdifferwithrespecttoseveral
properties (Lenstra et al 2012) there are methods that perform analyses
withinbreedoracrossbreedsthatcomparealleleorhaplotypefrequenciesand
size that detect recent or ancient selection events either still segregating or
fixatedintopopulations(Lenstraetal2012OrsquoBrienetal2014Utsunomiyaet
al2013)Differentmethodshavedifferentsensitivityandrobustnesseg they
maybeinfluencedatadifferentextentbymarkerascertainmentbiasanduneven
distributionofrecombinationhotspotsalongthegenome
13
Extended haplotype homozygosity (EHH) is a method that has been used in
many animal species including cattle (Pan et al 2013 Qanbari et al 2010
Zhangetal2006)TheEHHmethodwasfirstdevelopedbySabetietal(2002)
for human population analysis The main objective of this methodology is to
identifylongrangehaplotypesinheritedwithoutrecombinationeventsMorein
detailunderaneutralevolutionmodelchangesinallelefrequenciesaredriven
onlybygeneticdriftInthisscenarioanewvariantwouldrequirealongtimeto
reach high frequency in the population and the Linkage Disequilibrium (LD)
surroundingitwoulddecayduetorecombinationevents(Kimura1984)Inthe
caseofpositiveselectionaquickriseinfrequencyofabeneficialmutationina
relatively short time will preserve the original haplotype structure as the
number of recombination events would be limited Therefore a signature of
positiveselectionisdefinedasaregionwithanalleleinlongrangeLDlocatedin
anuncommonlyhighfrequencyhaplotype
One of themost interesting features of thismethod is that it is able to detect
genomic regions that are under recent selection These recent selection
signatures can be therefore used to identify the genomic regions involved in
breedformationInadditionEHHmethoddoesnotrequireanydefinitionofan
ancestralallele(asneededintheintegratedhaplotypescoreiHS(Voightetal
2006) and is suited to be applied to SNP data because it is less sensitive to
ascertainmentbiasthanothermethods(Nielsenetal2007)HowevertheEHH
methodgeneratesa largenumberof falsepositiveandnegative resultsdue to
heterogeneous recombination rates along the genome (Qanbari et al 2010)
Moreover another drawback not specific of EHH but shared by all selection
signaturemethod is the lack of robust inferences able to discern true signals
fromthosegeneratedbychance(Kemperetal2014)
To partially account for these limitations Sabeti et al (2002) developed the
relative extended haplotype homozygosity (rEHH) including an empirical
methodologytoobtainsignificantsignalsTherEHHofaputativecorehaplotype
compares its original EHH valuewith that of other haplotypes at that specific
locus using all other haplotypes as control for local variation in the
recombination rate Therefore it identifies genomic regions carrying variants
14
under selection that are still segregating in the population analysed Although
thesemethodswere conceived for human populations theywere successfully
appliedto livestockspeciessuchaspig(Maetal2012)andcattle(Qanbariet
al2010)
After domestication about 10000 years ago in the fertile crescent taurine
cattle have colonized Europe and Africa and have been selected for different
human needs (Taberlet et al 2011) Particularly in the last centuries the
anthropic pressure led to the formation of hundreds of specialized breeds
adapted to different environmental conditions and linked to local tradition
constitutingagenepooldeservingattentionforconservation(AjmoneMarsanet
al2010)Someofthesebreedshavebeenselectedspecificallyfordairybeefor
bothproductiontypesfollowingstrongartificialselectionforthesetraits(Hayes
etal2009)Thepresentstudyaimsat identifyingsignalsof recentdirectional
selectionusingrEHHmethodindairyandbeefproductiontypesusingdatafrom
five Italian dairy beef and dual purpose cattle breeds We focused on the
significant core haplotypes scanned by rEHH that are shared among breeds of
thesameproductiontypeThenweusedabioinformaticsapproachto identify
positionalcandidategeneslyingwithinthegenomicregionsunderselectionand
investigatetheirbiologicalrole
Methods
SampleAnimalsandGenotyping
A total of 4311 bulls of five Italian dairy beef anddual purpose breedswere
genotyped with the Illumina BovineSNP50 BeadChip v1 (Illumina San Diego
CA)joiningthegenotypingeffortsoftwoItalianprojects(namelyldquoSelMolrdquoand
ldquoProzoordquo)Thedatasetincluded101replicatesand773siresonspairsusedfor
downstreamqualitycheckofdataproducedIndetailgenotypesof2954dairy
(2179 ItalianHolsteinand775 ItalianBrown)864beef (485Marchigianaand
379 Piedmontese) and 493 dual purpose (Italian Simmental) bulls were
availableDataqualitycontrol(QC)wasperformedintwostagesafirststageon
15
animals independently for each breed applying the same methods and
thresholdsandasecondstageappliedonmarkersacrossall theindividualsof
thedatasetThefirststageexcludedindividualswithunexpectedlyhigh(gt02)
Mendelian errors for fatherson couples andwith low call rates (lt95)The
secondstageexcludedi)SNPswithle25missingvaluesinthewholedataset
orcompletelymissing inonebreed ii)SNPswithminorallele frequencyle5
and iii) SNPs in sexual chromosomes and with unassigned chromosome or
physicalposition
EstimationofrEHH
HaplotypeswereobtainedusingfastPHASEsoftwareconsideringdefaultoptions
(ScheetandStephens2006)andwererunbreedwiseandchromosomewisein
eachbreedPedigreeinformationforallbullswereprovidedbytheirrespective
breederassociations andwereused to filteroutdirect relatives (in fatherson
pairsthesonwasmaintainedinthedatasetandthefatherdiscarded)andover
representedfamilies(amaximumof5randomlychosenindividualsperhalfsib
familywasallowed)Thefinaldatasetcontainingtheseldquolessrelatedrdquoanimalswill
behenceforthcalled ldquononredundantrdquodatasetThisnonredundantdatasetwas
used also to calculate within breed pairwise wholegenome Linkage
Disequilibrium(LD)Ther2statisticforallpairsofmarkerswasobtainedusing
PLINK software v107 (Purcell et al 2007)Thedecayof LDwas assessedby
averagingr2valuesupto1Mbdistance
To test if the population structure influenced the detection of rEHH we
replicated all analyses on the whole Italian Holstein dataset without any
population correction (ie ldquoredundantrdquo dataset) focusing on genes or gene
clustersthatarewellknowntobeunderrecentselectionincattle(ieldquocandidate
regionsrdquo) In particularwe focused on the casein polled gene cluster and coat
colourgenes(MC1RandKIT(Kemperetal2014Qanbarietal2010))
The EHH and rEHH calculations were performed using Sweep v11 software
(Sabetietal2002)Somedefaultprogramsettingshadtobemodifiedtoadapt
theseanalyses to thebovinegenomeas thesoftwarewasoriginallydeveloped
forhumangeneticanalysesSpecificallylocalrecombinationratesbetweenSNPs
16
wereapproximatedto1cMperMbEHHandrEHHcalculationswereperformed
breedwise and chromosomewise using an automatic core selection with
defaultoptions(ieconsideringlongestnonoverlappingcoresandlimitingcores
toatleast3andnomorethan20SNPs)asinQanbarietal(2010)AlthoughEHH
and rEHH values were obtained for all core haplotypes only those with
frequency ge 25 were retained for further analyses The (empirical) rEHH
significancethresholdwasobtainedbydividingtherEHHvaluesinto20binsof
5 frequency range each logtransforming withinbin values to achieve
normality Core haplotypes with pvalues lt 005 were considered significant
DetectionofcommonselectionsignatureswasobtainedcomparingrEHHvalues
acrossgenomicregionsacrossbreeds
Breedgroupingaccordingtoproductiontype
Toexamine thesignals common toeachof the twoproduction types (iedairy
andbeef) corehaplotypes thatsharedoneormoreSNP inat least twobreeds
from the same production type were selected The dual purpose Italian
Simmentalwas included inbothdairy (ItalianHolsteinand ItalianBrown)and
beef groups (Piedmontese and Marchigiana) since potentially it possesses
haplotypes selected for both production types All downstream analyses were
performedseparatelyforthedairyandthebeefproductiontypes
Detectionandannotationofcandidategenes
Genome annotation was performed using Biomart
(httpwwwbiomartorgindexhtml) a comprehensive source of gene
annotationformanyspeciesprovidedbyEnsembl(wwwensemblorg)Thelist
ofsharedhaplotypesregions(inbp)fordairyandbeefwasusedasinputfilein
theBiomartwebinterfaceThesetofgenesretrievedbyBiomartwasthenused
as input for theCanonicalpathwayandregulatorynetworkanalyses Ingenuity
PathwayAnalysis toolversion80(IPA IngenuityregSystems IncRedwoodCity
CA httpwwwingenuitycom) and a substantial examination of published
literaturewasusedtoexaminethefunctionalrelationshipsamongtheresulting
genes The IPA operates with a proprietary knowledge database providing
complementarypathwayanalysisforseveralspeciesincludingcattleIntheIPA
17
analyses the significance of each biological function identified was estimated
using Fisherrsquos exact test A Benjamini and Hochberg correction for multiple
testingwas calculated for function and canonical pathways For eachNetwork
IPAcomputedascore(pscore=log10(pvalue))fittingthesetstudiedgenesand
a list of biological functionspresent in the IPAknowledgedatabase The score
considered thenumberofgenes in thenetworkand thesizeof thenetwork to
estimatehowrelevantthisnetworkwasaccordingtothelistofgenesprovided
Thenthescorewasusedtorankthenetworks
Results
DatasetQC
Therepeatabilityobtainedfromthe101replicatespresentinthewholedataset
washigherthan998AfterthetwoQCstages105individualsand9730SNPs
were discarded Moreover after phasing 1292 individuals were removed to
reducethe largenumberofsibfamiliespresent intheredundantdatasetAfter
quality control procedures the final dataset had 44271 SNPs and 1132 514
393 410 and 364 individuals from Italian Holstein Italian Brown Italian
SimmentalMarchigianaandPiedmontesebullsrespectively(Table1)
18
Table1JNumberofanimalsgenotypedbeforeandafterQCTotalnumberofanimalgenotypedandrelativenumberofanimaldiscardedafterQCanalysis
Detectionofselectionsignatures
Sweepv11 softwaredetectedrEHHsignalswitha frequencyhigher than25overlapping the test candidate regions in both datasets corrected or not forpopulation structure In the redundant dataset we found only one significantrEHHsignal(caseincluster)whereasinthenonredundantdatasetweidentified5significantrEHHsignalsoverlappingallthetestcandidateregionsconsidered(caseinclusterpolledclusterMC1R)(Table2)
BreedTotal
Genotyped
ED15
misAN
ED1
REPL
ED1
MENDCleaned
HOL 2179 40 31 5 2093
BRW 775 6 16 4 749
SIM 493 6 6 2 479
MAR 485 37 38 410
PIE 379 5 10 364
19
Table2JComparisonofcandidateregionrEHHinnonJredundantandredundantdatasetrEHH overlapping candidate regions in both corrected (nonredundant) and not corrected
(redundant)datasetsforpopulation
Redundant NonRedundant
Candidate
geneBTA1
Closest
SNP(bp)CH2range CH2Frequency
rEHHJ
log(pval)CH2Frequency
rEHHJ
log(pval)
POLLED
Cluster1 1981154 18974181981154 H1079 093037 H1078 167174
MC1R 18 13657912 1331772014007505 H1069 120096 H1069 077167
KIT_BOVIN 6 72821175 7250492172821175 H1054 030030 H1054 033048
Casein
Cluster6 88427760 8835009888452835 H1047 094139 H1046 148133
Casein
Cluster6 88427760 8835009888452835 H2032 002003 H2033 002003
In total5526567847724388and4049corehaplotypeswitha frequency
higher than 25 were detected on Italian Holstein Italian Brown Italian
Simmental Marchigiana and Piedmontese breeds respectively A total of 838
866 740 692 and 613 core haplotypes were found significant outliers (see
Materials and Methods) in the aforementioned breeds Table 3 shows the
distribution of the total and significant core haplotypes per chromosome and
breed
20
Tableamp3amp(ampCoreamphaplotypesampdistributionampDistributionoftotalandsignificantcorehaplotypesperchromosomeandbreed
ampamp ItalianampampHolsteinamp
ItalianampampBrownamp
ItalianampSimmentalamp Marchigianaamp Piedmonteseamp
BTA1amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
1amp 389 66 389 68 335 49 310 51 288 46
2amp 312 47 314 52 267 44 257 42 238 33
3amp 292 48 313 62 264 46 236 34 227 34
4amp 268 46 279 50 245 41 237 39 229 39
5amp 223 32 231 29 200 30 183 27 171 22
6amp 291 51 307 40 265 45 252 38 233 42
7amp 249 37 253 46 214 39 195 33 204 27
8amp 268 39 270 38 235 35 223 32 209 31
9amp 227 37 228 41 210 37 165 22 187 30
10amp 219 36 234 34 192 25 196 26 168 25
11amp 248 39 246 34 209 30 205 33 187 25
12amp 162 24 176 18 148 21 148 25 135 22
13amp 205 32 200 19 155 18 178 35 135 19
14amp 175 17 189 25 168 29 143 22 129 19
15amp 180 31 185 29 148 23 139 13 130 19
16amp 176 28 192 29 160 25 149 19 127 15
17amp 170 27 183 29 168 31 135 22 123 16
18amp 133 18 153 19 119 14 108 10 82 12
19amp 137 22 130 23 118 16 99 18 92 15
20amp 184 35 172 33 145 23 124 23 124 17
21amp 145 23 147 30 113 18 110 21 94 13
22amp 140 24 145 19 115 21 112 18 104 13
23amp 106 13 100 16 67 10 64 9 57 8
24amp 129 11 140 20 124 17 97 23 101 16
25amp 91 12 98 11 68 10 77 12 70 6
26amp 125 7 114 14 93 12 96 15 90 16
27amp 91 12 91 15 75 11 84 12 60 10
28amp 88 9 96 10 70 10 66 9 55 11
29amp 103 15 103 13 82 10 72 9 67 12
TOTamp 5526 838 5678 866 4772 740 4388 692 4049 613
1BostaurusAutosomes2Detectedcorehaplotypeswithafrequencyinthebreedhigherthan253Significantcorehaplotypesatplt005
Comparisonamptoamppreviouslyampreportedampdataamp
AnumberofstudieshavesearchedselectionsweepsinHolstein(Qanbarietal
2010) Brown (Qanbari et al 2011) and Simmental (Fan et al 2014) breeds
Since different methods are expected to identify signatures having different
characteristicsthecomparisonwithpreviousstudiesislimitedtothosestudies
using thesamemethodand thesamebreed(s)analysed inourstudyOnlyone
study analysed (German) HolsteinRFriesian cattle with rEHH (Qanbari et al
2010)Qanbarietal(2010)reportedcandidategenesandgeneclustersunder
recent positive selection that we compared with our rEHH in dairy breeds
dataset(Table4)
Table4JCorehaplotypesincandidategenesfollowingQanbarietal2011
1BosTaurusChromosome2Corehaplotype
Thenumberofcorehaplotypesfoundinourstudyinthecandidateregionswas
lowerpreviouslyreportedWhenconsideringHolsteinsthetwomostsignificant
candidateregionsinbothstudiesmatch(caseinclusterandthesomatostatinSST
gene)althoughitisimpossibletodetermineifthehaplotypeunderselectionis
the same as this information was not provided in Qanbari et al (2010)
However other genes considered significant in Qanbari et al (with pvalues
ranging from 004 to 010) were not significant in our studyWhen using the
same loose significant threshold (pvalues le 010) ldquosignificantldquo signals were
foundalsoinItalianBrownfortheCaseinCluster(log10pvalue=109)andin
ItalianSimmentalfortheSSTgene(log10pvalue=126)
HOLSTEIN BROWN SIMMENTAL
Candidate
geneBTA1
Closest
SNP
(bp)
CH2rangeCH2
Frequency
rEHH
Hlog(p)CH2range
CH2
Frequency
rEHH
Hlog(p)CH2range
CH2
Frequency
rEHH
Hlog(p)
DGAT1 14 444963236653443936
H1057 011443936763332
H1042 0130007
Casein
Cluster6 88391612
8835009888452835
H1046 1481338832601288452835
H2068 0801098835009888452835
H1044 037022
8835009888452835
H2033 018030
8835009888452835
H3034 022045
GH 19 49652377 GHR 20 33908597
SST 1 813769568128358581376961
H1036 200189 8131845181376961
H1042 126053
8128358581376961
H2029 00630084 8131845181376961
H2042 006027
IGFH1 5 71169823
ABCG2 6 37374911 3731702038256889
H1044 031027
3731702038256889
H1040 029031
Leptin 4 95715500
LPR 3 855692038549710885594551
H1047 0910728549710885794693
H1068 0800638549710885794693
H1063 002005
8549710885594551
H2041 027025
PITH1 1 35756434
22
Signaturessharedbydairyandbybeefbreeds
Significantcorehaplotypeswerealignedacrossbreedsto identifythoseshared
among dairy or beef production types Since breeds can be considered
independentsetsofobservationssharedsignaturesaremorelikelytorepresent
real effects rather than false positives A total of 123 and 142 significant core
haplotypesweresharedbetweenatleasttwodairyorbeefbreedsrespectively
(File S1) Only 82 and 87 of those significant core haplotypes contained genes
considered positional candidates under positive selection In total 22 and
17 of the genomewas under selection for dairy and beef production types
respectivelyTherEHHaveragelengthswere216932and190994bpfordairy
andbeefrespectively
Genesetannotationandnetworkpathwayanalysis
Atotalof244and232annotatedgenesfellwithintheselectedregionsfordairy
andbeefproductiontypesrespectively(Table5)
23
Table5JStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreedsStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreeds
DairyCattle BeefCattle
BTA SelSig
n
Genes1 Sum
SelSignsize
(bp)
Mean
SelSign
size(bp)
SelSi
gn
Genes1 Sum
SelSignsize
(bp)
Mean
SelSign
size(bp)
1 7 11 1521608 138328 7 13 2263213 174093
2 5 9 2216685 246298 8 11 2658549 2416863 2 5 1619336 323867 7 21 3755847 1788504 7 21 5956755 283655 6 11 1761288 1601175 4 17 4506119 265066 5 11 1251127 1137396 2 4 1467738 366934 5 12 5149692 4291417 6 30 4842969 161432 3 28 11889191 4246148 0 0 0 0 4 12 3512215 2926859 4 7 1554863 222123 2 6 1106768 18446110 4 17 8839762 519986 2 3 573138 19104611 6 8 1841802 230225 1 1 100439 10043912 2 8 4244133 530517 1 1 536461 53646113 3 8 1933026 241628 2 8 985376 12317214 0 0 0 0 3 9 692397 7693315 5 9 1415405 157267 2 2 189094 9454716 3 6 1302506 217084 4 6 905719 15095317 3 4 471046 117762 2 9 1551010 17233418 0 0 0 0 3 23 2961718 12877019 3 23 5744143 249745 2 2 116938 5846920 1 2 148886 74443 2 4 627971 15699321 3 7 1930206 275744 1 5 1150760 23015222 3 5 620674 124135 3 10 2411751 24117523 0 0 0 0 1 1 68421 6842124 2 18 9166004 509222 2 4 803770 20094225 2 11 2016602 183327 1 3 329199 10973326 2 4 466134 116534 4 7 776080 11086927 1 1 631792 631792 2 3 1004717 33490628 0 0 0 0 1 1 93464 9346429 2 9 935209 103912 1 5 798300 159660TOT 82 244 65393403 216932 87 232 50024613 190994
1GenesidentifiedGenesintheregionssharedbyallthreedairybreeds(8genes)orallthreebeefbreeds(11genes)foramoredetailedanalysiswereselected(Fig1)
24
Figure1JGenomiclocationoftheselectionsignaturessharedamongstudiedbreedsa)GenesinensembletracksaredisplayedasaredboxcorehaplotypesandSNPsarecolouredinorange(MarchigianaMAR)inviolet(PiedmontesePIE)andpink(SimmentalSIM)b)Genesinensemble tracks are displayed as a red box core haplotypes and SNPs are coloured in blue(HolsteinHOL)ingreen(ItalianBrownBRW)andpink(SimmentalSIM)
All the genes identified were submitted to downstream pathwaynetwork
analyses(Table5Figure1FileS1)Interestinglygenesintheregionssharedby
allthreedairyorbeefbreedsweremanifestedinthemostlikelynetworksforthe
two production aptitudes The IPA analysis detected a total of 12 and 15
networksindairyandbeefbreedsrespectively
Indairybreeds network scores ranged from46 to2 (File S2) Eightnetworks
obtainedscoreshigherthan20andincludedmorethan14moleculeseachThey
wereassociatedtoi)geneexpressionRNAdamageandrepairproteinsynthesis
andposttranslationalmodificationii)cellassemblyorganizationfunctionand
maintenance iii) cell cycle proliferation and cancer iv) carbohydrate
metabolism v) gastrointestinal and neurological disease developmental and
endocrine system disorders A total of 23moleculeswere associatedwith the
highest ranked network (score = 46) In this the immunoglobulins mitogen
25
activatedproteinkinasesandNFkBcomplexesformacentralhublinkinggenes
involved in celltocell signaling and interaction hematological system
developmentandfunctionandimmunecelltraffickingfunctions(Figure2)
Figure2JThemostlikelyIPAnetworkindairycattlebreedsThefunctionsassociatedtothenetworkarecelltocellsignallingandinteractionhematologicalsystem development and function immune cell trafficking Nodes in red correspond to genesidentifiedincorehaplotypesoverlappinginallthethreebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds
Breast cancer antiNestrogen resistance gene 3 (BARC3) andpituitary glutaminyl
cyclase gene (QPCT) genes were common to all three dairy breeds and are
directlyconnectedwithmammaryglandmetabolisms(Ezuraetal2004GSun
et al 2012) The solute carrier family 2 member 5 (SLC2A5) gene facilitates
glucosefructose transport (Zhao et al 1998) and the zetaNchain (TCR)
26
associated protein kinase 70kDa (ZAP70) gene plays a critical role in Tcell
signaling (Bonnefont et al 2011) Calpain is another important complex that
togetherwithcalpainN3(CAPN3)mediatesepithelialcelldeathduringmammary
gland involution (Wilde et al 1997) Furthermore RAS guanyl nucleotide
releasingprotein(RASGRP1)activatestheErkMAPkinasecascaderegulatesT
cellsandBcellsdevelopmenthomeostasisanddifferentiationandisinvolvedin
the regulation of breast cancer cell (Madani et al 2004 Yasuda et al 2007
Wickramasingheetal2009)Genesinvolvedintheothernetworksarelistedin
FileS2
The scoreof thebeef cattlebreedsnetwork rangedbetween46and2 and the
numberofmoleculespresentineachnetworkrangedfrom23to1(FileS3)Asin
dairy breeds eight networks obtained a score higher than 20 and 13 ormore
molecules each These networkswere associated to i) carbohydrate lipid and
nucleic acidsmetabolism energyproduction and smallmoleculebiochemistry
ii) cardiovascular system development and function iii) cellular and tissue
developmentcellassemblyandorganizationfunctionandmaintenanceandcell
and organmorphology iv) cell cycle celltocell signaling and interaction v)
DNA replication recombination and repair and RNA posttranscriptional
modification vi) dermatological diseases and conditions infectious disease
organismal injury and abnormalities In the network with the highest score
(score=46)chondroitinsulfateproteoglycan4(CSPG4)andsnurportinN1(SNUPN)
genes were shared among all beef cattle breeds investigated CSPG4 gene is
relatedtomeattendernesswhileSNUPN isanimprintedgeneimportantinthe
embryodevelopmentandinvolvedinhumanmuscleatrophy(Narayananetal
2002) (Figure 3) All the genes included in this network were involved in
carbohydratemetabolismsmallmoleculesbiochemistryandcellcycleTheother
networkswere related to important signaling andmetabolic process essential
foranimalgrowth(FileS3)
27
Figure3JThemostlikelyIPAnetworkinbeefcattlebreedsThe functions associated to the network are carbohydrate metabolism small molecule
biochemistry and cell cycle Nodes in red correspond to genes identified in core haplotypes
overlapping in all the three breeds of each production type whereas those in green depict
overlappingcorehaplotypesinatleasttwoofthosebreeds
A total of 6 and 9 statistically significant Canonical Pathways (FDR le 005
log10(FDR) ge 13) were identified using IPA for dairy and beef breeds
respectively(Figure4)
28
Figure4JBarplotofstatisticallysignificantCanonicalPathwaysPvalues were corrected for multiple testing using BenjaminiHochberg method and arepresented in thegraphas log(pvalue)Thebar represents thepercentageof genes in a givenpathway that meet cutoff criteria within the total number of molecules that belong to thefunctiona)BarplotofstatisticallysignificantCanonicalPathways indairycattlebreedsb)BarplotofstatisticallysignificantCanonicalPathwaysinbeefcattlebreeds
Themost significant Canonical Pathway in dairywas for purinemetabolism (
log10(FDR) = 26) supporting the highly synthetic processes in the mammary
epithelium(Figure5)
29
Figure5JGenesdetectedunderrecentpositiveselectionindairycattleinvolvedinpurinemetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds
Inbeefproductiontheephrinreceptorsignal(log10(FDR)=27)wasthemost
significant Canonical Pathway (Figure 6) Among other functions ephrin is
knowntopromotemuscleprogenitorcellmigrationbeforemitoticactivation(Li
andJohnson2013)AlltheotherCanonicalPathwaysarereportedinFileS4
30
Figure6JGenesdetectedunderrecentpositiveselectioninbeefcattleinvolvedinEphrinmetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds
Discussion
In this work more than 4000 genotyped bulls of five dairy beef and dualpurposeItalianbreedswerescreenedsearchingforselectionsignaturesindairyand beef production types A strict data quality controlwas applied to reducepossiblesourcesofbiassuchastechnicalgenotypingproblemsandpopulationstructurethatcouldhavelargeimpactonrEHHresultsInfactsincetheimpactofpopulationstructureonrEHHvaluesisstillunexploredwereplicatedpartofouranalyseswithandwithoutcloserelativesinanattempttostudytheeffectofpopulationstructureonthisselectionsignaturemethodInprinciplepopulationstratificationshouldleadtoanoverrepresentationofcertainhaplotypessimplyduetothepresenceoftightpedigreelinks(egsirespassinghalfoftheirgeneticmaterialtotheirsons)ratherthanactualselectionprocessesForthisreasonfor
31
the full analyses all siresons couples were removed after haplotype phasing(retaining only the younger animal) and halfsib familieswere restricted to amaximum of five randomly chosen individuals in an attempt to reduce overrepresentation This assessment was restricted to four regions known to beunderselectioninItalianHolsteinasthecaseinclusterthepolledgeneclusterMC1R and KIT Italian Holstein was selected for two reasons i) it is a highlystructuredbreedaccordingtoourdata(abouthalfofindividualsremovedwereclose relatives) ii) allowed us to compare our results with a previous studyAlthoughtheanalysesconductedonbothredundantandnonredundantdatasetswereable to identify rEHHsignals in these regions thenonredundantdatasetproduced five significant rEHH signals rather than just one in the redundantdataset(Table2)Thisresulthighlightstheconfoundingeffectofthepresenceofcloserelativesinthedatasetandconsequentlytheimprovedabilitytodetectedsignificant selection signature while correcting for population structureInterestingly our results only partially overlap those found by Quanbari et al(2010) These authors detected significant signals at the casein cluster andsomatostatinSSTaswedid but also at ahighernumberof candidate regionsThese inconsistencies may be the effect of different sires included in theanalyses of the different dataset size and to the close relative trimmingprocedure we adopted to decrease the effect of population structure andconsequent bias (Table 4) However poor correlation across investigations isoftenobservedalsointhedeeplyinvestigatedhumanspeciesThisisduetotheuse of different within and between breed statistics that identify selectionsignatureshavingdifferent characteristics (ancientrecent segregatingfixatedunder directionalbalancing selection) to the likely high rate of falsepositivesnegativesresultsandalsotothedifferentconsiderationofpopulationstructureandbackgroundselection(Enardetal2014)The number of total and significant core haplotypes identified by the SweepsoftwarewasthehighestinItalianBrownandthelowestinPiedmontesebreedSinceEHHisheavilybasedonpopulationLDtheaverageLDleveloverdistancewascalculatedineachbreed(Figure7)
32
Figure7JMultiJbreedaveragelinkagedisequilibriumagainstphysicaldistance(inkb)Marchigiana Piedmontese and Italian Simmental breeds present a lower persistence of LD allargerdistancethanItalianHolsteinandItalianBrownbreedsThisanalysishighlightedageneralpositivecorrelationbetweenthe levelofLDover distance and the number of total and significant core haplotypes foundHoweverconsideringthatrEHHisarelativemeasure it ismorelikelythatthehighernumberofsignificantcorehaplotypesidentifiedindairybreedswasduemostlytoahigherselectivepressureindairycomparedtobeefbreedsSignificance tests used in detecting selection signatures should measure theprobability of a statistic to be anoutlier compared to its expecteddistributionunder a neutral model However no reliable neutral model has so far beendevelopedincattlebecauseofthecomplexityofthedemographichistoryofthisspecies (Kemper et al 2014) As a consequence empiric rather than modelbasedsignificancetestaregenerallyusedinthedetectionofselectionsignaturesAccordinglyweconsideredasoutliersvaluesas thosesimply falling in the5plusvarianttailoftherEHHdistributionWekeptthewithinbreedsignificancethresholdloosenoncorrectingitformultipletestingbutconsideredonlysignals
33
eitherexceedingthisthresholdbyatleasttwoordersofmagnitudeorsharedby
twoormoreindependentbreeds
Parallel comparison of results from independent analyses of different breeds
allowed the achievement of a double objective i) to identify putative regions
under(recent)selectioninbreedsbelongingtotwoproductiontypeswhichwas
the main objective of this study and ii) reducing false positives since multi
breed analyses served as internal control Since the rEHH method does not
considerphenotypicinformationasignificantsignalmightarisebecausei)the
core haplotype is actually under a selective pressure ii) the result is a false
positive ie causedby chance population structureor anyotherdriving force
However even considering an unrealistic scenario with no false positives a
proportionofthetotalamountofsignalswouldactuallybeselectionsignatures
due to a selectionpressurenot ondairy or beef traits This becausedairy and
beefbreedsinvestigatedareonlyafewandshareanumberoftraitsnotdirectly
related to dairybeef production Even considering this limitation to our
knowledge this is the first dairy and beef multibreed study applying this
strategytoreducefalsepositivesatthecostofpossiblelossofinformationdue
tohighfalsenegativeratesSignificantsignalsharedbydairyandbybeefbreeds
wereused fordownstreamnetworkanalysesonpositional candidategenes to
search forabiological justification towhatwasobtainedstudying thegenomic
information Only the top network for dairy and the top network for beef
productionarediscussedindetail
Dairybreeds
ThemostsignificantnetworkdetectedindairybreedsisassociatedwithCellTo
CellSignalingandInteractionHematologicalSystemDevelopmentandFunction
andImmuneCellTraffickingfunctions(Figure2)ImmunoglobulinandNFkBact
aspivotalgeneswithinthisnetworkInterestinglyBARC3andQPCTgeneswere
sharedamongallthethreedairybreedsBothgeneshavenotyetbeenstudiedin
cattlealthoughinhumanstudiesthesegenesarewellknowntobelinkedtothe
mammaryglandmetobolismandthecalciumregulationTheformerisinvolved
in integrinmediated cell adhesions and signaling which is required for
mammary gland development and functions (G Sun et al 2012) The latter is
34
associatedwithlowradialbodymineraldensity(BMD)inadultwomen(Ezuraet
al2004)AnotherimportantplayeronthisnetworkisSLC2A5genealsoknown
asGLUT5 SLC2A5 gene acts as fructose transporter in the intestine and has a
significantroleinenergybalanceofdairycows(Burantetal1992)Findingthis
gene in adairy cattle specificnetwork seems surprising since there shouldbe
little need for transporting glucose andor fructose in the ruminant intestinal
tractassimplecarbohydratesaredegradedintovolatilefattyacids(VFA)inthe
rumen (Iqbal et al 2009) However it is often the case that large amount of
starchbypasstherumenincowsfedwithdietsrichincerealgrains(Zhaoetal
1998)Thisbypassedstarchneedstobedigestedinthesmallintestinetoavoid
highlevelsofglucoseconcentrationinthelargeintestine
WedetectedcalpaincomplexandcalpainN3(CAPN3)genesunderpositiverecent
selection in the dairy group as reported also by Utsunomiya et al (2013)
AltoughCalpainisknowntobeinvolvedinpostmortemmeattendernizationitis
also related to dairy metabolism as the muscle breakdown promoted by the
calpain gene provides an energy source for milk production especially at the
beginningoflactation(Kuhlaetal2011)InadditionasreportedbyWildeetal
(1997) calpainNcalpastatin system is related to programmed cell death of
alveolar secretory epithelial cells during lactation Zap70 gene is a cytoplasmic
proteintyrosinekinaseItisrelatedtotheimmunesystemasacomponentofthe
Tcell receptor playing a central role in Tcell responses (Wang et al 2010)
Bonnefontet al (2011) found thatZap70 genewasup regulated (FoldChange
(FC)=82FDR=277E12)onmilksomaticcellsofsheepinfectedbySaureus
andSepidermidisshowinganassociationwithmastitisresistance
Purinemetabolismwasthemostsignificantcanonicalpathwayindairybreeds
In a study conducted in human mammary epthelium Maningat et al (2008)
conductedageneexpressionanalysisonbreastmilkfatglobulesidentifyingthe
PurinemetabolismasthemostsignificantpathwaySynthesisandbreakdownof
purine is essential in the metabolism of the tissues of many organisms
particularlyofthemammaryglandduringlactation
Another interesting canonical pathway was endothelin signaling (Figure 5)
endothelin functions as vasocontrictor and is secreted by endothelial cells
35
(Nussdorferetal1999)Acostaetal(1999)reportedthatincattleendothelinsareinvolvedinthefollicularproductionofprostaglandinsandtheregulationofsteroidogenesis in the mature follicle In a recent study Puglisi et al (2013)confirmed the implication of endothelin in particular EDNRA and endothelinconvertin enzyme 1 in reproductive disorder in bovine They classified theEDNRAaspotentialbiomarkerforfertilityincow
Beefbreeds
Inbeefcattlebreedsthefocaladhesionkinase(FAK)togetherwithintegrinandERK12genesplayacentralroleashubsinthemostsignificantnetwork(Figure3)Theyareinvolvedincellularadhesionandcellmotilityprocessesparticularlyduringskeletalmyogenesis(Nguyenetal2014)Moreovertheoverexpressionof FAK can determine a reduced apoptosis and an increase risk of metastaticcancers(MehlenandPuisieux2006)Thesehubsarenotdirectlyinvolvedinourstudybut sharemany interactors thatare located in region found tobeunderselction such as CSPG4 RB1CC1 MGAT3 CIRPB CSPG4 gene belongs tochondroitin sulfate proteoglycans (CSPGs) family CSPGs are proteoglycansconsistingofaproteincoreandachondroitinsulfatesidechainTheyareknownto be structural components of a variety of tissues includingmuscle andplaykey roles inneuraldevelopmentandglial scar formationTheyare involved incellular processes such as cell adhesion cell growth receptor binding cellmigrationandinteractionswithotherextracellularmatrixconstituents
Manystudiesreporttheroleofproteoglycansinmeattextureofseveralmusclesfromthebovinespecies(Nishimuraetal1996)Dubostetal(2013)highlightadirect involvement of proteoglycans with respect to cooked meat juicinessRB1CC1geneplaysacrucialroleinmusculardifferentiationanditsactivationisa essential for myogenic differentiation (Watanabe et al 2005 p 1)Monoacylglycerol acyltransferase (MGAT3) gene cause the synthesis ofdiacylglycerol (DAG)using2monoacylglyceroland fattyacyl coenzymeAThisenzymaticreactionis fundamental fortheabsorptionofdietaryfat inthesmallintestineSunetal(2012p3)reportedinastudyonfiveChinesecattlebreedsthatMGAT3gene isassociatedwithgrowthtraitsCIRPBgenemaybepartofacompensatorymechanism inmuscles undergoing atrophy It preservesmuscle
36
tissuemassduringcoldshockresponsesaginganddisuse(DupontVersteegden
et al 2008) SNUPN is an imprinted gene and is expressed monoallelically
dependingonitsparentaloriginSNUPNplayimportantrolesinembryosurvival
andpostnatal growth regulation (Smithet al 2012Wanget al 2012)Ephrin
receptorsignalingwasthetopcanonicalpathwayproducebyIPAandhasalsoan
interestingbiologicalinterpretationformeatproduction(Figure6)Indeedthis
pathaway is important for the muscle tissue growth and regeneration
participating in correct positioning and formation of the neural muscular
junction(LiandJohnson2013)
Conclusions
In this study we analysed the selection signatures at genomewide level in 5
Italian cattle breeds Then we used a multibreed approach to identify the
genomicregionssharedamongcattlebreedsselectedfordairyorbeefpurpose
Thisapproachincreasedthepowertopinpointregionsofthegenomethatplay
important roles in economically relevant traits in both dairy and beef cattle
Moreover network and pathways analyses were used to display a detailed
functional characterization of genes mapping in the regions detected under
recentpositiveselection
Particularly in dairy cattle genes under directional selection are related to
feeding adaptation (increasing level of starch in the diet) mammary gland
metabolismandresistencetomastitisThebiologicalfeaturesunderselectionin
beef cattle breeds are involved in animal growth meat texture and juiceness
These results have a logical biological explanation However the frequent
approach of interpreting selection signatures on the basis of the function of
geneslocatednearbythepeaksignalofthestatisticusedispresentlychallenged
Novelinformationinhumanssuggeststhatmostselectedvariationisnotlocated
within genes and coding regions but in regulatory sites identified by the
ENCODEproject(Enardetal2014)Thesemaycontroltheexpressionofentire
genomicregionsorofgeneslocatedatarelavantdistancefromtheselectedsite
makingbiologicalinterpretationmorecomplex
37
InanycasefurtherstudiesusingdenserSNPchipsorwholegenomesequencing
producinginformationnotsubjectedtoascertainmentbias(Qanbarietal2014)
may increase the resolution of our analysis and togetherwith new knowledge
arisingonthecontrolofgeneexpressionvalidateorcorrectourresults
Acknowledgements
ANAFI (Italian Association of Holstein Friesian Breeders) ANAPRI (Italian
Association Simmental Breeders) ANABORAPI (National Association of
PiedmonteseBreeders)ANABIC(AssociationofItalianBeefCattleBreeders)and
ANARB (Italian Brown Cattle Breeders Association) are acknowledged for
providing the biological samples and the necessary information for this work
The Authors are also grateful to SelMol project (MiPAAF Ministero delle
PoliticheAgricoleAlimentarieFortestali)andbytheProZooproject(Regione
LombardiandashDGAgricoltura Fondazione Cariplo Fondazione Banca Popolare di
Lodi)forfunding
Thefundershadnoroleinstudydesigndatacollectionandanalysisdecisionto
publishorpreparationofthemanuscript
References
Acosta TJ Berisha B Ozawa T Sato K Schams D Miyamoto A 1999
Evidence for a Local EndothelinAngiotensinAtrial Natriuretic Peptide
SysteminBovineMature Follicles In Vitro Effects on SteroidHormones
and Prostaglandin Secretion Biol Reprod 61 1419ndash1425
doi101095biolreprod6161419
AjmoneMarsan P Garcia JF Lenstra JA 2010 On the origin of cattle how
aurochs became cattle and colonized theworld Evol Anthropol Issues
NewsRev19148ndash157
38
BonnefontCToufeerMCaubetCFoulonETascaCAurelMRBergonier
D Boullier S RobertGranieacute C Foucras G 2011 Transcriptomic
analysisofmilksomaticcells inmastitisresistantandsusceptiblesheep
upon challenge with Staphylococcus epidermidis and Staphylococcus
aureusBMCGenomics12208
BurantCFTakedaJBrotLarocheEBellGIDavidsonNO1992Fructose
transporter inhumanspermatozoaandsmall intestine isGLUT5 JBiol
Chem26714523ndash14526
DubostAMicolDPicardBLethiasCAnduezaDBauchartDListratA
2013Structuralandbiochemicalcharacteristicsofbovineintramuscular
connective tissue and beef quality Meat Sci 95 555ndash561
doi101016jmeatsci201305040
DupontVersteegden EE Nagarajan R Beggs ML Bearden ED Simpson
PMPetersonCA2008IdentificationofcoldshockproteinRBM3asa
possible regulator of skeletal muscle size through expression profiling
AJP Regul Integr Comp Physiol 295 R1263ndashR1273
doi101152ajpregu904552008
Enard D Messer PW Petrov DA 2014 Genomewide signals of positive
selection in human evolution Genome Res gr164822113
doi101101gr164822113
EzuraYKajitaMIshidaRYoshidaSYoshidaHSuzukiTHosoiTInoue
SShirakiMOrimoHEmiM2004Associationofmultiplenucleotide
variationsinthepituitaryglutaminylcyclasegene(QPCT)withlowradial
BMDinadultwomenJBoneMinRes191296ndash301
Fan H Wu Y Qi X Zhang J Li J Gao X Zhang L Li J Gao H 2014
Genomewide detection of selective signatures in Simmental cattle J
ApplGenet1ndash9doi101007s1335301402006
HayesBJChamberlainAJMaceachernSSavinKMcPartlanHMacLeodI
SethuramanLGoddardME2009Agenomemapofdivergentartificial
selectionbetweenBostaurusdairycattleandBostaurusbeefcattleAnim
Genet40176ndash184doi101111j13652052200801815x
Hayes BJ Lien S Nilsen H Olsen HG Berg P Maceachern S Potter S
Meuwissen THE 2008 The origin of selection signatures on bovine
39
chromosome 6 Anim Genet 39 105ndash111 doi101111j1365
2052200701683x
IqbalSZebeliQMazzolariABertoniGDunnSMYangWZAmetajBN
2009 Feeding barley grain steeped in lactic acid modulates rumen
fermentation patterns and increases milk fat content in dairy cows J
DairySci926023ndash6032doi103168jds20092380
KemperKESaxtonSJBolormaaSHayesBJGoddardME2014Selection
for complex traits leaves littleornoclassic signaturesof selectionBMC
Genomics15246doi1011861471216415246
KimuraM1984Theneutraltheoryofmolecularevolution
KuhlaBNuumlrnbergGAlbrechtDGoumlrsSHammonHMMetgesCC2011InvolvementofSkeletalMuscleProteinGlycogenandFatMetabolismin
the Adaptation on Early Lactation of Dairy Cows J Proteome Res 10
4252ndash4262doi101021pr200425h
Lenstra JAGroeneveld LF EdingHKantanen JWilliams JL Taberlet P
NicolazziELSolknerJSimianerHCianiEGarciaJFBrufordMW
AjmoneMarsan P Weigend S 2012 Molecular tools and analytical
approaches for the characterization of farm animal genetic diversity
AnimGenet43483ndash502
Li J Johnson SE 2013 EphrinA5 promotes bovine muscle progenitor cell
migration before mitotic activation J Anim Sci 91 1086ndash1093
doi102527jas20125728
Ma YL Zhang Q Ding XD 2012 [Detecting selection signatures on X
chromosome in pig through high density SNPs] Yi Chuan Hered
ZhongguoYiChuanXueHuiBianJi341251ndash1260
Madani S Hichami A CharkaouiMalki M Khan NA 2004 Diacylglycerols
Containing Omega 3 and Omega 6 Fatty Acids Bind to RasGRP and
Modulate MAP Kinase Activation J Biol Chem 279 1176ndash1183
doi101074jbcM306252200
Maningat PD Sen P Rijnkels M Sunehag AL Hadsell DL Bray M
Haymond MW 2008 Gene expression in the human mammary
epitheliumduring lactation themilk fat globule transcriptome Physiol
Genomics3712ndash22doi101152physiolgenomics903412008
40
Mehlen P Puisieux A 2006Metastasis a question of life or deathNat Rev
Cancer6449ndash458doi101038nrc1886
NarayananUOspinaJKFreyMRHebertMDMateraAG2002SMNthe
Spinal Muscular Atrophy Protein Forms a PreImport Snrnp Complex
withSnurportin1andImportin HumMolGenet111785ndash1795
Nguyen N Yi JS Park H Lee JS Ko YG 2014 Mitsugumin 53 (MG53)
LigaseUbiquitinatesFocalAdhesionKinaseduringSkeletalMyogenesisJ
BiolChem2893209ndash3216doi101074jbcM113525154
NielsenRHellmannIHubiszMBustamanteCClarkAG2007Recentand
ongoing selection in the human genome Nat Rev Genet 8 857ndash868
doi101038nrg2187
NishimuraTHattoriATakahashiK1996Relationshipbetweendegradation
of proteoglycans andweakening of the intramuscular connective tissue
duringpostmortemageingofbeefMeatSci42251ndash260
Nussdorfer GG Rossi GPMalendowicz LKMazzocchi G 1999Autocrine
ParacrineEndothelinSysteminthePhysiologyandPathologyofSteroid
SecretingTissuesPharmacolRev51403ndash438
OrsquoBrienAMPUtsunomiyaYTMeacuteszaacuterosGBickhartDM LiuGE Tassell
CPV Sonstegard TS Silva MVD Garcia JF Soumllkner J 2014
Assessing signatures of selection through variation in linkage
disequilibrium between taurine and indicine cattle Genet Sel Evol 46
19doi101186129796864619
Pan D Zhang S Jiang J Jiang L Zhang Q Liu J 2013 GenomeWide
DetectionofSelectiveSignatureinChineseHolsteinPLoSONE8e60440
doi101371journalpone0060440
PuglisiRCambuliCCapoferriRGianninoLLukajADuchiRLazzariG
Galli C Feligini M Galli A Bongioni G 2013 Differential gene
expressionincumulusoocytecomplexescollectedbyovumpickupfrom
repeat breeder and normally fertile Holstein Friesian heifers Anim
ReprodSci14126ndash33doi101016janireprosci201307003
Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender D
MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKa
41
tool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795
Qanbari S Gianola D Hayes B Schenkel F Miller S Moore S Thaller GSimianer H 2011 Application of site and haplotypefrequency basedapproachesfordetectingselectionsignaturesincattleBMCGenomics12318doi1011861471216412318
Qanbari S Pausch H Jansen S Somel M Strom TM Fries R Nielsen RSimianer H 2014 Classic Selective Sweeps Revealed by MassiveSequencinginCattlePLoSGenet10doi101371journalpgen1004148
Qanbari S Pimentel ECG Tetens J Thaller G Lichtner P Sharifi ARSimianerH2010AgenomewidescanforsignaturesofrecentselectioninHolsteincattleAnimGenetdoi101111j13652052200902016x
Sabeti PC Reich DE Higgins JM Levine HZ Richter DJ Schaffner SFGabriel SB Platko JV Patterson NJ McDonald GJ Ackerman HCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash7
ScheetPStephensM2006Afastandflexiblestatisticalmodelforlargescalepopulation genotype data applications to inferring missing genotypesandhaplotypicphaseAmJHumGenet78629ndash44
Shahzad K Loor JJ 2012 Application of topdown and bottomup systemsapproaches in ruminantphysiologyandmetabolismCurrGenomics13379
SmithLSuzuki JGoffAFilionFTherrien JMurphyBKohanGhadrHLefebvreRBrisvilleABuczinskiSFecteauGPerecinFMeirellesF2012DevelopmentalandEpigeneticAnomaliesinClonedCattleReprodDomestAnim47107ndash114doi101111j14390531201202063x
Sun G Cheng SYS Chen M Lim CJ Pallen CJ 2012 Protein TyrosinePhosphatase Phosphotyrosyl789 Binds BCAR3 To Position Cas forActivationatIntegrinMediatedFocalAdhesionsMolCellBiol323776ndash3789doi101128MCB0021412
Sun J Zhang C Lan X Lei C ChenH 2012 Exploring polymorphisms andassociationsofthebovineMOGAT3genewithgrowthtraitsGenomeNatl
42
Res Counc Can Geacutenome Cons Natl Rech Can 55 56ndash62doi101139g11077
TaberletPCoissacEPansu J PompanonF 2011Conservationgeneticsofcattle sheep and goats C R Biol On the trail of domesticationsmigrationsandinvasionsinagricultureDominiqueJobGeorgesPelletierJeanClaudePernollet334247ndash254doi101016jcrvi201012007
Utsunomiya YT Perez OrsquoBrien AM Sonstegard TS Van Tassell CP doCarmo AS Meszaros G Solkner J Garcia JF 2013 Detecting lociunder recent positive selection in dairy and beef cattle by combiningdifferentgenomewidescanmethodsPLoSOne8e64280
Voight BF Kudaravalli S Wen X Pritchard JK 2006 A Map of RecentPositive Selection in the Human Genome PLoS Biol 4 e72doi101371journalpbio0040072
WangHKadlecekTAAuYeungBBGoodfellowHESHsuLYFreedmanTSWeissA2010ZAP70AnEssentialKinaseinTcellSignalingColdSpringHarbPerspectBiol2doi101101cshperspecta002279
WangMZhangXKangLJiangCJiangY2012MolecularcharacterizationofporcineNECD SNRPNandUBE3Agenesand imprinting status in theskeletal muscle of neonate pigs Mol Biol Rep 39 9415ndash9422doi101007s1103301218066
WatanabeRChanoTInoueHIsonoTKoiwaiOOkabeH2005Rb1cc1iscritical for myoblast differentiation through Rb1 regulation VirchowsArchIntJPathol447643ndash648doi101007s0042800411831
WickramasingheNSManavalanTTDoughertySMRiggsKALiYKlingeCM 2009 Estradiol downregulates miR21 expression and increasesmiR21targetgeneexpressioninMCF7breastcancercellsNucleicAcidsRes372584ndash2595doi101093nargkp117
WildeCJAddeyCVLiPFernigDG1997Programmedcelldeathinbovinemammary tissue during lactation and involution Exp Physiol 82 943ndash953
YasudaSStevensRLTeradaTTakedaMHashimotoTFukaeJHoritaTKataoka H Atsumi T Koike T 2007 Defective Expression of Ras
43
Guanyl NucleotideReleasing Protein 1 in a Subset of Patients with
SystemicLupusErythematosusJImmunol1794890ndash4900
ZhangCBaileyDKAwadTLiuGXingGCaoMValmeekamVRetiefJ
Matsuzaki H Taub M Seielstad M Kennedy GC 2006 A whole
genome longrange haplotype (WGLRH) test for detecting imprints of
positiveselectioninhumanpopulationsBioinformatics222122ndash8
ZhaoFQOkineEKCheesemanCIShiraziBeecheySPKennellyJJ1998
Glucose transporter gene expression in lactating bovine gastrointestinal
tractJAnimSci762921ndash2929
44
Supplementaryfiles
YoucanfindChapterʹsupplementaryfilesat
- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)
- GoogleDrive(httptinyurlcomPhDMMChapter02)orscantheQRcode
SupplementaryfileS1ndashSignificantcorehaplotypesandgenessharedin
dairyandbeef
Significantcorehaplotypes(pvaluele005haplotypefrequencyge025)shared
indairyandbeefcattlebreedsandrelativegenesintersectingwiththemThefile
isdividedinto4tablesThefirst2sheets(sheet1and2)shownthesignificant
corehaplotypesfordairyandbeefThelast2sheets(sheet3and4)showngenes
intersectingwiththesignificantcorehaplotypesfordairyandbeef
SupplementaryfileS2ndashNetworksrankingindairybreeds
Networksrankingindairybreedswithrelativemoleculessymbolslistscore
valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop
functionforeachnetwork
SupplementaryfileS3ndashNetworksrankinginbeefbreeds
Networksrankinginbeefbreedswithrelativemoleculessymbolslistscore
valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop
functionforeachnetwork
45
SupplementaryfileS4ndashCanonicalpathwaysrankingindairyorbeefcattle
breeds
Canonicalpathwaysrankingindairy(sheet1)orbeef(sheet2)cattlebreedswithrelativemoleculessymbolslistratioandndashlog10ofthepvaluesforeachcanonicalpathway
46
CHAPTER3Searchingnewsignalsforproductivetraitsthroughgenebasedassociationanalysisin
threeItaliancattlebreedsSCapomaccio1MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItalyTheseauthorscontributedequallytothiswork
____________________SubmittedtoAnimalGenetics
Abstract
Genome wide association studies (GWAS) have been widely applied to
disentanglethegeneticbasisofcomplextraits IncattlebreedsclassicalGWAS
approach with medium density markers panels are far to be conclusive
especiallyforcomplextraitsThisisduetotheintrinsiclimitationsofGWASand
to the choices that are made to step from the associationrsquos signals to the
functionalunits
Here we applied a geneHbased association strategy to prioritize genotypeH
phenotype associations found with classical approaches in three Italian dairy
cattlebreedswithdifferentsamplesizes(ItalianHolsteinn=2058ItalianBrown
n=745ItalianSimmentaln=477)formilkproductionandqualitytraits
WhileclassicalsinglemarkerregressionfailedtorevealgenomeHwidesignificant
genotypeHphenotype association except for Italian Holstein the gene based
approach found specific genes in each breed that are associated with milk
physiologyandmammaryglanddevelopment
Since no established method is yet implemented to step from variation to
functional units our strategy may contribute to reveal new genes that play
significant roles in complex traits like thosehere investigated condensing low
signalsofassociationinageneHcentricfashionapproach
Background
Modern cattle breeds have nomore than 200 years of history (Taberlet et al
2008) a time spanwhere selection reproductive isolationand the consequent
geneticdriftshapedallelefrequenciesandlinkagedisequilibriumthroughoutthe
genome This is particularly true for industrial dairy breeds where the joint
application of reproductive technologies and modern genetic evaluation
methodshave imposedahigh selectionpressureona limitednumberof traits
(Taberletetal2008)involvedinmilkquantityandquality
Indeed the milk production industry has been historically ready to acquire
technicalimprovementsfromdifferentfieldsstatisticsbiologyandengineering
50
ForexampleBLUPbasedstrategiesandhighHdensitygenotyping throughSNPHchipsarenowroutinelyincludedinbullevaluationshavingsignificantimpactonselectionefficiencyandpopulationstructureThese techniques have dramatically improved milk yield and compositionwithout knowing the biological basis of the traits under selection To datedespite many efforts and many QTL mapped only a few genes have beeneffectivelyassociatedwithmilkyieldandcomposition(Grisartetal2002Blottetal2003CohenHZinderetal2005)In thiscontextGenomeWideAssociationStudies(GWAS) is themostcommonmethodfordissectingthebiologythatunderliescomplextraits(McCarthyetal2008)TheunderlyingideaofGWASissimplefindmarkerHtraitassociationsexploitingthe linkage disequilibrium (LD) that exists between the causative mutation ndashwhichweignorendashandoneormoreanalyzedmarkerswhichwewellknowInthiswayonecanpinpointgenomicregionscarryingcausalvariantsforanytrait
Although a number of GWAS have been conducted in cattle productive andmorphological traits (some examples (Jiang et al 2010 Cole et al 2011Meredith et al 2012)) and found several genomic regions associated to thesetraits none focused their attention directly on genes involved inmilk biologyMoreoveroneof themajordrawbacksofGWASisrelatedtothescarceresultsportability across populations due to a variety of confounding factorspopulationstructuredifferentialLD levelsbreedspecific selection targetsandeven SNPs ascertainment bias (Clark et al 2005) For instance significantassociationsonfatpercentagewereidentifiedinmanyregionsoftheBostaurusgenome in different breeds Except for genes of major effect there is littleagreement on which regions and H more importantly H genes are actuallyinvolvedinthecontrolofthetraitinvestigatedThisissomehownotsurprisingone favorable allele may segregate in one breed and be fixated in a differentbreed the same allele segregates in both breeds but allelesmay differ or thesegregates in both breeds but the genetic background masks the effect ofsegregation
51
AnotherlimitationinGWASisthestringentsignificancethresholdoftenapplied
tocorrectformultipletestingAsaresulta largeproportionofgeneswithlow
effectaredisregardedwithconsequentoverestimationof thehigheffectgenes
variance(Xu2003)Thisbiasisparticularlyworryingwheninvestigatingtraits
controlledbya largenumberofgeneswithsmalleffectand inhighLDasmilk
yield(GIANOLAETAL2013)
To overcome this kind of limitation different strategies have been invented
usingprior knowledge Interesting ldquocandidatepathwayrdquo approacheshavebeen
conducted(Ravenetal2013)testingforassociationbetweenaphenotypeand
genes inpathwaysthatare likelytobe involvedinthecontrolofthetraitThis
approach reduces thenumberofmarkers tobe tested increases the statistical
poweroftheexperimentrestrictingtheanalysistoexistingknowledge
Other methods based on statistical manipulations or particular experimental
designs have been developed to prioritize or validate genotypeHphenotype
associations For example geneHbased association strategies while restricting
GWA study only to genes and neighboring genomic regionsmay identify new
candidategenesdeservingpostHGWASanalysis(Akulaetal2011Cantoretal
2010 Liu et al 2010) Moreover it is true that classical GWAS downstream
bioinformatics analyses concentrate their efforts in findingmeaningful results
usinggenesasfinaltarget
The aim of ourworkwas to find new candidate genes and novel information
associated to complex traits as milk production (yiled) and quality (fat and
protein percentage) using a geneHcentric genome wide association in three
differentdairybreedsItalianHolsteinItalianBrownandItalianSimmental
Methods
BiologicalSamples
Bulls from the three most represented dairy breeds in Italy (Italian Holstein
ItalianBrownandItalianSimmental)wereselectedfromArtificialInsemination
(AI) bulls progeny tested in Italy until 2010 and forwhich biological samples
52
wereavailableItalianHolsteinbullswerechosenaccordingtofollowingcriteria
i) having the national Selection Index namely production functionality type
(PFT)withreliabilitygt75andii)havingthelowestpossiblerelationshipwith
theothersiresinthesetThiswasachievedbymaximizingthevariabilitywithin
the group On the contrary considering the low number of available Italian
BrownandItalianSimmentalbullstheywereallincludedintheanalysesAfter
thisprocedureatotalof2139ItalianHolstein775ItalianBrownand486Italian
Simmental samples (including quality control replicates)were genotypedwith
theBovineSNP50BeadChipv1(IlluminaSanDiegoCA)
Traitsconsideredintheanalysis
Italian Holstein (ANAFI) Italian Brown (ANARB) and Italian Simmental
(ANAPRI)breedersassociationsprovidedderegressedproofs(DRP)ordaughter
yielddeviations (DYD)hereafter referredasphenotype forall traitsanalyzed
Traitsanalyzedweremilkyieldfatpercentageandproteinpercentage
Genotypingandqualitycontrol
Quality controls (QC) were run independently for each breed using the same
pipelineandthresholds IndetailQCexcluded i)thereplicatesamplewiththe
highernumberofmissinggenotypesii)sampleswithunexpectedlyhigh(ge100)
mendelianerrors(forfatherHsonpairs)iii)sampleswithmorethan5missing
genotypes iv) SNPs with ge 25 missing data v) SNPs with minor allele
frequency(MAF)le5vi)SNPswithmendelianerrorsge25andvii)SNPsout
ofHardyHWeinbergequilibriumtestedbyFisherexacttest(ple0001Bonferroni
corrected)
Analysisofpopulationstructure
MultiDimensionScaling(MDS)wasperformedusingPLINK107(Purcelletal
2007)andresultsplottedwithRscriptsThisanalysisshowninsupplementary
material (FigureS1) exploredpopulationsubstructureandverified thegenetic
homogeneityofthedatasetpriortofurtheranalyses
53
PairwiseLDwascalculatedusingPLINK107(Purcelletal2007)LDdecayand
LD structure on specific regions were calculated with in house R scripts and
snpplotterRpackage(LunaandNicodemus2007)
GWAS
GenomeHwide association analysis was made with the GenABEL package in R
(Team 2014) using GRAMMARHCG (Genome wide Association using Mixed
Model and Regression H Genomic Control) approach (Amin et al 2007
Aulchenkoetal2007)
Basingonpriorknowledge(Minozzietal2013) in ItalianHolsteintheDGAT1
regionwas also treated as fixed effect inGenABEL analysis of the three traits
ResultsusingthismodelarereferredtoasldquoHolsteinDGATrdquo
Manhattan andquantileHquantileplots for each traitwereproduced to allowa
thoroughevaluationoftheGWASresultsAllcalculatedpHvaluesfromtheGWAS
analysiswere retained for the downstream prioritizationwith the geneHbased
association
Workingongenomecoordinates
Coordinatesof themarkerswere liftedover fromBTA40(Elsiketal2009)to
UMD31assembly(Ziminetal2009)usingSnpChimpv1(Nicolazzietal2014)
andgenecoordinateswereretrievedusingBioMartEnsembl
QTLcoordinatesweredownloadedfromtheAnimalQTLdb(Huetal2013) in
bedformat
Operations on genomic intervalswereperformedusing thebedtools suite ver
2162(QuinlanandHall2010)
GeneBasedAssociationanalysis
Toreducetheantagonismbetweensignificancethresholdandfalsepositiverate
of a classical GWA study (Gianola et al 2013) and to facilitate selection
signature comparison across the three breeds investigated we used a geneH
54
centricstatisticthatcollapsesinasinglevalueallthepHvaluesfromSNPsrelated
toagenecorrectingforLDinformation
Our approachderivesdirectly from the softwareVEGAS (VersatileGeneHBased
TestforGenomeHwideAssociationStudies)(Liuetal2010)adaptedtopursue
meaningfulresultsinanyspeciesbeyondhumans
Briefly the geneHwise test statistic is the sum of the n upper tail chiHsquared
valuesofaSNPsubset withonedegreeof freedom(wheren is thenumberof
SNPmappinginthegivengene)WithperfectlinkageequilibriumthegeneHwise
pHvaluewouldbetheoneHsidepHvalueofachiHsquaredistributionwithndegrees
of freedom Otherwise more likely the distribution of the pHvalues has to be
evaluated(withsimulations)andweighted(withLDvalues)(Liuetal2010)
The VEGAS speciesHfree procedure was implemented in a UNIX environment
usingBashandRlanguagesusingEnsemblannotationasinputfilesPLINKdata
typeandpHvaluesforeachSNPfromthepreviouslycitedGRAMMARprocedure
WemappedSNPsonBostaurusgenes(UMD31)andtheirboundaries(100000
bpupHanddownHstream)toaccountforregulatoryregionsOnlygeneswithat
least 2 SNPs mapped were retained Significant entries from the geneHbased
associationweremappedinthecattleQTLdatabasewithbedtools
ResultsandDiscussion
In this study a gene based association strategy was applied to identify new
genomicregionsboostingGWASanalysisresultsformilkproductionandquality
fromthreeItaliandairybreeds
Complextraitslikethosehereevaluatedareaffectedbyafewmajorgeneswith
large effects andmany otherswithmoderate to low effects the latter are not
easily identifiedby genomewide scans inmodern cattle breedsduemainly to
samplesizelimitationwhilesignalfromtheformersarelostingenomescansH
withsomeexceptionsHduerapidfixations
InthisscenariowithmediumdensitySNPpanelsliketheoneusedinthisstudy
associationbasedonsingleSNPregressionorotherldquoclassicalrdquomethodscouldbe
inconclusiveintermsofresults
55
AftertheQCprocess7452058and477samplesand3575938535and38574SNPs were retained for Italian Brown Italian Holstein and Italian Simmentalbulls respectively (Table S1) HolsteinDGAT analysis retained 38534markersoneSNPlessthantheotherHolsteinpanelDescriptive statistics on genotypes for each breed (histogram and barplot forheterozygosis and call rate per individual Figure S7 and multi dimensionalscalingFigureS1)revealedabsenceofconfoundingeffectsWiththesedatasetsatthenominalgenomeHwidepHvaluesignificancethresholdof 005 (ple125 x 10H6 after Bonferroni correction (Burton et al 2007Dudbridge and Gusnanto 2008)) significant associations were found only inItalianHolsteinbetweenmilkyieldandfatpercentageand26SNPsmappingintheproximalregionofBTA14nearbyDGAT1(Table1)
56
Table1PHvaluesoftheSNPsovercomingthegenomewidesignificancethresholdintheHolsteinbreedformilkyieldandfatpercentage(notaccountingforDGAT1effect)SNPsaresortedbypositionsintheUMD31assembly
SNPID BTA Position(Bp) POvalue(Fat) POValue(Milk)
Hapmap30383OBTCO005848 14 1489496 112EH12 361EH12
ARSOBFGLONGSO57820 14 1651311 247EH46 161EH16
ARSOBFGLONGSO34135 14 1675278 857EH17
ARSOBFGLONGSO94706 14 1696470 665EH16
ARSOBFGLONGSO4939 14 1801116 534EH49 106EH20
Hapmap52798Oss46526455 14 1923292 915EH21
UAOIFASAO6878 14 2002873 382EH24
ARSOBFGLONGSO107379 14 2054457 118EH33 664EH15
ARSOBFGLONGSO18365 14 2117455 250EH16
Hapmap30922OBTCO002021 14 2138926 439EH13
Hapmap25384OBTCO001997 14 2217163 390EH10 697EH09
Hapmap24715OBTCO001973 14 2239085 108EH09 369EH09
BTAO35941OnoOrs 14 2276443 402EH13
Hapmap30374OBTCO002159 14 2468020 528EH12
Hapmap30086OBTCO002066 14 2524432 798EH11
ARSOBFGLONGSO74378 14 3640788 161EH08
ARSOBFGLOBACO1511 14 3765019 375EH09
UAOIFASAO9288 14 3956956 811EH09
ARSOBFGLONGSO100480 14 4364952 133EH09
UAOIFASAO5306 14 4468478 458EH08
ThisislikelyduetotheeffectoftheK232ADGAT1mutationthatissegregatingin
thisbreed(Fontanesietal2014)NootherSNPwassignificantlyassociatedto
anyothertraitinthe3breedsinvestigated
SomeSNPswereunderasuggestivesignificancethresholdofple5x10H5Details
on these markers are in supplementary material (Table S3) but they are not
discussedherePlotsdescribingatglancetheassociationresultsrevealingalow
signaltonoiseratioareavailableintheonlinesupplementarymaterial(Figure
S2S3S4andS5respectively for ItalianBrown ItalianHolsteinHolsteinDGAT
andItalianSimmental)
57
Asperthegenecentricanalysiswemappedatotalof192371989919894and
19844 genes on 24616 available on Ensembl (release version 73) in Italian
BrownItalianHolsteinHolsteinDGATandItalianSimmentalrespectivelyGenes
with a geneHwisepHvalue at least 10H5were considered significantly associated
withthestudiedtraitThesignificanceappearedtobeadequatetothehighrate
ofredundancyof theannotationfeaturesandto figureoutthetopassociations
with a reasonable level of confidence Results on these associations per breed
andpertraitaresummarizedinTable2Table3andinsupplementarymaterials
(TableS2andTableS4)ThemajorityofsignificantgenesfoundwiththegeneH
centricanalysismapped inpreviouslycharacterizedQTL thatareassociated to
milkproductionorqualityThesearesummarizedintheTableS5
Table2Numberofsignificantlyassociatedgenesforeachbreedineachtraitinthegenebasedassociationtest
Fat Milkyield Protein
Brown 3 1
Holstein 92 80 1
HolsteinDGAT 1 4 1
Simmental 1 13
58
Tableamp3FeaturessignificantlyassociatedperbreedtraitHolsteinanalysisnotaccountingforDGAT1isshow
ninTableS3
Breedamp
Traitamp
EnsemblampIDamp
GeneWiseampPValueamp
SNPsInGeneamp
BestSNPamp
BestSNP_PValueamp
BTAamp
Startamp(bp)amp
Endamp(bp)amp
GeneampNam
eamp
BROWNamp
FAT
Nosignificantassociationfound
MILK
ENSBTAG00000019237
500EG05
4Hapmap53770Gss46526325
204EG04
23
47333403
47348212
TXNDC5
ENSBTAG00000043918
500EG05
4Hapmap53770Gss46526325
204EG04
23
47337650
47337787
5S_rRNA
ENSBTAG00000046839
500EG05
4Hapmap53770Gss46526325
204EG04
23
47328131
47352621
UncharacterizedProtein
PROTEIN
ENSBTAG00000001260
700EG05
3ARSGBFGLGNGSG116222
441EG03
18
52120387
52124418
PINLYP
amp
HOLSTEINDGATamp
FAT
ENSBTAG00000000369
100EG05
3Hapmap53294Grs29016908
333EG05
594673755
94749093
EPS8
MILK
ENSBTAG00000026896
130EG05
5ARSGBFGLGNGSG104949
385EG04
23
51549721
51553563
FOXF2
ENSBTAG00000005260
400EG05
5Hapmap33288GBTCG033751
120EG03
638120578
38127577
SPP1
ENSBTAG00000006579
500EG05
4ARSGBFGLGBACG2207
632EG05
15
54418559
54459500
P4HA3
ENSBTAG00000007013
700EG05
6UAGIFASAG7665
396EG05
19
5221507
5283308
TOM1L1
PROTEIN
ENSBTAG00000006638
600EG05
3ARSGBFGLGNGSG104774
176EG04
18
56523439
56530499
BCL2L12
amp
SIMMENTHALamp
FAT
ENSBTAG00000037649
900EG05
6Hapmap30001GBTAG142716
529EG05
4120427915
120494453
VIPR2
MILK
ENSBTAG00000017538
100EG05
6ARSGBFGLGBACG13093
587EG06
12
85630428
85630912
Pseudogene
ENSBTAG00000000856
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1766767
1769754
FBXL6
ENSBTAG00000000857
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1763994
1766621
GPR172B
ENSBTAG00000026356
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1795351
1804562
DGAT1
ENSBTAG00000035158
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1770660
1772329
TMEM
249
ENSBTAG00000046208
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1782901
1788087
SCRT1
ENSBTAG00000009811
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1825982
1844587
UncharacterizedProtein
ENSBTAG00000009816
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1838135
1840081
UncharacterizedProtein
ENSBTAG00000014458
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1844664
1894424
MROH1
ENSBTAG00000020751
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1806081
1825793
HSF1
ENSBTAG00000044406
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1883849
1883971
SCARNA15
ENSBTAG00000046889
400EG05
5ARSGBFGLGBACG13093
587EG06
12
85610279
85610762
UncharacterizedProtein
ENSBTAG00000020117
800EG05
3BTAG78494GnoGrs
188EG04
721189879
21196263
ZBTB7A
PROTEIN
Nosignificantassociationfound
amp
59
1 ItalianHolstein
The$ large$ number$ of$ significant$ signals$ was$ observed$ in$ the$ Italian$ Holstein$
especially$ for$ milk$ yield$ and$ fat$ percentage$ As$ shown$ in$ Table$ 2$ the$ strong$
effect$ of$ DGAT1$ in$ Italian$ Holsteins$ observed$ with$ individual$ SNPs$ was$
confirmed$by$the$geneBcentric$approach$In$this$region$almost$the$totality$of$the$
geneBwise$pBvalues$under$the$chosen$threshold$appeared$to$be$influenced$by$this$
major$ gene$ and$ by$ the$ high$ level$ of$ linkage$ disequilibrium$ that$ characterizes$
cosmopolitan$bovine$breeds$(see$Figure$S6)$Within$this$pool$of$genes$it$is$worth$
to$mention$PLEC$ (plectin)$ a$ cell$ surface$ receptor$ gene$ in$ the$RNASE5$ pathway$
(Angiogenin$ encoded$ by$ angiogenin$ ribonuclease$ RNase$ A$ family$ 5)$ Proteins$
encoded$ by$ these$ genes$ that$ are$ present$ in$ milk$ are$ involved$ in$ protein$
synthesis$ and$ act$ as$ growth$ factors$ in$ vitro$Moreover$RNASE5$ seems$ to$ have$
other$ functions$ such$ as$ regulation$ of$ stress$ response$ and$ apoptosis$ and$ even$
angiogenesis$ through$ which$ it$ may$ also$ contribute$ to$ the$ regulate$ milk$
production$(Raven$et$al$2013)$
The$ exclusion$ of$DGAT1$ effect$ (HolsteinDGAT$ results)$ highlighted$ that$ all$ the$
significant$ effects$ on$ fat$ percentage$ found$ for$ genes$ located$ within$
approximately$25$Mb$up$or$downstream$from$the$excluded$SNP$(ARS5BFGL5NGS5
4939)$were$ redundant$ signals$ of$DGAT1$ (Table3$ and$Table$ S2$ and$ S3)$ This$ is$
also$observed$for$milk$yield$due$to$the$genetic$correlations$between$fat$and$milk$$
Also$ PLEC$ was$ not$ significant$ in$ our$ analyses$ when$ taking$ into$ account$ the$
DGAT1$effect$contradicting$previous$findings$in$Holstein$that$report$association$
of$PLEC$gene$with$production$traits$either$accounting$or$not$for$the$effect$of$the$
diglyceride$ acyltransferase$ (Raven$ et$ al$ 2013)$ A$ possible$ explanation$ for$ the$
different$results$between$these$studies$is$a$difference$between$genetic$structures$
of$the$populations$analyzed$To$reduce$false$positive$signals$we$only$evaluated$
the$genes$based$on$HolsteinDGAT$results$for$the$Italian$Holstein$$
11 Milkyield
We$found$genes$associated$with$this$trait$in$Holstein$in$particular$SPP1$P4HA3=
FOXF2$and$TOM1L1$genes$
Among$ these$ the$ osteopontin$ SPP1$ (secreted$ phosphoprotein$ 1)$ is$ directly$
related$to$milk$production$(de$Mello$et$al$2012)$ It$ is$ located$ in$BTA6$nearby$
60
$
ABCG2$ and$ there$ is$ evidence$ that$ polymorphisms$ in$ this$ region$ are$ strongly$
correlated$with$high$production$ levels$of$milk$ in$various$breeds$(CohenBZinder$
et$ al$ 2005)$Osteopontin$SPP1$ is$ a$ secreted$glycophosphoprotein$ that$plays$ a$
crucial$ role$ in$ mammary$ gland$ development$ having$ an$ essential$ role$ also$ in$
mammary$gland$differentiation$and$branching$of$the$mammary$epithelial$ductal$
system$ This$ gene$has$ also$ been$ implicated$ in$many$ types$ of$ cancer$ including$
breast$cancer$ for$ the$potentiality$of$proliferation$and$alveologenesis$ (Hubbard$
et$ al$ 2013)$ It$ is$worth$mentioning$ that$ the$most$ significant$ SNPs$within$ this$
gene$was$Hapmap332885BTC5033751=with=a$pBvalue$of$00012$ largely$over$ the$
genomeBwide$significance$ threshold$This$signal$would$be$ ignored$ in$ the$single$
SNP$approach$
Another$gene$revealed$by$the$geneBbased$association$is$P4HA3$encoding$for$the$
prolyl$4Bhydroxylase$α$polypeptide$III$involved$in$the$correct$folding$of$nascent$
procollagen$ chains$ It$ is$ known$ that$ stromalBepithelial$ interactions$ are$ critical$
for$mammary$gland$development$and$defective$deposition$or$reorganization$of$
extracellular$ matrix$ can$ lead$ to$ incorrect$ mammary$ epithelial$ morphogenesis$
(Gillette$ et$ al$ 2013)$ highlighting$ the$ role$ of$ this$ gene$ in$ the$ healthy$ breast$
tissue=
FOXF2=gene$ (Forkhead$Box$F2)$ is$ a$member$of$ the$ forkhead$box$ transcription$
factors$ known$ to$ be$ involved$ in$ tissue$ development$ as$ part$ of$ the$ ldquoHedgehog$
signaling$pathwayrdquo$This$key$cascade$is$essential$for$the$correct$segmentation$of$
tissues$during$embryonic$development$but$has$been$recently$demonstrated$that$
is$active$in$adult$stem$cells$proliferation$in$mammary$tissue$(Liu$et$al$2006)$In$
particular$ FOXF2$ plays$ a$ key$ role$ in$ extracellular$ matrix$ synthesis$ and$
epithelialBmesenchymal$ interactions$ and$ could$ therefore$ affect$ the$ proper$
development$of$the$mammary$stroma$In$addition$low$levels$of$transcription$of$
this$gene$are$correlated$ to$early$onset$metastasis$ in$breast$cancer$ (Kong$et$al$
2013)$ underlining$ a$ specific$ role$ in$ the$ mammary$ gland$
Lastly$ TOM1L1$ (Target$ Of$ Myb1$ ChickenBLike$ 1)$ is$ another$ gene$ potentially$
involved$ with$ the$ milk$ physiology$ as$ is$ known$ its$ implication$ with$ the$ early$
endosome$ sorting$ coupled$ with$ EGFR= (epidermal$ growth$ factor$ receptor)$
another$ important$player$ in$ lactation$(Fowler$et$al$1995)$TOM1L1$belongs$to$
61
$
the$ VHS$ domain$ containing$ proteins$ that$ are$ involved$ in$ the$ postBGolgi$
trafficking$and$signaling$(Wang$et$al$2010)$
12 Fatpercentage
In$ Italian$ Holstein$ breed$ we$ found$ a$ significant$ association$ with$ or$ without$
DGAT1$ effect$ correction$ to$ the$EPS8$ gene$ located$on$BTA5$at$around$95$Mbp$
This$gene$encodes$for$a$substrate$of$the$already$cited$EGFR$and$is$involved$in$the$
fatty$acid$metabolism$(Fazioli$et$al$1993)$A$recent$study$on$Holstein$Friesian$
(Wang$et$al$2012)$found$strong$association$with$a$variant$located$in$the$second$
intron$of$the$epidermal$growth$factor$receptor$pathway$substrate$enforcing$the$
hypothesis$of$a$true$involvement$of$this$gene$on$fat$percentage$
13 Proteinpercentage
The$BCL2L12$gene$(BclB2Blike$protein$12)$associated$with$protein$percentage$in$
Italian$Holstein$breed$represents$another$interesting$result$The$product$of$this$
gene= is$ an$ important$ oncoprotein$ in$ two$ fundamental$ cell$ cycle$ pathways$
namely$ p53$ and$ cytoplasmic$ caspase$ signaling$ BCL2L12= resides$ in$ both$
cytoplasm$and$nucleus$inhibiting$caspases$3$and$7$and$forming$a$complex$with$
p53$thereby$ inhibiting$p53Bdirected$transcriptomic$changes$upon$DNA$damage$
(Stegh$ and$DePinho$ 2011)$ The$members$ of$ BCL2$ family$ are$ considered$ antiB
apoptotic$genes$and$it$is$obvious$that$dysfunctional$programmed$cell$death$plays$
an$ important$ role$ in$ the$ pathogenesis$ and$ in$ the$ progression$ of$ cancer$Many$
members$of$this$family$were$found$to$be$modulated$in$various$malignancies$and$
BCL2L12= in$ particular$ was$ expressed$ in$ mammary$ gland$ and$ proposed$ as$
prognostic$ cancer$ biomarker$ (Thomadaki$ et$ al$ 2007)$ In$ dairy$ cows$ the$
members$of$BCL2$family$are$involved$in$the$key$events$ in$the$mammary$tissue$
development$ and$ remodeling$ during$ lactation$ and$ involution$ (Stefanon$ et$ al$
2002$p$200)$Members$of$BCL2$family$regulate$the$turnover$of$cell$population$
in$the$mammary$gland$during$lactation$and$its$overexpression$was$found$to$be$
linearly$related$to$milk$production$in$Holstein$cow$(Colitti$et$al$2004)$
$
62
$
2 ItalianSimmental
In$ the$ Italian$ Simmental$ breed$ we$ found$ one$ association$ on$ BTA4$ for$ fat$
percentage$(VIPR2)$and$two$associations$ for$milk$yield$one$ in$BTA7$(ZBTB7A)$
and$ the$ other$ in$ BTA14$ related$ to$ the$ DGAT1$ No$ association$ was$ found$ for$
protein$percentage$
21 Fatpercentage
Among$the$genes$found$significantly$associated$with$production$traits$in$Italian$
Simmental$VIPR2=a$ gene$ encoding$ a$ receptor$ for$ a$ small$ vasoactive$ intestinal$
neuropeptide$ was$ the$ single$ signal$ in$ BTA4$ for$ fat$ percentage$ Vasoactive$
intestinal$ peptide$ (VIP)$ is$ synthesized$ in$ the$pituitary$ gland$ and$ among$other$
functions$ stimulates$ prolactin$ secretion$ and$ is$ involved$ in$ proliferation$
survival$and$differentiation$in$human$breast$cancer$cells$(Valdehita$et$al$2010)$$
An$interesting$novel$finding$is$the$effect$of$immunosuppression$(through$direct$
effect$of$IL56)$on$prolactin$cells$and$its$ability$to$regulate$prolactin$by$acting$on$
pituitary$ VIP$ through$ VIP$ receptors$ (Blanco$ et$ al$ 2013)$ These$ evidences$
suggest$ that$VIPR2$ could$ be$ an$ important$ candidate$ for$ further$ investigations$
through$post$GWAS$approaches$such$as$reBsequencing$or$fine$mapping$
22 Milkyield
Surprisingly$the$DGAT1$gene$was$significantly$associated$with$milk$yield$but$not$
with$fat$percentage$in$Italian$Simmental$breed$Indeed$the$most$significant$SNP$
for$the$milk$yield$trait$in$this$breed$was$the$ARS5BFGL5NGS54939$with$a$pBvalue$
of$ 413EB06$ The$ importance$ of$ this$ gene$ in$ the$ lactation$ is$ largely$ known$
(Grisart$et$al$2002)$$
To$ investigate$DGAT1$ signal$ in$ Simmental$ we$ analyzed$ the$ fine$ LD$ structure$
genomeBwide$ and$ in$ BTA14$ proximal$ region$ in$ all$ the$ three$ breeds$ revealing$
different$patterns$
Observed$LD$decay$on$ the$ three$breeds$genomes$revealed$similar$behavior$ for$
Italian$Holstein$and$Italian$Brown$with$r2$gt$010$$until$ around$ 250$ Kbp$ while$
Italian$Simmental$showed$lower$LD$Distances$higher$that$400$Kbp$revealed$half$
level$of$LD$in$Simmental$(Figure$S6)$
63
$
In$DGAT1$region$LD$behavior$is$very$similar$between$Italian$Holstein$and$Italian$
Brown$ (Figure$ S8)$ while$ association$ values$ are$ really$ different$ because$ ARS5
BFGL5NGS54939=is$fixed$in$Italian$Brown$and$purged$by$QC$process$Interestingly$
association$ patterns$ become$ similar$ when$ DGAT1$ effect$ is$ removed$ from$ the$
Holstein$ panel$ (HolsteinDGAT$ Figure$ S8)$ In$ Simmental$ we$ found$ a$ different$
situation$ very$ low$ LD$ level$ and$ significant$ association$ levels$ with$ ARS5BFGL5
NGS54939$with$ the$milk$ trait$This$ is$probably$due$ to$ the$ low$ frequency$of$ the$
p232K$ allele$ in$ Italian$ Simmental$ (Scotti$ et$ al$ 2010)$ and$ to$ the$ different$
selection$strategies$applied$in$these$breeds$
The$ relatively$ high$ pBvalue$ of$ this$ SNP$ is$ responsible$ for$ the$ statistical$
significance$of$many$of$the$13$genes$found$significant$in$this$breed$for$this$trait$
as$ shown$ in$ Table$ 3$ therefore$ reducing$ the$ gene$ set$ to$ three$ locations$ two$
overlapping$features$indicating$a$pseudogene$in$BTA12$and$a$zinc$finger$protein$
ZBTB7A$in$BTA7$DGAT1$role$and$function$will$be$not$discussed$here$as$already$
done$elsewhere$(Grisart$et$al$2002)$
ZBTB7A=is$part$of$zinc$finger$and$BTB$proteins$(Broad$complex$Tramtrack$and$
Bricabrac)$ transcription$ factors$ with$ emerging$ importance$ for$ their$
involvement$ in$ oncogenesis$ and$ cell$ differentiation$by$ repressing$key$ genes$of$
cell$cycle$regulation$as$retinoblastoma$(Rb)$and=p53=(ldquoJeon$et$al$B$2008$B$ProtoB
oncogene$ FBIB1$ (PokemonZBTB7A)$ Represses$ Trpdfrdquo$ nd)= ZBTB7A$ over$
expression$ is$ shown$ in$primary$ and$ recurrent$ breast$ cancer$ tissues$ leading$ to$
disease$progression$and$poor$survival$(Zu$et$al$2011)$Link$between$this$gene$
polymorphism$and$milk$production$might$be$in$the$proper$cellular$development$
of$ the$ mammary$ gland$ This$ transcription$ factor$ is$ also$ known$ as$ a$ master$
regulator$ of$ B$ and$ TBcell$ lineage$ (Lee$ and$ Maeda$ 2012)$ enforcing$ the$ link$
between$ milk$ production$ and$ immune$ protection$ as$ already$ outlined$ by$ of$
Lemay$ and$ colleagues$ with$ proteomic$ studies$ where$ the$ ldquoactivation$ of$ the$
innate$ immune$systemrdquo$resulted$as$one$of$ the$most$represented$gene$ontology$
categories$(Lemay$et$al$2009)$
$
64
$
3 ItalianBrown
In$ the$ Italian$Brown$breed$we$ found$ one$ association$ on$BTA23$ (TXNDC5)$ for$
milk$yield$and$one$in$BTA18$for$protein$percentage$(PINLYP)$The$three$features$
listed$in$Table$3$for$milk$yield$in$Italian$Brown$were$collapsed$due$to$overlap$$
31 Milkyield
Thioredoxin$ domainBcontaining$ protein$ 5$ (TXNDC5)$ belongs$ to$ thioredoxin$
family$small$ubiquitous$proteins$containing$disulphide$domain$affecting$protein$
folding$with$chaperone$activity$preventing$protein$misBfolding$within$the$lumen$
of$ the$endoplasmic$reticulum$This$multifunctional$organelle$have$a$key$role$ in$
several$ processes$ including$ lipid$ catabolism$ and$ protein$ synthesis$ (RamiacuterezB
Torres$ et$ al$ 2012)$ Because$ of$ its$ disulphide$ bonds$ thioredoxin$ facilitate$ the$
reduction$of$other$proteins$by$cysteine$thiol$disulfide$exchange$This$mechanism$
is$essential$for$detossification$
During$ lactation$ indeed$ an$ enhanced$ detoxification$ level$ is$ functional$ to$
preserve$ milk$ quality$ against$ increased$ metabolic$ activities$ Higher$ metabolic$
levels$ lead$ to$ free$ radicals$ production$ making$ necessary$ the$ maintenance$ of$
redox$ homeostasis$ that$ include$ among$ others$ the$ reduction$ and$ oxidation$ of$
peroxiredoxin$and$thioredoxin$(Lu$and$Holmgren$2014)$
32 Proteinpercentage
For$ protein$ percentage$ in$ Italian$ Brown$ there$ is$ only$ one$ associated$ gene$
PINLYP=is$encoding$a$phospholipase$A2$inhibitor$but$this$protein$product$is$not$
yet$well$studied$ $Many$types$of$phospholipase$A2$were$characterized$and$ it$ is$
known$ that$ they$ are$ mediator$ of$ inflammation$ atherosclerosis$ and$ cancer$ in$
mammals$and$that$plays$an$important$role$in$the$innate$immune$response$(Qiu$
and$Lai$2013)$It$may$be$reasonable$to$imagine$a$role$for$PINLYP$in$the$lactation$
in$light$of$previously$discussed$results$that$link$milk$to$innate$immune$response$
(Lemay$et$al$2009)$
$
Although$ limited$ by$ the$ size$ of$ the$ study$ and$ the$ number$ of$ the$ included$
markers$these$results$represent$both$a$confirmation$and$a$step$forward$towards$
the$comprehension$of$the$biological$bases$of$milk$yield$and$quality$$
65
$
New$ genes$ potentially$ involved$ with$ production$ traits$ in$ dairy$ cows$ were$
tracked$when$GWAS$signals$resulted$too$weak$to$be$considered$in$a$classical$SNP$
based$mapping$with$the$ldquoclosest$gene$to$peakrdquo$or$the$windowBbased$strategy$
It$ is$ worth$ mentioning$ that$ none$ of$ the$ analyzed$ SNPs$ (not$ considering$ the$
Italian$Holstein$with$ARS5BFGL5NGS54939= included$ in$ the$ analysis)$were$ above$
the$ genomeBwide$ significance$ threshold$ in$ single$ SNP$ analysis$ meaning$ that$
important$information$is$often$lost$in$such$studies$$
In$addition$ to$ the$newly$ identified$ loci$we$observed$a$confirmation$of$existing$
associations$ with$ genes$ related$ to$ milk$ production$ not$ only$ dissecting$ gene$
specific$ functions$ but$ from$ mapping$ these$ genes$ in$ previously$ characterized$
regions$(Table$S5)$This$ is$not$trivial$when$using$new$data$analysis$techniques$
especially$ in$ high$ noise$ to$ signal$ ratio$ datasets$ like$ those$ here$ presented$We$
believe$ that$ this$ approach$will$ benefit$ of$data$ from$denser$marker$panels$ (eg$
BovineHD)$ to$ exploit$ local$ linkage$ disequilibrium$ and$ increase$ the$ number$ of$
mapped$genes$with$more$than$two$SNPs$
Results$ obtained$ with$ the$ gene$ centric$ approach$ although$ having$ intrinsic$
limitations$are$not$ influenced$by$a$number$of$subjective$choices$often$made$in$
GWAS$ For$ instance$when$ ldquowindow$mappingrdquo$ is$ used$ to$decrease$ the$noise$ to$
signal$ ratio$ eg$ due$ to$ nearby$ markers$ having$ different$ allele$ frequencies$
windows$size$ is$ to$be$decided$This$can$be$based$on$equal$number$of$markers$
equal$number$of$base$pairs$or$equal$map$distance$in$cM$per$window$The$three$
choices$ do$ not$ coincide$ because$ markers$ are$ not$ equally$ spread$ along$ the$
chromosomes$ and$ recombination$ differs$ in$ different$ genomic$ regions$ In$
addition$ windows$ can$ be$ chosen$ nonBoverlapping$ or$ with$ different$ degree$ of$
overlap$Finally$when$stepping$ from$significant$SNPs$or$windows$ to$candidate$
genes$the$size$of$the$flanking$region$from$which$picking$up$candidate$genes$is$to$
be$decided$as$well$Sometimes$the$gene$closer$to$the$signal$peak$is$selected$ in$
other$cases$all$genes$within$certain$boundaries$are$included$in$further$analyses$
All$ these$ choices$ may$ generate$ misleading$ results$ For$ example$ not$ truly$
associated$genes$may$be$retained$for$further$analyses$disregarding$the$correct$
ones$ In$addition$ a$ large$number$of$ false$positives$can$be$ included$ to$be$ later$
interpreted$with$ the$help$of$network$and$pathway$analysis$As$ a$ consequence$
these$decisions$have$a$direct$impact$in$terms$of$production$and$interpretation$of$
66
$
results$leading$to$conclusions$heavily$based$on$the$enrichment$analysis$and$preB
existing$knowledge$$
With$ the$ gene$ based$ association$ we$ also$ noticed$ some$ advantages$ in$ the$
downstream$data$mining$First$genes$are$functional$units$and$the$golden$targets$
of$ the$ ldquoclassical$ postBGWASrdquo$ bioinformatics$ pipelines$ Hence$ the$ gene$ based$
association$leads$to$genes$with$an$embedded$statistically$robust$framework$This$
is$ a$ ldquoplusrdquo$ in$ studying$ complex$ traits$ because$ one$ can$ concentrate$ on$ specific$
signals$ rather$ than$ cherry$ picking$meaningful$ features$ from$ a$ list$ constructed$
with$the$previously$cited$approaches$$
Moreover$it$is$easier$to$track$back$gene$clusters$to$major$gene$effects$B$like$for$
example$DGAT1$in$a$gene$rich$genome$chunk$many$genes$may$result$significant$
due$to$the$same$SNP$(Table$S2$and$Table$3)$With$the$VEGAS$method$finding$the$
ldquotruerdquo$association$is$facilitated$by$following$the$ldquobest$SNPrdquo$in$the$region$
In$addition$we$found$phenotype$coherent$signals$in$such$situation$where$only$a$
few$SNPs$B$ if$none$ndash$are$above$an$acceptable$statistical$genomeBwide$threshold$
using$a$single$SNP$regression$method$
The$gene$centric$approach$ is$a$ first$and$slight$ improvement$ in$ livestock$GWAS$
analyses$since$we$still$have$a$partial$genome$annotation$ In$addition$ it$ is$now$
known$that$ the$definition$of$gene$ is$ to$be$revised$many$genes$do$not$code$ for$
proteins$ have$ alternative$ starting$ sites$ have$ antisense$ transcripts$ host$
regulatory$ RNA$ are$ subjected$ to$ alternative$ splicing$ and$may$ be$ regulated$ by$
sequences$quite$ far$ away$ from$ their$ coding$ regions$ (Consortium$2012)$Other$
limitations$are$the$need$for$larger$number$of$animals$in$these$type$of$studies$as$
in$this$study$and$the$still$relatively$high$cost$of$sequencing$that$cannot$be$used$
routinely$in$large$experiments$at$least$not$yet$$
$
$
Conclusions
This$study$underlines$the$importance$of$the$implementation$of$new$methods$to$
analyze$complex$traits$with$genomic$data$No$golden$path(s)$is$discovered$yet$to$
walk$from$variation$to$functional$units$The$geneBcentric$analysis$may$contribute$
67
$
to$ reveal$ new$ signals$ throughout$ the$ bovine$ genome$ condensing$ weak$
association$signals$in$chromosome$bits$having$functional$significance$
$
$
Acknowledgements
The$Authors$would$ like$ to$ thank$Dr$ Jimmy$ Liu$ for$ his$ advices$ in$ building$ the$
software$AIA$ANAFI$ANARB$and$ANAPRI$breedersrsquo$associations$ for$ their$help$
and$ support$during$ the$data$analysis$ SelMol$ and$ProZoo$projects$ that$provide$
samples$ The$ Authors$ are$ also$ grateful$ to$MIPAAF$ and$ Cariplo$ Foundation$ for$
funding$
$
$
$
References
Akula$ N$ Baranova$ A$ Seto$ D$ Solka$ J$ Nalls$MA$ Singleton$ A$ Ferrucci$ L$
Tanaka$T$Bandinelli$ S$Cho$YS$Kim$YJ$Lee$ JBY$Han$BBG$Bipolar$
Disorder$ Genome$ Study$ (BiGS)$ Consortium$ The$ Wellcome$ Trust$ CaseB
Control$Consortium$McMahon$FJ$2011$A$NetworkBBased$Approach$to$
Prioritize$ Results$ from$ GenomeBWide$ Association$ Studies$ PLoS$ ONE$ 6$
e24220$doi101371journalpone0024220$
Amin$N$ van$Duijn$ CM$ Aulchenko$ YS$ 2007$ A$Genomic$Background$Based$
Method$ for$ Association$ Analysis$ in$ Related$ Individuals$ PLoS$ ONE$ 2$
e1274$doi101371journalpone0001274$
Aulchenko$YS$de$Koning$DBJ$Haley$C$2007$Genomewide$Rapid$Association$
Using$ Mixed$ Model$ and$ Regression$ A$ Fast$ and$ Simple$ Method$ For$
Genomewide$PedigreeBBased$Quantitative$Trait$Loci$Association$Analysis$
Genetics$177$577ndash585$doi101534genetics107075614$
Blanco$ EJ$ CarreteroBHernaacutendez$ M$ GarciacuteaBBarrado$ J$ IglesiasBOsma$ MC$
Carretero$M$Herrero$JJ$Rubio$M$Riesco$JM$Carretero$J$2013$The$
activity$and$proliferation$of$pituitary$prolactinBpositive$cells$and$pituitary$
68
$
VIPBpositive$ cells$ are$ regulated$by$ interleukin$6$Histol$Histopathol$ 28$
1595ndash1604$
Blott$S$Kim$JBJ$Moisio$S$SchmidtBKuntzel$A$Cornet$A$Berzi$P$Cambisano$
N$Ford$C$Grisart$B$Johnson$D$Karim$L$Simon$P$Snell$R$Spelman$
R$ Wong$ J$ Vilkki$ J$ Georges$ M$ Farnir$ F$ Coppieters$ W$ 2003$
Molecular$ dissection$ of$ a$ quantitative$ trait$ locus$ a$ phenylalanineBtoB
tyrosine$substitution$in$the$transmembrane$domain$of$the$bovine$growth$
hormone$ receptor$ is$ associated$ with$ a$ major$ effect$ on$ milk$ yield$ and$
composition$Genetics$163$253ndash266$
Burton$PR$Clayton$DG$Cardon$LR$Craddock$N$Deloukas$P$Duncanson$A$
Kwiatkowski$ DP$ McCarthy$ MI$ Ouwehand$ WH$ Samani$ NJ$ Todd$
JA$ Donnelly$ P$ Barrett$ JC$ Burton$ PR$ Davison$ D$ Donnelly$ P$
Easton$ D$ Evans$ D$ Leung$ HBT$ Marchini$ JL$ Morris$ AP$ Spencer$
CCA$Tobin$MD$Cardon$LR$Clayton$DG$Attwood$AP$Boorman$JP$
Cant$B$Everson$U$Hussey$JM$Jolley$JD$Knight$AS$Koch$K$Meech$
E$ Nutland$ S$ Prowse$ CV$ Stevens$ HE$ Taylor$ NC$ Walters$ GR$
Walker$NM$Watkins$NA$Winzer$T$Todd$JA$Ouwehand$WH$Jones$
RW$ McArdle$ WL$ Ring$ SM$ Strachan$ DP$ Pembrey$ M$ Breen$ G$
Clair$ DS$ Caesar$ S$ GordonBSmith$ K$ Jones$ L$ Fraser$ C$ Green$ EK$
Grozeva$D$Hamshere$ML$Holmans$PA$Jones$IR$Kirov$G$Moskvina$
V$ Nikolov$ I$ OrsquoDonovan$ MC$ Owen$ MJ$ Craddock$ N$ Collier$ DA$
Elkin$A$Farmer$A$Williamson$R$McGuffin$P$Young$AH$Ferrier$IN$
Ball$SG$Balmforth$AJ$Barrett$JH$Bishop$DT$Iles$MM$Maqbool$A$
Yuldasheva$N$Hall$AS$Braund$PS$Burton$PR$Dixon$RJ$Mangino$
M$ Stevens$ S$ Tobin$ MD$ Thompson$ JR$ Samani$ NJ$ Bredin$ F$
Tremelling$ M$ Parkes$ M$ Drummond$ H$ Lees$ CW$ Nimmo$ ER$
Satsangi$ J$Fisher$SA$Forbes$A$Lewis$CM$Onnie$CM$Prescott$NJ$
Sanderson$J$Mathew$CG$Barbour$J$Mohiuddin$MK$Todhunter$CE$
Mansfield$JC$Ahmad$T$Cummings$FR$Jewell$DP$Webster$J$Brown$
MJ$ Clayton$DG$ Lathrop$GM$ Connell$ J$Dominiczak$A$ Samani$NJ$
Marcano$CAB$Burke$B$Dobson$R$Gungadoo$J$Lee$KL$Munroe$PB$
Newhouse$SJ$Onipinla$A$Wallace$C$Xue$M$Caulfield$M$Farrall$M$
Barton$A$ (braggs)$TB$ in$RG$and$G$Bruce$ IN$Donovan$H$Eyre$S$
69
$
Gilbert$ PD$ Hider$ SL$ Hinks$ AM$ John$ SL$ Potter$ C$ Silman$ AJ$Symmons$ DPM$ Thomson$ W$ Worthington$ J$ Clayton$ DG$ Dunger$DB$ Nutland$ S$ Stevens$ HE$ Walker$ NM$ Widmer$ B$ Todd$ JA$Frayling$TM$Freathy$RM$Lango$H$Perry$JRB$Shields$BM$Weedon$MN$ Hattersley$ AT$ Hitman$ GA$ Walker$ M$ Elliott$ KS$ Groves$ CJ$Lindgren$ CM$ Rayner$ NW$ Timpson$ NJ$ Zeggini$ E$ McCarthy$ MI$Newport$M$Sirugo$G$Lyons$E$Vannberg$F$Hill$AVS$Bradbury$LA$Farrar$ C$ Pointon$ JJ$ Wordsworth$ P$ Brown$ MA$ Franklyn$ JA$Heward$JM$Simmonds$MJ$Gough$SCL$Seal$S$(uk)$BCSC$Stratton$MR$Rahman$N$Ban$M$Goris$A$Sawcer$SJ$Compston$A$Conway$D$Jallow$ M$ Newport$ M$ Sirugo$ G$ Rockett$ KA$ Kwiatkowski$ DP$Bumpstead$ SJ$ Chaney$A$Downes$K$ Ghori$MJR$ Gwilliam$R$Hunt$SE$Inouye$M$Keniry$A$King$E$McGinnis$R$Potter$S$Ravindrarajah$R$ Whittaker$ P$ Widden$ C$ Withers$ D$ Deloukas$ P$ Leung$ HBT$Nutland$S$Stevens$HE$Walker$NM$Todd$JA$Easton$D$Clayton$DG$Burton$PR$Tobin$MD$Barrett$JC$Evans$D$Morris$AP$Cardon$LR$Cardin$NJ$Davison$D$Ferreira$T$PereiraBGale$ J$Hallgrimsdoacutettir$ IB$Howie$ BN$Marchini$ JL$ Spencer$ CCA$ Su$ Z$ Teo$ YY$ Vukcevic$ D$Donnelly$P$Bentley$D$Brown$MA$Cardon$LR$Caulfield$M$Clayton$DG$ Compston$ A$ Craddock$ N$ Deloukas$ P$ Donnelly$ P$ Farrall$ M$Gough$ SCL$ Hall$ AS$ Hattersley$ AT$ Hill$ AVS$ Kwiatkowski$ DP$Mathew$ CG$McCarthy$MI$ Ouwehand$WH$ Parkes$M$ Pembrey$M$Rahman$N$Samani$NJ$Stratton$MR$Todd$ JA$Worthington$ J$2007$GenomeBwide$ association$ study$ of$ 14000$ cases$ of$ seven$ common$diseases$ and$ 3000$ shared$ controls$ Nature$ 447$ 661ndash678$doi101038nature05911$
Cantor$ RM$ Lange$ K$ Sinsheimer$ JS$ 2010$ Prioritizing$ GWAS$ Results$ A$Review$ of$ Statistical$ Methods$ and$ Recommendations$ for$ Their$Application$Am$J$Hum$Genet$86$6ndash22$doi101016jajhg200911017$
Clark$ AG$ Hubisz$ MJ$ Bustamante$ CD$ Williamson$ SH$ Nielsen$ R$ 2005$Ascertainment$ bias$ in$ studies$ of$ human$ genomeBwide$ polymorphism$Genome$Res$15$1496ndash1502$doi101101gr4107905$
70
$
CohenBZinder$M$ Seroussi$ E$ Larkin$ DM$ Loor$ JJ$Wind$ AE$ der$ Lee$ JBH$Drackley$ JK$Band$MR$Hernandez$AG$Shani$M$Lewin$HA$Weller$JI$ Ron$ M$ 2005$ Identification$ of$ a$ missense$ mutation$ in$ the$ bovine$ABCG2$gene$with$a$major$effect$on$ the$QTL$on$ chromosome$6$affecting$milk$yield$and$composition$in$Holstein$cattle$Genome$Res$15$936ndash944$doi101101gr3806705$
Cole$JB$Wiggans$GR$Ma$L$Sonstegard$TS$Lawlor$TJ$Crooker$BA$Tassell$CPV$ Yang$ J$ Wang$ S$ Matukumalli$ LK$ Da$ Y$ 2011$ GenomeBwide$association$ analysis$ of$ thirty$ one$ production$ health$ reproduction$ and$body$ conformation$ traits$ in$ contemporary$ US$ Holstein$ cows$ BMC$Genomics$12$408$doi1011861471B2164B12B408$
Colitti$M$Wilde$CJ$Stefanon$B$2004$Functional$expression$of$bclB2$protein$family$and$AIF$in$bovine$mammary$tissue$in$early$lactation$J$Dairy$Res$71$20ndash27$
Consortium$ TEP$ 2012$ An$ integrated$ encyclopedia$ of$ DNA$ elements$ in$ the$human$genome$Nature$489$57ndash74$doi101038nature11247$
De$Mello$F$Cobuci$JA$Martins$MF$Silva$MVGB$Neto$JB$2012$Association$of$the$polymorphism$g8514CgtT$in$the$osteopontin$gene$(SPP1)$with$milk$yield$ in$ the$ dairy$ cattle$ breed$ Girolando$ Anim$ Genet$ 43$ 647ndash648$doi101111j1365B2052201102312x$
Dudbridge$ F$ Gusnanto$ A$ 2008$ Estimation$ of$ significance$ thresholds$ for$genomewide$ association$ scans$ Genet$ Epidemiol$ 32$ 227ndash234$doi101002gepi20297$
Elsik$ CG$ Tellam$ RL$ Worley$ KC$ 2009$ The$ Genome$ Sequence$ of$ Taurine$Cattle$ A$window$ to$ ruminant$ biology$ and$ evolution$ Science$ 324$ 522ndash528$doi101126science1169588$
Fazioli$ F$ Minichiello$ L$ Matoska$ V$ Castagnino$ P$ Miki$ T$ Wong$ WT$ Di$Fiore$ PP$ 1993$ Eps8$ a$ substrate$ for$ the$ epidermal$ growth$ factor$receptor$kinase$enhances$EGFBdependent$mitogenic$signals$EMBO$J$12$3799ndash3808$
Fontanesi$ L$ Calograve$ DG$ Galimberti$ G$ Negrini$ R$ Marino$ R$ Nardone$ A$AjmoneBMarsan$ P$ Russo$ V$ 2014$ A$ candidate$ gene$ association$ study$
71
$
for$ nine$ economically$ important$ traits$ in$ Italian$ Holstein$ cattle$ Anim$
Genet$nandashna$doi101111age12164$
Fowler$KJ$Walker$F$Alexander$W$Hibbs$ML$Nice$EC$Bohmer$RM$Mann$
GB$ Thumwood$ C$ Maglitto$ R$ Danks$ JA$ 1995$ A$ mutation$ in$ the$
epidermal$growth$factor$receptor$in$wavedB2$mice$has$a$profound$effect$
on$ receptor$ biochemistry$ that$ results$ in$ impaired$ lactation$ Proc$ Natl$
Acad$Sci$92$1465ndash1469$doi101073pnas9251465$
Gianola$ D$ Hospital$ F$ Verrier$ E$ 2013$ Contribution$ of$ an$ additive$ locus$ to$
genetic$variance$when$inheritance$is$multiBfactorial$with$implications$on$
interpretation$ of$ GWAS$ Theor$ Appl$ Genet$ 126$ 1457ndash1472$
doi101007s00122B013B2064B2$
Gillette$ M$ Bray$ K$ Blumenthaler$ A$ VargoBGogola$ T$ 2013$ P190B$ RhoGAP$
Overexpression$ in$ the$ Developing$Mammary$ Epithelium$ Induces$ TGFβB
dependent$ Fibroblast$ Activation$ PLoS$ ONE$ 8$ e65105$
doi101371journalpone0065105$
Grisart$B$Coppieters$W$Farnir$F$Karim$L$Ford$C$Berzi$P$Cambisano$N$
Mni$ M$ Reid$ S$ Simon$ P$ Spelman$ R$ Georges$ M$ Snell$ R$ 2002$
Positional$Candidate$Cloning$of$a$QTL$ in$Dairy$Cattle$ Identification$of$a$
Missense$Mutation$ in$the$Bovine$DGAT1$Gene$with$Major$Effect$on$Milk$
Yield$ and$ Composition$ Genome$ Res$ 12$ 222ndash231$
doi101101gr224202$
Hubbard$NE$Chen$QJ$Sickafoose$LK$Wood$MB$Gregg$ JP$Abrahamsson$
NM$ Engelberg$ JA$ Walls$ JE$ Borowsky$ AD$ 2013$ Transgenic$
Mammary$Epithelial$Osteopontin$(Spp1)$Expression$Induces$Proliferation$
and$ Alveologenesis$ Genes$ Cancer$ 4$ 201ndash212$
doi1011771947601913496813$
Hu$ ZBL$ Park$ CA$ Wu$ XBL$ Reecy$ JM$ 2013$ Animal$ QTLdb$ an$ improved$
database$tool$for$livestock$animal$QTLassociation$data$dissemination$in$
the$ postBgenome$ era$ Nucleic$ Acids$ Res$ 41$ D871ndashD879$
doi101093nargks1150$
Jeon$et$al$ B$2008$ B$ProtoBoncogene$FBIB1$ (PokemonZBTB7A)$Represses$Trpdf$
nd$
72
$
Jiang$ L$ Liu$ J$ Sun$ D$ Ma$ P$ Ding$ X$ Yu$ Y$ Zhang$ Q$ 2010$ Genome$Wide$
Association$ Studies$ for$ Milk$ Production$ Traits$ in$ Chinese$ Holstein$
Population$PLoS$ONE$5$e13661$doi101371journalpone0013661$
Kong$ PBZ$ Yang$ F$ Li$ L$ Li$ XBQ$ Feng$ YBM$ 2013$Decreased$FOXF2$mRNA$
Expression$ Indicates$ EarlyBOnset$ Metastasis$ and$ Poor$ Prognosis$ for$
Breast$ Cancer$ Patients$ with$ Histological$ Grade$ II$ Tumor$ PLoS$ ONE$ 8$
e61591$doi101371journalpone0061591$
Lee$SBU$Maeda$T$2012$POKZBTB$proteins$an$emerging$ family$of$proteins$
that$ regulate$ lymphoid$ development$ and$ function$ Immunol$ Rev$ 247$
107ndash119$
Lemay$ DG$ Lynn$ DJ$ Martin$ WF$ Neville$ MC$ Casey$ TM$ Rincon$ G$
Kriventseva$ EV$ Barris$ WC$ Hinrichs$ AS$ Molenaar$ AJ$ 2009$ The$
bovine$lactation$genome$ insights$ into$the$evolution$of$mammalian$milk$
Genome$Biol$10$R43$
Liu$JZ$Mcrae$AF$Nyholt$DR$Medland$SE$Wray$NR$Brown$KM$2010$A$
versatile$ geneBbased$ test$ for$ genomeBwide$ association$ studies$ Am$ J$
Hum$Genet$87$139$
Liu$S$Dontu$G$Mantle$ID$Patel$S$Ahn$N$Jackson$KW$Suri$P$Wicha$MS$
2006$Hedgehog$Signaling$and$BmiB1$Regulate$SelfBrenewal$of$Normal$and$
Malignant$ Human$ Mammary$ Stem$ Cells$ Cancer$ Res$ 66$ 6063ndash6071$
doi1011580008B5472CANB06B0054$
Lu$ J$ Holmgren$ A$ 2014$ The$ thioredoxin$ superfamily$ in$ oxidative$ protein$
folding$ Antioxid$ Redox$ Signal$ 140202091634000$
doi101089ars20145849$
Luna$ A$ Nicodemus$ KK$ 2007$ snpplotter$ an$ RBbased$ SNPhaplotype$
association$and$linkage$disequilibrium$plotting$package$Bioinforma$Oxf$
Engl$23$774ndash776$doi101093bioinformaticsbtl657$
McCarthy$ MI$ Abecasis$ GR$ Cardon$ LR$ Goldstein$ DB$ Little$ J$ Ioannidis$
JPA$ Hirschhorn$ JN$ 2008$ GenomeBwide$ association$ studies$ for$
complex$traits$consensus$uncertainty$and$challenges$Nat$Rev$Genet$9$
356ndash369$doi101038nrg2344$
Meredith$ BK$ Kearney$ FJ$ Finlay$ EK$ Bradley$ DG$ Fahey$ AG$ Berry$ DP$
Lynn$ DJ$ 2012$ GenomeBwide$ associations$ for$ milk$ production$ and$
73
$
somatic$cell$score$in$HolsteinBFriesian$cattle$in$Ireland$BMC$Genet$13$21$doi1011861471B2156B13B21$
Minozzi$ G$Nicolazzi$ EL$ Stella$ A$ Biffani$ S$Negrini$ R$ Lazzari$ B$ AjmoneBMarsan$ P$Williams$ JL$ 2013$ Genome$Wide$ Analysis$ of$ Fertility$ and$Production$ Traits$ in$ Italian$ Holstein$ Cattle$ PLoS$ ONE$ 8$ e80219$doi101371journalpone0080219$
Nicolazzi$EL$Picciolini$M$Strozzi$F$Schnabel$RD$Lawley$C$Pirani$A$Brew$F$ Stella$ A$ 2014$ SNPchiMp$ a$ database$ to$ disentangle$ the$ SNPchip$jungle$ in$ bovine$ livestock$ BMC$ Genomics$ 15$ 123$ doi1011861471B2164B15B123$
Purcell$ S$ Neale$ B$ ToddBBrown$ K$ Thomas$ L$ Ferreira$ MAR$ Bender$ D$Maller$J$Sklar$P$de$Bakker$PIW$Daly$MJ$Sham$PC$2007$PLINK$a$tool$ set$ for$ wholeBgenome$ association$ and$ populationBbased$ linkage$analyses$Am$J$Hum$Genet$81$559ndash575$doi101086519795$
Qiu$ S$ Lai$ L$ 2013$ Antibacterial$ properties$ of$ recombinant$ human$ nonBpancreatic$secretory$phospholipase$A2$Biochem$Biophys$Res$Commun$441$453ndash456$doi101016jbbrc201310092$
Quinlan$AR$Hall$IM$2010$BEDTools$a$flexible$suite$of$utilities$for$comparing$genomic$ features$ Bioinformatics$ 26$ 841ndash842$doi101093bioinformaticsbtq033$
RamiacuterezBTorres$ A$ BarceloacuteBBatllori$ S$ MartiacutenezBBeamonte$ R$ Navarro$ MA$Surra$ JC$ Arnal$ C$ Guilleacuten$N$ Aciacuten$ S$ Osada$ J$ 2012$ Proteomics$ and$gene$ expression$ analyses$ of$ squaleneBsupplemented$ mice$ identify$microsomal$thioredoxin$domainBcontaining$protein$5$changes$associated$with$ hepatic$ steatosis$ J$ Proteomics$ 77$ 27ndash39$doi101016jjprot201207001$
Raven$ LBA$ Cocks$ BG$ Pryce$ JE$ Cottrell$ JJ$ Hayes$ BJ$ 2013$ Genes$ of$ the$RNASE5$pathway$ contain$ SNP$ associated$with$milk$ production$ traits$ in$dairy$cattle$Genet$Sel$Evol$45$25$doi1011861297B9686B45B25$
Scotti$E$Fontanesi$L$Schiavini$F$La$Mattina$V$Bagnato$A$Russo$V$2010$DGAT1$ pK232A$ polymorphism$ in$ dairy$ and$ dual$ purpose$ Italian$ cattle$breeds$Ital$J$Anim$Sci$9$doi104081ijas2010e16$
74
$
Stefanon$ B$ Colitti$ M$ Gabai$ G$ Knight$ CH$ Wilde$ CJ$ 2002$ Mammary$
apoptosis$and$lactation$persistency$in$dairy$animals$J$Dairy$Res$69$37ndash
52$
Stegh$ AH$ DePinho$ RA$ 2011$ Beyond$ effector$ caspase$ inhibition$ Bcl2L12$
neutralizes$ p53$ signaling$ in$ glioblastoma$ Cell$ Cycle$ 10$ 33ndash38$
doi104161cc10114365$
Taberlet$ P$ Valentini$ A$ Rezaei$ HR$ Naderi$ S$ Pompanon$ F$ Negrini$ R$
AjmoneBMarsan$ P$ 2008$ Are$ cattle$ sheep$ and$ goats$ endangered$
species$Mol$Ecol$17$275ndash284$doi101111j1365B294X200703475x$
Team$RC$ 2014$ R$ A$ Language$ and$Environment$ for$ Statistical$ Computing$ R$
Foundation$for$Statistical$Computing$Vienna$Austria$
Thomadaki$H$ Talieri$M$ Scorilas$ A$ 2007$ Prognostic$ value$ of$ the$ apoptosis$
related$genes$BCL2$and$BCL2L12$in$breast$cancer$Cancer$Lett$247$48ndash
55$doi101016jcanlet200603016$
Valdehita$ A$ Bajo$ AM$ FernaacutendezBMartiacutenez$ AB$ Arenas$ MI$ Vacas$ E$
Valenzuela$ P$ RuiacutezBVillaespesa$ A$ Prieto$ JC$ Carmena$ MJ$ 2010$
Nuclear$ localization$ of$ vasoactive$ intestinal$ peptide$ (VIP)$ receptors$ in$
human$ breast$ cancer$ Peptides$ 31$ 2035ndash2045$
doi101016jpeptides201007024$
Wang$T$Liu$NS$Seet$LBF$Hong$W$2010$The$Emerging$Role$of$VHS$DomainB
Containing$Tom1$Tom1L1$and$Tom1L2$in$Membrane$Trafficking$Traffic$
11$1119ndash1128$doi101111j1600B0854201001098x$
Wang$ X$Wurmser$ C$ Pausch$ H$ Jung$ S$ Reinhardt$ F$ Tetens$ J$ Thaller$ G$
Fries$R$2012$Identification$and$Dissection$of$Four$Major$QTL$Affecting$
Milk$Fat$Content$in$the$German$HolsteinBFriesian$Population$PLoS$ONE$7$
e40711$doi101371journalpone0040711$
Xu$S$2003$Theoretical$Basis$of$the$Beavis$Effect$Genetics$165$2259ndash2268$
Zimin$AV$Delcher$AL$Florea$L$Kelley$DR$Schatz$MC$Puiu$D$Hanrahan$
F$ Pertea$ G$ Tassell$ CPV$ Sonstegard$ TS$ Marccedilais$ G$ Roberts$ M$
Subramanian$ P$ Yorke$ JA$ Salzberg$ SL$ 2009$ A$ wholeBgenome$
assembly$ of$ the$ domestic$ cow$ Bos$ taurus$ Genome$ Biol$ 10$ R42$
doi101186gbB2009B10B4Br42$
75
$
Zu$X$Ma$J$Liu$H$Liu$F$Tan$C$Yu$L$Wang$J$Xie$Z$Cao$D$Jiang$Y$2011$ProBoncogene$ Pokemon$ promotes$ breast$ cancer$ progression$ by$upregulating$ survivin$ expression$ Breast$ Cancer$ Res$ 13$ R26$doi101186bcr2843$ $
76
$
Supplementaryfiles$
You$can$find$Chapter$͵$supplementary$files$at$
- DocTa$(Doctoral$Thesis$Archive)$(httptesionlineunicattit)$
- Google$Drive$(httptinyurlcomPhDMMChapter03)$or$scan$the$QR$code$
$
$
SupplementaryFigureS1Multidimensionalscaling
Spatial$distribution$of$genetic$structure$in$the$analyzed$breeds$$
$
Supplementary$FigureS2GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$Italian$Brown$
$
Supplementary$FigureS3GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$Italian$Holstein$
$
Supplementary$FigureS4GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$HolsteinDGAT$
$
Supplementary$FigureS5GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$Italian$Simmental$
77
$
$Supplementary$FigureS6LDdecayGenomeBwide$LD$decay$for$the$three$breeds$(Holstein$Simmental$Brown)$$Supplementary$FigureS7DescriptiveplotsDescriptive$statistics$for$the$Italian$Brown$Italian$Holstein$and$Italian$Simmental$$Supplementary$FigureS8DGAT1LDFocus$on$the$LD$of$the$DGAT1$region$in$all$the$datasets$(Brown$Holstein$HolsteinDGAT$Simmental)$$Supplementary$TableS1SNPfactNumber$of$subjects$and$SNPs$after$QC$$Supplementary$TableS2GeneBasedAssociationresultsFull$results$of$the$significant$genes$associated$with$the$traits$in$all$the$datasets$$ Sheet$1$Brown$ndash$fat$percentage$$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS3SingleSNPAssociationresultsFull$results$of$the$SNP$that$overcome$the$suggestive$threshold$$$ Sheet$1$Brown$ndash$fat$percentage$
78
$
$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS4GenesandtraitsGeneWise$pBvalues$of$the$discussed$genes$in$all$the$analysis$In$grey$are$evidenced$the$pBvalues$that$overcome$the$significance$threshold$Genes$are$alphabetically$listed$$Supplementary$TableS5GenesinQTLsGenes$from$the$geneBbased$association$mapping$into$known$QTLs$$$$$
79
CHAPTER(4(
MUGBAS(a(species(free(gene(based(association(suite(
S(Capomaccio1(M(Milanesi1(L(Bomba1(et(al(1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122ItalyTheseauthorscontributedequallytothiswork
____________________ApplicationnotesubmittedtoBioinformatics
Abstract(Theassociationstudiesbetweengenomewidemarkersandphenotypes(GWAS)
arenowwidelyusedinmodelandnonmodelspeciesHowevertoolsthatmerge
singlemarkerresultsandtheirprobabilityofbeingassociatedtofunctionalunits
(typicallygenes)areseldomdevelopedforspeciesotherthanhuman
Herewe presentMUGBAS a suite that calculates a generegionbased pvalue
fromagivenannotationsinglemarkerGWAresultandgenotypedatawiththe
objective to ease postGWAS analysis The software is species and annotation
independentfasthighlyparallelizedandreadyforhighdensitymarkerstudies
Availabilityandimplementationhttpsbitbucketorgcapemastermugbas
Introduction(With the availability of highdensity SNP panels Genome Wide Association
Studies (GWAS) have become the gold standard for dissecting the biology of
complextraits(McCarthyetal2008)Thisistrueforhumansandotherspecies
TheunderlyingideaofGWASissimplefindmarkertraitassociationsexploiting
the linkage disequilibrium (LD) that exists between the causative mutation ndash
whichweignorendashandoneormoremarkersofknownchromosomalposition
Whileanumberofwellestablishedapproachescanbeeasilyusedinastandard
GWAS (Aulchenko et al 2007 Kang et al 2010 Purcell et al 2007) to find
marker(s) associated to a particular phenotype several issues have to be
accountedforduringtheupanddownstreamanalyses
Firstly genome wide studies have to take into account multiple testing If
stringent statistical thresholds are applied a proportion ofmoderate or small
effectsactuallyassociatedwiththetraitwillbedisregardedThereforereducing
the number of tests while maintaining all the information provided by high
densitypanelswouldbeaclearadvantageInadditiononceasignificantsignalis
detected different strategiesmaybeused to tag the candidate causative gene
Sometimesthegeneclosertothepeaksignalisselectedinothercasesallgenes
withinaflankingregionofanarbitrarysizeofthesignificantSNP(s)areincluded
82
infurtheranalysesThesechoicesmayendupinbiasresultseitherretaininga
largenumberofnottrulyassociatedgenesordisregardingthecorrectones
Somemethodshaverecentlybeenproposedtoovercometheselimitationsusing
aprioriknowledgeastheldquocandidatepathwayrdquoanalysis(Ravenetal2013)and
the ldquogenebasedrdquo association strategies (Akula et al 2011 Cantor et al 2010
Liuetal2010)TheseapproachesrestrictGWAstudiesonlytoknowngenesor
genesubsetsresizingthemultipletestingproblemandincreasingthepowerof
theanalyses
Usually bioinformatics pipelines are developed for the analysis of human and
modelorganismsandarenotadaptedtononmodelorganismThisisthecaseof
the VEGAS software (Liu et al 2010 Yang et al 2014) developed only for
humansandonlyforgenebasedanalyses
Here we introduce MUGBAS [MUlti species GeneBased Association Suite] a
softwarebasedonVEGASthatusesGWASdataandagivenannotation(typically
genesbutanyregionmaybesuitable)toestimategenebasedandregionbased
associationpvaluesThesuite is speciesandannotation free ready toanalyze
highdensitydatafromanyexperimentaldesigns
Methods(MUGBASisasuiteofscriptsbuiltinBashRPythonandPerlThesuiterequires
several prerequisites that can be easily fulfilled in a LinuxUnixMac
environmentwith a few command lines Further information on how tomeet
theserequirementsaredetailedintheonlinerepository
MUGBAS suite can be invoked launching the Bash wrapper that checks
dependenciesandstoresuserchoices
Required input files are MAPPED files in PLINK format with genotype data
from(atleast200individualsarerecommendedforaproperLDcalculation)an
annotation file inBED format (chromosome startingposition endingposition
nameandusercustomfields)ofthedesiredgenomefeatureandtheGWASresult
file with mandatory headers ldquoSNPNAMErdquo and ldquoPVALUErdquo Multiple association
resultscanbetestedatthesametime ifprovidedinseparatefiles inthesame
directoryAtrialdatasetisdownloadablealongwiththesoftwarereleasePlease
83
notethatforsimplicityweprovideageneannotationbutanyBEDfilewithany
desiredfeaturecanbeused
The program starts the analysis assigning SNPs to any feature of the given
annotationusingbedtools(QuinlanandHall2010)Thisstepiscustomizableby
user defined boundaries (up and downstream) in order to catch regulatory
regionsthatcouldbeinhighLDwiththecurrentgeneregion
Inordertospeeduptheanalysistheprogramsubsamples200individualsfrom
theinputdatasetWethenimplementedtwolevelsofparallelizationoneforthe
LDandonefortheldquogenewiserdquoorldquoregionwiserdquopvaluecalculationThelatteris
particularlyusefulanalyzingmorethanoneGWAresultsfromthesamedataset
BrieflyMUGBASfirstsplitstheLDcalculationinncores(asdefinedbytheuser)
using the foreach doParallel packages (Analytics andWeston 2014a 2014b)
and then distributes traits across cores using GNU Parallel and foreach
doParallel
SinceLDcalculationisaslowprocessVEGASbenefitsofHapMapphase2datato
speed up the processwhile preserving the option for a user defined (human)
population using PLINK In order to expand these analysis to any species LD
calculationinMUGBASisimplementedwithther2fast()functionfromGenABEL
package(Aulchenkoetal2007)whilethegenewisepvalueiscalculatedasin
theVEGASapproachBrieflythegenewiseteststatisticisthesumofthenupper
tailchisquaredvaluesofaSNPsubsetwithonedegreeoffreedom(wheren is
thenumberofSNPmappinginthegivengene)Withperfectlinkageequilibrium
the genewise pvalue would be the onetailed pvalue of a chisquare
distributionwithndegreesof freedomOtherwisemorelikelythedistribution
of the pvalues has to be evaluated (with simulations) andweighted (with LD
values)(Liuetal2010)
Once every generegion has its ldquogenewiserdquo pvalue a FDR qvalue (False
Discovery Rate) statistics is calculated with the base R function padjust()
Results are stored and enrichedwith ldquomanhattanrdquo and ldquoundergroundrdquo plots of
theldquogenewiserdquopvalueandtheqvaluerespectively
84
Results(MUGBAS output is generegionoriented with useful information about the
location of the analyzed feature the ldquoBest SNPrdquo within that feature with its
genomiccoordinatesandtheoriginalpvaluethegeneorregionwisestatistics
(pvalue number of simulations and qvalue) the custom annotation of that
particularentryandthepositionoftheSNPwithrespecttothefeatureandthe
decidedboundariesAll this informationareuseful to track theeffectofhighly
significant markers that could map into multiple region of interest and
effectively pinpoint the most probable location to focus on with data mining
processes
A typical analysis performed with a highdensity panel (gt600K markers)
distributed in 10 midperformance cores (Intel Xeon X5675 307 GHz) on
threetraitssimultaneouslytakesaboutfourhours
Discussion(Generegionbased methods are now widely used and accepted in genomic
research However non modelspecies suffer for the lack of availability of
specifictoolstoenhancediscoveryfromgenomewidedataMUGBASprovidesa
robust and accepted statistics to researchers dealing with species other than
humantoeasilyjumpfromassociatedmarkerstothedesiredfunctionalunits
Acknowledgements(((WewouldliketothankDrJimmyLiuforhisadvicesinbuildingthesuite
Reference(Akula N Baranova A Seto D Solka J NallsMA Singleton A Ferrucci L
TanakaTBandinelli SChoYSKimYJLee JYHanBGBipolar
Disorder Genome Study (BiGS) Consortium The Wellcome Trust Case
85
ControlConsortiumMcMahonFJ2011ANetworkBasedApproachtoPrioritize Results from GenomeWide Association Studies PLoS ONE 6e24220doi101371journalpone0024220
Analytics R Weston S 2014a doParallel Foreach parallel adaptor for theparallelpackage
AnalyticsRWestonS2014bforeachForeachloopingconstructforRAulchenkoYSdeKoningDJHaleyC2007GenomewideRapidAssociation
Using Mixed Model and Regression A Fast and Simple Method ForGenomewidePedigreeBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614
Cantor RM Lange K Sinsheimer JS 2010 Prioritizing GWAS Results AReview of Statistical Methods and Recommendations for TheirApplicationAmJHumGenet866ndash22doi101016jajhg200911017
KangHMSulJHServiceSKZaitlenNAKongSFreimerNBSabattiCEskin E 2010 Variance component model to account for samplestructure in genomewide association studies Nat Genet 42 348ndash354doi101038ng548
LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKM2010Aversatile genebased test for genomewide association studies Am JHumGenet87139
McCarthy MI Abecasis GR Cardon LR Goldstein DB Little J IoannidisJPA Hirschhorn JN 2008 Genomewide association studies forcomplextraitsconsensusuncertaintyandchallengesNatRevGenet9356ndash369doi101038nrg2344
Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender DMallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKatool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795
QuinlanARHallIM2010BEDToolsaflexiblesuiteofutilitiesforcomparinggenomic features Bioinformatics 26 841ndash842doi101093bioinformaticsbtq033
86
Raven LA Cocks BG Pryce JE Cottrell JJ Hayes BJ 2013 Genes of the
RNASE5pathway contain SNP associatedwithmilk production traits in
dairycattleGenetSelEvol4525doi101186129796864525
87
CHAPTER5
Imputationaccuracyisrobusttocattlereferencegenomeupdates
MMilanesi1etal1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122Italy
____________________ShortCommunicationacceptedinAnimalGenetics
SummaryGenotype imputation is routinely applied in a large number of cattle breeds Imputation
hasbecomeaneedduetothelargenumberofSNParrayswithvariabledensity(currently
from2900to777962SNPs)Althoughmanyauthorshavestudiedtheeffectofdifferent
statisticalmethodsonimputationaccuracytheimpactofa(likely)changeinthereference
genomeassemblyonimputationfromlowertohigherdensityhasnotbeendeterminedso
far In this work 1021 Italian Simmental SNP genotypes were reJmapped on the three
mostrecentreferencegenomeassembliesFour imputationmethodswereusedtoassess
theimpactofanupdateinthereferencegenomeAsexpectedthefourmethodsbehaved
differentlywith largedifferences in termsof accuracyUpdatingSNPcoordinateson the
three tested cattle reference genome assemblies determined only a slight variation on
imputationresultswithinmethod
KeywordsImputationreferencegenomeassemblySNPbovine
MaintextFromthepublicationof the firstbovinesinglenucleotidepolymorphism(SNP)array the
wholegenomicscenarioofthisspeciesevolvedrapidlyNowadaysseveralSNPpanelshave
been designed with a range of densities mapping on different reference genome
assemblies and produced by different manufacturers When running analyses with SNP
arrays of different densities or from different commercial companies (eg Illumina and
Affymetrix) an imputation step is often required The impact of different statistical
methods on imputation accuracy has already been assessed (Pei et al 2008) (Marchini
andHowie2010)(Maetal2013)Surprisinglyuptonowtherearenotavailablestudies
investigating the impact of an update on the reference genome assembly on imputation
accuracy This assessment would be of fundamental importance considering that reJ
mappingSNPsdeterminesarearrangementofthehaplotypestructureofthegenotypes(in
somecasesSNPsarereJmappedtoadifferentchromosome)andthatthebovinereference
assembly is likely toberepeatedlyupdatedover timeThisworkwasaimedatassessing
the effect of different reference genome assemblies in cattle As test casewe compared
four different imputation methods from low (Illumina BovineLDv10 Beadchip) to
90
mediumhigh (Illumina BovineSNP50v2 Beadchip) density in the Italian Simmental
populationForeachmethodweassessedthe impacton imputationaccuracyofmapping
SNPs on the BTAU42 BTAU46 (wwwhgscbcmeduotherJmammalsbovineJgenomeJ
project)orUMD31(Ziminetal2009)cattlereferencegenomeassemblies
Atotalof900and121SimmentalbullsweregenotypedwiththeIlluminaBovineSNP50v1
and v2 Beadchips (Illumina Inc San Diego CA) respectively Illumina BovineSNP50v2
Beadchip (hereinafter named 54k) the official reference chip for the Italian Simmental
breedersassociation(ANAPRI)wasconsideredasreferenceSNPpanelTheSNPchiMpv1
tool(Nicolazzietal2014)wasusedtoobtainSNPcoordinatesonBTAU42UMD31and
BTAU46genomeassembliesForeachassemblySNPsnotcontainedinthe54kunmapped
orinthesexualchromosomeswerediscardedThefinaldatasetcontained5118952886
and51290SNPsinBTAU42UMD31andBTAU46respectively
FourimputationmethodswereconsideredPedImpute(Nicolazzietal2013)FindHapv2
(VanRadenetal2011)FImpute(Sargolzaeietal2014)andBeaglev332(Browningand
Browning2007)Thefirstthreeusepedigreeandpopulation(ie linkagedisequilibrium
LD) informationwhereas the fourth uses only LD information Performance of the four
methods testedacrossgenomicassemblieswas comparedusingwrongly imputedalleles
(ldquoerrorsrdquo)andallelicsquaredcorrelation(ldquoallelicJR2rdquo)statisticsAllelicJR2wasdefined
asthesquaredcorrelationbetweenimputedandtrue(eggenotyped)allelesmultipliedby
100 as in Browning amp Browning (2009) The 878 oldest sires were used as training
population whereas the 143 youngest bulls were used as test population SNPs not in
commonwith the lowdensity panelwere set tomissing in the test population The test
population was split in four scenarios individuals with sire and maternal grandsire
genotypedinthereferencepopulation(scenarioAJ84animals)individualswithonlythe
sire genotyped (scenario B J 34 animals) individuals with only the maternal grandsire
genotyped(scenarioCJ13animals)ornoneofcloserelativesgenotyped(scenarioDJ12
animals)Thepopulationusedwashighlystructuredwithnearlyhalfofthebullspresent
in the dataset (490) sired by 115 genotyped sires In addition 35 out of these 115
genotypedsiresweresiredby22bulls(eggrandsires)alsopresentinthedataset
DifferencesacrossassembliesarenothomogeneousFromBTAU42toBTAU4646SNPs
weremapped to different chromosomes and the average difference in physical position
withinchromosomeswasnearly025MbpHowevertherewasnosubstantialdifferencein
the order of markers In fact not considering unmapped SNPs on either of the
aforementionedassembliesadifferenceinthesequentialorderwasobservedforonly134
91
single (ie spread throughout the genome) SNPs On the contrary from UMD31 to
BTAU46 222 SNPsweremapped to different chromosomes and amuch larger average
differenceinSNPsphysicalpositionwasobserved(iemorethan2Mbp)Alsothenumber
ofSNPswithdifferentsequentialorderwasmuchlarger(ie1893SNPs)Howevermore
than50oftherearrangementsinvolvedblocksfrom3tonearly100markersAsaresult
slight differences in overall imputation accuracy from BTA46 to UMD31 rather than
between the two BTAU assemblies were expected across methods On the contrary if
rearrangements are due to issues in the assembly of scaffolds theymight have a direct
(negative) impact on local imputation performances In fact a preliminary analysis on
BTA1consideringlocalerroracrossassembliesandmethodssuggestatleasttwosuch
SNPsclustersat44and138MbponUMD31(FiguresS1S2andS3)
Considering allmethods and scenarios amaximum average variation of 02 inerrors
(Table1)andof09inallelicJR2(Table2)wasobservedacrossreferenceassembliesThe
standarddeviation(SD)oftheimputationaccuracywasstableacrossreferenceassemblies
withinmethodsOnthecontraryalargervariabilitywasobservedacrossmethods
PedImpute and FindHap showed a relatively large average allelicJR2 variability across
scenarios evaluated over the same reference assembly Only PedImpute showed a large
variation in SD across scenarios in errors and allelicJR2 with high variability on
scenarios C and D showing a low performance of thismethodwhen there are no close
relatives(egsiredam)genotypedinthereferencedatasetFImputeinsteadachievedthe
highestaccuracyacrossallreferencegenomesandscenariossurprisinglyobtainingitsbest
resultsJerrorsof14(06SD)andallelicJR2of955(26SD)JinscenarioCwhereonly
onerelativelycloserelativeisgenotyped(maternalgrandJsire)FImputeappeartobeable
todetectshorterhaplotypesfromdistantrelativesmoreaccuratelythananyothermethod
tested in this study thus it is lessdependenton theavailabilityofpedigree information
thantheothertwopedigreeJbasedmethods(Sargolzaeietal2010)
The large variability across methods within assembly was expected although the
differences obtained among PedImpute FindHap and Beagle were larger in this study
comparedtoasimilarstudyinItalianHolstein(Nicolazzietal2013)Thiscoulddependon
a number of factors First the sample size of the test population is small (especially for
groups C and D where less than 15 individuals were present) Second the interaction
between population structure and imputationmethodsmay have a role PedImpute and
FindHap rely much more heavily on pedigree information than FImpute and Beagle
assumesall individualsareunrelatedHowever the lattertwomethods indirectlybenefit
92
greatly of from highly structured populations with large number of relatives ndash close or
distantndashpresent inthereferencepopulation(Sargolzaeietal2014)Anotherinteresting
factor that influences the imputation accuracy may be the persistence of LD at long
distances this is much lower in the Italian Simmental than in the Italian Holstein
population(FigureS4)Informationonrelatives(eitherpedigreeinformationorgenotyped
relativesinthereferencepopulation)isexpectedtoplayabiggerroleifLDisconservedat
shorter distances This hypothesis is supported by the fact that although there is high
variability acrossmethods better performances of the pedigreeJbasedmethods are still
observedinscenariosAandB(andCforFImpute)
Theslightdifferencesinimputationaccuracyobservedacrossreferenceassembliesmaybe
explained by the aforementioned small variation in the SNP blocks across reference
assemblies irrespectively of the rearrangements of single SNPs Considering that SNP
blocksaremostlymaintainedweexpectthatbreedswithlargerLDpersistenceatlonger
distances (ie Holstein Brown) will obtain even better imputation accuracy across
assembliescomparedtowhatreportedinthisstudyinItalianSimmentalHoweverastill
relativelyunexploredaspectof imputationperformance is thedistributionof imputation
errors across the genomeNot considering genotyping errors (which are expected to be
randomly distributed in the genome) a cluster of consecutive markers with high
percentage of imputation error across methods are most probably identifying missJ
assembledregions in thegenome(FiguresS1S2andS3)Althoughthe impactonoverall
imputation accuracy is limited as results in this study show these errorsmight have a
large impact on any analyses relying on specific genomic coordinates (eg genomeJwide
associationstudies)Morecomprehensiveresultsincludingannotationstudiesandwhole
genomesequencingdataarerequiredtoconfirmandinvestigatetheseobservations
Toconcludeonlyasmalleffectofupdatingthereferencegenomeassemblyonimputation
accuracywas observed in Italian Simmental On the other hand as already reported by
other studies the variability of imputation accuracy estimates is much larger across
imputation methods The choice of the most suitable imputation method based on the
breedgeneticbackgrounditscharacteristicsandontheavailabledataisthereforecritical
AcknowledgementsThe research leading to these results has received funding from the European Unions
Seventh Framework Programme (FP72007J2013) under grant agreement ndeg 289592 ndash
93
Gene2FarmPartofthegenotypicdatausedinthisstudywasprovidedbyprojectldquoSelMolrdquo
fundedbytheItalianMinistryofAgriculture(MiPAAF)MMwassupportedbytheDoctoral
SchoolontheAgroJFoodSystem(Agrisystem)of theUniversitagraveCattolicadelSacroCuore
(Italy) FB was supported by the Marie Curie European Reintegration Grant
ldquoNEUTRADAPTrdquo(FP7JPEOPLEJ2009JRG)
ReferencesBrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingandMissingJ
Data Inference for WholeJGenome Association Studies By Use of Localized
HaplotypeClusteringAmJHumGenet811084ndash1097
MaPBroslashndumRFZhangQLundMSSuG2013Comparisonofdifferentmethods
forimputinggenomeJwidemarkergenotypesinSwedishandFinnishRedCattleJ
DairySci964666ndash4677doi103168jds2012J6316
Marchini J Howie B 2010 Genotype imputation for genomeJwide association studies
NatRevGenet11499ndash511doi101038nrg2796
NicolazziELBiffaniSJansenG2013ShortcommunicationImputinggenotypesusing
PedImputefastalgorithmcombiningpedigreeandpopulationinformationJournal
ofDairyScience962649ndash2653doi103168jds2012J6062
NicolazziELPiccioliniMStrozziFSchnabelRDLawleyCPiraniABrewFStella
A 2014 SNPchiMp a database to disentangle the SNPchip jungle in bovine
livestockBMCGenomics15123doi1011861471J2164J15J123
PeiYLiJZhangLPapasianCJDengH2008Analysesandcomparisonofaccuracyof
differentgenotypeimputationmethodsPLoSONE3551
VanRadenPMOrsquoConnellJRWiggansGRWeigelKA2011Genomicevaluationswith
manymoregenotypesGenetSelEvol43
94
Tableamp1ampIm
putationampaccuracyampacrossampassem
bliesampforampPedImputeampFindH
apampFImputeampandampBeagleamppercentageampofampwronglyampim
putedamp
allelesamp
PedImputeamp
FindHapamp
FImputeamp
Beagleamp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
Scenarioamp
Namp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
A1amp
84
20
07
21
07
21
07
32
09
32
09
32
09
15
09
15
09
15
09
24
11
25
11
25
11
B2amp
34
23
08
23
09
22
08
36
11
36
12
36
11
15
07
15
09
14
06
23
08
23
08
23
08
C3amp
13
98
41
98
41
98
41
51
10
52
10
51
10
14
06
14
06
14
06
23
06
23
06
24
06
D4 amp
12
88
49
89
49
88
49
54
11
56
12
54
11
16
11
17
12
16
11
26
15
26
15
26
14
1 Sireandm
aternalgrandsiregenotyped
2 Siregenotypedmate
rnalgrandsireno
tgenotyped
3 Sireno
tgenotypedmate
rnalgrandsiregenotyped
4 Sireandm
aternalgrandsireno
tgenotyped
Tableamp2ampIm
putationampaccuracyampacrossampassem
bliesampforampPedImputeampFindH
apampFImputeampandampBeagleampallelicJR2amp
PedImputeamp
FindHapamp
FImputeamp
Beagleamp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
Scenarioamp
Namp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
A1amp
84
941
20
940
20
941
20
907
24
908
25
907
24
952
26
952
27
952
27
925
33
923
33
925
33
B2amp
34
935
24
934
25
935
24
896
30
896
34
895
32
954
20
954
21
954
20
929
24
928
25
928
24
C3amp
13
759
89
761
88
758
87
848
33
845
30
848
32
955
18
955
19
955
19
931
19
928
19
927
19
D4 amp
12
782
113
777
113
781
112
841
32
832
35
839
33
949
34
947
38
949
35
921
44
919
45
921
44
1 Sireandm
aternalgrandsiregenotyped
2 Siregenotypedmate
rnalgrandsireno
tgenotyped
3 Sireno
tgenotypedmate
rnalgrandsiregenotyped
4 Sireandm
aternalgrandsireno
tgenotyped
95
Supplementaryfiles
YoucanfindChapterͷsupplementaryfilesat
- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)
- GoogleDrive(httptinyurlcomPhDMMChapter05)orscantheQRcode
SupplementaryFigureS1ErrorpercentageacrossmethodsforBTAU42onBTA1
SupplementaryFigureS2ErrorpercentageacrossmethodsforUMD31onBTA1
SupplementaryFigureS3ErrorpercentageacrossmethodsforBTAU46onBTA1
SupplementaryFigureS4Linkagedisequilibrium(LD)decayoverdistanceinItalian
SimmentalandHolstein
ThedottedlinewithblackdotsindicatesLDdecayoverdistanceinItalianHolstein
whereasthedashedlinewithwhitediamondsindicatesthesamevaluesforItalian
SimmentalItalianHolsteingenotypeswereasubsetofthedatausedinNicolazzietal
(2013)
96
CHAPTER6
QuestfordeleteriousrecessivevariantsinItalianHolsteinbullscombiningexomesequencingandhighdensitySNPanalysis
MMilanesi1etal
1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly
____________________Manuscript
Abstract
Outbreed species carry a genetic load of deleterious recessive variants Those
that are lethal survive only in the heterozygous state in the population while
otherssimplyreducethefitnessofindividualshomozygousforthevariantThe
long=termdestinyofthesemutationsistodecreaseinfrequencyduetonegative
selectionandeventuallydisappearbecauseofgeneticdriftHowever theymay
bemaintainedinlivestockpopulationsandevenincreaseinfrequencywhenin
linkage with favourable alleles for traits under selection or occurring in the
genomeofsireshavinghighgeneticvalueforproductivetraitsWesearchedfor
these variants combining exome sequence data from 20 Italian Holstein bulls
sampledfromoppositetailsofthemaleandfemalefertilityEBVdistributionthe
HighDensity(HD)SNPgenotypingof1009progenytestedItalianHolsteinbulls
and information existing in public databases Exome sequencing detected
202956 variants Among these 5167 were classified as ldquodeleteriousrdquo when
annotatedwithSIFTandAnnovarThesewerefilteredbypickingthosemapping
in regions out of equilibrium and lack of homozygotes in the progeny tested
Holstein bulls or having an asymmetric distribution in bulls plus=variant and
minus=variant for fertility EBV The resulting 193 variants in 163 genes were
furtherassessedbycomparativegenomicshaplotypeanalysisandfunctionFor
eachassessmentapositiveornegativescorewasgiventoevidenceinfavouror
against thevariant tobedeleteriousA totalof35variantshadapositive total
score These genes are involved in development cell motility and immune
responseSomeof themareknowntobeexpressed in the testiclesandappear
importantforspermfertilityinmodelspeciesothersareresponsibleforgenetic
defectsinhumanandmousecarryingmutationssimilartotheonesdetectedin
cattleThesegenevariantsarecandidatetobeamongthosethatconstitutethe
ldquogeneticloadrdquoofdeleteriousrecessivegenesintheHolsteinpopulation
100
Introduction
The presence of deleterious recessive variants is well tolerated by outbred
speciesWhenthesevariantsappeartheymayspreadandsurvivealongtimein
the population in the heterozygous state (Simmons and Crow 1977) It is
estimatedthatoutbredindividualshumanincludedcarryonaverageagenetic
loadofmorethan100loss=of=functionvariantsintheirgenome(MacArthuretal
2012)
Theeffectofdeleterious recessivevariantsappearswhendemographicevents
eggeneticbottlenecksreducethegeneticvariabilityofthepopulationorwhen
relatedindividualsareintentionallymatedforbreedingpurposesInthesecases
recessive deleterious reduce the fitness of individuals carrying them in the
homozygous state and in extreme case induce very severe genetic defects
premature death and abortion (McClure et al 2014 Sahana et al 2013
VanRadenetal2011)Theythereforecontributetoexplainthephenomenonof
inbreeding depression and its negative effect on fertility (Charlesworth and
Willis2009)
The search for deleterious recessive variants is particularly interesting in
Holstein cattle Thisbreed is experiencing anegative genetic trend for fertility
traits (Jamrozik et al 2005) it is worth noting that this is occurring while
inbreedingisunderstrictmonitoringandistheoreticallyincreasingataverylow
pace(about005peryear=Mrodeetal2009)Inadditionmatingisplannedto
minimise relationship between animals and the additive genetic variance for
mosttraitsndashincludingmilkproduction=isapparentlynotdecreasinginspiteof
the selection pressure applied by modern schemes (Biffani et al 2002)
indicatingthatoveralldiversityisonlyslightlydecreasing
SelectiontoincreasefertilityischallengingSelectioncriteriagenerallyadopted
to improve this trait are far from the biological mechanisms that govern
reproduction Inaddition thesemechanismsarecomplexandverysensitive to
environmental(management)conditions(Laneetal2013Wathesetal2014)
Asa result fertility traitshave loworvery lowheritability (Pryceetal1997)
andthereforelowresponsetoselectionandslowgeneticprogress
101
Forthesereasons thequest formolecularvariants influencing fertility traits is
on=goingsincetheavailabilityofmoleculartoolsforDNAanalysisAnumberof
investigations tested the effect of functional candidate genes running genome
wide association analyses to find QTLs for fertility traits using hundreds
microsatellite markers in structured families (Ashwell et al 2004) or tens of
thousand SNP panels in whole populations (Aston and Carrell 2009) The
availability of medium density SNP panels allowed the search for deleterious
recessivesevenwithouttheuseofphenotypictraitsbyscanningthegenomein
searchofregionslackingoneofthetwohomozygousclasses(Sahanaetal2013
VanRaden et al 2011) This approach has highlighted a number of genomic
regions likely carrying deleterious recessives This information has immediate
application inmolecular assisted breeding and a relevant scientific interest in
thesearchofcausativevariantsandassessmentoftheirfunctions
Thedevelopmentofnewsequencingtechnologieshasgreatlydecreasedthecost
of sequencing so that nowadays the sequencing of few target individuals is
becoming the gold standard to seek for causal mutations for rare monogenic
defectsinhumansandotherspecies(ChunandFay2009)Oftenonlytheexome
the protein coding part of the genome is sequenced searching for those
mutationsthatseverelyalterproteinstructureandorfunction(Bamshadetal
2011 Li et al 2013) A typical mammalian exome consists in a few tens of
millionnucleotides(around2ofthereferencegenome)comparedtothefew
billionofthewholegenomeThereforewiththesamenumberofreadsexome
sequencinghastheadvantageofamoreaccurategenotypingdueto thehigher
sequencing depth an easier data management lower computation load and
easier interpretation (Bieseckeretal2011Majewskietal2011) Indeed the
numbervariants found inexomesareone=twoordersofmagnitude lowerthan
thatdetectedinwholegenomesandmutationsaredetectedinannotatedgenes
orimmediatelynearbyOntheotherhandtheexomeapproachcanonlyidentify
variantson annotatedgenesVariants inunknownexonsor that occur innon=
coding regulatory features will be missed (Bamshad et al 2011) In humans
these are turning out to be of outmost importance for gene regulation and
downstreamphenotypic effects (ENCODEConsortium2012) but they are still
verypoorlyannotatedinlivestock
102
Herewesequencedtheexomesof20Holsteinbullsprogenytestedselectedfrom
the tails of male and female fertility EBVs (Estimated Breeding Value) Since
livestockfertilityisacomplextraitandcausativemutationscannotbesoughtby
the approach used in monogenic diseases we integrated the sequence
information with high density SNP chip data obtained on a much larger
populationofItalianHolsteinbullstodetectgenescarryingallelescandidateto
be deleterious recessives and confirm their likely deleterious effect by
comparativegenomics
Materialsandmethods
1SamplingandDNAextraction
Semen(straws)andbloodsampleswereprovidedbyItaliancertificatedartificial
insemination centres (UE Directive 88407CEE) thus ethics committee
approvalforthisstudyisnotrequired
A total of 20 ItalianHolstein progeny tested bullswere sampled for complete
exome sequencing Among them 19were chosen from the tails of female and
male fertilityEBV(EstimatedBreedingValue)distribution (Table1) tohave5
animals fromeach tail One animalwas to be in theminus=variant tail of both
traitsBullswereselectedasunrelatedaspossibleSamplingwaspossiblethanks
tothecollaborationwithANAFI(ItalianHolsteinBreederAssociation)
ANAFIdevelopedageneticevaluationfor fertilitycombiningdifferenttraitsas
explained by Biffani and colleagues (Biffani et al 2005) The traits are ldquodays
from calving to first inseminationrdquo ldquocalving intervalrdquo ldquofirst=service non return
rateto56daysrdquoldquoangularityrdquoandldquomatureequivalentmilkyieldat305daysrdquo
MalefertilityisexpressedasanEstimatedRelativeConceptionRate(ERCR)and
isameasureofconceptionrateofaservicesirerelativetoservicesiresofherd
mates(Soggiuetal2013)
103
Table1FemaleandmalefertilityEBVsofthe20animalssequenced
Animal EBVFemaleFertility EBVMaleFertility GroupfriA01 97 =111 NAfriA02 98 =166 MalLfriA03 115 0 FemHfriA04 114 0 FemHfriA05 114 0 FemHfriA06 86 0 FemLfriA07 98 195 MalHfriA08 105 213 MalHfriA09 95 =154 MalLfriA10 95 149 MalHfriA11 102 264 MalHfriA12 91 =475 FemLMalLfriA13 105 =179 MalLfriA14 102 207 MalHfriA15 99 =405 MalLfriA16 85 0 FemLfriA17 87 0 FemLfriA18 121 0 FemHfriA19 117 0 FemHfriA20 88 0 FemL
A total of 1009 Italian Holstein bulls were chosen from the entire population
balancing theirpresence in the tailsof theEBV forkgprotein (PROT) somatic
cell count (SCC) fertility (FERT) and functional longevity (LONG) for high
densitygenotyping(TableS1)
DNAwasisolatedfromwholebloodandsemenstrawssamplesinLGSlaboratory
(CremonaItalywwwlgscrit)usingtheGenomixkitaccordingtoLGSinternal
protocolDNAsforexomecaptureandsequencingwereextractedindependently
tosatisfy therequirement indicatedby IGA technologies (Udine Italy)at least
12mgofDNAataconcentration100ngulwithR260280higherthan18and
R260230near19
2Moleculardataproductionandqualitychecking
21Exomecaptureandsequencing
104
Exomes were captured using the ldquoSureSelected All exon Bovinerdquo (Agilent
Technologies) kit in early version Probes coordinates are based on UMD31
bovinereferencesequence(Ziminetal2009)anddesignedagainstNCBIexons
as well as predicted exons UTRs and miRNAs Number and distribution of
probesperchromosomearedetailedinTableS2
rdquoOvation Ultralow Library Systemsrdquo protocol was followed for library
preparation After quantification with Qubit (Life technologies) and quality
controlwith2100Bioanalyzer(AgilentTechnologies)DNAwassonicatedwith
Bioruptor (Diagenode) to obtain fragments between 150 to 200 bp in length
blunt end=repaired adenylated ligated with paired=end adaptors and purified
followingtheprotocolprovidedbythemanufacturer
LibrarieswerehybridizedwithldquoSureSelectedAllexonBovinerdquoprobesusingthe
Encore Target Capture Module (Nugen) SureSelect Target Enrichment and
Herculase II FusionEnzymewithdNTPCombo (AgilentTechnology) kit Then
capturedDNAwasamplifiedandtaggedwithlibraryspecificindexesQualityof
capturedsampleswastestedwithQubitand2100Bioanalyzer
Paired=end 2X100bp sequencing was performed on Illumina HiSeq2500
(IlluminaSanDiegoCA)atanexpectedcoverageof50X
22Sequencealignment
Adapters and reads shorter than 36bp were trimmed using Trimmomatic
(version032)(Bolgeretal2014)ForeachFastqfilethealignmentandpairing
of the reads to theUMD31 reference sequencewas performedwithBurrows=
WheelerAligner(BWA)(version077=r441)(LiandDurbin2009)applyingalso
trimmingoption (=q5) SAM files resultedwere converted intoBAM file using
SAMtools (version0118) (Lietal2009)Foreach individualBAM fileswere
merged in a single one with Picard (version 1111)
(httpsgithubcombroadinstitutepicard)Duplicatereadswereidentifiedand
markedwithPicardFinallyindividualBAMfileswereindexedwithSAMtools
105
Sequencedepthwasevaluatedsimultaneouslyonall20sequencedanimalsand
onlyonregionstargetedbyprobesusingDepthOfCoverageofGATK(McKennaet
al2010)
23ExomeVariantcallingandqualitycheck
Singlenucleotidepolymorphisms(SNPs)insertionsanddeletions(InDels)were
calledsimultaneouslyonall20sequencedanimalsandonlyonregionstargeted
byprobesusingUnifiedGenotyperofGATK(DePristoetal2011)
All variantswere filteredusingVariantFiltration ofGATK to removeoneswith
lowmappingandorsequencingqualityThechoiceoffiltersandthresholdswas
made in accordance to ldquoGATK best practicerdquo for exome sequencing and
distributionoffiltervalues
bull ldquoSNPclusterrdquoexcludedvariantifmorethan3variantswerefoundin10
bp
bull ldquoBaseQualityRankSumTestrdquoexcludedifitrsquoslowerthen=5ormorethan
5
bull ldquoMappingQualityRankSumTestrdquoexcludedif itrsquoslowerthen=7ormore
than6
bull ldquoRead Position Rank Sum Testrdquo excluded if itrsquos lower then =4 or more
than4
bull ldquoFisherStrandrdquoexcludedifitrsquosmorethan30
bull ldquoDepthrdquoexcluded if itrsquos lower then60X(ieaminimumvalueof3Xper
animals)ormorethan3500X
bull ldquoQualityrdquoexcludedifitrsquoslowerthen40
bull ldquoMappingqualityrdquoexcludedifitrsquoslowerthen30
Filters were applied to all variants but only bi=allelic SNPs were used in the
following analyses Genotypes with ldquoGenotype Depthrdquo lower than 5 and
ldquoGenotype Qualityrdquo lower than 20 were discarded The remaining ones were
converted intoldquoABrdquocode(A=referencealleleandB=alternativeallele)Then
data were transformed in PLINK (Purcell et al 2007) format using a home=
made python script To prepare the working dataset the final quality control
106
removedvariantswithmore than25missingdata (that iswithmore than5
animals out of 20 without genotypes) monomorphic (minor allele frequency
MAFlt1)ornotmappinginautosomesBasicpopulationgeneticsparameters
werecalculatedtoevaluatethepresenceofcrypticstructureinthedatathatmay
influencetheoutcomeofthedownstreamanalyses
In total 24390 SNPs identified in the exomes were also included in the HD
SNPchip panel HD genotype from nineteen out to twenty of the sequenced
animals sequenced was used to compare technologies Genotypes at common
locationswerecomparedacrosstechnologies toevaluatesequencequalityand
theeffectofsequencedepthongenotypecalling
24BovineHDGenotypingBeadChipdataandqualitycheck
AllanimalsweregenotypedusingtheBovineHDGenotypingBeadChip(Illumina
San Diego CA) Genotypes were quality controlled and filtered with standard
threshold Markers with more than 5 missing data and MAF lt 1 were
excluded Animals with more than 5missing data were also excluded Only
autosomal SNPs were retained Thereafter genotypes were phased and the
remainingmissingdata imputedusingBeaglev332(BrowningandBrowning
2007)
Ofthe20animalssequenced12hadalsoBovineHDGenotypingBeadChipand7
BovineSNP50 Genotyping BeadChip version 2 genotypes available For
concordance and haplotype analysis we imputed the 50K data to 800K as
previouslydescribedabove
3Strategyfordetectingcandidatedeleteriousvariants
Wesearchedforcandidatedeleteriousvariantsfollowinganintegratedapproach
that joined the analysis of i) exomes of a few Italian Holstein bulls having
plusvariant and minusvariant EBVs for a target trait ii) HD genotyping of a
population of 1009 progeny tested Italian Holstein bulls iii) the comparative
genomic analysis of 100 vertebrate species The strategy followed for the
107
analysisisshowninFigure1Inthefirststep(Search)welookedforcandidate
deleterious variants in the exomes In step2 (Filter)we filtered candidatesby
assessingwhichof thesevariantseitherhadanon=randomdistributionamong
animals having extreme EBVs or co=mapped with genomic regions carrying
outliersignals inthelargepopulationofprogenytestedbulls Instep3(Asses)
candidateswere further evaluated by i) assessing their conservation across a
panel of 100 vertebrate species ii) searching for associated haplotypes and
evaluating their frequencies in the population of progeny tested bulls iii)
evaluatingtheirassociationwithgeneticdefectsinhumanandmodelspeciesiv)
examining their expression profile and function when known Methods are
belowdescribedindetail
Figure1Strategyfollowedtoidentifydeleteriousvariants
108
31Step1Searchforcandidatedeleteriousvariants
Quest for candidate deleterious variants in exome sequence data by
variantannotation
WeannotatedallvariantsusingANNOVAR(Wangetal2010)andVEP(Variant
Effect Predictor) (McClure et al 2014) software They use the Ensembl gene
annotation database (UMD31 reference sequence release 74) to investigate
locationandpotentialeffectoncodingsequencesofallvariantsVEPintegrates
the SIFT algorithm (Kumar et al 2009) to predict the effect of amino acid
substitutiononproteinfunctionForeachsequencechangesequencehomology
and physic=chemical similarity between the reference amino acid and the
alternative is considered providing a score that evaluates the mutation
tolerabilityVariantswithscorelowerthan005wereclassifiedasldquodeleteriousrdquo
inaccordingtosoftwarerecommendations
SIFTalsocomparedallvariants found in theexomeswith thosepresent in the
dbSNPdatabase(version140)toidentifythosethatwerenovel
32Step2Filteringofcandidatedeleteriousvariants
2a Analysis of deleterious variant distribution among animals having
extremesEBVvaluesformaleandfemalefertilitytraits
Association between variants and the classification in the tails of the EBV
distributionwas estimated byFisherexact testwhich compares frequencies of
alleles (a vs A) in the two tails not assumingH=W equilibriumgenotypic test
that compares the frequency of genotypes (aa vs Aa vs AA) and the recessive
geneactiontestwhichtestsforthedistributionofonehomozygousclassversus
theothers(aavsaAAA)AlltestsareimplementedinPLINKfunctionsThe5
significance threshold was set empirically by permutation (N=100000) either
point=wise that is not corrected for multiple testing (EMP1 = Empirical
Probability1)andthereafterfamily=wisecorrectedformultipletesting(EMP2=
EmpiricalProbability2)
109
GenesexceedingEMP1were considered suggestive and thoseexceedingEMP2
significant candidates to be deleterious recessives and their characteristics
conservationacrossspeciesandfunctionfurtherinvestigated
2b Quest for regions candidate to carry deleterious variants in the
populationanalysedwiththeHDBeadChip
Basic population genetics parameters and Multi Dimensional Scaling were
calculated with GenABEL R package (Aulchenko et al 2007) to evaluate the
presenceofcrypticstructure inthedatathatmayinfluencetheoutcomeofthe
downstreamanalyses
ExpectedandobservedallelicandgenotypicfrequenciesandFisherexacttestof
Hardy=Weinberg proportions (Janis E Wigginton 2005) were calculated with
PLINKon theHDSNPdatasetValueswere averaged in slidingwindowsof10
markersThisvalue(10markers)waschosenaftertestingdifferentwindowsize
as represent a good compromise between noise reduction and resolution (not
shown)
Candidate regions containing deleterious mutations were identified following
twodifferentapproaches
In one approach hereafter refereed to as ldquoOut of Hardy=Weinbergrdquo approach
(OHW)markerssignificantlydeviatingfromofHardy=WeinbergequilibriumatP
le84710=8 (nominalvaluePle005Bonferronicorrected formultiple testing)
werefirstidentifiedWindowswithatleastthreesignificantmarkersbelowthis
thresholdandhavingahomozygotedeficitwereconsideredcandidatestocarry
deleterious mutations In this OHW approach we applied very stringent
constraints to reduce false positive discoveries due to the Wahlund effect
inducedbypopulationsubstructure(seeresultsandFigureS5)
In thesecondapproachhereafterreferred toas ldquoLackOfHomozygotesrdquo (LOH)
approach the differences between observed and expected frequencies of
homozygotesfortheminorallelewerecomputedandaveragedacrossmarkers
included in each sliding window Values were standardized and only regions
carrying three overlapping windows below =3Z score value were considered
110
candidates to carry deleterious mutations Thereafter significant windows
carryingatleastonemarkerincommonwerejoinedtodefineregionboundaries
ExomesequencingknowledgeandHDBeadChipdatawerecrossed inorder to
identifygenescarryingseverevariantslocatedwithinsignificantregionsortheir
proximity (50Kb upstream or downstream) These genes were considered
candidates to be deleterious recessives and their characteristics conservation
acrossspeciesandfunctionwasfurtherinvestigated
33Step3Assessmentofcandidatedeleteriousvariants
Genes carrying variations classified as deleterious by SIFT or ANNOVAR and
eitherassuggestivebyfertilityEBVtailsanalysisorassignificantbyOHWeLOH
analyses of the Holstein bull population were submitted to comparative
genomics population genetics and functional analyses In order to prioritize
genesvariants discoveredwe scored each evaluation using a subjective scale
ranging from =3 to 3 to measuring the evidence in favour (scores 1 to 3) or
against(scores=1to=3)thedeleteriouseffectoftheSNPsThescore0inallcases
indicated the absence of evidence or that a specific assessment failed The
completescorescalethereforerangedfrom=9to9
3aComparativegenomics
For comparative genomics purposes the position of eachmutationwas ldquolifted
overrdquo from UMD31 bovine coordinates to hg19 human coordinates using the
specifictoolatUCSC(Kentetal2002=httpgenomeucscedu)Orthologous
proteins were aligned using the UCSC genome browser and amino acid
conservation assessed in a set of 100 vertebrate species that included human
primates (11 species) euarchontoglires (egmouse14 species) laurasiatheria
(egsheep25species)afrotheira(egelephant6species)othermammals(eg
platypus5species)birds(egchicken14species)sarcopterygii(egturtle8
species) fish (eg zebrafish 16 species) Results of amino acid conservation
analysiswererecordedonlyifhomologousproteinfromatleast50speciescould
beretrievedandalignedThereafteraminoacidconservationwasscored3ifthe
111
aminoacidwasconservedinallvertebrates2ifconservedinallmammalsand1
ifconservedinthelargemajority(atleast90)ofmammalsAscoreof=1was
given when the amino acid was not conserved among species and a score =3
when both variants observed in cattle were carried by other species Finally
wheneithertheliftoverprocessortheproteinalignmentfailedthecomparison
wasconsiderednoninformativeandgivenascore0
3bSearchofhaplotypesassociatedtodeleteriousvariantsandanalysisof
HDpopulationdata
To acquire further evidence we searched for the haplotypes that carry the
deleterious variations and estimated their frequency in the
homozygousheterozygousstatuswithinthebullpopulationgenotypedwiththe
HDBeadChip
Weusedamultistepapproachtofindhaplotypescarryingdeleteriousvariantsi)
from the phased HD BeadChip data we extracted the haplotypes of the 20
sequencedanimalsintheregionofthevariant(100markersoneachside)ii)in
these 20 animals we searched the shorter haplotype able to distinguish
homozygous andor heterozygous animals for the deleterious variant from all
other haplotypes iii) we assessed the frequency of homozygotes and
heterozygotes for thehaplotype in thewholepopulationofbulls characterised
withtheHDBeadChip
Scoresinthiscasewere=3inthecaseofuniquehaplotypesshowinganumberof
homozygotesnonsignificantlydifferentfromthenumberexpectedfromHardy=
Weinbergproportions3inthecaseofuniquehaplotypesshowinganumberof
homozygotes significantly lower than expected 1 in the case of unique
haplotypesneverhomozygousandrare0whenevernouniquehaplotypecould
befound
3cFunctionalanalyses
With PantherDB (Mi et al 2010) resources we classified our gene list into
functionalcategoriesaccordingtotheGeneOntologyclassificationAllsuggestive
112
andcandidateEnsemblGeneIDsubmittedtothebulkanalysiswererecognizedand annotated Beyond the GO term characterization in order to understandpotentialdetrimentaleffectsofmutationsandalsopinpointpotentialcandidatesfor fertility traits we retrieved information for each gene according to thefollowing parameters i) the involvement in known Mendelian or complexdiseases fromOMIMdatabase ii) theavailabilityofknock=outmice iii) and theeventual expression in reproductive tissues from the Gene Expression Atlas(httpwwwebiacukgxa)(Table4)
Alsoforthesedataweusedthesamescaleattributingarangefrom=3pointstogenes having no relation with fertility viability and genetic defects and norelevant phenotypic effects in knock=out mouse models to 3 to gene closelyrelated to fertility and viability associated to genetic defects in human andaffectingfertilityorsurvivalwhenknocked=outInparticularascoreranging=1to+1wasattributedtogenesassociatedtohumansyndromes(=1=noassociationrecordedinOMIMdatabase0=noinformationinOMIM1=associationinOMIM)A score with same range to information collected from knock=out mousephenotypes(=1=knock=outmouseshowsnoimpairedphenotype0=noknockoutmouse has been produced 1= knock=out mouse shows impaired phenotype)Thesamescorewasattributedtomolecularfunctions(=1=functionunrelatedtofertilityandviability0=functionindirectlyrelatedasimmunityandmetabolism1=functiondirectlyrelatedasfertilityanddevelopment)
Results
Exomecaptureandsequencing
A totalof225414probeswereused for capturing thecattleexome (5355Mbabout 2 of UMD31 reference genomes) Mitochondrial (MT) and Ychromosome were not included in probe design The number of probes perchromosomespannedfrom2500onBTA27toalmost12000inBTA3(TableS2)
113
The average exome sequencing depth was 4058 slightly lower than the
expected50XAllanimalshadatleast24Xaveragegenomessequencedbutina
same animal the sequencing depth varied across chromosomes and regions
withinchromosomesThelowestchromosomeaveragevalueobservedwas10X
inBTA25oftheshallowersequencedanimalAveragecoverageperanimaland
chromosomeandoverallmeansareshowninFiguresS1
In total1613461402 raw100bppair=ends sequence readswereproducedA
total of 1425514112 100bp reads were mapped and properly paired to the
reference genome Paired reads per animal ranged between 41973510 and
138840392
Insynthesissequencingdepthwasvariableasexpectedandalmostallanimals
hadsomeregionsofscarceornosequencedepthbutoverallallexomeregions
were sufficiently covered to permit a comprehensive analysis of molecular
variantsoftheanimalsinvestigated
Step1Searchforcandidatedeleteriousvariants
Variantdetectionandannotation
Thesequencingdepthofallvariantscalledonthealignmentofallreadsacross
animalsfollowedaChi=squaredistributionAveragedepthwas79694spanning
from2Xto5000Xwithpeakfrequencyclassat150=250X(FigureS2)
A multistep approach was followed to clean the exome sequence dataset
Cleaning process steps and the number onmarkers retained in each step are
describedinTable2
114
Table2Typeandnumberofmarkersdetectedandretainedafterdatasetcleaning
TypeofvariationNumberofvariations(N)
Rawdata
Aftervariantcleaning
Aftergenotypecleaning(onlyautosomal)
Complex 1213 963 0Deletion 6938 5031 0Insertion 5510 3656 0MultiallelicComplex 5710 3995 0MultiallelicSNP 803 512 0SNP 242988 203079 164277Total 263162 217236 164277whencomparedtothereferencegenome
Atotalof263162variantswereidentifiedAmongthese217236(8255)wereretainedafterapplyingthevariantfilters(seematerialsandmethodssection)
According to GATK classification the vast majority (9348) of variationsidentified were biallelic SNPs followed by insertions and deletions (InDels40) more complex variations and multiallelic SNPs (252) Indels andcomplexvariationsalthoughlessfrequentthanSNPsaffectedarelevantportionoftheexome(49589bp)Thenumberofvariantsdetectedperchromosomewassignificantly correlated to the number of probes used (r=08761P=228x10=10)andofbasescaptured(r=08424P=53x10=19)perchromosomehoweverBTA23BTA27 BTAX had an outlier behaviour the former two containing highervariation(450and445variantsperKb)andthelatterlower(084variantsperKb)thantheaverage(322variantsperKb)
Bialleic SNPswere further filtered for genotype quality (sequence depth gt 5Xandsequencequalitygt20)andBTAXmarkersThefinalautosomalSNPdatasetcounted164277SNPsAmongthese17829biallelicSNPs(109)describedforthefirsttimehavebeensubmittedtodbSNP
Variantsrsquo annotation is shown in Table 3a Variantswere detected in differentgenedomains(exonsintrons3rsquountranslatedregions)aswellasupstreamanddownstreamgenesMorethan31ofthevariationsweredetectedinexonsandamong them more than 41 have the potential to have an effect on genefunctionbyaffectingsplicingoralterproteinstructure(Table3b)
115
Table 3 A) number of autosomal SNPs annotated in each gene region B) number and
classificationofautosomalSNPsalteringgenefunction
A)
Generegion NofSNPs(autosomes)Downstream 1170Exonic 51293Intergenic 49901Intronic 58116ncRNA_exonic 160ncRNA_intronic 3ncRNA_splicing 2Splicing 184Upstream 1445Upstreamdownstream 54UTR3 1345UTR5 604Total 164277
B)
EffectofexonicSNPs NofSNPs(autosomes)Nonsynonymoussinglenucleotidevariation 20839Stopgainsinglenucleotidevariation 213Stoplosssinglenucleotidevariation 10Synonymoussinglenucleotidevariation 30226Unknown 5Total 51293
whencomparedtothereferencegenome
Amongthe51293exonicSNPs4166in2978geneswereclassifiedldquodeleteriousrdquo
by SIFT Among these 3210 in 2356 genes were observed at least twice
ANNOVARidentified223stopgain(213)orloss(10)variantsin217genes
Theaverageconcordanceat24390SNPs in commonbetweenexomesand the
HD Beadchip was 9716 (Table S3) Discordance was due to either alleles
detected by sequencing andnot detected by genotyping (exome=heterozygote
Beadchip=homozygote) and vice=versa (exome=homozygote
Beadchip=heterozygote) Discordances were randomly distributed among
markers and inversely correlated to sequencing depth (Figure S3) indicating
that most errors are likely occurring during sequencing even if occasional
116
Beadchip failure indetectingsomeallelecannotbeexcludedTwoanimalshad
anoutlierbehaviouronehaving846andthesecond719concordanceBoth
animalswereretainedforvariantdiscoverywhiletheywereexcludedfromthe
analysesofEBVtailsWithoutthesetwooutlieranimalsthemeanconcordance
increasesto9938
Multidimensional scaling confirmed the absence of a substructure in animals
sequenced(FigureS4)
Step2Filteringofcandidatedeleteriousvariants
AnalysisofEBVtails
Fisherexact test isgenerallyused to test for theassociationbetweenavariant
and a Mendelian disease Samples are divided in cases and controls and the
association between alleles or genotypes and the disease is tested under
different inheritance models Here we used the same method to test for
association between allelesgenotypes and animals from opposite tails of the
EBVdistributionfori)malefertility(N=4high+4low)ii)femalefertility(N=5
high+4low)andiii)maleandfemalefertility(N=9high+8low)
Genotypes were tested under all inheritance models implemented in PLINK
Giventhelownumberofsamplesnovariantwassignificantaftercorrectionfor
multiple comparisons (EMP2) and only suggestive variants could be found
(significant at 5 at EMP1)We considered suggestive of carrying deleterious
allelesalistof101genescarrying112variants(TableS4)
Detectionofregionscarryingdeleteriousmutationfrom800KSNPdata
After applying quality control parameters 26653 and 146991 markers were
removed from the original dataset for excessive missing data and MAF
respectivelyInaddition17individualsremovedforlowgenotypingrateAswith
exomes only autosomal SNPswere retained for the analyses thus obtaining a
workingdatasetof590680SNPsand992animals
117
MDS(FigureS5)analysis indicates that thedatasethassomesubstructure that
mightaffectdownstreamanalysesWeaccountedfortheriskoffindingmarkers
outofHardy=WeinbergproportionsduetotheWahlundeffectbychoosingvery
stringentsignificancethresholdintheOHWanalyses
To detect regions candidate to carry deleterious recessivemutations we used
two approaches the ldquoOutside Hardy Weinbergrdquo (OHW) and the ldquoLack Of
Homozygotesrdquo (LOH) approaches (see materials and methods) The two
approaches complement each other in the detection of regions carrying less
homozygotes than expected OHW identifies regions in which a few markers
have extreme homozygote deficit while LOH detects regions in which many
markershaveamoderatedeficit
In the SNP dataset 1884 markers were significantly out of Hardy=Weinberg
equilibrium(Ple84710=8)Atotalof485windowswereidentifiedwithatleast
3 markers out of 10 exceeding this threshold Among these 84 contained
markerssignificantbecauseofhomozygotedeficitThese84windowsidentified
11OHWcandidateregionsin9chromosomes(TableS5)
TheLOHapproachidentified2335windowsbelow=3Zscoreofthestandardised
differencebetweenobservedandexpected frequencyofhomozygotes forMAF
Theseidentified326LOHcandidateregionsspreadonallautosomes(TableS6)
Onlyregionsformedbyatleast3singlewindowswereretainedreducingLOH
candidateregionsto240
Deleterious variants (SIFT deleterious and stop codon gainedlost) were
intersectedwithsignificantOHWandLOHregionSevenofthesemappedwithin
OHW sliding windows 36 within LOH sliding windows Among these 5 were
located in regions significant in both OHW and LOH Other 51 deleterious
variants mapped in the immediate vicinity (+=50Kb) of significant windows
These 83 variants affect 66 unique genes candidate to carry deleterious
recessivealleles(TableS7)
Five regionswere common toOHWandLOHoneonBTA4 (LOH=74959610=
75151417 OHW=74970653=75151417) two on BTA7 (LOH=9626435=
10099512 OHW=9628735=10099512 LOH=10575691=11171132
118
OHW=10486424=11087441) one on BTA15 (LOH=80299924=80928311
OHW=80299924=80921274) and one on BTA18 (LOH=28122373=
28185525OHW=28106022=28164693)
Step3Assessingcandidatedeleteriousvariants
After the two filtering step previously described a total of 191 SNPs were
retained out of the 3433 classified as deleterious by SIFT and Annovar and
analysed further These are carried by 163 genes Eighteen genes carried 2
candidateSNPsthreegenes3SNPsandonegene5SNPsInaddition4SNPsin
four geneswere identified as potential candidates in both the analysis of EBV
tailsandofbullpopulation
A totalof12SNPs introducedstopcodonswhile the remaining inducedamino
acid substitutionsor splicing variants In the filtereddataset theproportionof
stop codons on total variation (63) was only slightly higher than the
proportion found in the entire dataset (51 225 stop codon out of 4391
deleteriousSNPs)
Comparativegenomics
Proteinmultiplealignmentwassuccessfulin146outof191comparisonsIn26
casesthemostfrequentaminoacidincattlewasveryhighlyconservedeitherin
allvertebrates(N=9)orinallmammals(N=17)Inother23casestheaminoacid
was conserved in at least 90 of mammalian species In the last class were
generally included amino acids that changed in mammalian species
phylogenetically verydistant from cattle (eg platypus and related species) In
97 instances the amino acid could change quite freely and occasionally both
variantsobservedincattlewerecarriedbyotherspecies
HaplotypeanalysisofHDpopulationdata
The search for haplotypes associated to the191 variants for further testing in
the population identified 101 unique and invariant haplotypes common to all
119
carriersofatargetvariantanddifferentfromallthosefoundinnoncarriersIn
additional36casesanoncompletelyinvarianthaplotypecouldbeassociatedto
themutationConverselyin54instancesareliablehaplotypecouldnotbefound
(table S8) In total 38 haplotypes had no homozygous individuals in the
population Thiswas expected in the case of 33 rather rare haplotypes (fewer
than 100 heterozygotes observed in the population) Conversely 5 haplotypes
were rather frequent (from 170 to almost 300 heterozygotes observed in the
population)andtheexpectationwastofind78to222homozygousindividuals
in thepopulation insteadof thenone foundAnothervarianthadaremarkable
high frequency (35) and a clear outlier behaviour Only 11 homozygous
individualswereobservedoutofthe121expected
Functionalanalysisofcandidategenes
To assess the potential deleterious effects of the variants identified we also
retrieved functional information for each gene according to the following
parametersi)associationtoknownMendelianorcomplexdiseaseslistedinthe
OMIM database ii) availability and effect of gene knock=out in mice iii)
expression profileswith particular attention in reproductive tissues from the
GeneExpressionAtlas(Table4)
Twenty genes out of 163 have a recognized phenotype in humans These are
mostlyseveresyndromesUnfortunatelyonlyafewgeneshaveanull=miceline
producedwithphenotypicdataavailable
According to the expression data available from Ensembl these genes are
generally (with someexceptions)expressed inavarietyof tissueswitha clear
tendencytobehighlyexpressedintestis
120
Tableamp4Informationon163genesanalyzedconcerningeventualphenotypesinmanorknockoutsinmouseHeaderampcodesAbrainBcolonCheartDkidney
EliverFlungGskeletalmuscleHspleenItestisColoursrepresentthegenelevelofexpressionindifferenttissuefromlow(whitegrey)tohigh(blue)NCin
FunctioncolumnmeansldquoNotClassifiedrdquo
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000000079
CCSAP
Cells
9
1
03
1
07
2
06
2
2
NC
ENSBTAG00000000149
USP40
Cells
6
7
2
7
5
7
2
8
9
fertility
ENSBTAG00000000414
FUT5
Nomatch
37
1
cellular
process
ENSBTAG00000000918
ERMARD
Periventricular
nodularheterotopia
6Xlinked
periventricular
heterotopia6q
terminaldeletion
syndrome
Periventricular
nodularheterotopia
Nomatch
5
4
4
9
6
7
4
9
17
cellular
process
ENSBTAG00000000998
SLC39A6
Cells
6
9
2
7
4
10
1
10
9
cellular
process
ENSBTAG00000001133
VWA8
Mice
2
3
6
7
10
3
3
4
4
blood
metabolism
ENSBTAG00000001140
IAH1
httpswwwmousephenotypeor
gdatagenesMGI1914982sect
ionassociations
BOTHSEXadiposetissue
behaviourneurological
growthsizebodyFEMALE
homeostasismetabolismMALE
limbsdigitstailskeleton
9
18
30
60
23
19
12
17
30
cellular
process
ENSBTAG00000001262
IRGQ
Cells
6
1
5
3
06
2
5
04
2
immunity
ENSBTAG00000001286
ELMO3
httpswwwmousephenotypeor
gdatagenesMGI2679007sect
ionassociations
Decreasedstartlereflex
02
27
14
7
7
02
1
fertility
ENSBTAG00000001291
OR8H3
Nomatch
fertility
ENSBTAG00000001500
FIGNL1
Cells
08
2
01
05
07
08
04
2
3
cellular
process
ENSBTAG00000001505
GRK4
Cells
10
7
09
4
6
3
2
6
218
signalling
ENSBTAG00000002220
SPTLC1
Hereditarysensory
andautonomic
neuropathytype1
No
17
27
3
28
12
28
5
19
6
metabolism
ENSBTAG00000002288
NT5DC4
Nomatch
02
cellular
process
ENSBTAG00000002289
NLRP14
No
04
cellular
process
ENSBTAG00000002660
CCDC11
Situsinversustotalis
Nomatch
3
1
06
2
07
3
03
2
54
fertility
121
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000002809
CAPN10
AutosomalRecessive
MentalRetardation
DIABETESMELLITUS
NONINSULIN
DEPEND
ENT1
Cells
2
13
41
33
29
immunity
ENSBTAG00000002966
DNAJC13
Youngadultonset
Parkinsonism
No
5
52
94
62
84
fertility
ENSBTAG00000003231
LIM2
Pulverulentcataract
Cells
01
02
01
02
cellular
process
ENSBTAG00000003508
CFAP69
httpswwwmousephenotypeor
gdatagenesMGI2443778sect
ionassociations
Abnormalcirculatinginsulinlevel
MaleInfertilityDecreasedmean
plateletvolum
eIncreasedleanbody
mass
203
07
1
01
7NC
ENSBTAG00000003523
UGT2B15
Nomatch
34
metabolism
ENSBTAG00000004081
FAT3
No
2
03
01
01
05
signalling
ENSBTAG00000004555
LRP2
DONN
AIBARROW
SYND
ROME
Intellectualdisability
Cells
56
3
01
disease
ENSBTAG00000004772
THEM
4
Cells
8
16
295
27
52
63
metabolism
ENSBTAG00000004860
SLC27A6
Cells
6
02
10
602
207
01
immunity
ENSBTAG00000005021
SEMA5A
Monosom
y5p
Cells
9
04
89
13
08
08
6immunity
ENSBTAG00000005170
GPR114
Mice
101
02
01
09
6
01
fertility
ENSBTAG00000005248
OFCC1
No
01
01
03
01
01
01
03
NC
ENSBTAG00000005340
SERGEF
Cells
7
10
35
15
58
23
cellular
process
ENSBTAG00000005357
VPS11
Mice
14
13
619
717
815
15
cellular
process
ENSBTAG00000005466
C8H8orf58
Nomatch
05
01
08
02
103
02
04
NC
ENSBTAG00000005514
TRIO
Cells
17
66
10
210
88
7immunity
ENSBTAG00000005633
RGNEF
httpswwwmousephenotypeor
gdatagenesMGI1346016sect
ionassociations
Vertebralfusion
36
214
05
801
08
1cellular
process
ENSBTAG00000006021
CEP250
Mice
3
22
43
41
88
metabolism
ENSBTAG00000006027
USP34
Mice
11
63
74
88
912
fertility
ENSBTAG00000006532
CDK5RAP2
Autosomalrecessive
primary
microcephaly
httpswwwmousephenotypeor
gdatagenesMGI2384875sect
ionassociations
skeletonimmunesystem
growthsizebodyhem
atopoietic
system
5
34
92
74
784
developm
ent
ENSBTAG00000007001
SLC26A1
No
03
01
02
32
903
01
07
metabolism
ENSBTAG00000007340
LOC101906349
Nomatch
NC
ENSBTAG00000007568
MEPE
Cells
cellular
process
ENSBTAG00000007910
EXOC7
Cells
13
24
23
22
12
22
15
22
28
cellular
process
122
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000008650
CAMK1D
Cells
24
2
01
06
1
2
03
7
3
signalling
ENSBTAG00000008797
Uncharacterized
Nomatch
09
04
07
04
NC
ENSBTAG00000008871
DDX4
Cells
02
01
01
132
cellular
process
ENSBTAG00000008996
SFI1
Mice
2
1
06
3
04
3
2
5
NC
ENSBTAG00000009014
UPK1B
Mice
2
2
2
5
4
4
2
7
3
fertility
ENSBTAG00000009030
NOX5
Nomatch
04
01
8
01
immunity
ENSBTAG00000009033
TEX37
httpswwwmousephenotypeor
gdatagenesMGI1921471sect
ionassociations
Abnormalstartlereflex
03
83
NC
ENSBTAG00000009064
CSF2RB
Congenitalpulmonary
alveolarproteinosis
Mice
01
2
04
06
2
23
03
17
05
signalling
ENSBTAG00000009144
BPIFA2A
Nomatch
6
06
05
metabolism
ENSBTAG00000009337
PNMAL1
Cells
8
05
01
3
02
03
03
04
3
Immunity
ENSBTAG00000009444
SLC15A5
Cells
01
cellular
process
ENSBTAG00000009568
KANK2
Cells
1
17
30
9
3
31
9
10
3
cellular
process
ENSBTAG00000009569
DOCK6
AdamsOliver
syndrome
Cells
2
5
10
12
5
15
5
6
21
metabolism
ENSBTAG00000009836
CHGA
Mice
140
555
01
2
02
01
07
1
metabolism
ENSBTAG00000010522
Uncharacterized
Nomatch
01
02
02
03
2
metabolism
ENSBTAG00000010601
DOLK
Familialisolated
dilated
cardiomyopathy
No
4
8
6
22
9
6
5
5
12
metabolism
ENSBTAG00000010847
FOXN4
Cells
01
01
metabolism
ENSBTAG00000010954
ART3
Cells
02
1
55
01
08
2
10
06
71
metabolism
ENSBTAG00000011011
SSH2
No
3
09
5
4
2
2
6
7
10
cellular
process
ENSBTAG00000011387
NAT9
Cells
6
7
3
8
6
8
4
6
6
metabolism
ENSBTAG00000011420
CA9
Cells
04
3
02
2
03
1
01
2
56
metabolism
ENSBTAG00000011433
RGP1
httpswwwmousephenotypeor
gdatagenesMGI1915956sect
ionassociations
Completepreweaninglethality
7
19
11
18
8
12
9
8
25
cellular
process
ENSBTAG00000011622
C18orf21
Mice
9
10
6
6
2
10
5
10
34
NC
ENSBTAG00000011683
NUMB
Cells
21
25
8
35
17
47
6
8
19
immunity
ENSBTAG00000011684
RAB11FIP1
Cells
6
01
8
09
11
02
1
4
NC
ENSBTAG00000012053
ACCSL
No
03
01
cellular
process
ENSBTAG00000012314
LDLR
Homozygousfamilial
hypercholesterolemia
Cells
8
33
11
21
17
24
2
15
2
cellular
process
123
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000012326
LOC101906218
Nomatch
11
01
01
09
01
NC
ENSBTAG00000012458
PSAPL1
Cells
01
NC
ENSBTAG00000012565
PCNX
No
3
32
32
33
48
developm
ent
ENSBTAG00000012833
ATG2B
No
4
21
21
21
29
NC
ENSBTAG00000012921
KIF24
Mice
05
07
03
08
02
202
214
metabolism
ENSBTAG00000013568
UEVLD
Cells
5
41
53
42
214
metabolism
ENSBTAG00000013685
KRT31
Mice
cellular
process
ENSBTAG00000013753
DDRGK1
Mice
22
32
18
56
55
25
14
22
37
NC
ENSBTAG00000013789
OR6K6
No
NC
ENSBTAG00000013838
ALLC
Mice
02
01
78
metabolism
ENSBTAG00000013916
HERC2
Mentalretardation
autosomalrecessive
38SKINHAIREYE
PIGM
ENTATION
VARIATIONIN1
Developm
entaldelay
withautismspectrum
disorderandgait
instability
No
8
74
83
74
65
metabolism
ENSBTAG00000014001
MAP1A
Cells
136
23
22
202
43
352
cellular
process
ENSBTAG00000014058
LDB2
No
28
33
12
825
410
3cellular
process
ENSBTAG00000014284
ALPK2
No
01
10
01
304
02
cellular
process
ENSBTAG00000014750
EPB41L4A
httpswwwmousephenotypeor
gdatagenesMGI103007secti
onassociations
Decreasedcirculatingglucoselevel
Abnormalconeelectrophysiology
Immunesystem
s1
22
20
07
12
13
43
NC
ENSBTAG00000014752
AKAP11
httpswwwmousephenotypeor
gdatagenesMGI2684060sect
ionassociations
Nopheno
24
72
63
306
68
cellular
process
ENSBTAG00000014768
ZNF786
Cells
3
23
42
48
33
cellular
process
ENSBTAG00000014794
SCYL3
Cells
6
10
59
613
516
26
cellular
process
ENSBTAG00000015027
Clusterin
httpswwwmousephenotypeor
gdatagenesMGI88423sectio
nassociations
Aggressionvertebralfusion
05
03
03
02
05
3NC
124
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000015210
LOXHD1
DEAFNESS
AUTOSOMAL
RECESSIVE77
Autosomalrecessive
nonsyndromic
sensorineuDominant
lateonsetFuchs
cornealdystrophy
deafnessautosomal
recessivetype77
No
3
03
NC
ENSBTAG00000015294
COX10
Leighsyndrome
Mitochondrial
complexIVdeficiency
(MTC4D)
Cells
4
6
24
5
2
3
14
3
16
cellular
process
ENSBTAG00000015335
BAI3
No
38
03
8
01
01
2
signalling
ENSBTAG00000015490
HS1BP3
Cells
1
10
5
9
4
8
2
1
37
cellular
process
ENSBTAG00000015602
C7H5orf45
Nomatch
10
cellular
process
ENSBTAG00000015839
MAP4
No
27
85
104
35
11
100
82
46
61
cellular
process
ENSBTAG00000015912
DMKN
No
01
04
03
NC
ENSBTAG00000015980
FASN
AutosomalRecessive
MentalRetardation
Cells
19
29
7
12
6
39
6
7
22
metabolism
ENSBTAG00000015985
GPR20
httpswwwmousephenotypeor
gdatagenesMGI2441803sect
ionassociations
Anatomicaldefectsvarioustissues
03
02
08
01
09
signalling
ENSBTAG00000016495
LINS
AutosomalRecessive
MentalRetardation
Cells
6
3
1
5
3
3
1
5
12
NC
ENSBTAG00000016691
LACC1
Mice
2
9
1
9
9
8
08
7
27
NC
ENSBTAG00000017096
ERICH1
Cells
6
4
2
4
2
5
2
5
13
metabolism
ENSBTAG00000017165
MATN2
No
2
7
2
20
08
3
2
2
7
cellular
process
ENSBTAG00000017418
RRP1B
Cells
6
4
4
5
3
7
3
12
159
cellular
process
ENSBTAG00000017436
ALS2CR11
Cells
01
37
disease
ENSBTAG00000017505
PAXIP1
Cells
2
3
2
4
2
4
2
3
12
signalling
ENSBTAG00000017576
GPR144
Nomatch
02
signalling
ENSBTAG00000018528
USP45
No
1
1
04
1
05
06
02
2
04
cellular
process
ENSBTAG00000018810
THBS2
INTERVERTEBRAL
DISCDISEASE
Cells
1
2
04
09
1
09
04
1
2
fertility
ENSBTAG00000019072
PSD4
Cells
7
8
1
1
7
02
6
4
cellular
process
125
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000019265
AGK
Sengerssyndrom
e
Congenitalcataract
hypertrophic
cardiomyopathy
mitochondrial
myopathy
Cells
7
36
43
52
65
metabolism
ENSBTAG00000019752
BPIFA2B
Nomatch
01
1metabolism
ENSBTAG00000019799
FCAM
R
Cells
07
01
02
01
2
7immunity
ENSBTAG00000019961
ATP4B
Cells
12
03
1
05
cellular
process
ENSBTAG00000020414
DHX37
No
3
33
73
73
413
cellular
process
ENSBTAG00000021073
KIAA1549
PILOCYTIC
ASTROCYTOM
ASOMATIC
httpswwwmousephenotypeor
gdatagenesMGI2669829sect
ionassociations
behaviourneurological
homeostasism
etabolism
hematopoieticsystem
3
04
23
01
21
04
3NC
ENSBTAG00000021233
OR5B17
No
fertility
ENSBTAG00000021375
OR10Q1
No
fertility
ENSBTAG00000021643
EFCAB12
Cells
01
1
9cellular
process
ENSBTAG00000021645
MBD4
Cells
8
43
82
51
10
6cellular
process
ENSBTAG00000022509
DNAH
9
No
02
5
05
5metabolism
ENSBTAG00000022858
LOC783561
Nomatch
fertility
ENSBTAG00000026247
PKHD1L1
Cells
02
01
cellular
process
ENSBTAG00000026323
LYSB
Nomatch
68
metabolism
ENSBTAG00000026350
SPATC1
Cells
05
06
01
01
01
39
NC
ENSBTAG00000027390
RTFDC1
Mice
28
19
21
20
922
13
20
49
NC
ENSBTAG00000031115
Uncharacterized
Nomatch
01
cellular
process
ENSBTAG00000031718
OGFR
Mice
6
17
612
517
39
1fertility
ENSBTAG00000032264
Uncharacterized
Nomatch
52
4
22
NC
ENSBTAG00000032481
DAPL1
No
1
6
07
01
13
10
cellular
process
126
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000032527
ERCC6
CEREBROOCULOFACI
OSKELETAL
SYND
ROME1
COCKAYNE
SYND
ROMEBDe
SanctisCacchione
syndromeMACULAR
DEGENERATION
AGERELATED5UV
SENSITIVE
SYND
ROME1COFS
syndromeCockayne
syndrometype1
Cockaynesyndrome
type2Cockayne
syndrometype3UV
sensitivesyndrome
Cockaynesyndrome
typeB(CSB)De
SanctisCacchione
syndrome(DSC)UV
sensitivesyndrome
(UVS)cerebrooculo
facioskeletal
syndrometype1
(COFS1)
Mice
3
205
22
207
42
cellular
process
ENSBTAG00000032821
SCEL
No
01
02
17
04
NC
ENSBTAG00000033954
PRPS1L1
No
35
metabolism
ENSBTAG00000034991
SLX4IP
Cells
6
21
32
207
33
NC
ENSBTAG00000035226
TOR1AIP1
Cells
9
12
513
816
716
83
NC
ENSBTAG00000035309
OR4F15
No
fertility
ENSBTAG00000035708
LOC527795
Nomatch
01
01
23
metabolism
ENSBTAG00000035985
LOC101909743
OR8K1
No
fertility
ENSBTAG00000036019
RIPPLY3
httpswwwmousephenotypeor
gdatagenesMGI2181192sect
ionassociations
hemopoieticsystem
01
04
01
9
01
malformation
ENSBTAG00000037399
Uncharacterized
Nomatch
2
92
4
24
05
5signalling
ENSBTAG00000038606
DBDR
No
02
01
signalling
ENSBTAG00000038831
PHYHD1
No
2
18
21
23
30
18
16
15
1metabolism
ENSBTAG00000039110
OR8U1
No
fertility
ENSBTAG00000039117
FAM179B
Cells
14
206
31
308
33
cellular
process
ENSBTAG00000039520
SIRPB1
Cells
4
08
32
516
05
12
01
signalling
127
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000040568
Uncharacterized
Nomatch
4
307
54
33
25
cellular
process
ENSBTAG00000045558
LOC100299084
Nomatch
1
22
110
10
09
52
fertility
ENSBTAG00000045588
LOC100298356
Nomatch
metabolism
ENSBTAG00000045709
LOC514057
Nomatch
fertility
ENSBTAG00000046175
Uncharacterized
Nomatch
01
01
cellular
process
ENSBTAG00000046228
Olfactory
receptor
Nomatch
fertility
ENSBTAG00000046239
C10orf12
No
03
104
07
108
06
209
cellular
process
ENSBTAG00000046285
OR8I2
No
fertility
ENSBTAG00000046360
Uncharacterized
Nomatch
NC
ENSBTAG00000046423
LOC618007
Nomatch
NC
ENSBTAG00000046527
LOC100139830
Nomatch
fertility
ENSBTAG00000046590
FAM181A
Cells
09
44
NC
ENSBTAG00000046593
Uncharacterized
Nomatch
20
metabolism
ENSBTAG00000046727
Olfactory
receptor
Nomatch
04
fertility
ENSBTAG00000047150
RTEL
Cells
4
23
64
55
616
cellular
process
ENSBTAG00000047166
SH2D7
Mice
03
101
01
1
02
02
2immunity
ENSBTAG00000047174
Uncharacterized
Nomatch
03
7105
02
01
271
05
03
NC
ENSBTAG00000047317
CL43
Nomatch
01
120
01
immunity
ENSBTAG00000047587
Uncharacterized
Nomatch
22
09
09
35
03
4
06
2cellular
process
ENSBTAG00000047661
FABP12
httpswwwmousephenotypeor
gdatagenesMGI1922747sect
ionassociations
Growthsizebody
01
02
69
metabolism
ENSBTAG00000048021
PNMAL2
No
04
03
2
03
03
02
NC
ENSBTAG00000048153
Uncharacterized
Nomatch
2
05
01
02
01
cellular
process
128
Discussion(
Deleterious recessive variants arenaturally present in outbred animal species
Theydonotcreateproblemsaslongastheyremainatlowfrequencyinrandom
mating populations Under these conditions these variants remain in the
heterozygousstateanddonotproducephenotypiceffectsHoweverassoonas
their frequency increases andmatingoccursbetween related individuals their
deleteriouseffectsbecomeapparentandhighlyaffectthewelfareandviabilityof
animalscarryingtheminthehomozygousstate
In industrialdairy cattlebreedscrossbreeding isgenerallyavoidedandbreeds
areinpracticegeneticisolatesinwhichartificialinsemination(AI)iswidelyused
for rapidly spreading thegeneticprogress in thepopulationAsa consequence
elitesireshavemanythousandandsometimeshundredsofthousanddaughters
Under these conditions mating between relatives becomes practically
unavoidable in the longterm In fact inspiteof theclosecontrolof inbreeding
andofthematingsystemgeneticdefectsregularlyappeartothesurfaceSome
examples are the Complex Vertebral Malformation (CVM) and the Bovine
Leucocyte Adhesion Deficiency (BLAD) in Holstein (Schuumltz et al 2008) and
WeaverdiseaseinBrown(McClureetal2013)Inthesecasestheeffectsofthe
recessivedeleteriousallelesareclearandmoleculardiagnostictoolshavebeen
developedtoavoidtheirfurtherspreadingandtoinitiatetheeradicationprocess
However recessive deleterious alleles may produce their effects in a more
deceitful way for example by affecting fertility inducing early lethality or
influence disease susceptibility All these effects are difficult to trackwith the
current methods of phenotype measurements but have an impact on the
economyofthelivestocksectorandonanimalwelfare
InthispaperwesoughtfordeleteriousrecessiveallelesinItalianHolsteincattle
This breed is recently experiencing a decrease in the genetic trend of fertility
traits (Pryce et al 2014) therefore we started our search from the exome
sequences of 20 sires 19 of which having extreme EBV values for male and
femalefertility
130
The choiceof sequencingexomeswasa compromisebetween completenessof
informationaccuracyofvariantandgenotypedetectionandcostExomesclearly
give incomplete information since they miss all variation in regulatory sites
outside gene boundaries (ENCODE Consortium 2012) These may be very
relevantforthecontroloftraitvariationbuttheyarestillverypoorlyannotated
in livestock On the other hand sequencing exomes yields millions instead of
billionsbasepairsadeepersequencingoftargetregionsandaconsequentmore
reliablevariantandgenotypecalling(FigureS1andS2)Theanalysisofexome
data is also less demanding in terms of computer time (thousands instead of
millions variants are to be analysed) and easier in terms of biological
interpretation Exome sequencing revealed to be very effective in the
identificationofanumberofmutationsaffectinghumanhealth(Bamshadetal
2011 Yang et al 2013) All thesewere defects caused bymutations in single
genes producing clear phenotypic effects havingMendelian inheritance These
conditionsdonotholdinourinvestigationthereforewehadtodevisestrategies
to integrate exome data with other information collected from the Holstein
populationandotherspecies
Theoverallprocedureforidentifythecandidatedeleteriousrecessivewasbased
on the rationale that deleterious recessive alleles in coding sequences are
expectedtobei)causedbyseverealterationoftheproteinsencodedbygenes
ii)mostlyifnotexclusivelycarriedattheheterozygousstateinthepopulation
iii)unevenlydistributedinanimalsofhighandlowfertilityiv)likelyconserved
indifferentspeciesbecauseofselectionconstraintv)likelyassociatedtogenetic
defectsinanyspecieswhenseverelydamaged
A stepwise procedure was set up for reducing candidates to a number
manageable for functional analysis (Figure1) In the first stepmutationswere
discoveredinthesecondfilteredandinthirdassessed
Thisprocedureidentifiesonlysomeofthedeleteriousrecessivesexistinginthe
populationanalysedSincewesequencedonlyexomesandnotwholegenomes
ouranalysesislimitedtocodingsequencesandatthemomenttoSNPvariation
Wethereforemissregulatoryvariationsoutsidegenesand theeffectsof indels
andmorecomplexvariations
131
ThenumberofanimalssequencedislimitedDeleteriousrecessivesareexpected
toberareinthepopulationandwehavesampledonly20animalsfromasubset
ofsiresauthorizedtoAI inItalyAlthoughwehavetakenamongthemthose in
theminus tail of fertility trait thatmighthave ahigher genetic load itrsquos likely
thatwehavemissedothervariantsexistinginthepopulationOntheotherhand
variantscarriedbyAIauthorizedbullsaretheonesthatwillhavealargerimpact
onthepopulationifwidelyspreadbyelitebullsandthereforetheyarethemost
interestingfromabreedingpointofview
Most deleterious variants are expected to be very rare and therefore not
detected by the approach here adopted or detected and afterwards discarded
duringthefilteringprocesses
Sequencing is prone to errors Some real variants may have been discarded
duringQCofsequencedataduetopoorsequencequalityorpoorcoverageofthe
region
Someofthevariationsclassifiedasldquonondeleteriousrdquobythesoftwareusedmay
indeedhave an effect In fact synonymousmutation arenon`neutralwhenever
thedifferenttRNAsforasameaminoacidhavedifferentrelativeabundanceand
translationrates(Goymer2007)
1(Searching(candidate(deleterious(variants(in(exomes(
ExomesproducedafairamountofdataAlltogether5355Mbweresequenced
These represent 9998 of the exome sequences targeted by probes Among
theseonly35hadasequencingdepthacrossanimalslt60XSequencingdepth
is a key factor for variant detection and even more so when the genotype of
sequencedanimalsistobeinferredInvestigatingtheconcordancebetweenthe
genotypes detected at more than 20K SNP by both Beadchip and exome
sequencingwe observed a clear inverse relationship between sequence depth
andnumberofdiscordances(TableS3andFigureS3)Theaveragesequencing
depth expected (50X) and realised in practice (4058) in this investigation
yieldedratheraccurategenotypessincetheaverageconcordanceobservedwas
over97and increased toover99whentwooutlieranimalswereremoved
132
from the dataset The reason for the outlier behaviour of these samples waslikely a suboptimal quality of DNA submitted to sequencing Overall almost215000 variants passed quality controls and among them 192440 autosomalSNPs have been investigated in detail As found in other investigations mostvariants were rare (Peloso et al 2014) For example the MAF distributionobservedin46904SNPsclassifiedneutralhasameanof0197andamedianof0167values(FigureS6)
ThefirststepinourresearchwastoidentifySNPscausingseverealterationsinthe proteins coded by genes These were relevant changes in conformationalterationsofsplicingsitesortruncationsofproteinsToaccomplishthistaskwereliedontwosoftwareSIFTandANNOVARthatidentified4902suchvariationsin3423genes Interestingly the subsetofdeleteriousvariantshas significantlylowermean (0156)andmedian (010) compared toneutral variations (FigureS6) This subset is therefore at least enriched in variants under purifyingselectionhavingadeleteriouseffectatthepopulationlevel
Thenumberofdeleteriousallelesdetectedishighanditisunlikelytheyallhavea relevant phenotypic effect even if most variants are rare In fact variantsdeleterious for a protein are not automatically deleterious for an organism asthe protein change induced by the variant although relevant may not affectproteinfunctionvariationmayoccurinonememberofagenefamilysothatitsfunction may be compensated by other family members the protein functionmay be compensated by other proteins or the affected metabolic pathwaycompensated by alternative pathways The challenge was therefore to filter asubsetofvariantslikelyhavingaphenotypiceffecttobeproposedforvalidationatalargerscaleToaccomplishthistaskwesoughtforadditionalinformationinexomesequencesandinalargepopulationofHolsteinsiresgenotypedwiththebovineHDBeadchip
( (
133
2(Filtering(deleterious(variants(
The second step consisted in filtering the several thousand deleterious SNPsidentified in exomes to search for SNPs and genes that worth deeperinvestigationWeusedtwostrategiestoaccomplishthistask
21$ Distribution$ of$ deleterious$ variants$ in$ fertility$ EBV$ plus9variant$ and$
minus9variant$animals$
Fertility traits are controlled by a high number of genes and are highlygenetically heterogeneous Even using a selected subset of variants identifiedfrom exome sequencing classic association analyses would need sample sizesmuch larger that the one available in this investigation to have a reasonablestatistical power (Stitziel et al 2011) The first approachwe appliedwas theanalysisof SNPallelic andgenotypicdistributionbetween theminusandplus`variant tails of male and female fertility EBVs Given the very low number ofanimals sequencedwedecided to use a simplemodel to exclude from furtheranalyses all markers showing no evidence of association with the EBV tailsratherthantofindsignificantassociations
We divided samples in cases (minus`variant animals for fertility EBV) andcontrols (plus`variant animals) and tested all possible models of inheritanceimplemented in thePlinkpackage for tackling simple genetic defectsWe thenused an empirical point`wise significant threshold based on permutations tohighlight SNPs having alleles or genotypes showing an uneven distributionbetweencasesandcontrolsandconsideredtheseasworthfurtherinvestigationAtotalof112SNPsin101genespassedthisfilteringprocess
$
22$Detection$of$regions$carrying$deleterious$mutation$from$800K$SNP$data
The analysis of exomes alone was therefore insufficient to tag recessivedeleteriousgenesinItalianHolsteinandwesoughtforadditionalinformationin1009ItalianHolsteinbullsgenotypedathighdensitythatarepartofthegenomicselection program run by ANAFI The rationale of the approach is that
134
deleterious recessives are expected to be absent or rare in the homozygous
status in the population Hence they may be tagged searching for genome
regions having markers or haplotypes lacking or having significantly less
homozygotesthanexpectedunderaneutralmodel
We used two complementary approaches to identify these candidate regions
One(OHW)searchedforverystrongsignalsproducedbysinglemarkersoutof
Hardy`Weinberg equilibrium at Ple847x10`8 A minimum of three significant
markersinaslidingwindowsof10wassettoreachsignificancetocapturemore
reliablesignalsavoidexcessivesinglemarkernoiseandaccount forapossible
Wahlund effect due to presence of the genetic structure in the population
difficulttoavoidinthesetofprogenytestedbullsanalysedThesecondapproach
(LOH) captured more moderate signals but from multiple markers in larger
regions Both approaches were designed to detect signals in regions in which
deleteriousvariantsarestillinsegregationandarenotexcessivelyrare
Asexpectedthetwoapproachesidentifiedregionsthatonlypartiallyoverlapped
(TablesS5S6eS7)Inparticular5regionsoutofthe11identifiedbyOHWwere
in common to the 240 identified by LOH Two of these regions overlap with
previous investigations that used the BovineSNP50 Beadchip Sahana et al
(2013) found a region candidate to carrypotential lethal haplotypes inNordic
Holstein on BTA7 in the region 6708007`10907840 This interval overlaps
withbothBTA7regionsdetectedinthisstudyVanRadenetal(2011)detected
inUSHolsteinalargeregiononBTA15spanning76Mb`82M(onBTA30)that
includedLRP4 gene responsible for themulefootdefectOuranalyseswith the
HDBeadchipindicatesontheonesidethatthelargeregiondetectedbySahana
etal(2013)mayconsistintwonearbyregionsOntheothersidethesignalwe
detectedonBTA15doesnot includetheLRP4 locusthat iscompletely fixed in
theItalianbullspopulationanalysed
In total83mutations in66genes fell in eitherorbothOHWandLOHregions
These were joined to the SNPs output of the EBV tail analyses for further
assessment
(
135
3(Assessing(filtered(deleterious(variants(
Different characteristics of the filtered deleterious variants were assessed to
identify among them strong candidates to submit to a large`scale validation
studyTheassessmentphasepermittedtogainadditionalinformationinfavour
or against SNP and gene candidature to be deleterious We quantified this
evidenceusingasubjectivescore
31$Comparative$genomics$analysis$
In the case of comparative genomics the score quantified the level of
conservationof thealternativeaminoacidresidues inducedbySNPvariants in
ItalianHolsteincattleComparisonwasrunagainstaset100vertebratespecies
that included 62 mammals The level of conservation is a good proxy for the
expected severity of a mutation In the case of the 46 SNPs that induced the
change of an amino acid highly conserved in the species investigatedwemay
hypothesize a strong selection pressure in favour of the maintenance or the
invariant amino acid at that position Therefore the amino acid change or the
stopcodonweobserved incattlemayaffectnotonlytheproteinstructureand
function but also the phenotype of homozygous animals Interestingly in all
thesecasestheSNPallelecodingforthenonconservedvariantswastheminor
allele
Converselyinallcasesofhighvariabilityofthetargetaminoacidacrossspecies
(asmanyassevendifferentaminoacidshavesometimesbeenobserved)poorly
supports the presence of phenotypic effects due to its change Sometimes the
informationcollectedisagainstthepresenceofadeleteriousmutationasinthe
case of the presence in different species of both alleles identified by exome
sequencingThis suggests thatneither of these is greatly affecting the viability
andfitnessofanimalscarryingthem
Wealsoobservedanumberoffailures(N=45)intheliftoverofSNPcoordinates
fromcattletohumanorinproteinalignmentThishappenedforexampleinthe
caseofuncharacterized cattleproteins thathavenoorthologous in thehuman
genome and of someolfactory receptors that belong to a highly variable gene
136
family Even if the failure of alignment is an indication of poor proteinconservationacrossspeciestheconservationofthetargetaminoacidcouldnotbeassessesandinthesecaseswepreferredtoconsidertheinformationmissingandtoscoreitaccordingly
32$Haplotype$analysis$
Asecondassessmentofthefilteredcandidateswasontheirbehaviourinthesirepopulation The ideal test would be the direct genotyping of the SNPs in thepopulationand theevaluationof theirallelicandgenotypicproportions In theabsenceofthisinformationwefirstinferredtheHDhaplotypeassociatedtoSNPallelesandassessedthebehaviourofthehaplotypeinthepopulationassumingthisasasurrogateoftheSNPbehaviourSincedeleteriousallelesareexpectedtoberareandthis isalsowhatweobserved fromcomparativegenomicanalysesweinvestigatethefrequencyonlyofthehaplotypecarryingtheminorSNPalleleagainsttheexpectationto findthecorrespondinghaplotypeneverhomozygousor a number of times much lower than expected from its frequency in thepopulation The strategy yielded some interesting confirmation but was onlypartiallysuccessful
Firstofalltheidentificationofauniquehaplotypecarryingtherarevariantwasnot always possible or straightforward In some cases carriers of a same rarevariantdidnotshareexactlythesamehaplotypeWeacceptedsomevariationifthe haplotype remained clearly distinguishable from those of animals notcarryingtherarevariantbutatthesametimereducedthestringencyofthetestInquiteanumberofcasesasamehaplotypecarriedbothcandidatedeleteriousSNP variants This may occur for example in the case of recent mutations IntheselastcasesthetestcouldnotbecarriedoutFinallymostrarevariantswereassociatedtorarehaplotypesthatarenotlikelytohavehomozygousindividualsinthepopulationindependentlyoftheSNPeffect
Thisassessmentwas thereforeonly significant fora fewhaplotypes thathadalarge number of heterozygous individuals and no or only a very fewhomozygotesandforthosethathadalargenumberofhomozygotes
137
33$Functional$analysis$
The final assessment was the analysis of the function of genes carrying the
candidate deleterious SNP This has some limitation since knowledge of gene
function is only partial and mainly gained from investigations on species
different from cattle In any case we considered the association to genetic
defectsinhumanandtheinductionofseverephenotypesinmouseknockoutas
a positive indication of the likely phenotypic effect of the severe protein
alterationweobservedincattleInadditiongenefunctionandexpressionprofile
was evaluated in search for evidence of gene involvement in fertility
developmentorprocessesfundamentalforcellandorganismviability
Amonggenesidentified22areassociatedtohumansyndromes(Table4)Eleven
syndromesareneurologicaldiseasesaffectingbraindevelopmentand inducing
mental retardation (eg Donnay`Barrow Syndrome and Autosomal recessive
primary microcephaly) the others influence development (eg
Cerebrooculofacioskeletal Syndrome 1 and Adams`Oliver syndrome) or organ
function(egCongenitalpulmonaryalveolarproteinosis)
Most of the same defects translated into Holstein would induce physical and
behavioural anomalies likely causingyoungbulls tobe excluded fromprogeny
testing Itshould in factbeconsideredthattheanimalsweanalysedareonthe
onesidethosethatwillmostinfluencethegeneticmakeoffuturegenerationbut
are not a random sample of Holstein animals These bulls have been progeny
testedandauthorisedtobeusedinartificialinseminationinItalyCandidatesto
progenytestingderivefromthematingofthebest1siresandbest2damsof
the Italian Holstein population At the age of 4 months they are taken to the
Italian Holstein breeder association (ANAFI) test station where they are
subjected to performance test sanitary controls and behavioural tests
Thereafterthequalityoftheirsemenischeckedbeforeauthorizationisgranted
toenterinprogenytestInthissampleofanimalsitisthereforeexpectedtofind
never at the homozygous state mutations causing lethality physical
malformation abnormal behaviour and semen infertility Also those having a
138
majoreffectonsemenqualityandhighsusceptibilitytodiseasesareexpectedto
berarelycarriedatthehomozygousstatus
Association with human syndromes is a strong evidence of the potential
phenotypiceffectofmutationsalteringthegenestructureHoweveritshouldbe
considered that variants found in humans are different fromvariants found in
cattle and that the severity of the phenotype induced by a deleterious gene is
often variable even within species Therefore we evaluated this information
togetherwiththeotherindirectevidencesofvariantstobedeleterious
WealsoenquiredthewebsiteoftheInternationalMousePhenotypeConsortium
(httpswwwmousephenotypeorg) to search for the effects of null alleles in
mice The few genes forwhichnull`mice line is available are involved in basic
functionsfortheorganismthatspanfromvision(IAH1andEPB41L4A)toaging
(RGP1)andtissuesdevelopment(FABP12RIPPLY3RGNEF)orfertility(CFAP96)
Aswithgenesassociatedtohumansyndromesmostoftheabnormalphenotypes
observedinknockmousewouldhamperyoungbullstoenterprogenytesting
Gene Ontology (GO) was evaluated on the entire gene set No enrichment on
particularcategoriesGOtermswasobservedThemostrepresentedcategories
in Biological Processes were as expected ldquometabolic processrdquo and ldquocellular
processrdquo however lower level terms ldquoreproductionrdquo and ldquodevelopmental
processrdquo are represented with meaningful hits The analysis of Molecular
Function indicates thatmany of the hits are enzymes (ldquocatalytic activityrdquo) and
haveregulatoryroles(ldquobindingrdquoandldquoenzymeregulatoractivityrdquo)Thisresultcan
be interpreted indifferentwaysOn theone sidedeleterious allelesmay exert
their deleterious effect in a number of different cellular and developmental
processes the correct process is one while there is many options to make it
wrongAlsothenumberofgenesinvestigatedisrelativelylowandformanyof
them there is little or no information available (eg 11 genes code for a
completelyuncharacterizedprotein)Inadditionsomeofthemmayturnoutnot
tobereallydeleteriousandaddnoisetotheGOanalysis
As a final step all the previous information (GO expression patterns known
function)havebeenused to estimate a scoremeasuring the importanceof the
gene function forcattle survivalandreproductionThescoringof functionwas
139
particularlydifficultGOcategoriesweregeneralandallfunctionstaggedhavearelevant importance for animal survival andproductionTherefore in this casewedidnrsquotusethewholescorerangeplanned(from`1to1)andgavenonegativescoresApositivescoreof1wasattributedonly togenesplayingan importantroleinfertilityanddevelopment
$
34$Strong$candidate$deleterious$variants$
Thescoringprocessyielded21variantswithatotalscore=035variantswithpositive and 135 with negative values (Figure 2 and Table 5) Although theprocesswasnoterrorfreeasitreliedonincompleteinformationandhadsomedegreeofsubjectivityparticularlyinthechoiceofrelativeweightsitidentifiedaset of variants that may be considered strong candidates to have deleteriouseffects in the Italian Holstein breed Variants were evaluated individually andthose in a same genes had sometimes different scores For example the fivevariantscarriedbyALPK2hadscoresrangingfrom`7to`2
140
Figure2Graphicalrepresentationoftotalscoreforeachvariants
minus505
Variation
Score
1_645923001_645985561_1381698111_1464490851_150972144
2_7478962_270362092_270455392_375933832_555609862_904645543_75418123_110403993_110404003_188945763_1137877703_1203356984_53767714_265405974_265406204_749514994_1033779064_1057246734_1130233264_1178417155_445557055_445557425_757447955_940308955_940309556_207762566_382803486_863518016_927092906_1075769256_1078851146_1090916426_1166055226_1186534167_13339757_56553577_97149147_167941797_168358027_168621947_197404097_197409057_263293538_602606098_603324848_658345648_704101558_760297748_770028908_872798628_1117618168_1128416959_83888819_83889249_237928389_510133929_1047396559_10515702910_171251510_1584300310_2802235210_2802311010_8290312110_8517371111_4630576711_4675390311_4740957711_5979141011_7832201011_8794387511_9548610911_9945389411_9946100112_1190618112_1247843312_1248084512_1413689712_5304908112_9076120913_381566413_1178793413_1178793813_5247168413_5393493013_5393494713_5393985313_5454042513_5504774013_5998820513_6304575013_6316961013_6539670814_197749414_364069514_3556894514_4688575014_5718402214_6863536315_18267915_18331715_18340415_248692
15_3018920915_3504878815_4630194115_7503314015_7503764515_8024448915_8025571015_8028875915_8038974515_8048559315_8055593115_8094968915_8275455615_8292533216_456201716_3831963616_6257435517_5309539517_5311265517_6609153117_7237753218_2562355018_3495651318_4637204118_5215480618_5408127718_5413095518_5781976819_2144410319_3107215219_3111397219_3279084319_4225178519_5139519419_5619117519_5726625120_763980120_2341043120_5888423620_6417173921_623911921_2034587821_2034628921_2683773321_3100359921_5527419521_5582646321_5817338421_5912071921_6048248421_6048251021_6284314222_5241595922_5242173422_5242204622_5242610422_5242666822_5695165722_5697484722_5779767423_4588505224_2132460424_2145336424_4676490924_5032719524_5815817824_5815862024_5815878424_5820400324_5820447925_3743318026_1811648726_2571599626_2576457027_503107827_3283255728_28422828_169040328_3571880728_4408176529_198588129_753052329_753066329_26397931
141
Table5SingleanalysesscoreandtotalforeachcandidatevariantsChrchrom
osom
ePosphysicalposition
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
164592300
ENSBTAG00000009014
UPK1B
02
I1
01
2
164598556
ENSBTAG00000009014
UPK1B
0I1
I1
00
I2
1138169811
ENSBTAG00000002966
DNAJC13
03
10
15
1146449085
ENSBTAG00000017418
RRP1B
I3
I1
I1
00
I5
1150972144
ENSBTAG00000036019
RIPPLY3
I3
I1
I1
I1
1I5
2747896
ENSBTAG00000013916
HERC2
30
10
04
227036209
ENSBTAG00000004555
LRP2
I3
I1
10
0I3
227045539
ENSBTAG00000004555
LRP2
I3
31
00
1
237593383
ENSBTAG00000032481
DAPL1
00
I1
00
I1
255560986
ENSBTAG00000032264
Uncharacterized
3I3
I1
00
I1
290464554
ENSBTAG00000017436
ALS2CR11
0I1
I1
00
I2
37541812
ENSBTAG00000046175
Uncharacterized
00
00
00
311040399
ENSBTAG00000013789
OR6K6
I3
I3
I1
00
I7
311040400
ENSBTAG00000013789
OR6K6
I3
I3
I1
00
I7
318894576
ENSBTAG00000004772
THEM
40
I3
I1
00
I4
3113787770
ENSBTAG00000000149
USP40
11
I1
01
2
3120335698
ENSBTAG00000002809
CAPN10
I3
21
00
0
45376771
ENSBTAG00000001500
FIGNL1
01
I1
00
0
426540597
ENSBTAG00000033954
PRPS1L1
1I3
I1
00
I3
426540620
ENSBTAG00000033954
PRPS1L1
1I1
I1
00
I1
474951499
ENSBTAG00000003508
CFAP69
I3
I3
I1
10
I6
4103377906
ENSBTAG00000021073
KIAA1549
0I1
11
01
4105724673
ENSBTAG00000019265
AGK
01
10
02
142
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
4113023326
ENSBTAG00000014768
ZNF786
I3
I3
I1
00
I7
4117841715
ENSBTAG00000017505
PAXIP1
0I1
I1
00
I2
544555705
ENSBTAG00000026323
LYSB
I3
3I1
00
I1
544555742
ENSBTAG00000026323
LYSB
I3
0I1
00
I4
575744795
ENSBTAG00000009064
CSF2RB
I3
I3
10
0I5
594030895
ENSBTAG00000009444
SLC15A5
I3
I1
I1
00
I5
594030955
ENSBTAG00000009444
SLC15A5
I3
I3
I1
00
I7
620776256
ENSBTAG00000008797
Uncharacterized
I3
I3
00
0I6
638280348
ENSBTAG00000007568
MEPE
0I3
I1
00
I4
686351801
ENSBTAG00000003523
UGT2B15
1I1
I1
00
I1
692709290
ENSBTAG00000010954
ART3
I3
2I1
00
I2
6107576925
ENSBTAG00000038606
DBDR
0I3
I1
00
I4
6107885114
ENSBTAG00000001505
GRK4
I3
1I1
00
I3
6109091642
ENSBTAG00000007001
SLC26A1
1I3
I1
00
I3
6116605522
ENSBTAG00000014058
LDB2
01
I1
00
0
6118653416
ENSBTAG00000012458
PSAPL1
I3
I1
I1
00
I5
71333975
ENSBTAG00000015602
C7H5orf45
I3
I3
I1
00
I7
75655357
ENSBTAG00000045588
LOC100298356
1I3
I1
00
I3
79714914
ENSBTAG00000046423
LOC618007
00
I1
00
I1
716794179
ENSBTAG00000012314
LDLR
02
10
03
716835802
ENSBTAG00000009568
KANK2
01
I1
00
0
716862194
ENSBTAG00000009569
DOCK6
01
10
02
719740409
ENSBTAG00000000414
FUT5
I3
I3
I1
00
I7
719740905
ENSBTAG00000000414
FUT5
I3
I3
I1
00
I7
143
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
726329353
ENSBTAG00000004860
SLC27A6
0I1
I1
00
I2
860260609
ENSBTAG00000011420
CA9
I3
I1
I1
00
I5
860332484
ENSBTAG00000011433
RGP1
I3
0I1
10
I3
865834564
ENSBTAG00000035708
LOC527795
00
I1
00
I1
870410155
ENSBTAG00000005466
C8H8orf58
I3
I3
I1
00
I7
876029774
ENSBTAG00000015027
Clusterin
00
I1
10
0
877002890
ENSBTAG00000012921
KIF24
12
I1
00
2
887279862
ENSBTAG00000002220
SPTLC1
00
10
01
8111761816
ENSBTAG00000006532
CDK5RAP2
I3
I3
11
1I3
8112841695
ENSBTAG00000013838
ALLC
01
I1
00
0
98388881
ENSBTAG00000015335
BAI3
00
I1
00
I1
98388924
ENSBTAG00000015335
BAI3
00
I1
00
I1
923792838
ENSBTAG00000046593
Uncharacterized
03
00
03
951013392
ENSBTAG00000018528
USP45
02
I1
00
1
9104739655
ENSBTAG00000018810
THBS2
03
10
15
9105157029
ENSBTAG00000000918
ERMARD
I3
21
00
0
10
1712515
ENSBTAG00000014750
EPB41L4A
0I1
I1
I1
0I3
10
15843003
ENSBTAG00000009030
NOX5
I3
I3
I1
00
I7
10
28022352
ENSBTAG00000035309
OR4F15
I3
1I1
00
I3
10
28023110
ENSBTAG00000035309
OR4F15
I3
I1
I1
00
I5
10
82903121
ENSBTAG00000012565
PCNX
I3
2I1
01
I1
10
85173711
ENSBTAG00000011683
NUM
BI3
0I1
00
I4
11
46305767
ENSBTAG00000002288
NT5DC4
0I3
I1
00
I4
11
46753903
ENSBTAG00000019072
PSD4
0I1
I1
00
I2
144
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
11
47409577
ENSBTAG00000009033
TEX37
I3
I2
I1
10
I5
11
59791410
ENSBTAG00000006027
USP34
02
I1
01
211
78322010
ENSBTAG00000015490
HS1BP3
00
I1
00
I1
11
87943875
ENSBTAG00000001140
IAH1
I3
3I1
10
011
95486109
ENSBTAG00000017576
GPR144
I3
I3
I1
00
I7
11
99453894
ENSBTAG00000038831
PHYHD1
11
I1
00
111
99461001
ENSBTAG00000010601
DOLK
12
10
04
12
11906181
ENSBTAG00000001133
VWA8
I3
1I1
00
I3
12
12478433
ENSBTAG00000014752
AKAP11
1I3
I1
I1
0I4
12
12480845
ENSBTAG00000014752
AKAP11
1I1
I1
I1
0I2
12
14136897
ENSBTAG00000016691
LACC1
I3
I3
I1
00
I7
12
53049081
ENSBTAG00000032821
SCEL
I3
I3
I1
00
I7
12
90761209
ENSBTAG00000019961
ATP4B
I3
I3
I1
00
I7
13
3815664
ENSBTAG00000034991
SLX4IP
00
I1
00
I1
13
11787934
ENSBTAG00000008650
CAMK
1D
I3
0I1
00
I4
13
11787938
ENSBTAG00000008650
CAMK
1D
I3
0I1
00
I4
13
52471684
ENSBTAG00000013753
DDRGK1
02
I1
00
113
53934930
ENSBTAG00000039520
SIRPB1
0I3
I1
00
I4
13
53934947
ENSBTAG00000039520
SIRPB1
00
I1
00
I1
13
53939853
ENSBTAG00000039520
SIRPB1
0I3
I1
00
I4
13
54540425
ENSBTAG00000047150
RTEL
I3
0I1
00
I4
13
55047740
ENSBTAG00000031718
OGFR
01
I1
00
013
59988205
ENSBTAG00000027390
RTFDC1
01
I1
00
013
63045750
ENSBTAG00000009144
BPIFA2A
0I3
I1
00
I4
145
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
13
63169610
ENSBTAG00000019752
BPIFA2B
0I1
I1
00
I2
13
65396708
ENSBTAG00000006021
CEP250
0I3
I1
00
I4
14
1977494
ENSBTAG00000026350
SPATC1
I3
I3
I1
00
I7
14
3640695
ENSBTAG00000015985
GPR20
00
I1
00
I1
14
35568945
ENSBTAG00000037399
Uncharacterized
30
I1
00
2
14
46885750
ENSBTAG00000047661
FABP12
0I1
I1
00
I2
14
57184022
ENSBTAG00000026247
PKHD1L1
I3
3I1
00
I1
14
68635363
ENSBTAG00000017165
MATN2
01
I1
00
0
15
182679
ENSBTAG00000046228
Olfactoryreceptor
00
I1
0o
I1
15
183317
ENSBTAG00000046228
Olfactoryreceptor
00
I1
00
I1
15
183404
ENSBTAG00000046228
Olfactoryreceptor
00
I1
00
I1
15
248692
ENSBTAG00000045558
LOC100299084
00
I1
0o
I1
15
30189209
ENSBTAG00000005357
VPS11
02
I1
00
1
15
35048788
ENSBTAG00000005340
SERGEF
0I3
I1
00
I4
15
46301941
ENSBTAG00000002289
NLRP14
I3
I3
I1
00
I7
15
75033140
ENSBTAG00000012053
ACCSL
1I3
I1
00
I3
15
75037645
ENSBTAG00000012053
ACCSL
I3
I3
I1
00
I7
15
80244489
ENSBTAG00000046527
LOC100139830
I3
0I1
00
I4
15
80255710
ENSBTAG00000046285
OR8I2
10
I1
00
0
15
80288759
ENSBTAG00000001291
OR8H3
1I1
I1
00
I1
15
80389745
ENSBTAG00000045709
LOC514057
13
I1
00
3
15
80485593
ENSBTAG00000035985
LOC101909743OR8K1
1I1
I1
00
I1
15
80555931
ENSBTAG00000022858
LOC783561
I3
0I1
00
I4
15
80949689
ENSBTAG00000039110
OR8U1
1I3
I1
00
I3
146
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
15
82754556
ENSBTAG00000021375
OR10Q1
01
I1
00
015
82925332
ENSBTAG00000021233
OR5B17
I3
0I1
00
I4
16
4562017
ENSBTAG00000019799
FCAM
RI3
I3
I1
00
I7
16
38319636
ENSBTAG00000014794
SCYL3
I3
0I1
00
I4
16
62574355
ENSBTAG00000035226
TOR1AIP1
00
I1
00
I1
17
53095395
ENSBTAG00000020414
DHX37
1I1
I1
00
I1
17
53112655
ENSBTAG00000020414
DHX37
0I3
I1
00
I4
17
66091531
ENSBTAG00000010847
FOXN4
0I1
I1
00
I2
17
72377532
ENSBTAG00000008996
SFI1
0I3
I1
00
I4
18
25623550
ENSBTAG00000005170
GPR114
0I3
I1
01
I3
18
34956513
ENSBTAG00000001286
ELMO
33
3I1
11
718
46372041
ENSBTAG00000015912
DMKN
0
1I1
00
018
52154806
ENSBTAG00000001262
IRGQ
I3
1I1
00
I3
18
54081277
ENSBTAG00000009337
PNMA
L1
0I3
I1
00
I4
18
54130955
ENSBTAG00000048021
PNMA
L2
I3
1I1
00
I3
18
57819768
ENSBTAG00000003231
LIM2
1
I3
10
0I1
19
21444103
ENSBTAG00000011011
SSH2
I3
I3
I1
00
I7
19
31072152
ENSBTAG00000022509
DNAH
9I3
0I1
00
I4
19
31113972
ENSBTAG00000022509
DNAH
90
0I1
00
I1
19
32790843
ENSBTAG00000015294
COX10
12
10
04
19
42251785
ENSBTAG00000013685
KRT31
I3
0I1
00
I4
19
51395194
ENSBTAG00000015980
FASN
1I1
10
01
19
56191175
ENSBTAG00000007910
EXOC7
3I3
I1
00
I1
19
57266251
ENSBTAG00000011387
NAT9
12
I1
00
2
147
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
20
7639801
ENSBTAG00000005633
RGNEF
1I1
I1
10
0
20
23410431
ENSBTAG00000008871
DDX4
1I3
I1
00
I3
20
58884236
ENSBTAG00000005514
TRIO
I3
0I1
00
I4
20
64171739
ENSBTAG00000005021
SEMA5A
01
10
02
21
6239119
ENSBTAG00000016495
LINS
02
10
03
21
20345878
ENSBTAG00000012326
LOC101906218
00
I1
00
I1
21
20346289
ENSBTAG00000012326
LOC101906218
I3
0I1
00
I4
21
26837733
ENSBTAG00000047587
Uncharacterized
10
00
01
21
31003599
ENSBTAG00000047166
SH2D7
0I3
I1
00
I4
21
55274195
ENSBTAG00000039117
FAM179B
0I3
I1
00
I4
21
55826463
ENSBTAG00000014001
MAP1A
0I3
I1
00
I4
21
58173384
ENSBTAG00000009836
CHGA
I3
I1
I1
00
I5
21
59120719
ENSBTAG00000046590
FAM181A
01
I1
00
0
21
60482484
ENSBTAG00000017096
ERICH1
0I3
I1
00
I4
21
60482510
ENSBTAG00000017096
ERICH1
01
I1
00
0
21
62843142
ENSBTAG00000012833
ATG2B
0I3
I1
00
I4
22
52415959
ENSBTAG00000015839
MAP4
00
I1
00
I1
22
52421734
ENSBTAG00000015839
MAP4
1I1
I1
00
I1
22
52422046
ENSBTAG00000015839
MAP4
I3
I3
I1
00
I7
22
52426104
ENSBTAG00000047174
Uncharacterized
I3
00
00
I3
22
52426668
ENSBTAG00000047174
Uncharacterized
10
00
01
22
56951657
ENSBTAG00000021645
MBD4
1I1
I1
00
I1
22
56974847
ENSBTAG00000021643
EFCAB12
0I1
I1
00
I2
22
57797674
ENSBTAG00000031115
Uncharacterized
01
00
01
148
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
23
45885052
ENSBTAG00000005248
OFCC1
00
I1
00
I1
24
21324604
ENSBTAG00000000998
SLC39A6
02
I1
00
1
24
21453364
ENSBTAG00000011622
C18orf21
I3
I3
I1
00
I7
24
46764909
ENSBTAG00000015210
LOXHD1
0I3
10
0I2
24
50327195
ENSBTAG00000002660
CCDC11
I3
11
01
0
24
58158178
ENSBTAG00000014284
ALPK2
I3
I1
I1
00
I5
24
58158620
ENSBTAG00000014284
ALPK2
0I1
I1
00
I2
24
58158784
ENSBTAG00000014284
ALPK2
I3
I3
I1
00
I7
24
58204003
ENSBTAG00000014284
ALPK2
I3
I3
I1
00
I7
24
58204479
ENSBTAG00000014284
ALPK2
I3
2I1
00
I2
25
37433180
ENSBTAG00000040568
Uncharacterized
00
I1
00
I1
26
18116487
ENSBTAG00000046239
C10orf12
0I1
I1
00
I2
26
25715996
ENSBTAG00000010522
Uncharacterized
10
00
01
26
25764570
ENSBTAG00000046727
Olfactoryreceptor
00
I1
00
I1
27
5031078
ENSBTAG00000007340
LOC101906349
0I1
I1
00
I2
27
32832557
ENSBTAG00000011684
RAB11FIP1
10
I1
00
0
28
284228
ENSBTAG00000000079
CCSAP
0I1
I1
00
I2
28
1690403
ENSBTAG00000048153
Uncharacterized
10
00
01
28
35718807
ENSBTAG00000047317
CL43
1I1
I1
00
I1
28
44081765
ENSBTAG00000032527
ERCC6
I3
01
00
I2
29
1985881
ENSBTAG00000004081
FAT3
30
I1
00
2
29
7530523
ENSBTAG00000046360
Uncharacterized
01
00
01
29
7530663
ENSBTAG00000046360
Uncharacterized
1I1
00
00
29
26397931
ENSBTAG00000013568
UEVLD
I3
I3
I1
00
I7
149
Interestingly13ofthe22genesassociatedtohumansyndromescarryvariants
appearing in the short list of the 35 positives and only 6 in that of 135 with
negative values This in spite of the low weight (from 1 to 1) given to this
parameterEightofthe35positivevariantsareinproteinsstilluncharacterized
Sixvariantshadascoreof4orhigherThehighestscore(totalscore=7)wasfor
ELMO3 (Engulfment and cell motility gene 3) This gene is involved in cell
motility and migration ELMO3 is also a homolog of C( elegans Ced12 that is
required for many developmental processes included gonadal morphogenesis
Mutant for this gene in C( elegans suffer from abnormal development usually
with lethal consequences or sublethal developmental defects including small
embryosdelayeddevelopmentandreducedfertility(Gumiennyetal2001)In
human it is expressed in testis prostate and ovary
(httpwwwgenecardsorgcgibincarddispplgene=ELMO3)
ThegeneDNAJC13(DnaJ(Hsp40)homologsubfamilyCmember13)hadatotal
score of 5 This gene is involved in the onset of Parkinson disease DNAJC13
regulates the dynamics of clathrin coats on early endosomes Cellular analysis
shows that the mutation Arg855Ser identified by VilarintildeoGuumlell et al (2014)
confers a gainoffunction that impairs endosomal transportApossible roleof
DNAJC13geneinTourettesyndromechronicticdisorderisunderinvestigation
(Sundarametal2011)
A second genehad a scoreof 5THBS2 (Thrombospondin2) It belongs to the
thrombospondin family and is a powerful inhibitor of tumour growth and
angiogenesis It is highly expressed in testis in fetal Leydig cell Hirose et al
(2008) found a significant association between an intronic SNP in the THBS2
gene and lumbar disc herniation Thrombospondin2 is also a matricellular
proteinfoundinhumanserumDeletionofTSP2causesagedependentdilated
cardiomyopathy (Hanatani et al 2014) Mutant mice for this gene display a
variety of mutant phenotype Kyriakides et al 1998 suggest that THBS2
modulates the cell surface properties of mesenchymal cells affecting cell
functionssuchasadhesionandmigration
150
Three genes had a score of 4 all three are associated to human syndromes
Mutation in HERC2 (HECT and RLD domain containing E3 ubiquitin protein
ligase2)isinvolvedinautosomalrecessivementalretardation38(Puffenberger
etal2012)Jietal(2000)identifiedinmutantmouseneuromuscularsecretory
vesicledefectsandspermacrosomedefectsMoreovergeneticvariationsinthis
gene are associated with skinhaireye pigmentation variability Mutations in
DOLK gene (dolichol kinase) are associated with dolichol kinase deficiency
(Lefeberetal2011)COX10(cytochromecoxidaseassemblyhomolog10)gene
is associated to mitochondrial complex IV deficiency causing Leigh syndrome
(Antonickaetal2003)
Conclusions)
Exomesequencingthereforeprovidedvaluableinformationoncodingsequence
variants and on their potential effect on protein function As a group variants
classifiedasdeleteriouswererarerthanvariantsclassifiedasneutralSincethis
isanexpectedeffectofpurifyingselectionthissetofvariantwaslikelyenriched
forvariantshavingaphenotypiceffect
Theinclusionintheanalysesofplusandminusvariantanimalsforfertilitytraits
likelypermittedtosequencethemostimportantvariantsinfluencingthesetraits
butthelownumberofanimalssequencedcombinedwiththecomplexityofthe
traitpermittedtofindonlyasuggestiveassociationbetweenSNPsandtraitsItrsquos
likely that amuch larger number of individuals is to be sequenced if genome
wideassociationisthestrategyoneswanttopursue
We explored an alternative strategy to confirm suggestivedeleterious variants
based on indirect evidence of the variant effect by assessing i) evolutionary
constraintsassociatedtotheaminoacidresiduecodedbythevariantii)variant
behaviour in a large population In the absence of direct genotyping of the
candidatevariantsweusedhaplotypeinformationassurrogateiii)functionand
effectsinotherspecies
151
We attempted to quantify these criteria through a score subjective and only
indicativebutpermitted tohighlighted35genes so farneglected in cattlebut
deservingfurtherattentionandpotentiallyhavingdeleteriouseffectsinHolstein
andothercattlebreedsThesemaybepartoftherecessivedeleterioussetthat
constitutethegeneticloadofItalianHolstein
References)AntonickaHLearySCGuercinGHAgarJNHorvathRKennawayNG
HardingCOJakschMShoubridgeEA2003MutationsinCOX10
resultinadefectinmitochondrialhemeAbiosynthesisandaccountfor
multipleearlyonsetclinicalphenotypesassociatedwithisolatedCOX
deficiencyHumMolGenet122693ndash2702doi101093hmgddg284
AshwellMSHeyenDWSonstegardTSVanTassellCPDaYVanRaden
PMRonMWellerJILewinHA2004DetectionofQuantitativeTrait
LociAffectingMilkProductionHealthandReproductiveTraitsin
HolsteinCattleJournalofDairyScience87468ndash475
doi103168jdsS00220302(04)731860
AstonKICarrellDT2009GenomeWideStudyofSingleNucleotide
PolymorphismsAssociatedWithAzoospermiaandSevere
OligozoospermiaJournalofAndrology30711ndash725
doi102164jandrol109007971
AulchenkoYSRipkeSIsaacsADuijnCMvan2007GenABELanRlibrary
forgenomewideassociationanalysisBioinformatics231294ndash1296
doi101093bioinformaticsbtm108
BamshadMJNgSBBighamAWTaborHKEmondMJNickersonDA
ShendureJ2011ExomesequencingasatoolforMendeliandisease
genediscoveryNatRevGenet12745ndash755doi101038nrg3031
BieseckerLGShiannaKVMullikinJC2011Exomesequencingtheexpert
viewGenomeBiology12128doi101186gb2011129128
152
BiffaniSMarusiMBiscariniFCanavesiF2005Developingagenetic
evaluationforfertilityusingangularityandmilkyieldascorrelatedtraits
InterbullBulletin063
BiffaniSSamoreacuteABCanavesiF2002Inbreedingdepressionforproduction
reproductionandfunctionaltraitsinItalianHolsteincattleInstitut
NationaldelaRechercheAgronomique(INRA)pp0ndash4
BolgerAMLohseMUsadelB2014TrimmomaticAflexibletrimmerfor
IlluminaSequenceDataBioinformaticsbtu170
doi101093bioinformaticsbtu170
BrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingand
MissingDataInferenceforWholeGenomeAssociationStudiesByUseof
LocalizedHaplotypeClusteringAmJHumGenet811084ndash1097
CharlesworthDWillisJH2009ThegeneticsofinbreedingdepressionNat
RevGenet10783ndash796doi101038nrg2664
ChunSFayJC2009Identificationofdeleteriousmutationswithinthree
humangenomesGenomeRes191553ndash1561
doi101101gr092619109
ConsortiumTEP2012AnintegratedencyclopediaofDNAelementsinthe
humangenomeNature48957ndash74doi101038nature11247
DePristoMABanksEPoplinRGarimellaKVMaguireJRHartlC
PhilippakisAAdelAngelGRivasMAHannaMMcKennaA
FennellTJKernytskyAMSivachenkoAYCibulskisKGabrielSB
AltshulerDDalyMJ2011Aframeworkforvariationdiscoveryand
genotypingusingnextgenerationDNAsequencingdataNatGenet43
491ndash498doi101038ng806
GoymerP2007SynonymousmutationsbreaktheirsilenceNatRevGenet8
92ndash92doi101038nrg2056
GumiennyTLBrugneraEToselloTrampontACKinchenJMHaneyLB
NishiwakiKWalkSFNemergutMEMacaraIGFrancisRSchedl
TQinYVanAelstLHengartnerMORavichandranKS2001CED
12ELMOaNovelMemberoftheCrkIIDock180RacPathwayIs
RequiredforPhagocytosisandCellMigrationCell10727ndash41
doi101016S00928674(01)005207
153
HanataniSIzumiyaYTakashioSKimuraYArakiSRokutandaTTsujita
KYamamotoETanakaTYamamuroMKojimaSTayamaS
KaikitaKHokimotoSOgawaH2014Circulatingthrombospondin2
reflectsdiseaseseverityandpredictsoutcomeofheartfailurewith
reducedejectionfractionCircJ78903ndash910
HiroseYChibaKKarasugiTNakajimaMKawaguchiYMikamiY
FuruichiTMioFMiyakeAMiyamotoTOzakiKTakahashiA
MizutaHKuboTKimuraTTanakaTToyamaYIkegawaS2008
AfunctionalpolymorphisminTHBS2thataffectsalternativesplicingand
MMPbindingisassociatedwithlumbardischerniationAmJHum
Genet821122ndash1129doi101016jajhg200803013
httpsgenomeucsceduindexhtml[WWWDocument]nd
JamrozikJFatehiJKistemakerGJSchaefferLR2005EstimatesofGenetic
ParametersforCanadianHolsteinFemaleReproductionTraitsJournalof
DairyScience882199ndash2208doi103168jdsS00220302(05)728952
JanisEWiggintonDJC2005AnoteonexacttestsofHardyWeinberg
equilibriumAmericanjournalofhumangenetics76887ndash93
doi101086429864
JiYRebertNAJoslinJMHigginsMJSchultzRANichollsRD2000
StructureoftheHighlyConservedHERC2GeneandofMultiplePartially
DuplicatedParalogsinHumanGenomeRes10319ndash329
KentWJSugnetCWFureyTSRoskinKMPringleTHZahlerAM
HausslerandD2002TheHumanGenomeBrowseratUCSCGenome
Res12996ndash1006doi101101gr229102
KumarPHenikoffSNgPC2009Predictingtheeffectsofcodingnon
synonymousvariantsonproteinfunctionusingtheSIFTalgorithmNat
Protoc41073ndash1081doi101038nprot200986
KyriakidesTRZhuYHSmithLTBainSDYangZLinMTDanielson
KGIozzoRVLaMarcaMMcKinneyCEGinnsEIBornsteinP
1998Micethatlackthrombospondin2displayconnectivetissue
abnormalitiesthatareassociatedwithdisorderedcollagenfibrillogenesis
anincreasedvasculardensityandableedingdiathesisJCellBiol140
419ndash430
154
LaneEACroweMABeltmanMEMoreSJ2013Theinfluenceofcowand
managementfactorsonreproductiveperformanceofIrishseasonal
calvingdairycowsAnimReprodSci14134ndash41
doi101016janireprosci201306019
LefeberDJdeBrouwerAPMMoravaERiemersmaMSchuurs
HoeijmakersJHMAbsmannerBVerrijpKvandenAkkerWMR
HuijbenKSteenbergenGvanReeuwijkJJozwiakAZuckerN
LorberALammensMKnopfCvanBokhovenHGruumlnewaldSLehle
LKapustaLMandelHWeversRA2011AutosomalRecessive
DilatedCardiomyopathyduetoDOLKMutationsResultsfromAbnormal
DystroglycanOMannosylationPLoSGenet7e1002427
doi101371journalpgen1002427
LiHDurbinR2009FastandaccurateshortreadalignmentwithBurrows
WheelertransformBioinformatics251754ndash1760
doi101093bioinformaticsbtp324
LiHHandsakerBWysokerAFennellTRuanJHomerNMarthG
AbecasisGDurbinR1000GenomeProjectDataProcessingSubgroup
2009TheSequenceAlignmentMapformatandSAMtoolsBioinformatics
252078ndash2079doi101093bioinformaticsbtp352
LiMXKwanJSHBaoSYYangWHoSLSongYQShamPC2013
PredictingMendelianDiseaseCausingNonSynonymousSingle
NucleotideVariantsinExomeSequencingStudiesPLoSGenet9
e1003143doi101371journalpgen1003143
MacArthurDGBalasubramanianSFrankishAHuangNMorrisJWalter
KJostinsLHabeggerLPickrellJKMontgomerySBAlbersCA
ZhangZConradDFLunterGZhengHAyubQDePristoMA
BanksEHuMHandsakerRERosenfeldJFromerMJinMMuXJ
KhuranaEYeKKayMSaundersGISunerMMHuntTBarnes
IHAAmidCCarvalhoSilvaDRBignellAHSnowCYngvadottirB
BumpsteadSCooperDNXueYRomeroIGWangJLiYGibbs
RAMcCarrollSADermitzakisETPritchardJKBarrettJCHarrow
JHurlesMEGersteinMBTylerSmithC2012Asystematicsurvey
155
oflossoffunctionvariantsinhumanproteincodinggenesScience335
823ndash828doi101126science1215040
MajewskiJSchwartzentruberJLalondeEMontpetitAJabadoN2011
WhatcanexomesequencingdoforyouJMedGenet48580ndash589
doi101136jmedgenet2011100223
McClureMCBickhartDNullDVanRadenPXuLWiggansGLiuG
SchroederSGlasscockJArmstrongJColeJBVanTassellCP
SonstegardTS2014BovineExomeSequenceAnalysisandTargeted
SNPGenotypingofRecessiveFertilityDefectsBH1HH2andHH3Reveal
aPutativeCausativeMutationinSMC2forHH3PLoSONE9e92769
doi101371journalpone0092769
McClureMKimEBickhartDNullDCooperTColeJWiggansGAjmone
MarsanPColliLSantusELiuGESchroederSMatukumalliLVan
TassellCSonstegardT2013FineMappingforWeaverSyndromein
BrownSwissCattleandtheIdentificationof41ConcordantMutations
acrossNRCAMPNPLA8andCTTNBP2PLoSONE8e59251
doi101371journalpone0059251
McKennaAHannaMBanksESivachenkoACibulskisKKernytskyA
GarimellaKAltshulerDGabrielSDalyMDePristoMA2010The
GenomeAnalysisToolkitAMapReduceframeworkforanalyzingnext
generationDNAsequencingdataGenomeRes201297ndash1303
doi101101gr107524110
MiHDongQMuruganujanAGaudetPLewisSThomasPD2010
PANTHERversion7improvedphylogenetictreesorthologsand
collaborationwiththeGeneOntologyConsortiumNuclAcidsRes38
D204ndashD210doi101093nargkp1019
MrodeRKearneyJFBiffaniSCoffeyMCanavesiF2009Short
communicationGeneticrelationshipsbetweentheHolsteincow
populationsofthreeEuropeandairycountriesJournalofDairyScience
925760ndash5764doi103168jds20081931
PelosoGMAuerPLBisJCVoormanAMorrisonACStitzielNOBrody
JAKhetarpalSACrosbyJRFornageMIsaacsAJakobsdottirJ
FeitosaMFDaviesGHuffmanJEManichaikulADavisBLohman
156
KJoonAYSmithAVGroveMLZanoniPRedonVDemissieS
LawsonKPetersUCarlsonCJacksonRDRyckmanKKMackey
RHRobinsonJGSiscovickDSSchreinerPJMychaleckyjJC
PankowJSHofmanAUitterlindenAGHarrisTBTaylorKD
StaffordJMReynoldsLMMarioniREDehghanAFrancoOHPatel
APLuYHindyGGottesmanOBottingerEPMelanderOOrho
MelanderMLoosRJFDugaSMerliniPAFarrallMGoelA
AsseltaRGirelliDMartinelliNShahSHKrausWELiMRader
DJReillyMPMcPhersonRWatkinsHArdissinoDZhangQWang
JTsaiMYTaylorHACorreaAGriswoldMELangeLAStarrJM
RudanIEiriksdottirGLaunerLJOrdovasJMLevyDChenYDI
ReinerAPHaywardCPolasekODearyIJBoreckiIBLiuY
GudnasonVWilsonJGvanDuijnCMKooperbergCRichSSPsaty
BMRotterJIOrsquoDonnellCJRiceKBoerwinkleEKathiresanS
CupplesLA2014AssociationofLowFrequencyandRareCoding
SequenceVariantswithBloodLipidsandCoronaryHeartDiseasein
56000WhitesandBlacksTheAmericanJournalofHumanGenetics94
223ndash232doi101016jajhg201401009
PryceJEVeerkampRFThompsonRHillWGSimmG1997Genetic
aspectsofcommonhealthdisordersandmeasuresoffertilityinHolstein
FriesiandairycattleAnimalScience65353ndash360
doi101017S1357729800008559
PryceJEWoolastonRBerryDPWallEWintersMButlerRShafferM
2014WorldTrendsinDairyCowFertilityin10ThWorldCongressof
GeneticsAppliedtoLivestockProduction
PuffenbergerEGJinksRNWangHXinBFiorentiniCShermanEA
DegrazioDShawCSougnezCCibulskisKGabrielSKelleyRI
MortonDHStraussKA2012Ahomozygousmissensemutationin
HERC2associatedwithglobaldevelopmentaldelayandautismspectrum
disorderHumMutat331639ndash1646doi101002humu22237
PurcellSNealeBToddBrownKThomasLFerreiraMARBenderD
MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKA
157
ToolSetforWholeGenomeAssociationandPopulationBasedLinkage
AnalysesAmJHumGenet81559ndash575
SahanaGNielsenUSAamandGPLundMSGuldbrandtsenB2013Novel
HarmfulRecessiveHaplotypesIdentifiedforFertilityTraitsinNordic
HolsteinCattlePLoSONE8e82909doi101371journalpone0082909
SchuumltzEScharfensteinMBrenigB2008Implicationofcomplexvertebral
malformationandbovineleukocyteadhesiondeficiencyDNAbased
testingondiseasefrequencyintheHolsteinpopulationJDairySci91
4854ndash4859doi103168jds20081154
SimmonsMJCrowJF1977MutationsAffectingFitnessinDrosophila
PopulationsAnnualReviewofGenetics1149ndash78
doi101146annurevge11120177000405
SoggiuAPirasCHusseinHACanioMDGaviraghiAGalliAUrbaniA
BonizziLRoncadaP2013UnravellingthebullfertilityproteomeMol
BioSyst91188ndash1195doi101039C3MB25494A
StitzielNOKiezunASunyaevS2011Computationalandstatistical
approachestoanalyzingvariantsidentifiedbyexomesequencing
GenomeBiology12227doi101186gb2011129227
SundaramSKHuqAMSunZYuWBennettLWilsonBJBehenME
ChuganiHT2011ExomesequencingofapedigreewithTourette
syndromeorchronicticdisorderAnnNeurol69901ndash904
doi101002ana22398
VanRadenPMOlsonKMNullDJHutchisonJL2011Harmfulrecessive
effectsonfertilitydetectedbyabsenceofhomozygoushaplotypesJournal
ofDairyScience946153ndash6161doi103168jds20114624
VilarintildeoGuumlellCRajputAMilnerwoodAJShahBSzuTuCTrinhJYuI
EncarnacionMMunsieLNTapiaLGustavssonEKChouP
TatarnikovIEvansDMPishottaFTVoltaMBeccanoKellyD
ThompsonCLinMKShermanHEHanHJGuentherBL
WassermanWWBernardVRossCJAppelCresswellSStoesslAJ
RobinsonCADicksonDWRossOAWszolekZKAaslyJOWuR
MHentatiFGibsonRAMcPhersonPSGirardMRajputMRajput
158
AHFarrerMJ2014DNAJC13mutationsinParkinsondiseaseHumMolGenet231794ndash1801doi101093hmgddt570
WangKLiMHakonarsonH2010ANNOVARfunctionalannotationofgeneticvariantsfromhighthroughputsequencingdataNuclAcidsRes38e164ndashe164doi101093nargkq603
WathesDCPollottGEJohnsonKFRichardsonHCookeJS2014HeiferfertilityandcarryoverconsequencesforlifetimeproductionindairyandbeefcattleAnimal8Suppl191ndash104doi101017S1751731114000755
YangYMuznyDMReidJGBainbridgeMNWillisAWardPABraxtonABeutenJXiaFNiuZHardisonMPersonRBekheirniaMRLeducMSKirbyAPhamPScullJWangMDingYPlonSELupskiJRBeaudetALGibbsRAEngCM2013ClinicalWhole
ExomeSequencingfortheDiagnosisofMendelianDisordersNewEnglandJournalofMedicine3691502ndash1511doi101056NEJMoa1306555
ZiminAVDelcherALFloreaLKelleyDRSchatzMCPuiuDHanrahanFPerteaGTassellCPVSonstegardTSMarccedilaisGRobertsMSubramanianPYorkeJASalzbergSL2009Awholegenome
assemblyofthedomesticcowBostaurusGenomeBiology10R42doi101186gb2009104r42
159
Supplementary)files)
YoucanfindChapter6supplementaryfilesat
- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)
- GoogleDrive(httptinyurlcomPhDMMChapter06)orscantheQRcode
)
Supplementary)Figure)S1)Coverage)analyses)
A)Averagecoverageperanimals
B)Boxplotofchromosomescoverageperanimal
C)BoxplotofanimalcoverageperchromosomeBluelinerepresenttheexpected
50Xcoverage
Supplementary)Figure)S2)Distribution)of)variant)sequencing)depth)
)
Supplementary) Figure) S3) Correlation) between) sequencing) depth) and)
number) of) discordant) genotypes) In) red) values) for) two) outlier) animals)
having)concordance)lt846))
)
Supplementary)Figure)S4)Multidimensional)scaling)of)Italian)Holstein)bulls)
sequenced))
)
Supplementary)Figure)S5)Multidimensional)scaling)of)Italian)Holstein)bulls)
analysed)with)the)800K)SNPchip))
160
Supplementary) Figure) S6) MAF) distribution) on) neutral) (blue)) and)
deleterious)(red))variants))
)
Supplementary) Table) S1) EBV) distribution) on) Italian) Holstein) population)
and)SNPchip)dataset)
Sheet1EBVthresholdvaluetodefineplusandminustailforPROT(kgprotein)
SCC(somaticcellcount)FERT(fertility)andLONG(functional longevity) All
ItalianHolsteinFAbullswereincludeinthisanalysis
Sheet2Numberofdatasetanimalsineachtailforeachtrait
Supplementary)Table)S2)Exome)probes)statistics)number)and)bp)covered)
per)chromosomes)
Supplementary) Table) S3) Concordance) between) animal) genotypes) in)
sequencing)and)SNPchip)data)
Error1 HD data homozygotes ndash Exome data heterozygotes Error2 HD data
heterozygotesndashExomedatahomozygotes
Supplementary) Table) S4) List) of) suggestive) genes) deriving) from) the)
analysis)of)exomes)of)animals) in)the)tails)of)male)and)female)fertility)EBV)
distributions))
Header(code
bull Bfreq_totFrequencyofBalleleinentiredataset
bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset
bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset
bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset
bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset
bull CallRate_totSNPcallratendashentiredataset
bull CallRate_MHSNPcallratendashhighmalefertilitydataset
bull CallRate_MLSNPcallratendashlowmalefertilitydataset
bull CallRate_FHSNPcallratendashhighfemalefertilitydataset
bull CallRate_FLSNPcallratendashhighmalefertilitydataset
bull HWeqHardyWeinbergequilibriumpvalue
bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele
bull Geno_HetNumberofanimalswithheterozygotegenotype
161
bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele
bull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)dataset
bull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)dataset
bull LOW_CRSNPcallratefromlowfertility(maleandfemale)dataset
bull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallele
bull LOW_Het Number of animals from low fertility (male and female) dataset with
heterozygotegenotype
bull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallele
bull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and
female)dataset
bull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)dataset
bull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)dataset
bull HIGH_CRSNPcallratefromhighfertility(maleandfemale)dataset
bull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallele
bull HIGH_Het Number of animals from high fertility (male and female) dataset with
heterozygotegenotype
bull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallele
bull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and
female)dataset
bull Male_TREND inheritance models implemented in PLINK trend test on male fertility
dataset
bull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale
fertilitydataset
bull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male
fertilitydataset
bull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility
dataset
bull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on
femalefertilitydataset
bull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale
fertilitydataset
bull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male
andfemalefertilitydataset
bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test
onmaleandfemalefertilitydataset
162
bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston
maleandfemalefertilitydataset
bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset
bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset
bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset
Supplementary)Table) S5) Regions) candidate) to) carry) deleterious) variants)
identified)by)OHW)approach))
Supplementary)Table) S6) Regions) candidate) to) carry) deleterious) variants)
identify)by)LOH)approach))
Ingreyregionsremovedinfurtheranalysis
Supplementary)Table)S7)List)of)candidate)deleterious)variants)mapping)in)
OHW)and)LOH)regions))Header(code
bull Bfreq_totFrequencyofBalleleinentiredataset
bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset
bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset
bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset
bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset
bull CallRate_totSNPcallratendashentiredataset
bull CallRate_MHSNPcallratendashhighmalefertilitydataset
bull CallRate_MLSNPcallratendashlowmalefertilitydataset
bull CallRate_FHSNPcallratendashhighfemalefertilitydataset
bull CallRate_FLSNPcallratendashhighmalefertilitydataset
bull HWeqHardyWeinbergequilibriumpvalue
bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele
bull Geno_HetNumberofanimalswithheterozygotegenotype
bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele
bull HW_insideDeleteriousmutationsinsideldquoOutofHardyWeinbergrdquoregion
bull HW_outsideDeleteriousmutationsoutsideldquoOutofHardyWeinbergrdquoregion
bull DH_insideDeleteriousmutationsinsideldquoLackofHomozygosityrdquoregion
bull DH_outsideDeleteriousmutationsoutsideldquoLackofHomozygosityrdquoregion
163
bull GeneClass Classifications of gene 1 deleterious mutations inside OHW and LOHregion2deleteriousmutationsinsideOHWorLOHregion
bull 2deleteriousmutationsoutsideOHWandorLOHregionbull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)datasetbull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)datasetbull LOW_CRSNPcallratefromlowfertility(maleandfemale)datasetbull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallelebull LOW_Het Number of animals from low fertility (male and female) dataset with
heterozygotegenotypebull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallelebull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and
female)datasetbull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)datasetbull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)datasetbull HIGH_CRSNPcallratefromhighfertility(maleandfemale)datasetbull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallelebull HIGH_Het Number of animals from high fertility (male and female) dataset with
heterozygotegenotypebull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallelebull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and
female)datasetbull Male_TREND inheritance models implemented in PLINK trend test on male fertility
datasetbull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale
fertilitydatasetbull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male
fertilitydatasetbull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility
datasetbull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on
femalefertilitydatasetbull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale
fertilitydatasetbull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male
andfemalefertilitydataset
164
bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test
onmaleandfemalefertilitydataset
bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston
maleandfemalefertilitydataset
bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset
bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset
bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset
Supplementary)Table)S8)Results)of)haplotype)analysis)))Header(code
bull Chromchromosome
bull PosphysicalpositiononUMD31genome
bull Exome_homonumberofhomozygotefordeleteriousmutationsequencedanimals
bull Exome_hetnumberofheterozygotesequencedanimalsfordeleteriousmutation
bull Outcome NotFound It was impossible to associated a haplotype to the mutation
NotOpt Non completely invariant haplotype could be associated to themutationOK
Putativehaplotype associate tomutationwas found In few case alternativehaplotype
wasreported
bull LenlengthofhaplotypeinnumberofSNPs
bull Pop_numnumbersofselectedhaplotypedetectedinHolsteinpopulation
bull Pop_homo numbers of homozygote animals in Holstein population for selected
haplotype
bull Pop_het numbers of heterozygotes animals in Holstein population for selected
haplotype
bull ExpExpectednumberofhomozygoteanimal
bull Alternative if only one heterozygote sequenced animals for deleteriousmutationwas
foundalltwopossiblehaplotypeswheretested
Supplementary)Table)S9)Descriptive)results)for)the)Biological)Process)(BP)))
NumberofgenesbelongingtoBiologicalProcessGOterms)
Supplementary) Table) S10) Descriptive) results) for) the)Molecular) Function)
(MF))
NumberofgenesbelongingtoMolecularFunctionGOterms
165
CHAPTER7
Conclusion
Since the domestication event occurred in the Fertile Crescent around 10000yearsagothebovinespecieswasselectedbyhumankindtofulfilhisneedsThecreationofbreedsoccurredaround200yearsagointensifiedthedifferentiationbetween populations and promoted the development of intensive selectionschemesTraditionalAandnowadaysgenomicAselectionsareproducingpositivegenetictrendsinmanyproductivetraitsdespitethestillpoorknowledgeaboutphenotype biology A deeper understanding of genome organization and genebiology would increase the accuracy of genomic evaluation by incorporatingpriorknowledgeThisthesisexploresthebovinegenomebyhighAthroughputtechnologiesAsuchasNextGenerationSequencingandHighADensityGenotypingAwithestablishedand innovative procedures to investigate the biology of complex traits and toprovidenewdataandmoleculartoolstobovinebreedingThese goals have been achieved by complementary approaches and differentmethodsInChapter2selectionsignatureswereinvestigatedindependentlyintofive Italian breeds using Illumina BovineSNP50 BeadChip Then amultiAbreedapproach was applied clustering breeds into dairy (Italian Holstein ItalianBrownand Italian Simmental) andbeef (Marchigiana Piedmontese and ItalianSimmental) groups Regions under recent positive selection shared by breedswithasameproductiveaptitudewereinvestigatedinfurtherdetailbyretrievinggenes in the region and analysing their function by ontology and pathwayanalysis In the dairy group selection signals appear nearby genes related tomammarygland(metabolismandresistancetomastitis)andfeedingadaptationIn thebeefgroupgenes inselectedregionsare involved inanimalgrowthandmeatquality(textureandjuiciness)Hencecandidategenesidentifiedbelongtonetworks and pathways important in directing a breed towards either beef ordairyproductionIn Chapter 3 a GWAS approach was used in dairy cattle to correlate SNP tocomplex phenotypes having an economic value Three independent ldquoclassicalrdquosingleAmarkerregressionswereruninthreeItaliandairycattle(ItalianHolsteinItalianBrownandItalianSimmental)UsingthetraditionalsingleSNPapproach
168
only few SNPs were above the nominal genomeAwide significance thresholdwhile applying a geneAbased method identified significant candidate genesassociatedwithmilkphysiology andmammary glanddevelopment in all threepopulationsInterestinglygenesfoundindifferentbreedshavesimilarfunctionbut are not the same suggesting that breed history demography selectioncriteriaandstochasticeventsasgeneticdrifthaveanimpactonthedefinitionofthemainmoleculartargetofselectionschemesCurrent intensive selection strategies rapidly spread the genetic gain in thepopulationbutholdtheassociatedriskofdisseminatingdefectivegenescausinggenetic defects (eg mulefoot complex vertebral malformation and others) ordecreasing fertility In Chapter 6 different technologies and approaches werecombined to obtain filter and prioritize genes candidate to be deleteriousExome sequence datawere exploited to identify deleterious variants affectingItalianHolsteincattleinasetof20animalsextremeforfertilitytraitsMorethan5000 candidate deleterious SNP variants were identified To bypass the smallnumberlimitationandthelackofpowerofanystatisticanalysison20animalsastepwise approachwas used to first filter and then prioritize these candidatedeleterious variants Polymorphisms were retained if having asymmetricdistribution in bulls from opposite tails of fertility EBV distributions andormapping in genomic regions outlier for lack of homozygosity in a 1009 bullpopulation genotyped with the Illumina BovineHD BeadChip A total of 191variants in163genespassed theprocessesThesewereassessedby searchingadditional evidences in favour or against their deleterious effect throughdifferentapproachesi)checkingthevariantconservationacross100vertebratespecies by comparative genomics ii) finding haplotypes associated withcandidatemutationsandevaluatingtheirfrequencyinHolsteinsiii)integratinghumanandmouse functionalannotationsA scorewasproposed toweight theresult of these assessments and rank the variants 35 of them had positivescores These occurred in genes involved in basic biological mechanisms asfertilitydevelopmentand immunityandresultedenriched ingenesassociatedtogeneticdefectsreportedinhumanThesemutationsarestrongcandidatestobe recessive deleterious variants worth to be further investigated in a larger
169
populationtobetterunderstandtheirbiologyimpactandpossibleapplicationinbreedingTo increase the reliability of results based on genomic data and to reduce theimpact of subjective choices in data analysis new toolsapproaches weredeveloped(iethemultiAbreedapproachinSelectionSignaturendashChapter2thepostAGWAS geneAbased approach ndash Chapters 3 and 4) and the existent onescriticallyevaluated(imputationmethodsndashChapter5)In Chapter 4 the postAGWAS geneAbased method used in Chapter 3 wasimplemented in the MUGBAS (MUlti species GeneABased Association Suite)software In contrast to existing software MUGBAS is species and annotationfree AmultipleAtesting correctionwas included in the package in addition tocomputing parallelization for faster processing and plot representation forbetterunderstandingThesoftwareusessingleAmarkerGWASdataandagivenannotation to estimate geneAregionAwise association pAvalues As previouslymentioned in Chapter 3 the ldquoclassicalrdquo GWAS approach (single SNP) foundsignals only in ItalianHolsteinwhileMUGBAS geneAbased approach identifiedsignificantsignalsinallbreedsinvestigatedIn Chapter 5 the impact of different bovine reference sequences on genotypeimputation was investigated In this work Illumina BovineSNP50 BeadChipgenotypesof ItalianSimmentalbullswerereAmappedonthethreemostrecentreferencegenomeassembliesWecomparedfourdifferent imputationmethodsfromlow(IlluminaBovineLDv10Beadchip)tohighdensityThetestpopulationwasdividedintofoursubpopulationsonthebasisofrelationshipandgenotypedataavailabilityUpdatingSNPcoordinatesonthe three testedcattlereferencegenome assemblies determined only a slight variation on imputation resultswithinmethodsOntheotherhandlargedifferencesintermsofaccuracywereobservedamongthefourimputationmethodstestedGenetic structure of industrial breeds was evaluated assessing LinkageDisequilibrium (LD) Since LD decay is correlated to intensity of selection weexpected different behaviours in the breeds analysed Indeed as evident inChapter 2 ndash Figure 7 breeds subjected to lower selection intensity (ie
170
MarchigianaPiedmonteseandItalianSimmental)displayalowerpersistenceof
LDatgreaterdistancescomparedtothosesubjectedtohigherselectionpressure
(ieItalianHolsteinandItalianBrown)Chapter3investigatesLDintheDGAT1
region of dairy cattle breeds In Italian Simmental LD is lower compared to
Italian Holstein and Italian Brown In spite of this significant association
betweentheDGAT1regionandmilktraitswasfoundbothinItalianSimmental
andinItalianHolsteinInItalianBrownnoassociationwasfoundbecauseDGAT1
isfixed
Data and results presented in this thesis represent a clear example of the
progress of molecular technologies occurred in last few years spanning from
mediumdensitygenotypingtonextgenerationsequencing
Thebiologylandscapesofsomecomplextraits(iemilkproductionandquality
fertility)were improved finding candidate genes andpathwaysnotpreviously
associated to the phenotype in the bovine species Moreover insights on
deleterious mutations were inferred through a deep and integrated analysis
usingNGSdata
This new knowledge could be used to improve cattle breeding For example
deleteriousrecessivevariantscouldbeincludedinbullevaluationsbybreedersrsquo
associations to reduce the genetic load of deleterious genes in parallel to
improvingproductivity
While technological progress has been impressivewe are still only scratching
thesurfaceofanimalbiologyNewfrontiersarebeingopenedbywholegenome
sequencing of many animals (eg in the 1000 bull genome project) and by
stepping beyond sequencing (a number of epigenomic projects in animals are
following the footprints of the human ENCODE project) The trend is towards
unravelling complexity and data production ability is to be flanked by data
storingandprocessingandaboveallbynewmethodsandtoolsfordataanalysis
171
ACKNOWLEDGMENTS
ldquoNon egrave la destinazione ma il viaggio che contardquo Questi tre anni sono stati un gran bel viaggio di crescita personale e ricchi diesperienze Non egrave stato un viaggio solitarioma condiviso con vecchie e nuoveconoscenzechemisembradoverosomenzionareInnanzitutto vorrei ringraziare il prof Paolo AjmoneMarsan chemi ha dato lapossibilitagrave di fare questa esperienza Grazie per gli insegnamenti e per leopportunitagravechemihaoffertoSperodinonavertraditolesueaspettativeAgli attuali colleghi Lorenzo Stefano Licia Elisa Riccardo ed Elia e a quellipassati Nicola Fatima Raffaele Ezequiel e Barbara va il mio piugrave sincero edaffettuoso ringraziamento per aver condiviso anche solo in parte questomiopercorso Grazie per avermi sopportato (e non egrave poco) guidato (non solo nelcampolavorativo)efattovivereinunmodospecialelrsquoambientedilavoroUnsincerograzievaatuttiimieicolleghidelXXVIIcicloAgrisystemRicorderogravesempretuttiimomentichehopotutocondividereconvoi UngrazieancheachihafattopartedellamiaesperienzasvedeseGraziedicuoreaCarlFlavioChristianEricaatuttiiragazziitalianiatuttiicolleghildquosvedesirdquoea tutte le altre persone che ho conosciuto Mi avete fatto vivere appienolrsquoesperienzaesteraefattotornareacasaconqualcheconsapevolezzainpiugrave UnsentitoringraziamentoatuttiglialtricolleghiconcuihocollaboratoinquestianniRingrazio tutti gli amici e conoscenti che mi sono stati vicino per avermisupportatoesopportatoognunoamodoproprio Ultimomanonperimportanza ilringraziamentochevaatuttalamiafamigliaper avermi aiutato spronato sostenuto sempre e in particolare in questaesperienzaSemplicementegrazie PS Adirlatuttaancheladestinazionenonegravepoicosigravemale
173
- INDEX
- CHAPTER 1 - Introduction
- CHAPTER 2 - Relative extended haplotype homozygosity (rEHH) signals across breeds reveal dairy and beef specific signatures of selection
- CHAPTER 3 - Searching new signals for productive traits in three Italian cattle breeds
- CHAPTER 4 - MUGBAS a species free gene based ased association suite
- CHAPTER 5 - Imputation accuracy is robust to cattle reference genome updates
- CHAPTER 6 - Quest for deleterious recessive variants in Italian Holstein bulls combining exome sequencing and high density SNP analysis
- CHAPTER 7 - Conclusion
- ACKNOWLEDGMENTS
-
INDEXamp
Chapteramp1ndashIntroduction 3
Chapteramp2Relativeextendedhaplotypehomozygosity(rEHH)signalsacrossbreedsrevealdairyandbeefspecificsignaturesofselection
11
Chapteramp3SearchingnewsignalsforproductivetraitsthroughgenebasedassociationanalysisinthreeItaliancattlebreeds
49
Chapteramp4ampMUGBASaspeciesfreegenebasedassociationsuite
81
Chapteramp5Imputationaccuracyisrobusttocattlereferencegenomeupdates
89
Chapteramp6QuestfordeleteriousrecessivevariantsinItalianHolsteinbullscombiningexomesequencingandhighdensitySNPanalysis
99
Chapteramp7ndashConclusion 167
amp
Acknowledgmentsampamp 173
CHAPTER1
Introduction
Background
The bovine species counts almost 15 billion of heads classified in some 800
breeds worldwide (httpfaostat3faoorg) The species is adapted to a wide
varietyofdifferentenvironmentandhusbandrysystemsandisoftenrearedfor
multiple purposes (Taberlet et al 2008) draft power milk meat leather
productionandotheruses(Elsiketal2009)
European cattle (Bos$ taurus) derive from a single domestication event of the
auroch (Bos$primigenius)occurred in theFertileCrescent around10000years
ago(Taberletetal2011)FollowingdomesticationcattlefollowedtheNeolithic
expansionofagricultureandcolonisedEurope(seeAjmoneMMarsanetal2010
for a review) Many thousand years of natural and artificial selection have
shaped cattle phenotype and genotype to meet environmental conditions and
human needs and together with demographic events and genetic drift have
started thedifferentiation of the bovine species in diverse and locally adapted
populations This differentiation process accelerated in the last two centuries
whenthebreedconceptwasinventedsothatnowthisspeciescountshundreds
of different breeds with high diversity in coat colour body size production
aptitudeadaptationtoclimate tolerancetodiseaseetc(Taberletetal2008
Elsiketal2009TheBovineHapMapConsortium2009)
Thisdiversityisnowbeingprogressivelylostwiththeextinctionofmanylocal
populationsmainlyforeconomicreasonsSelectioneffortshavethereforebeen
focussedonafewoutperformingindustrialbreedsthatarepresentlyselectedfor
specialiseddairyorbeefproductionandsometimesdoublepurpose
In these breeds breeding schemes are well developed and have been
traditionally based on the genetic evaluation of sires and dams using data
collectedfromperformanceandprogenytestingandunbiasedlinearprediction
modelsThesearefoundedontheFisherinfinitesimalmodellingofquantitative
trait variation and need no genomic information to estimate breeding values
(EBVs) The genetic progress recorded using this approach has been relevant
particularly for traits having medium to high heritability In Italian Holstein
genetic trends for themain production traits (milk protein and fat yield) are
positiveHoweveraccurateestimateofbreedingvalueshasahighcostinterms
4
of investment and time so that 5 to 6 years are needed to have a dairy bullprogenytested(wwwanafiit)Theadventofgenomicselection(Meuwissenetal2001)markedamilestoneindairycattlebreedingTheideawasdevelopedbeforethemoleculartoolswereinplaceforimplementitinpracticeandhasbecomearealityonlyrecentlyInthisapproach progeny tested bulls are genotyped at many thousand loci and thevalue of their EBVs used to estimate the value of each allele at each locusgenotyped These values are then used to predict the genetic value of younganimals on the basis of their genotype As a result animals from the newgeneration receive an EBV with the same accuracy of estimates obtained byprogenytestingbutmuchearlierThislowerthegenerationintervalandgreatlyincreasestherateofgeneticprogress indairypopulationsThe initial ideawasso good that nowadays genomic selection is widely applied in industrialcountriestothemajordairycattlebreedsWiththedecreaseofgenotypingcostit is likely that its use will be soon expanded to animals selected for otherpurposes (eg beef cattle) and other species (eg pigs) However genomicselection is an extension of traditional selection It is extremely useful inbreedingforcomplextraitsbutgivesnobiologicalinformationonthegenesthatcontrolthemIfthetraditionalselectionviewsasiregenomeasabigblackboxhaving a value but no clue why genomic selection views the same genomedisassembled inmany small boxes but still black and stillwithno clueon thereasonoftheirvalueIt is reasonable to think that a deeper understanding of the biologicalmechanisms controlling traits may further increase the accuracy of genomicevaluationsIndeedthenewtechnologiesoffernewopportunitiesinthissensebyloweringcostsandincreasingprecisionofdataproductionThis thesis is exploring the use of these technologies for retrieving biologicalinformationfromgenomicdataItusesmethodsnowconsideredldquotraditionalrdquoingenomics (GWAS and analysis of selection sweeps) and less traditional ones(gene centered GWAS and exome analysis) All data used here have beenproducedwithin twonationalprojectson farmanimalgenomics fundedby theItalianMinistryofAgriculture
5
M SELMOL ldquoRicerca e innovazione nelle attivitagrave di miglioramento genetico
animalemediante tecniche di geneticamolecolare per la competitivitagrave del
sistemazootecniconazionaleIrdquo(ResearchandInnovationinanimalbreeding
sheepgoatbuffalohorseanddonkeyI)
M INNOVAGENldquoRicercaeinnovazionenelleattivitagravedimiglioramentogenetico
animalemediante tecniche di geneticamolecolare per la competitivitagrave del
sistema zootecnico nazionale IIrdquo (Research and Innovation in animal
breedingsheepgoatbuffalohorseanddonkeyII)
In these projects the research group I work with M Istituto di Zootecnica at
Universitagrave Cattolica del Sacro Cuore di Piacenza M coordinated all the research
activitiesconcerningtheapplicationofgenomicstechniquesindairycattle
Contentsofthethesis
The thesis is structured into a short introduction (this first chapter) 5 core
chapters(Chapter2to6)andashortConclusion(Chapter7)Withtheexception
of Chapter 6 still in preparation the other core chapters containmanuscripts
that at time of writing are under review for their publication in international
peerMreviewedjournalsormanuscript
InChapter2selectionsignatureindairyandbeefcattleareinvestigatedwiththe
Illumina BovineSNP50 Genotyping BeadChip Five Italian cattle breeds (Italian
Holstein Italian Brown Italian Simmental Marchigiana and Piedmontese) are
analyzedwiththerEHHmethod(Sabetietal2002)tofindsignaturesofrecent
selectionCandidategenesinthevicinityofsignaturessharedbyeitherdairyor
beefbreeds are identified and theirpossible role indairy andbeefproduction
discussed
In Chapter 3 a GWAS analysis is run to investigate production traits in three
Italian dairy cattle breeds (Italian Holstein Italian Brown Italian Simmental)
Data are analysed with a well established method (Amin et al 2007)
6
implemented in theGenABELRpackage (Aulchenkoet al 2007) andwith the
newgeneMbasedmethodAssociationisinvestigatedbetween50KSNPsandmilk
production(milkyield)andqualitytraits(percentageofmilkproteinandfat)
InChapter4thegeneMbasedmethodusedinthepreviouschapter3inspiredby
theVEGASapproach(Liuetal2010)isdescribedThesoftwarewasdeveloped
andreMcodedinourlaboratoryinordertopermittheanalysisofanyspeciesin
additiontohumanMultiMtestingcorrectionandplotrepresentationcompletethe
toolsmadeavailable
InChapter5wetesttheimpactoftheuseofdifferentreferencesequencesinthe
imputation step Italian Simmental data are used to test variation in genomeM
wide inputation accuracy using different cattle reference sequences (BTAU42
UMD31 and BTAU46) To increase the analyses reliability four different
inputationsoftwaresaretested
InChapter6Holsteindataareanalysedcombiningexomesequencesand800K
SNPs data to seek deleterious recessive variantsWhile in natural populations
the destiny of these variants is to disappear in livestock breeds they can be
maintained and disseminated if carried by a highly productive sire The
intersection of deleterious mutations identified by sequencing and lack of
homozygousregionsidentifiedbytheanalysisofthepopulationwiththeHDSNP
chipidentifiedgenescandidatetocarryrecessivedeleteriousvariants
Objective
Themainobjectiveofthethesiswastoinvestigatethebovinegenomewiththe
aid new highMthroughput technology and to explore new strategies for linking
biology (genes and their function) to traits This linkage once validated may
increase the accuracy of genomic prediction and find application in selection
schemesegineradicatingthemostdeleteriousvariantsandorintheplanning
ofmatings
7
ReferencesAjmoneMMarsanPGarciaJFLenstraJA2010OntheoriginofcattleHow
aurochsbecamecattleandcolonizedtheworldEvolAnthropol19148ndash157doi101002evan20267
AminNvanDuijnCMAulchenkoYS2007AGenomicBackgroundBasedMethodforAssociationAnalysisinRelatedIndividualsPLoSONE2e1274doi101371journalpone0001274
AulchenkoYSKoningDMJdeHaleyC2007GenomewideRapidAssociationUsingMixedModelandRegressionAFastandSimpleMethodForGenomewidePedigreeMBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614
ElsikCGTellamRLWorleyKC2009TheGenomeSequenceofTaurineCattleAWindowtoRuminantBiologyandEvolutionScience324522ndash528doi101126science1169588
LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKMHaywardNKMontgomeryGWVisscherPMMartinNGMacgregorS2010AVersatileGeneMBasedTestforGenomeMwideAssociationStudiesTheAmericanJournalofHumanGenetics87139ndash145doi101016jajhg201006009
MeuwissenTHHayesBJGoddardME2001PredictionoftotalgeneticvalueusinggenomeMwidedensemarkermapsGenetics1571819ndash1829
SabetiPCReichDEHigginsJMLevineHZPRichterDJSchaffnerSFGabrielSBPlatkoJVPattersonNJMcDonaldGJAckermanHCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash837doi101038nature01140
TaberletPCoissacEPansuJPompanonF2011ConservationgeneticsofcattlesheepandgoatsCRBiol334247ndash254doi101016jcrvi201012007
8
TaberletPValentiniARezaeiHRNaderiSPompanonFNegriniR
AjmoneMMarsanP2008Arecattlesheepandgoatsendangered
speciesMolEcol17275ndash284doi101111j1365M294X200703475x
TheBovineHapMapConsortium2009GenomeMWideSurveyofSNPVariation
UncoverstheGeneticStructureofCattleBreedsScience324528ndash532
doi101126science1167936
9
CHAPTER2
Relative(extendedamphaplotype(homozygosity)(rEHH)ampsignalsampacrossampbreedsamprevealampdairyampand$beef$specificampsignaturesampofampselection
LBomba1ELNicolazzi2MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly2FondazioneParcoTecnologicoPadanoViaEinsteinLocCascinaCodazza26900LodiItalyTheseauthorscontributedequallytothiswork
____________________SubmittedtoGeneticSelectionEvolution
Abstract
Background
Nowadays many methods have been implemented to scan for signatures of
selection at genomescale level within and across breeds Extended haplotype
homozygosity is a proficient approach to detect genome regions under recent
selective pressure within breeds The objective of this study is to identify
common regions under positive recent selection shared by most represented
dairyandbeefItaliancattlebreeds
Results
A total of 3220 animals from Italian Holstein (2179) Italian Brown (775)
Simmental (493) Marchigiana (485) and Piedmontese (379) breeds were
genotyped with the Illumina BovineSNP50 BeadChip v1 in the frame of the
ItalianlivestockgenomicprojectsldquoSelMolrdquoandldquoProzoordquoAfterstandardcleaning
proceduresgenotypeswerephasedandcorehaplotypesidentifiedThedecayof
Linkage Disequilibrium (LD) for each core haplotype was assessedmeasuring
the extended haplotype homozygosity (EHH) To correct for the lack of good
estimatesof local recombinationrates therelativeEHH(rEHH)wascalculated
for each corehaplotypeGenomic regionsunder recentpositive selectionwere
identifiedasthosehighlyfrequentcorehaplotypeswithsignificantrEHHvalues
Significant independentcorehaplotypeswerealignedacrossbreeds to identify
signalsofrecentselectionsharedbydairyandsharedbybeefbreedsOverall82
and 87 common regions under selectionwere detected among dairy and beef
cattlebreedsrespectivelyBioinformaticsanalysisidentified244and232genes
mapping in these common genomic regions Pathway and network analysis of
significant genes revealed molecular functions biologically related to signal of
selectionsdetectedinmilkormeatproductiontypes
Conclusions
Resultssuggest that themultibreedapproach isamethodto identifyselection
signatures in cattle breeds under selection for the same production goal
12
allowing to better pinpoint the genomic regions of interest in dairy and beef
production
Background
The advent of genomic technologies and the consequent availability of single
nucleotidepolymorphism(SNP)datahaveenabledthestudyofselectionevents
on the cattle genome (Hayes et al 2008 Qanbari et al 2010) Such selection
signals arise from environmental or anthropogenic pressures and help to
understand the causes that led to breed formation These studies are usually
addressedinaldquotopdownrdquoapproach(ShahzadandLoor2012)fromgenotypeto
phenotypeinwhichgenomicdataisstatisticallyevaluatedtodetectdirectional
selectionThephenotypeneededtorunselectionsignaturesisintendedinavery
generalsenseabreedaproductionaptitudeorevenageographicareaoforigin
Thisapproachholdsthepotentialto investigatetraitsasadaptationtoextreme
climatesanddifferentfeedingandhusbandrysystemsresiliencetodiseasesetc
thatareveryexpensivedifficultandsometimesimpossibletostudywithclassic
GWAS (GenomeWide Association Study) approaches Therefore it is expected
thatgenomic informationretrievedinthesestudiesonlypartiallyoverlapsthat
identified by GWAS on specific traits Searching for these ldquosignatures of
selectionrdquo is of interest for understanding molecular mechanisms underlying
important biological processes and providing new insights to functional
biologicalinformation(egdiseaseandorimportanteconomictraits(Nielsenet
al 2007)) To date many methods have been implemented to scan these
selectionsignaturesatthegenomiclevelMethodsdifferwithrespecttoseveral
properties (Lenstra et al 2012) there are methods that perform analyses
withinbreedoracrossbreedsthatcomparealleleorhaplotypefrequenciesand
size that detect recent or ancient selection events either still segregating or
fixatedintopopulations(Lenstraetal2012OrsquoBrienetal2014Utsunomiyaet
al2013)Differentmethodshavedifferentsensitivityandrobustnesseg they
maybeinfluencedatadifferentextentbymarkerascertainmentbiasanduneven
distributionofrecombinationhotspotsalongthegenome
13
Extended haplotype homozygosity (EHH) is a method that has been used in
many animal species including cattle (Pan et al 2013 Qanbari et al 2010
Zhangetal2006)TheEHHmethodwasfirstdevelopedbySabetietal(2002)
for human population analysis The main objective of this methodology is to
identifylongrangehaplotypesinheritedwithoutrecombinationeventsMorein
detailunderaneutralevolutionmodelchangesinallelefrequenciesaredriven
onlybygeneticdriftInthisscenarioanewvariantwouldrequirealongtimeto
reach high frequency in the population and the Linkage Disequilibrium (LD)
surroundingitwoulddecayduetorecombinationevents(Kimura1984)Inthe
caseofpositiveselectionaquickriseinfrequencyofabeneficialmutationina
relatively short time will preserve the original haplotype structure as the
number of recombination events would be limited Therefore a signature of
positiveselectionisdefinedasaregionwithanalleleinlongrangeLDlocatedin
anuncommonlyhighfrequencyhaplotype
One of themost interesting features of thismethod is that it is able to detect
genomic regions that are under recent selection These recent selection
signatures can be therefore used to identify the genomic regions involved in
breedformationInadditionEHHmethoddoesnotrequireanydefinitionofan
ancestralallele(asneededintheintegratedhaplotypescoreiHS(Voightetal
2006) and is suited to be applied to SNP data because it is less sensitive to
ascertainmentbiasthanothermethods(Nielsenetal2007)HowevertheEHH
methodgeneratesa largenumberof falsepositiveandnegative resultsdue to
heterogeneous recombination rates along the genome (Qanbari et al 2010)
Moreover another drawback not specific of EHH but shared by all selection
signaturemethod is the lack of robust inferences able to discern true signals
fromthosegeneratedbychance(Kemperetal2014)
To partially account for these limitations Sabeti et al (2002) developed the
relative extended haplotype homozygosity (rEHH) including an empirical
methodologytoobtainsignificantsignalsTherEHHofaputativecorehaplotype
compares its original EHH valuewith that of other haplotypes at that specific
locus using all other haplotypes as control for local variation in the
recombination rate Therefore it identifies genomic regions carrying variants
14
under selection that are still segregating in the population analysed Although
thesemethodswere conceived for human populations theywere successfully
appliedto livestockspeciessuchaspig(Maetal2012)andcattle(Qanbariet
al2010)
After domestication about 10000 years ago in the fertile crescent taurine
cattle have colonized Europe and Africa and have been selected for different
human needs (Taberlet et al 2011) Particularly in the last centuries the
anthropic pressure led to the formation of hundreds of specialized breeds
adapted to different environmental conditions and linked to local tradition
constitutingagenepooldeservingattentionforconservation(AjmoneMarsanet
al2010)Someofthesebreedshavebeenselectedspecificallyfordairybeefor
bothproductiontypesfollowingstrongartificialselectionforthesetraits(Hayes
etal2009)Thepresentstudyaimsat identifyingsignalsof recentdirectional
selectionusingrEHHmethodindairyandbeefproductiontypesusingdatafrom
five Italian dairy beef and dual purpose cattle breeds We focused on the
significant core haplotypes scanned by rEHH that are shared among breeds of
thesameproductiontypeThenweusedabioinformaticsapproachto identify
positionalcandidategeneslyingwithinthegenomicregionsunderselectionand
investigatetheirbiologicalrole
Methods
SampleAnimalsandGenotyping
A total of 4311 bulls of five Italian dairy beef anddual purpose breedswere
genotyped with the Illumina BovineSNP50 BeadChip v1 (Illumina San Diego
CA)joiningthegenotypingeffortsoftwoItalianprojects(namelyldquoSelMolrdquoand
ldquoProzoordquo)Thedatasetincluded101replicatesand773siresonspairsusedfor
downstreamqualitycheckofdataproducedIndetailgenotypesof2954dairy
(2179 ItalianHolsteinand775 ItalianBrown)864beef (485Marchigianaand
379 Piedmontese) and 493 dual purpose (Italian Simmental) bulls were
availableDataqualitycontrol(QC)wasperformedintwostagesafirststageon
15
animals independently for each breed applying the same methods and
thresholdsandasecondstageappliedonmarkersacrossall theindividualsof
thedatasetThefirststageexcludedindividualswithunexpectedlyhigh(gt02)
Mendelian errors for fatherson couples andwith low call rates (lt95)The
secondstageexcludedi)SNPswithle25missingvaluesinthewholedataset
orcompletelymissing inonebreed ii)SNPswithminorallele frequencyle5
and iii) SNPs in sexual chromosomes and with unassigned chromosome or
physicalposition
EstimationofrEHH
HaplotypeswereobtainedusingfastPHASEsoftwareconsideringdefaultoptions
(ScheetandStephens2006)andwererunbreedwiseandchromosomewisein
eachbreedPedigreeinformationforallbullswereprovidedbytheirrespective
breederassociations andwereused to filteroutdirect relatives (in fatherson
pairsthesonwasmaintainedinthedatasetandthefatherdiscarded)andover
representedfamilies(amaximumof5randomlychosenindividualsperhalfsib
familywasallowed)Thefinaldatasetcontainingtheseldquolessrelatedrdquoanimalswill
behenceforthcalled ldquononredundantrdquodatasetThisnonredundantdatasetwas
used also to calculate within breed pairwise wholegenome Linkage
Disequilibrium(LD)Ther2statisticforallpairsofmarkerswasobtainedusing
PLINK software v107 (Purcell et al 2007)Thedecayof LDwas assessedby
averagingr2valuesupto1Mbdistance
To test if the population structure influenced the detection of rEHH we
replicated all analyses on the whole Italian Holstein dataset without any
population correction (ie ldquoredundantrdquo dataset) focusing on genes or gene
clustersthatarewellknowntobeunderrecentselectionincattle(ieldquocandidate
regionsrdquo) In particularwe focused on the casein polled gene cluster and coat
colourgenes(MC1RandKIT(Kemperetal2014Qanbarietal2010))
The EHH and rEHH calculations were performed using Sweep v11 software
(Sabetietal2002)Somedefaultprogramsettingshadtobemodifiedtoadapt
theseanalyses to thebovinegenomeas thesoftwarewasoriginallydeveloped
forhumangeneticanalysesSpecificallylocalrecombinationratesbetweenSNPs
16
wereapproximatedto1cMperMbEHHandrEHHcalculationswereperformed
breedwise and chromosomewise using an automatic core selection with
defaultoptions(ieconsideringlongestnonoverlappingcoresandlimitingcores
toatleast3andnomorethan20SNPs)asinQanbarietal(2010)AlthoughEHH
and rEHH values were obtained for all core haplotypes only those with
frequency ge 25 were retained for further analyses The (empirical) rEHH
significancethresholdwasobtainedbydividingtherEHHvaluesinto20binsof
5 frequency range each logtransforming withinbin values to achieve
normality Core haplotypes with pvalues lt 005 were considered significant
DetectionofcommonselectionsignatureswasobtainedcomparingrEHHvalues
acrossgenomicregionsacrossbreeds
Breedgroupingaccordingtoproductiontype
Toexamine thesignals common toeachof the twoproduction types (iedairy
andbeef) corehaplotypes thatsharedoneormoreSNP inat least twobreeds
from the same production type were selected The dual purpose Italian
Simmentalwas included inbothdairy (ItalianHolsteinand ItalianBrown)and
beef groups (Piedmontese and Marchigiana) since potentially it possesses
haplotypes selected for both production types All downstream analyses were
performedseparatelyforthedairyandthebeefproductiontypes
Detectionandannotationofcandidategenes
Genome annotation was performed using Biomart
(httpwwwbiomartorgindexhtml) a comprehensive source of gene
annotationformanyspeciesprovidedbyEnsembl(wwwensemblorg)Thelist
ofsharedhaplotypesregions(inbp)fordairyandbeefwasusedasinputfilein
theBiomartwebinterfaceThesetofgenesretrievedbyBiomartwasthenused
as input for theCanonicalpathwayandregulatorynetworkanalyses Ingenuity
PathwayAnalysis toolversion80(IPA IngenuityregSystems IncRedwoodCity
CA httpwwwingenuitycom) and a substantial examination of published
literaturewasusedtoexaminethefunctionalrelationshipsamongtheresulting
genes The IPA operates with a proprietary knowledge database providing
complementarypathwayanalysisforseveralspeciesincludingcattleIntheIPA
17
analyses the significance of each biological function identified was estimated
using Fisherrsquos exact test A Benjamini and Hochberg correction for multiple
testingwas calculated for function and canonical pathways For eachNetwork
IPAcomputedascore(pscore=log10(pvalue))fittingthesetstudiedgenesand
a list of biological functionspresent in the IPAknowledgedatabase The score
considered thenumberofgenes in thenetworkand thesizeof thenetwork to
estimatehowrelevantthisnetworkwasaccordingtothelistofgenesprovided
Thenthescorewasusedtorankthenetworks
Results
DatasetQC
Therepeatabilityobtainedfromthe101replicatespresentinthewholedataset
washigherthan998AfterthetwoQCstages105individualsand9730SNPs
were discarded Moreover after phasing 1292 individuals were removed to
reducethe largenumberofsibfamiliespresent intheredundantdatasetAfter
quality control procedures the final dataset had 44271 SNPs and 1132 514
393 410 and 364 individuals from Italian Holstein Italian Brown Italian
SimmentalMarchigianaandPiedmontesebullsrespectively(Table1)
18
Table1JNumberofanimalsgenotypedbeforeandafterQCTotalnumberofanimalgenotypedandrelativenumberofanimaldiscardedafterQCanalysis
Detectionofselectionsignatures
Sweepv11 softwaredetectedrEHHsignalswitha frequencyhigher than25overlapping the test candidate regions in both datasets corrected or not forpopulation structure In the redundant dataset we found only one significantrEHHsignal(caseincluster)whereasinthenonredundantdatasetweidentified5significantrEHHsignalsoverlappingallthetestcandidateregionsconsidered(caseinclusterpolledclusterMC1R)(Table2)
BreedTotal
Genotyped
ED15
misAN
ED1
REPL
ED1
MENDCleaned
HOL 2179 40 31 5 2093
BRW 775 6 16 4 749
SIM 493 6 6 2 479
MAR 485 37 38 410
PIE 379 5 10 364
19
Table2JComparisonofcandidateregionrEHHinnonJredundantandredundantdatasetrEHH overlapping candidate regions in both corrected (nonredundant) and not corrected
(redundant)datasetsforpopulation
Redundant NonRedundant
Candidate
geneBTA1
Closest
SNP(bp)CH2range CH2Frequency
rEHHJ
log(pval)CH2Frequency
rEHHJ
log(pval)
POLLED
Cluster1 1981154 18974181981154 H1079 093037 H1078 167174
MC1R 18 13657912 1331772014007505 H1069 120096 H1069 077167
KIT_BOVIN 6 72821175 7250492172821175 H1054 030030 H1054 033048
Casein
Cluster6 88427760 8835009888452835 H1047 094139 H1046 148133
Casein
Cluster6 88427760 8835009888452835 H2032 002003 H2033 002003
In total5526567847724388and4049corehaplotypeswitha frequency
higher than 25 were detected on Italian Holstein Italian Brown Italian
Simmental Marchigiana and Piedmontese breeds respectively A total of 838
866 740 692 and 613 core haplotypes were found significant outliers (see
Materials and Methods) in the aforementioned breeds Table 3 shows the
distribution of the total and significant core haplotypes per chromosome and
breed
20
Tableamp3amp(ampCoreamphaplotypesampdistributionampDistributionoftotalandsignificantcorehaplotypesperchromosomeandbreed
ampamp ItalianampampHolsteinamp
ItalianampampBrownamp
ItalianampSimmentalamp Marchigianaamp Piedmonteseamp
BTA1amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
1amp 389 66 389 68 335 49 310 51 288 46
2amp 312 47 314 52 267 44 257 42 238 33
3amp 292 48 313 62 264 46 236 34 227 34
4amp 268 46 279 50 245 41 237 39 229 39
5amp 223 32 231 29 200 30 183 27 171 22
6amp 291 51 307 40 265 45 252 38 233 42
7amp 249 37 253 46 214 39 195 33 204 27
8amp 268 39 270 38 235 35 223 32 209 31
9amp 227 37 228 41 210 37 165 22 187 30
10amp 219 36 234 34 192 25 196 26 168 25
11amp 248 39 246 34 209 30 205 33 187 25
12amp 162 24 176 18 148 21 148 25 135 22
13amp 205 32 200 19 155 18 178 35 135 19
14amp 175 17 189 25 168 29 143 22 129 19
15amp 180 31 185 29 148 23 139 13 130 19
16amp 176 28 192 29 160 25 149 19 127 15
17amp 170 27 183 29 168 31 135 22 123 16
18amp 133 18 153 19 119 14 108 10 82 12
19amp 137 22 130 23 118 16 99 18 92 15
20amp 184 35 172 33 145 23 124 23 124 17
21amp 145 23 147 30 113 18 110 21 94 13
22amp 140 24 145 19 115 21 112 18 104 13
23amp 106 13 100 16 67 10 64 9 57 8
24amp 129 11 140 20 124 17 97 23 101 16
25amp 91 12 98 11 68 10 77 12 70 6
26amp 125 7 114 14 93 12 96 15 90 16
27amp 91 12 91 15 75 11 84 12 60 10
28amp 88 9 96 10 70 10 66 9 55 11
29amp 103 15 103 13 82 10 72 9 67 12
TOTamp 5526 838 5678 866 4772 740 4388 692 4049 613
1BostaurusAutosomes2Detectedcorehaplotypeswithafrequencyinthebreedhigherthan253Significantcorehaplotypesatplt005
Comparisonamptoamppreviouslyampreportedampdataamp
AnumberofstudieshavesearchedselectionsweepsinHolstein(Qanbarietal
2010) Brown (Qanbari et al 2011) and Simmental (Fan et al 2014) breeds
Since different methods are expected to identify signatures having different
characteristicsthecomparisonwithpreviousstudiesislimitedtothosestudies
using thesamemethodand thesamebreed(s)analysed inourstudyOnlyone
study analysed (German) HolsteinRFriesian cattle with rEHH (Qanbari et al
2010)Qanbarietal(2010)reportedcandidategenesandgeneclustersunder
recent positive selection that we compared with our rEHH in dairy breeds
dataset(Table4)
Table4JCorehaplotypesincandidategenesfollowingQanbarietal2011
1BosTaurusChromosome2Corehaplotype
Thenumberofcorehaplotypesfoundinourstudyinthecandidateregionswas
lowerpreviouslyreportedWhenconsideringHolsteinsthetwomostsignificant
candidateregionsinbothstudiesmatch(caseinclusterandthesomatostatinSST
gene)althoughitisimpossibletodetermineifthehaplotypeunderselectionis
the same as this information was not provided in Qanbari et al (2010)
However other genes considered significant in Qanbari et al (with pvalues
ranging from 004 to 010) were not significant in our studyWhen using the
same loose significant threshold (pvalues le 010) ldquosignificantldquo signals were
foundalsoinItalianBrownfortheCaseinCluster(log10pvalue=109)andin
ItalianSimmentalfortheSSTgene(log10pvalue=126)
HOLSTEIN BROWN SIMMENTAL
Candidate
geneBTA1
Closest
SNP
(bp)
CH2rangeCH2
Frequency
rEHH
Hlog(p)CH2range
CH2
Frequency
rEHH
Hlog(p)CH2range
CH2
Frequency
rEHH
Hlog(p)
DGAT1 14 444963236653443936
H1057 011443936763332
H1042 0130007
Casein
Cluster6 88391612
8835009888452835
H1046 1481338832601288452835
H2068 0801098835009888452835
H1044 037022
8835009888452835
H2033 018030
8835009888452835
H3034 022045
GH 19 49652377 GHR 20 33908597
SST 1 813769568128358581376961
H1036 200189 8131845181376961
H1042 126053
8128358581376961
H2029 00630084 8131845181376961
H2042 006027
IGFH1 5 71169823
ABCG2 6 37374911 3731702038256889
H1044 031027
3731702038256889
H1040 029031
Leptin 4 95715500
LPR 3 855692038549710885594551
H1047 0910728549710885794693
H1068 0800638549710885794693
H1063 002005
8549710885594551
H2041 027025
PITH1 1 35756434
22
Signaturessharedbydairyandbybeefbreeds
Significantcorehaplotypeswerealignedacrossbreedsto identifythoseshared
among dairy or beef production types Since breeds can be considered
independentsetsofobservationssharedsignaturesaremorelikelytorepresent
real effects rather than false positives A total of 123 and 142 significant core
haplotypesweresharedbetweenatleasttwodairyorbeefbreedsrespectively
(File S1) Only 82 and 87 of those significant core haplotypes contained genes
considered positional candidates under positive selection In total 22 and
17 of the genomewas under selection for dairy and beef production types
respectivelyTherEHHaveragelengthswere216932and190994bpfordairy
andbeefrespectively
Genesetannotationandnetworkpathwayanalysis
Atotalof244and232annotatedgenesfellwithintheselectedregionsfordairy
andbeefproductiontypesrespectively(Table5)
23
Table5JStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreedsStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreeds
DairyCattle BeefCattle
BTA SelSig
n
Genes1 Sum
SelSignsize
(bp)
Mean
SelSign
size(bp)
SelSi
gn
Genes1 Sum
SelSignsize
(bp)
Mean
SelSign
size(bp)
1 7 11 1521608 138328 7 13 2263213 174093
2 5 9 2216685 246298 8 11 2658549 2416863 2 5 1619336 323867 7 21 3755847 1788504 7 21 5956755 283655 6 11 1761288 1601175 4 17 4506119 265066 5 11 1251127 1137396 2 4 1467738 366934 5 12 5149692 4291417 6 30 4842969 161432 3 28 11889191 4246148 0 0 0 0 4 12 3512215 2926859 4 7 1554863 222123 2 6 1106768 18446110 4 17 8839762 519986 2 3 573138 19104611 6 8 1841802 230225 1 1 100439 10043912 2 8 4244133 530517 1 1 536461 53646113 3 8 1933026 241628 2 8 985376 12317214 0 0 0 0 3 9 692397 7693315 5 9 1415405 157267 2 2 189094 9454716 3 6 1302506 217084 4 6 905719 15095317 3 4 471046 117762 2 9 1551010 17233418 0 0 0 0 3 23 2961718 12877019 3 23 5744143 249745 2 2 116938 5846920 1 2 148886 74443 2 4 627971 15699321 3 7 1930206 275744 1 5 1150760 23015222 3 5 620674 124135 3 10 2411751 24117523 0 0 0 0 1 1 68421 6842124 2 18 9166004 509222 2 4 803770 20094225 2 11 2016602 183327 1 3 329199 10973326 2 4 466134 116534 4 7 776080 11086927 1 1 631792 631792 2 3 1004717 33490628 0 0 0 0 1 1 93464 9346429 2 9 935209 103912 1 5 798300 159660TOT 82 244 65393403 216932 87 232 50024613 190994
1GenesidentifiedGenesintheregionssharedbyallthreedairybreeds(8genes)orallthreebeefbreeds(11genes)foramoredetailedanalysiswereselected(Fig1)
24
Figure1JGenomiclocationoftheselectionsignaturessharedamongstudiedbreedsa)GenesinensembletracksaredisplayedasaredboxcorehaplotypesandSNPsarecolouredinorange(MarchigianaMAR)inviolet(PiedmontesePIE)andpink(SimmentalSIM)b)Genesinensemble tracks are displayed as a red box core haplotypes and SNPs are coloured in blue(HolsteinHOL)ingreen(ItalianBrownBRW)andpink(SimmentalSIM)
All the genes identified were submitted to downstream pathwaynetwork
analyses(Table5Figure1FileS1)Interestinglygenesintheregionssharedby
allthreedairyorbeefbreedsweremanifestedinthemostlikelynetworksforthe
two production aptitudes The IPA analysis detected a total of 12 and 15
networksindairyandbeefbreedsrespectively
Indairybreeds network scores ranged from46 to2 (File S2) Eightnetworks
obtainedscoreshigherthan20andincludedmorethan14moleculeseachThey
wereassociatedtoi)geneexpressionRNAdamageandrepairproteinsynthesis
andposttranslationalmodificationii)cellassemblyorganizationfunctionand
maintenance iii) cell cycle proliferation and cancer iv) carbohydrate
metabolism v) gastrointestinal and neurological disease developmental and
endocrine system disorders A total of 23moleculeswere associatedwith the
highest ranked network (score = 46) In this the immunoglobulins mitogen
25
activatedproteinkinasesandNFkBcomplexesformacentralhublinkinggenes
involved in celltocell signaling and interaction hematological system
developmentandfunctionandimmunecelltraffickingfunctions(Figure2)
Figure2JThemostlikelyIPAnetworkindairycattlebreedsThefunctionsassociatedtothenetworkarecelltocellsignallingandinteractionhematologicalsystem development and function immune cell trafficking Nodes in red correspond to genesidentifiedincorehaplotypesoverlappinginallthethreebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds
Breast cancer antiNestrogen resistance gene 3 (BARC3) andpituitary glutaminyl
cyclase gene (QPCT) genes were common to all three dairy breeds and are
directlyconnectedwithmammaryglandmetabolisms(Ezuraetal2004GSun
et al 2012) The solute carrier family 2 member 5 (SLC2A5) gene facilitates
glucosefructose transport (Zhao et al 1998) and the zetaNchain (TCR)
26
associated protein kinase 70kDa (ZAP70) gene plays a critical role in Tcell
signaling (Bonnefont et al 2011) Calpain is another important complex that
togetherwithcalpainN3(CAPN3)mediatesepithelialcelldeathduringmammary
gland involution (Wilde et al 1997) Furthermore RAS guanyl nucleotide
releasingprotein(RASGRP1)activatestheErkMAPkinasecascaderegulatesT
cellsandBcellsdevelopmenthomeostasisanddifferentiationandisinvolvedin
the regulation of breast cancer cell (Madani et al 2004 Yasuda et al 2007
Wickramasingheetal2009)Genesinvolvedintheothernetworksarelistedin
FileS2
The scoreof thebeef cattlebreedsnetwork rangedbetween46and2 and the
numberofmoleculespresentineachnetworkrangedfrom23to1(FileS3)Asin
dairy breeds eight networks obtained a score higher than 20 and 13 ormore
molecules each These networkswere associated to i) carbohydrate lipid and
nucleic acidsmetabolism energyproduction and smallmoleculebiochemistry
ii) cardiovascular system development and function iii) cellular and tissue
developmentcellassemblyandorganizationfunctionandmaintenanceandcell
and organmorphology iv) cell cycle celltocell signaling and interaction v)
DNA replication recombination and repair and RNA posttranscriptional
modification vi) dermatological diseases and conditions infectious disease
organismal injury and abnormalities In the network with the highest score
(score=46)chondroitinsulfateproteoglycan4(CSPG4)andsnurportinN1(SNUPN)
genes were shared among all beef cattle breeds investigated CSPG4 gene is
relatedtomeattendernesswhileSNUPN isanimprintedgeneimportantinthe
embryodevelopmentandinvolvedinhumanmuscleatrophy(Narayananetal
2002) (Figure 3) All the genes included in this network were involved in
carbohydratemetabolismsmallmoleculesbiochemistryandcellcycleTheother
networkswere related to important signaling andmetabolic process essential
foranimalgrowth(FileS3)
27
Figure3JThemostlikelyIPAnetworkinbeefcattlebreedsThe functions associated to the network are carbohydrate metabolism small molecule
biochemistry and cell cycle Nodes in red correspond to genes identified in core haplotypes
overlapping in all the three breeds of each production type whereas those in green depict
overlappingcorehaplotypesinatleasttwoofthosebreeds
A total of 6 and 9 statistically significant Canonical Pathways (FDR le 005
log10(FDR) ge 13) were identified using IPA for dairy and beef breeds
respectively(Figure4)
28
Figure4JBarplotofstatisticallysignificantCanonicalPathwaysPvalues were corrected for multiple testing using BenjaminiHochberg method and arepresented in thegraphas log(pvalue)Thebar represents thepercentageof genes in a givenpathway that meet cutoff criteria within the total number of molecules that belong to thefunctiona)BarplotofstatisticallysignificantCanonicalPathways indairycattlebreedsb)BarplotofstatisticallysignificantCanonicalPathwaysinbeefcattlebreeds
Themost significant Canonical Pathway in dairywas for purinemetabolism (
log10(FDR) = 26) supporting the highly synthetic processes in the mammary
epithelium(Figure5)
29
Figure5JGenesdetectedunderrecentpositiveselectionindairycattleinvolvedinpurinemetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds
Inbeefproductiontheephrinreceptorsignal(log10(FDR)=27)wasthemost
significant Canonical Pathway (Figure 6) Among other functions ephrin is
knowntopromotemuscleprogenitorcellmigrationbeforemitoticactivation(Li
andJohnson2013)AlltheotherCanonicalPathwaysarereportedinFileS4
30
Figure6JGenesdetectedunderrecentpositiveselectioninbeefcattleinvolvedinEphrinmetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds
Discussion
In this work more than 4000 genotyped bulls of five dairy beef and dualpurposeItalianbreedswerescreenedsearchingforselectionsignaturesindairyand beef production types A strict data quality controlwas applied to reducepossiblesourcesofbiassuchastechnicalgenotypingproblemsandpopulationstructurethatcouldhavelargeimpactonrEHHresultsInfactsincetheimpactofpopulationstructureonrEHHvaluesisstillunexploredwereplicatedpartofouranalyseswithandwithoutcloserelativesinanattempttostudytheeffectofpopulationstructureonthisselectionsignaturemethodInprinciplepopulationstratificationshouldleadtoanoverrepresentationofcertainhaplotypessimplyduetothepresenceoftightpedigreelinks(egsirespassinghalfoftheirgeneticmaterialtotheirsons)ratherthanactualselectionprocessesForthisreasonfor
31
the full analyses all siresons couples were removed after haplotype phasing(retaining only the younger animal) and halfsib familieswere restricted to amaximum of five randomly chosen individuals in an attempt to reduce overrepresentation This assessment was restricted to four regions known to beunderselectioninItalianHolsteinasthecaseinclusterthepolledgeneclusterMC1R and KIT Italian Holstein was selected for two reasons i) it is a highlystructuredbreedaccordingtoourdata(abouthalfofindividualsremovedwereclose relatives) ii) allowed us to compare our results with a previous studyAlthoughtheanalysesconductedonbothredundantandnonredundantdatasetswereable to identify rEHHsignals in these regions thenonredundantdatasetproduced five significant rEHH signals rather than just one in the redundantdataset(Table2)Thisresulthighlightstheconfoundingeffectofthepresenceofcloserelativesinthedatasetandconsequentlytheimprovedabilitytodetectedsignificant selection signature while correcting for population structureInterestingly our results only partially overlap those found by Quanbari et al(2010) These authors detected significant signals at the casein cluster andsomatostatinSSTaswedid but also at ahighernumberof candidate regionsThese inconsistencies may be the effect of different sires included in theanalyses of the different dataset size and to the close relative trimmingprocedure we adopted to decrease the effect of population structure andconsequent bias (Table 4) However poor correlation across investigations isoftenobservedalsointhedeeplyinvestigatedhumanspeciesThisisduetotheuse of different within and between breed statistics that identify selectionsignatureshavingdifferent characteristics (ancientrecent segregatingfixatedunder directionalbalancing selection) to the likely high rate of falsepositivesnegativesresultsandalsotothedifferentconsiderationofpopulationstructureandbackgroundselection(Enardetal2014)The number of total and significant core haplotypes identified by the SweepsoftwarewasthehighestinItalianBrownandthelowestinPiedmontesebreedSinceEHHisheavilybasedonpopulationLDtheaverageLDleveloverdistancewascalculatedineachbreed(Figure7)
32
Figure7JMultiJbreedaveragelinkagedisequilibriumagainstphysicaldistance(inkb)Marchigiana Piedmontese and Italian Simmental breeds present a lower persistence of LD allargerdistancethanItalianHolsteinandItalianBrownbreedsThisanalysishighlightedageneralpositivecorrelationbetweenthe levelofLDover distance and the number of total and significant core haplotypes foundHoweverconsideringthatrEHHisarelativemeasure it ismorelikelythatthehighernumberofsignificantcorehaplotypesidentifiedindairybreedswasduemostlytoahigherselectivepressureindairycomparedtobeefbreedsSignificance tests used in detecting selection signatures should measure theprobability of a statistic to be anoutlier compared to its expecteddistributionunder a neutral model However no reliable neutral model has so far beendevelopedincattlebecauseofthecomplexityofthedemographichistoryofthisspecies (Kemper et al 2014) As a consequence empiric rather than modelbasedsignificancetestaregenerallyusedinthedetectionofselectionsignaturesAccordinglyweconsideredasoutliersvaluesas thosesimply falling in the5plusvarianttailoftherEHHdistributionWekeptthewithinbreedsignificancethresholdloosenoncorrectingitformultipletestingbutconsideredonlysignals
33
eitherexceedingthisthresholdbyatleasttwoordersofmagnitudeorsharedby
twoormoreindependentbreeds
Parallel comparison of results from independent analyses of different breeds
allowed the achievement of a double objective i) to identify putative regions
under(recent)selectioninbreedsbelongingtotwoproductiontypeswhichwas
the main objective of this study and ii) reducing false positives since multi
breed analyses served as internal control Since the rEHH method does not
considerphenotypicinformationasignificantsignalmightarisebecausei)the
core haplotype is actually under a selective pressure ii) the result is a false
positive ie causedby chance population structureor anyotherdriving force
However even considering an unrealistic scenario with no false positives a
proportionofthetotalamountofsignalswouldactuallybeselectionsignatures
due to a selectionpressurenot ondairy or beef traits This becausedairy and
beefbreedsinvestigatedareonlyafewandshareanumberoftraitsnotdirectly
related to dairybeef production Even considering this limitation to our
knowledge this is the first dairy and beef multibreed study applying this
strategytoreducefalsepositivesatthecostofpossiblelossofinformationdue
tohighfalsenegativeratesSignificantsignalsharedbydairyandbybeefbreeds
wereused fordownstreamnetworkanalysesonpositional candidategenes to
search forabiological justification towhatwasobtainedstudying thegenomic
information Only the top network for dairy and the top network for beef
productionarediscussedindetail
Dairybreeds
ThemostsignificantnetworkdetectedindairybreedsisassociatedwithCellTo
CellSignalingandInteractionHematologicalSystemDevelopmentandFunction
andImmuneCellTraffickingfunctions(Figure2)ImmunoglobulinandNFkBact
aspivotalgeneswithinthisnetworkInterestinglyBARC3andQPCTgeneswere
sharedamongallthethreedairybreedsBothgeneshavenotyetbeenstudiedin
cattlealthoughinhumanstudiesthesegenesarewellknowntobelinkedtothe
mammaryglandmetobolismandthecalciumregulationTheformerisinvolved
in integrinmediated cell adhesions and signaling which is required for
mammary gland development and functions (G Sun et al 2012) The latter is
34
associatedwithlowradialbodymineraldensity(BMD)inadultwomen(Ezuraet
al2004)AnotherimportantplayeronthisnetworkisSLC2A5genealsoknown
asGLUT5 SLC2A5 gene acts as fructose transporter in the intestine and has a
significantroleinenergybalanceofdairycows(Burantetal1992)Findingthis
gene in adairy cattle specificnetwork seems surprising since there shouldbe
little need for transporting glucose andor fructose in the ruminant intestinal
tractassimplecarbohydratesaredegradedintovolatilefattyacids(VFA)inthe
rumen (Iqbal et al 2009) However it is often the case that large amount of
starchbypasstherumenincowsfedwithdietsrichincerealgrains(Zhaoetal
1998)Thisbypassedstarchneedstobedigestedinthesmallintestinetoavoid
highlevelsofglucoseconcentrationinthelargeintestine
WedetectedcalpaincomplexandcalpainN3(CAPN3)genesunderpositiverecent
selection in the dairy group as reported also by Utsunomiya et al (2013)
AltoughCalpainisknowntobeinvolvedinpostmortemmeattendernizationitis
also related to dairy metabolism as the muscle breakdown promoted by the
calpain gene provides an energy source for milk production especially at the
beginningoflactation(Kuhlaetal2011)InadditionasreportedbyWildeetal
(1997) calpainNcalpastatin system is related to programmed cell death of
alveolar secretory epithelial cells during lactation Zap70 gene is a cytoplasmic
proteintyrosinekinaseItisrelatedtotheimmunesystemasacomponentofthe
Tcell receptor playing a central role in Tcell responses (Wang et al 2010)
Bonnefontet al (2011) found thatZap70 genewasup regulated (FoldChange
(FC)=82FDR=277E12)onmilksomaticcellsofsheepinfectedbySaureus
andSepidermidisshowinganassociationwithmastitisresistance
Purinemetabolismwasthemostsignificantcanonicalpathwayindairybreeds
In a study conducted in human mammary epthelium Maningat et al (2008)
conductedageneexpressionanalysisonbreastmilkfatglobulesidentifyingthe
PurinemetabolismasthemostsignificantpathwaySynthesisandbreakdownof
purine is essential in the metabolism of the tissues of many organisms
particularlyofthemammaryglandduringlactation
Another interesting canonical pathway was endothelin signaling (Figure 5)
endothelin functions as vasocontrictor and is secreted by endothelial cells
35
(Nussdorferetal1999)Acostaetal(1999)reportedthatincattleendothelinsareinvolvedinthefollicularproductionofprostaglandinsandtheregulationofsteroidogenesis in the mature follicle In a recent study Puglisi et al (2013)confirmed the implication of endothelin in particular EDNRA and endothelinconvertin enzyme 1 in reproductive disorder in bovine They classified theEDNRAaspotentialbiomarkerforfertilityincow
Beefbreeds
Inbeefcattlebreedsthefocaladhesionkinase(FAK)togetherwithintegrinandERK12genesplayacentralroleashubsinthemostsignificantnetwork(Figure3)Theyareinvolvedincellularadhesionandcellmotilityprocessesparticularlyduringskeletalmyogenesis(Nguyenetal2014)Moreovertheoverexpressionof FAK can determine a reduced apoptosis and an increase risk of metastaticcancers(MehlenandPuisieux2006)Thesehubsarenotdirectlyinvolvedinourstudybut sharemany interactors thatare located in region found tobeunderselction such as CSPG4 RB1CC1 MGAT3 CIRPB CSPG4 gene belongs tochondroitin sulfate proteoglycans (CSPGs) family CSPGs are proteoglycansconsistingofaproteincoreandachondroitinsulfatesidechainTheyareknownto be structural components of a variety of tissues includingmuscle andplaykey roles inneuraldevelopmentandglial scar formationTheyare involved incellular processes such as cell adhesion cell growth receptor binding cellmigrationandinteractionswithotherextracellularmatrixconstituents
Manystudiesreporttheroleofproteoglycansinmeattextureofseveralmusclesfromthebovinespecies(Nishimuraetal1996)Dubostetal(2013)highlightadirect involvement of proteoglycans with respect to cooked meat juicinessRB1CC1geneplaysacrucialroleinmusculardifferentiationanditsactivationisa essential for myogenic differentiation (Watanabe et al 2005 p 1)Monoacylglycerol acyltransferase (MGAT3) gene cause the synthesis ofdiacylglycerol (DAG)using2monoacylglyceroland fattyacyl coenzymeAThisenzymaticreactionis fundamental fortheabsorptionofdietaryfat inthesmallintestineSunetal(2012p3)reportedinastudyonfiveChinesecattlebreedsthatMGAT3gene isassociatedwithgrowthtraitsCIRPBgenemaybepartofacompensatorymechanism inmuscles undergoing atrophy It preservesmuscle
36
tissuemassduringcoldshockresponsesaginganddisuse(DupontVersteegden
et al 2008) SNUPN is an imprinted gene and is expressed monoallelically
dependingonitsparentaloriginSNUPNplayimportantrolesinembryosurvival
andpostnatal growth regulation (Smithet al 2012Wanget al 2012)Ephrin
receptorsignalingwasthetopcanonicalpathwayproducebyIPAandhasalsoan
interestingbiologicalinterpretationformeatproduction(Figure6)Indeedthis
pathaway is important for the muscle tissue growth and regeneration
participating in correct positioning and formation of the neural muscular
junction(LiandJohnson2013)
Conclusions
In this study we analysed the selection signatures at genomewide level in 5
Italian cattle breeds Then we used a multibreed approach to identify the
genomicregionssharedamongcattlebreedsselectedfordairyorbeefpurpose
Thisapproachincreasedthepowertopinpointregionsofthegenomethatplay
important roles in economically relevant traits in both dairy and beef cattle
Moreover network and pathways analyses were used to display a detailed
functional characterization of genes mapping in the regions detected under
recentpositiveselection
Particularly in dairy cattle genes under directional selection are related to
feeding adaptation (increasing level of starch in the diet) mammary gland
metabolismandresistencetomastitisThebiologicalfeaturesunderselectionin
beef cattle breeds are involved in animal growth meat texture and juiceness
These results have a logical biological explanation However the frequent
approach of interpreting selection signatures on the basis of the function of
geneslocatednearbythepeaksignalofthestatisticusedispresentlychallenged
Novelinformationinhumanssuggeststhatmostselectedvariationisnotlocated
within genes and coding regions but in regulatory sites identified by the
ENCODEproject(Enardetal2014)Thesemaycontroltheexpressionofentire
genomicregionsorofgeneslocatedatarelavantdistancefromtheselectedsite
makingbiologicalinterpretationmorecomplex
37
InanycasefurtherstudiesusingdenserSNPchipsorwholegenomesequencing
producinginformationnotsubjectedtoascertainmentbias(Qanbarietal2014)
may increase the resolution of our analysis and togetherwith new knowledge
arisingonthecontrolofgeneexpressionvalidateorcorrectourresults
Acknowledgements
ANAFI (Italian Association of Holstein Friesian Breeders) ANAPRI (Italian
Association Simmental Breeders) ANABORAPI (National Association of
PiedmonteseBreeders)ANABIC(AssociationofItalianBeefCattleBreeders)and
ANARB (Italian Brown Cattle Breeders Association) are acknowledged for
providing the biological samples and the necessary information for this work
The Authors are also grateful to SelMol project (MiPAAF Ministero delle
PoliticheAgricoleAlimentarieFortestali)andbytheProZooproject(Regione
LombardiandashDGAgricoltura Fondazione Cariplo Fondazione Banca Popolare di
Lodi)forfunding
Thefundershadnoroleinstudydesigndatacollectionandanalysisdecisionto
publishorpreparationofthemanuscript
References
Acosta TJ Berisha B Ozawa T Sato K Schams D Miyamoto A 1999
Evidence for a Local EndothelinAngiotensinAtrial Natriuretic Peptide
SysteminBovineMature Follicles In Vitro Effects on SteroidHormones
and Prostaglandin Secretion Biol Reprod 61 1419ndash1425
doi101095biolreprod6161419
AjmoneMarsan P Garcia JF Lenstra JA 2010 On the origin of cattle how
aurochs became cattle and colonized theworld Evol Anthropol Issues
NewsRev19148ndash157
38
BonnefontCToufeerMCaubetCFoulonETascaCAurelMRBergonier
D Boullier S RobertGranieacute C Foucras G 2011 Transcriptomic
analysisofmilksomaticcells inmastitisresistantandsusceptiblesheep
upon challenge with Staphylococcus epidermidis and Staphylococcus
aureusBMCGenomics12208
BurantCFTakedaJBrotLarocheEBellGIDavidsonNO1992Fructose
transporter inhumanspermatozoaandsmall intestine isGLUT5 JBiol
Chem26714523ndash14526
DubostAMicolDPicardBLethiasCAnduezaDBauchartDListratA
2013Structuralandbiochemicalcharacteristicsofbovineintramuscular
connective tissue and beef quality Meat Sci 95 555ndash561
doi101016jmeatsci201305040
DupontVersteegden EE Nagarajan R Beggs ML Bearden ED Simpson
PMPetersonCA2008IdentificationofcoldshockproteinRBM3asa
possible regulator of skeletal muscle size through expression profiling
AJP Regul Integr Comp Physiol 295 R1263ndashR1273
doi101152ajpregu904552008
Enard D Messer PW Petrov DA 2014 Genomewide signals of positive
selection in human evolution Genome Res gr164822113
doi101101gr164822113
EzuraYKajitaMIshidaRYoshidaSYoshidaHSuzukiTHosoiTInoue
SShirakiMOrimoHEmiM2004Associationofmultiplenucleotide
variationsinthepituitaryglutaminylcyclasegene(QPCT)withlowradial
BMDinadultwomenJBoneMinRes191296ndash301
Fan H Wu Y Qi X Zhang J Li J Gao X Zhang L Li J Gao H 2014
Genomewide detection of selective signatures in Simmental cattle J
ApplGenet1ndash9doi101007s1335301402006
HayesBJChamberlainAJMaceachernSSavinKMcPartlanHMacLeodI
SethuramanLGoddardME2009Agenomemapofdivergentartificial
selectionbetweenBostaurusdairycattleandBostaurusbeefcattleAnim
Genet40176ndash184doi101111j13652052200801815x
Hayes BJ Lien S Nilsen H Olsen HG Berg P Maceachern S Potter S
Meuwissen THE 2008 The origin of selection signatures on bovine
39
chromosome 6 Anim Genet 39 105ndash111 doi101111j1365
2052200701683x
IqbalSZebeliQMazzolariABertoniGDunnSMYangWZAmetajBN
2009 Feeding barley grain steeped in lactic acid modulates rumen
fermentation patterns and increases milk fat content in dairy cows J
DairySci926023ndash6032doi103168jds20092380
KemperKESaxtonSJBolormaaSHayesBJGoddardME2014Selection
for complex traits leaves littleornoclassic signaturesof selectionBMC
Genomics15246doi1011861471216415246
KimuraM1984Theneutraltheoryofmolecularevolution
KuhlaBNuumlrnbergGAlbrechtDGoumlrsSHammonHMMetgesCC2011InvolvementofSkeletalMuscleProteinGlycogenandFatMetabolismin
the Adaptation on Early Lactation of Dairy Cows J Proteome Res 10
4252ndash4262doi101021pr200425h
Lenstra JAGroeneveld LF EdingHKantanen JWilliams JL Taberlet P
NicolazziELSolknerJSimianerHCianiEGarciaJFBrufordMW
AjmoneMarsan P Weigend S 2012 Molecular tools and analytical
approaches for the characterization of farm animal genetic diversity
AnimGenet43483ndash502
Li J Johnson SE 2013 EphrinA5 promotes bovine muscle progenitor cell
migration before mitotic activation J Anim Sci 91 1086ndash1093
doi102527jas20125728
Ma YL Zhang Q Ding XD 2012 [Detecting selection signatures on X
chromosome in pig through high density SNPs] Yi Chuan Hered
ZhongguoYiChuanXueHuiBianJi341251ndash1260
Madani S Hichami A CharkaouiMalki M Khan NA 2004 Diacylglycerols
Containing Omega 3 and Omega 6 Fatty Acids Bind to RasGRP and
Modulate MAP Kinase Activation J Biol Chem 279 1176ndash1183
doi101074jbcM306252200
Maningat PD Sen P Rijnkels M Sunehag AL Hadsell DL Bray M
Haymond MW 2008 Gene expression in the human mammary
epitheliumduring lactation themilk fat globule transcriptome Physiol
Genomics3712ndash22doi101152physiolgenomics903412008
40
Mehlen P Puisieux A 2006Metastasis a question of life or deathNat Rev
Cancer6449ndash458doi101038nrc1886
NarayananUOspinaJKFreyMRHebertMDMateraAG2002SMNthe
Spinal Muscular Atrophy Protein Forms a PreImport Snrnp Complex
withSnurportin1andImportin HumMolGenet111785ndash1795
Nguyen N Yi JS Park H Lee JS Ko YG 2014 Mitsugumin 53 (MG53)
LigaseUbiquitinatesFocalAdhesionKinaseduringSkeletalMyogenesisJ
BiolChem2893209ndash3216doi101074jbcM113525154
NielsenRHellmannIHubiszMBustamanteCClarkAG2007Recentand
ongoing selection in the human genome Nat Rev Genet 8 857ndash868
doi101038nrg2187
NishimuraTHattoriATakahashiK1996Relationshipbetweendegradation
of proteoglycans andweakening of the intramuscular connective tissue
duringpostmortemageingofbeefMeatSci42251ndash260
Nussdorfer GG Rossi GPMalendowicz LKMazzocchi G 1999Autocrine
ParacrineEndothelinSysteminthePhysiologyandPathologyofSteroid
SecretingTissuesPharmacolRev51403ndash438
OrsquoBrienAMPUtsunomiyaYTMeacuteszaacuterosGBickhartDM LiuGE Tassell
CPV Sonstegard TS Silva MVD Garcia JF Soumllkner J 2014
Assessing signatures of selection through variation in linkage
disequilibrium between taurine and indicine cattle Genet Sel Evol 46
19doi101186129796864619
Pan D Zhang S Jiang J Jiang L Zhang Q Liu J 2013 GenomeWide
DetectionofSelectiveSignatureinChineseHolsteinPLoSONE8e60440
doi101371journalpone0060440
PuglisiRCambuliCCapoferriRGianninoLLukajADuchiRLazzariG
Galli C Feligini M Galli A Bongioni G 2013 Differential gene
expressionincumulusoocytecomplexescollectedbyovumpickupfrom
repeat breeder and normally fertile Holstein Friesian heifers Anim
ReprodSci14126ndash33doi101016janireprosci201307003
Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender D
MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKa
41
tool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795
Qanbari S Gianola D Hayes B Schenkel F Miller S Moore S Thaller GSimianer H 2011 Application of site and haplotypefrequency basedapproachesfordetectingselectionsignaturesincattleBMCGenomics12318doi1011861471216412318
Qanbari S Pausch H Jansen S Somel M Strom TM Fries R Nielsen RSimianer H 2014 Classic Selective Sweeps Revealed by MassiveSequencinginCattlePLoSGenet10doi101371journalpgen1004148
Qanbari S Pimentel ECG Tetens J Thaller G Lichtner P Sharifi ARSimianerH2010AgenomewidescanforsignaturesofrecentselectioninHolsteincattleAnimGenetdoi101111j13652052200902016x
Sabeti PC Reich DE Higgins JM Levine HZ Richter DJ Schaffner SFGabriel SB Platko JV Patterson NJ McDonald GJ Ackerman HCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash7
ScheetPStephensM2006Afastandflexiblestatisticalmodelforlargescalepopulation genotype data applications to inferring missing genotypesandhaplotypicphaseAmJHumGenet78629ndash44
Shahzad K Loor JJ 2012 Application of topdown and bottomup systemsapproaches in ruminantphysiologyandmetabolismCurrGenomics13379
SmithLSuzuki JGoffAFilionFTherrien JMurphyBKohanGhadrHLefebvreRBrisvilleABuczinskiSFecteauGPerecinFMeirellesF2012DevelopmentalandEpigeneticAnomaliesinClonedCattleReprodDomestAnim47107ndash114doi101111j14390531201202063x
Sun G Cheng SYS Chen M Lim CJ Pallen CJ 2012 Protein TyrosinePhosphatase Phosphotyrosyl789 Binds BCAR3 To Position Cas forActivationatIntegrinMediatedFocalAdhesionsMolCellBiol323776ndash3789doi101128MCB0021412
Sun J Zhang C Lan X Lei C ChenH 2012 Exploring polymorphisms andassociationsofthebovineMOGAT3genewithgrowthtraitsGenomeNatl
42
Res Counc Can Geacutenome Cons Natl Rech Can 55 56ndash62doi101139g11077
TaberletPCoissacEPansu J PompanonF 2011Conservationgeneticsofcattle sheep and goats C R Biol On the trail of domesticationsmigrationsandinvasionsinagricultureDominiqueJobGeorgesPelletierJeanClaudePernollet334247ndash254doi101016jcrvi201012007
Utsunomiya YT Perez OrsquoBrien AM Sonstegard TS Van Tassell CP doCarmo AS Meszaros G Solkner J Garcia JF 2013 Detecting lociunder recent positive selection in dairy and beef cattle by combiningdifferentgenomewidescanmethodsPLoSOne8e64280
Voight BF Kudaravalli S Wen X Pritchard JK 2006 A Map of RecentPositive Selection in the Human Genome PLoS Biol 4 e72doi101371journalpbio0040072
WangHKadlecekTAAuYeungBBGoodfellowHESHsuLYFreedmanTSWeissA2010ZAP70AnEssentialKinaseinTcellSignalingColdSpringHarbPerspectBiol2doi101101cshperspecta002279
WangMZhangXKangLJiangCJiangY2012MolecularcharacterizationofporcineNECD SNRPNandUBE3Agenesand imprinting status in theskeletal muscle of neonate pigs Mol Biol Rep 39 9415ndash9422doi101007s1103301218066
WatanabeRChanoTInoueHIsonoTKoiwaiOOkabeH2005Rb1cc1iscritical for myoblast differentiation through Rb1 regulation VirchowsArchIntJPathol447643ndash648doi101007s0042800411831
WickramasingheNSManavalanTTDoughertySMRiggsKALiYKlingeCM 2009 Estradiol downregulates miR21 expression and increasesmiR21targetgeneexpressioninMCF7breastcancercellsNucleicAcidsRes372584ndash2595doi101093nargkp117
WildeCJAddeyCVLiPFernigDG1997Programmedcelldeathinbovinemammary tissue during lactation and involution Exp Physiol 82 943ndash953
YasudaSStevensRLTeradaTTakedaMHashimotoTFukaeJHoritaTKataoka H Atsumi T Koike T 2007 Defective Expression of Ras
43
Guanyl NucleotideReleasing Protein 1 in a Subset of Patients with
SystemicLupusErythematosusJImmunol1794890ndash4900
ZhangCBaileyDKAwadTLiuGXingGCaoMValmeekamVRetiefJ
Matsuzaki H Taub M Seielstad M Kennedy GC 2006 A whole
genome longrange haplotype (WGLRH) test for detecting imprints of
positiveselectioninhumanpopulationsBioinformatics222122ndash8
ZhaoFQOkineEKCheesemanCIShiraziBeecheySPKennellyJJ1998
Glucose transporter gene expression in lactating bovine gastrointestinal
tractJAnimSci762921ndash2929
44
Supplementaryfiles
YoucanfindChapterʹsupplementaryfilesat
- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)
- GoogleDrive(httptinyurlcomPhDMMChapter02)orscantheQRcode
SupplementaryfileS1ndashSignificantcorehaplotypesandgenessharedin
dairyandbeef
Significantcorehaplotypes(pvaluele005haplotypefrequencyge025)shared
indairyandbeefcattlebreedsandrelativegenesintersectingwiththemThefile
isdividedinto4tablesThefirst2sheets(sheet1and2)shownthesignificant
corehaplotypesfordairyandbeefThelast2sheets(sheet3and4)showngenes
intersectingwiththesignificantcorehaplotypesfordairyandbeef
SupplementaryfileS2ndashNetworksrankingindairybreeds
Networksrankingindairybreedswithrelativemoleculessymbolslistscore
valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop
functionforeachnetwork
SupplementaryfileS3ndashNetworksrankinginbeefbreeds
Networksrankinginbeefbreedswithrelativemoleculessymbolslistscore
valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop
functionforeachnetwork
45
SupplementaryfileS4ndashCanonicalpathwaysrankingindairyorbeefcattle
breeds
Canonicalpathwaysrankingindairy(sheet1)orbeef(sheet2)cattlebreedswithrelativemoleculessymbolslistratioandndashlog10ofthepvaluesforeachcanonicalpathway
46
CHAPTER3Searchingnewsignalsforproductivetraitsthroughgenebasedassociationanalysisin
threeItaliancattlebreedsSCapomaccio1MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItalyTheseauthorscontributedequallytothiswork
____________________SubmittedtoAnimalGenetics
Abstract
Genome wide association studies (GWAS) have been widely applied to
disentanglethegeneticbasisofcomplextraits IncattlebreedsclassicalGWAS
approach with medium density markers panels are far to be conclusive
especiallyforcomplextraitsThisisduetotheintrinsiclimitationsofGWASand
to the choices that are made to step from the associationrsquos signals to the
functionalunits
Here we applied a geneHbased association strategy to prioritize genotypeH
phenotype associations found with classical approaches in three Italian dairy
cattlebreedswithdifferentsamplesizes(ItalianHolsteinn=2058ItalianBrown
n=745ItalianSimmentaln=477)formilkproductionandqualitytraits
WhileclassicalsinglemarkerregressionfailedtorevealgenomeHwidesignificant
genotypeHphenotype association except for Italian Holstein the gene based
approach found specific genes in each breed that are associated with milk
physiologyandmammaryglanddevelopment
Since no established method is yet implemented to step from variation to
functional units our strategy may contribute to reveal new genes that play
significant roles in complex traits like thosehere investigated condensing low
signalsofassociationinageneHcentricfashionapproach
Background
Modern cattle breeds have nomore than 200 years of history (Taberlet et al
2008) a time spanwhere selection reproductive isolationand the consequent
geneticdriftshapedallelefrequenciesandlinkagedisequilibriumthroughoutthe
genome This is particularly true for industrial dairy breeds where the joint
application of reproductive technologies and modern genetic evaluation
methodshave imposedahigh selectionpressureona limitednumberof traits
(Taberletetal2008)involvedinmilkquantityandquality
Indeed the milk production industry has been historically ready to acquire
technicalimprovementsfromdifferentfieldsstatisticsbiologyandengineering
50
ForexampleBLUPbasedstrategiesandhighHdensitygenotyping throughSNPHchipsarenowroutinelyincludedinbullevaluationshavingsignificantimpactonselectionefficiencyandpopulationstructureThese techniques have dramatically improved milk yield and compositionwithout knowing the biological basis of the traits under selection To datedespite many efforts and many QTL mapped only a few genes have beeneffectivelyassociatedwithmilkyieldandcomposition(Grisartetal2002Blottetal2003CohenHZinderetal2005)In thiscontextGenomeWideAssociationStudies(GWAS) is themostcommonmethodfordissectingthebiologythatunderliescomplextraits(McCarthyetal2008)TheunderlyingideaofGWASissimplefindmarkerHtraitassociationsexploitingthe linkage disequilibrium (LD) that exists between the causative mutation ndashwhichweignorendashandoneormoreanalyzedmarkerswhichwewellknowInthiswayonecanpinpointgenomicregionscarryingcausalvariantsforanytrait
Although a number of GWAS have been conducted in cattle productive andmorphological traits (some examples (Jiang et al 2010 Cole et al 2011Meredith et al 2012)) and found several genomic regions associated to thesetraits none focused their attention directly on genes involved inmilk biologyMoreoveroneof themajordrawbacksofGWASisrelatedtothescarceresultsportability across populations due to a variety of confounding factorspopulationstructuredifferentialLD levelsbreedspecific selection targetsandeven SNPs ascertainment bias (Clark et al 2005) For instance significantassociationsonfatpercentagewereidentifiedinmanyregionsoftheBostaurusgenome in different breeds Except for genes of major effect there is littleagreement on which regions and H more importantly H genes are actuallyinvolvedinthecontrolofthetraitinvestigatedThisissomehownotsurprisingone favorable allele may segregate in one breed and be fixated in a differentbreed the same allele segregates in both breeds but allelesmay differ or thesegregates in both breeds but the genetic background masks the effect ofsegregation
51
AnotherlimitationinGWASisthestringentsignificancethresholdoftenapplied
tocorrectformultipletestingAsaresulta largeproportionofgeneswithlow
effectaredisregardedwithconsequentoverestimationof thehigheffectgenes
variance(Xu2003)Thisbiasisparticularlyworryingwheninvestigatingtraits
controlledbya largenumberofgeneswithsmalleffectand inhighLDasmilk
yield(GIANOLAETAL2013)
To overcome this kind of limitation different strategies have been invented
usingprior knowledge Interesting ldquocandidatepathwayrdquo approacheshavebeen
conducted(Ravenetal2013)testingforassociationbetweenaphenotypeand
genes inpathwaysthatare likelytobe involvedinthecontrolofthetraitThis
approach reduces thenumberofmarkers tobe tested increases the statistical
poweroftheexperimentrestrictingtheanalysistoexistingknowledge
Other methods based on statistical manipulations or particular experimental
designs have been developed to prioritize or validate genotypeHphenotype
associations For example geneHbased association strategies while restricting
GWA study only to genes and neighboring genomic regionsmay identify new
candidategenesdeservingpostHGWASanalysis(Akulaetal2011Cantoretal
2010 Liu et al 2010) Moreover it is true that classical GWAS downstream
bioinformatics analyses concentrate their efforts in findingmeaningful results
usinggenesasfinaltarget
The aim of ourworkwas to find new candidate genes and novel information
associated to complex traits as milk production (yiled) and quality (fat and
protein percentage) using a geneHcentric genome wide association in three
differentdairybreedsItalianHolsteinItalianBrownandItalianSimmental
Methods
BiologicalSamples
Bulls from the three most represented dairy breeds in Italy (Italian Holstein
ItalianBrownandItalianSimmental)wereselectedfromArtificialInsemination
(AI) bulls progeny tested in Italy until 2010 and forwhich biological samples
52
wereavailableItalianHolsteinbullswerechosenaccordingtofollowingcriteria
i) having the national Selection Index namely production functionality type
(PFT)withreliabilitygt75andii)havingthelowestpossiblerelationshipwith
theothersiresinthesetThiswasachievedbymaximizingthevariabilitywithin
the group On the contrary considering the low number of available Italian
BrownandItalianSimmentalbullstheywereallincludedintheanalysesAfter
thisprocedureatotalof2139ItalianHolstein775ItalianBrownand486Italian
Simmental samples (including quality control replicates)were genotypedwith
theBovineSNP50BeadChipv1(IlluminaSanDiegoCA)
Traitsconsideredintheanalysis
Italian Holstein (ANAFI) Italian Brown (ANARB) and Italian Simmental
(ANAPRI)breedersassociationsprovidedderegressedproofs(DRP)ordaughter
yielddeviations (DYD)hereafter referredasphenotype forall traitsanalyzed
Traitsanalyzedweremilkyieldfatpercentageandproteinpercentage
Genotypingandqualitycontrol
Quality controls (QC) were run independently for each breed using the same
pipelineandthresholds IndetailQCexcluded i)thereplicatesamplewiththe
highernumberofmissinggenotypesii)sampleswithunexpectedlyhigh(ge100)
mendelianerrors(forfatherHsonpairs)iii)sampleswithmorethan5missing
genotypes iv) SNPs with ge 25 missing data v) SNPs with minor allele
frequency(MAF)le5vi)SNPswithmendelianerrorsge25andvii)SNPsout
ofHardyHWeinbergequilibriumtestedbyFisherexacttest(ple0001Bonferroni
corrected)
Analysisofpopulationstructure
MultiDimensionScaling(MDS)wasperformedusingPLINK107(Purcelletal
2007)andresultsplottedwithRscriptsThisanalysisshowninsupplementary
material (FigureS1) exploredpopulationsubstructureandverified thegenetic
homogeneityofthedatasetpriortofurtheranalyses
53
PairwiseLDwascalculatedusingPLINK107(Purcelletal2007)LDdecayand
LD structure on specific regions were calculated with in house R scripts and
snpplotterRpackage(LunaandNicodemus2007)
GWAS
GenomeHwide association analysis was made with the GenABEL package in R
(Team 2014) using GRAMMARHCG (Genome wide Association using Mixed
Model and Regression H Genomic Control) approach (Amin et al 2007
Aulchenkoetal2007)
Basingonpriorknowledge(Minozzietal2013) in ItalianHolsteintheDGAT1
regionwas also treated as fixed effect inGenABEL analysis of the three traits
ResultsusingthismodelarereferredtoasldquoHolsteinDGATrdquo
Manhattan andquantileHquantileplots for each traitwereproduced to allowa
thoroughevaluationoftheGWASresultsAllcalculatedpHvaluesfromtheGWAS
analysiswere retained for the downstream prioritizationwith the geneHbased
association
Workingongenomecoordinates
Coordinatesof themarkerswere liftedover fromBTA40(Elsiketal2009)to
UMD31assembly(Ziminetal2009)usingSnpChimpv1(Nicolazzietal2014)
andgenecoordinateswereretrievedusingBioMartEnsembl
QTLcoordinatesweredownloadedfromtheAnimalQTLdb(Huetal2013) in
bedformat
Operations on genomic intervalswereperformedusing thebedtools suite ver
2162(QuinlanandHall2010)
GeneBasedAssociationanalysis
Toreducetheantagonismbetweensignificancethresholdandfalsepositiverate
of a classical GWA study (Gianola et al 2013) and to facilitate selection
signature comparison across the three breeds investigated we used a geneH
54
centricstatisticthatcollapsesinasinglevalueallthepHvaluesfromSNPsrelated
toagenecorrectingforLDinformation
Our approachderivesdirectly from the softwareVEGAS (VersatileGeneHBased
TestforGenomeHwideAssociationStudies)(Liuetal2010)adaptedtopursue
meaningfulresultsinanyspeciesbeyondhumans
Briefly the geneHwise test statistic is the sum of the n upper tail chiHsquared
valuesofaSNPsubset withonedegreeof freedom(wheren is thenumberof
SNPmappinginthegivengene)WithperfectlinkageequilibriumthegeneHwise
pHvaluewouldbetheoneHsidepHvalueofachiHsquaredistributionwithndegrees
of freedom Otherwise more likely the distribution of the pHvalues has to be
evaluated(withsimulations)andweighted(withLDvalues)(Liuetal2010)
The VEGAS speciesHfree procedure was implemented in a UNIX environment
usingBashandRlanguagesusingEnsemblannotationasinputfilesPLINKdata
typeandpHvaluesforeachSNPfromthepreviouslycitedGRAMMARprocedure
WemappedSNPsonBostaurusgenes(UMD31)andtheirboundaries(100000
bpupHanddownHstream)toaccountforregulatoryregionsOnlygeneswithat
least 2 SNPs mapped were retained Significant entries from the geneHbased
associationweremappedinthecattleQTLdatabasewithbedtools
ResultsandDiscussion
In this study a gene based association strategy was applied to identify new
genomicregionsboostingGWASanalysisresultsformilkproductionandquality
fromthreeItaliandairybreeds
Complextraitslikethosehereevaluatedareaffectedbyafewmajorgeneswith
large effects andmany otherswithmoderate to low effects the latter are not
easily identifiedby genomewide scans inmodern cattle breedsduemainly to
samplesizelimitationwhilesignalfromtheformersarelostingenomescansH
withsomeexceptionsHduerapidfixations
InthisscenariowithmediumdensitySNPpanelsliketheoneusedinthisstudy
associationbasedonsingleSNPregressionorotherldquoclassicalrdquomethodscouldbe
inconclusiveintermsofresults
55
AftertheQCprocess7452058and477samplesand3575938535and38574SNPs were retained for Italian Brown Italian Holstein and Italian Simmentalbulls respectively (Table S1) HolsteinDGAT analysis retained 38534markersoneSNPlessthantheotherHolsteinpanelDescriptive statistics on genotypes for each breed (histogram and barplot forheterozygosis and call rate per individual Figure S7 and multi dimensionalscalingFigureS1)revealedabsenceofconfoundingeffectsWiththesedatasetsatthenominalgenomeHwidepHvaluesignificancethresholdof 005 (ple125 x 10H6 after Bonferroni correction (Burton et al 2007Dudbridge and Gusnanto 2008)) significant associations were found only inItalianHolsteinbetweenmilkyieldandfatpercentageand26SNPsmappingintheproximalregionofBTA14nearbyDGAT1(Table1)
56
Table1PHvaluesoftheSNPsovercomingthegenomewidesignificancethresholdintheHolsteinbreedformilkyieldandfatpercentage(notaccountingforDGAT1effect)SNPsaresortedbypositionsintheUMD31assembly
SNPID BTA Position(Bp) POvalue(Fat) POValue(Milk)
Hapmap30383OBTCO005848 14 1489496 112EH12 361EH12
ARSOBFGLONGSO57820 14 1651311 247EH46 161EH16
ARSOBFGLONGSO34135 14 1675278 857EH17
ARSOBFGLONGSO94706 14 1696470 665EH16
ARSOBFGLONGSO4939 14 1801116 534EH49 106EH20
Hapmap52798Oss46526455 14 1923292 915EH21
UAOIFASAO6878 14 2002873 382EH24
ARSOBFGLONGSO107379 14 2054457 118EH33 664EH15
ARSOBFGLONGSO18365 14 2117455 250EH16
Hapmap30922OBTCO002021 14 2138926 439EH13
Hapmap25384OBTCO001997 14 2217163 390EH10 697EH09
Hapmap24715OBTCO001973 14 2239085 108EH09 369EH09
BTAO35941OnoOrs 14 2276443 402EH13
Hapmap30374OBTCO002159 14 2468020 528EH12
Hapmap30086OBTCO002066 14 2524432 798EH11
ARSOBFGLONGSO74378 14 3640788 161EH08
ARSOBFGLOBACO1511 14 3765019 375EH09
UAOIFASAO9288 14 3956956 811EH09
ARSOBFGLONGSO100480 14 4364952 133EH09
UAOIFASAO5306 14 4468478 458EH08
ThisislikelyduetotheeffectoftheK232ADGAT1mutationthatissegregatingin
thisbreed(Fontanesietal2014)NootherSNPwassignificantlyassociatedto
anyothertraitinthe3breedsinvestigated
SomeSNPswereunderasuggestivesignificancethresholdofple5x10H5Details
on these markers are in supplementary material (Table S3) but they are not
discussedherePlotsdescribingatglancetheassociationresultsrevealingalow
signaltonoiseratioareavailableintheonlinesupplementarymaterial(Figure
S2S3S4andS5respectively for ItalianBrown ItalianHolsteinHolsteinDGAT
andItalianSimmental)
57
Asperthegenecentricanalysiswemappedatotalof192371989919894and
19844 genes on 24616 available on Ensembl (release version 73) in Italian
BrownItalianHolsteinHolsteinDGATandItalianSimmentalrespectivelyGenes
with a geneHwisepHvalue at least 10H5were considered significantly associated
withthestudiedtraitThesignificanceappearedtobeadequatetothehighrate
ofredundancyof theannotationfeaturesandto figureoutthetopassociations
with a reasonable level of confidence Results on these associations per breed
andpertraitaresummarizedinTable2Table3andinsupplementarymaterials
(TableS2andTableS4)ThemajorityofsignificantgenesfoundwiththegeneH
centricanalysismapped inpreviouslycharacterizedQTL thatareassociated to
milkproductionorqualityThesearesummarizedintheTableS5
Table2Numberofsignificantlyassociatedgenesforeachbreedineachtraitinthegenebasedassociationtest
Fat Milkyield Protein
Brown 3 1
Holstein 92 80 1
HolsteinDGAT 1 4 1
Simmental 1 13
58
Tableamp3FeaturessignificantlyassociatedperbreedtraitHolsteinanalysisnotaccountingforDGAT1isshow
ninTableS3
Breedamp
Traitamp
EnsemblampIDamp
GeneWiseampPValueamp
SNPsInGeneamp
BestSNPamp
BestSNP_PValueamp
BTAamp
Startamp(bp)amp
Endamp(bp)amp
GeneampNam
eamp
BROWNamp
FAT
Nosignificantassociationfound
MILK
ENSBTAG00000019237
500EG05
4Hapmap53770Gss46526325
204EG04
23
47333403
47348212
TXNDC5
ENSBTAG00000043918
500EG05
4Hapmap53770Gss46526325
204EG04
23
47337650
47337787
5S_rRNA
ENSBTAG00000046839
500EG05
4Hapmap53770Gss46526325
204EG04
23
47328131
47352621
UncharacterizedProtein
PROTEIN
ENSBTAG00000001260
700EG05
3ARSGBFGLGNGSG116222
441EG03
18
52120387
52124418
PINLYP
amp
HOLSTEINDGATamp
FAT
ENSBTAG00000000369
100EG05
3Hapmap53294Grs29016908
333EG05
594673755
94749093
EPS8
MILK
ENSBTAG00000026896
130EG05
5ARSGBFGLGNGSG104949
385EG04
23
51549721
51553563
FOXF2
ENSBTAG00000005260
400EG05
5Hapmap33288GBTCG033751
120EG03
638120578
38127577
SPP1
ENSBTAG00000006579
500EG05
4ARSGBFGLGBACG2207
632EG05
15
54418559
54459500
P4HA3
ENSBTAG00000007013
700EG05
6UAGIFASAG7665
396EG05
19
5221507
5283308
TOM1L1
PROTEIN
ENSBTAG00000006638
600EG05
3ARSGBFGLGNGSG104774
176EG04
18
56523439
56530499
BCL2L12
amp
SIMMENTHALamp
FAT
ENSBTAG00000037649
900EG05
6Hapmap30001GBTAG142716
529EG05
4120427915
120494453
VIPR2
MILK
ENSBTAG00000017538
100EG05
6ARSGBFGLGBACG13093
587EG06
12
85630428
85630912
Pseudogene
ENSBTAG00000000856
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1766767
1769754
FBXL6
ENSBTAG00000000857
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1763994
1766621
GPR172B
ENSBTAG00000026356
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1795351
1804562
DGAT1
ENSBTAG00000035158
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1770660
1772329
TMEM
249
ENSBTAG00000046208
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1782901
1788087
SCRT1
ENSBTAG00000009811
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1825982
1844587
UncharacterizedProtein
ENSBTAG00000009816
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1838135
1840081
UncharacterizedProtein
ENSBTAG00000014458
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1844664
1894424
MROH1
ENSBTAG00000020751
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1806081
1825793
HSF1
ENSBTAG00000044406
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1883849
1883971
SCARNA15
ENSBTAG00000046889
400EG05
5ARSGBFGLGBACG13093
587EG06
12
85610279
85610762
UncharacterizedProtein
ENSBTAG00000020117
800EG05
3BTAG78494GnoGrs
188EG04
721189879
21196263
ZBTB7A
PROTEIN
Nosignificantassociationfound
amp
59
1 ItalianHolstein
The$ large$ number$ of$ significant$ signals$ was$ observed$ in$ the$ Italian$ Holstein$
especially$ for$ milk$ yield$ and$ fat$ percentage$ As$ shown$ in$ Table$ 2$ the$ strong$
effect$ of$ DGAT1$ in$ Italian$ Holsteins$ observed$ with$ individual$ SNPs$ was$
confirmed$by$the$geneBcentric$approach$In$this$region$almost$the$totality$of$the$
geneBwise$pBvalues$under$the$chosen$threshold$appeared$to$be$influenced$by$this$
major$ gene$ and$ by$ the$ high$ level$ of$ linkage$ disequilibrium$ that$ characterizes$
cosmopolitan$bovine$breeds$(see$Figure$S6)$Within$this$pool$of$genes$it$is$worth$
to$mention$PLEC$ (plectin)$ a$ cell$ surface$ receptor$ gene$ in$ the$RNASE5$ pathway$
(Angiogenin$ encoded$ by$ angiogenin$ ribonuclease$ RNase$ A$ family$ 5)$ Proteins$
encoded$ by$ these$ genes$ that$ are$ present$ in$ milk$ are$ involved$ in$ protein$
synthesis$ and$ act$ as$ growth$ factors$ in$ vitro$Moreover$RNASE5$ seems$ to$ have$
other$ functions$ such$ as$ regulation$ of$ stress$ response$ and$ apoptosis$ and$ even$
angiogenesis$ through$ which$ it$ may$ also$ contribute$ to$ the$ regulate$ milk$
production$(Raven$et$al$2013)$
The$ exclusion$ of$DGAT1$ effect$ (HolsteinDGAT$ results)$ highlighted$ that$ all$ the$
significant$ effects$ on$ fat$ percentage$ found$ for$ genes$ located$ within$
approximately$25$Mb$up$or$downstream$from$the$excluded$SNP$(ARS5BFGL5NGS5
4939)$were$ redundant$ signals$ of$DGAT1$ (Table3$ and$Table$ S2$ and$ S3)$ This$ is$
also$observed$for$milk$yield$due$to$the$genetic$correlations$between$fat$and$milk$$
Also$ PLEC$ was$ not$ significant$ in$ our$ analyses$ when$ taking$ into$ account$ the$
DGAT1$effect$contradicting$previous$findings$in$Holstein$that$report$association$
of$PLEC$gene$with$production$traits$either$accounting$or$not$for$the$effect$of$the$
diglyceride$ acyltransferase$ (Raven$ et$ al$ 2013)$ A$ possible$ explanation$ for$ the$
different$results$between$these$studies$is$a$difference$between$genetic$structures$
of$the$populations$analyzed$To$reduce$false$positive$signals$we$only$evaluated$
the$genes$based$on$HolsteinDGAT$results$for$the$Italian$Holstein$$
11 Milkyield
We$found$genes$associated$with$this$trait$in$Holstein$in$particular$SPP1$P4HA3=
FOXF2$and$TOM1L1$genes$
Among$ these$ the$ osteopontin$ SPP1$ (secreted$ phosphoprotein$ 1)$ is$ directly$
related$to$milk$production$(de$Mello$et$al$2012)$ It$ is$ located$ in$BTA6$nearby$
60
$
ABCG2$ and$ there$ is$ evidence$ that$ polymorphisms$ in$ this$ region$ are$ strongly$
correlated$with$high$production$ levels$of$milk$ in$various$breeds$(CohenBZinder$
et$ al$ 2005)$Osteopontin$SPP1$ is$ a$ secreted$glycophosphoprotein$ that$plays$ a$
crucial$ role$ in$ mammary$ gland$ development$ having$ an$ essential$ role$ also$ in$
mammary$gland$differentiation$and$branching$of$the$mammary$epithelial$ductal$
system$ This$ gene$has$ also$ been$ implicated$ in$many$ types$ of$ cancer$ including$
breast$cancer$ for$ the$potentiality$of$proliferation$and$alveologenesis$ (Hubbard$
et$ al$ 2013)$ It$ is$worth$mentioning$ that$ the$most$ significant$ SNPs$within$ this$
gene$was$Hapmap332885BTC5033751=with=a$pBvalue$of$00012$ largely$over$ the$
genomeBwide$significance$ threshold$This$signal$would$be$ ignored$ in$ the$single$
SNP$approach$
Another$gene$revealed$by$the$geneBbased$association$is$P4HA3$encoding$for$the$
prolyl$4Bhydroxylase$α$polypeptide$III$involved$in$the$correct$folding$of$nascent$
procollagen$ chains$ It$ is$ known$ that$ stromalBepithelial$ interactions$ are$ critical$
for$mammary$gland$development$and$defective$deposition$or$reorganization$of$
extracellular$ matrix$ can$ lead$ to$ incorrect$ mammary$ epithelial$ morphogenesis$
(Gillette$ et$ al$ 2013)$ highlighting$ the$ role$ of$ this$ gene$ in$ the$ healthy$ breast$
tissue=
FOXF2=gene$ (Forkhead$Box$F2)$ is$ a$member$of$ the$ forkhead$box$ transcription$
factors$ known$ to$ be$ involved$ in$ tissue$ development$ as$ part$ of$ the$ ldquoHedgehog$
signaling$pathwayrdquo$This$key$cascade$is$essential$for$the$correct$segmentation$of$
tissues$during$embryonic$development$but$has$been$recently$demonstrated$that$
is$active$in$adult$stem$cells$proliferation$in$mammary$tissue$(Liu$et$al$2006)$In$
particular$ FOXF2$ plays$ a$ key$ role$ in$ extracellular$ matrix$ synthesis$ and$
epithelialBmesenchymal$ interactions$ and$ could$ therefore$ affect$ the$ proper$
development$of$the$mammary$stroma$In$addition$low$levels$of$transcription$of$
this$gene$are$correlated$ to$early$onset$metastasis$ in$breast$cancer$ (Kong$et$al$
2013)$ underlining$ a$ specific$ role$ in$ the$ mammary$ gland$
Lastly$ TOM1L1$ (Target$ Of$ Myb1$ ChickenBLike$ 1)$ is$ another$ gene$ potentially$
involved$ with$ the$ milk$ physiology$ as$ is$ known$ its$ implication$ with$ the$ early$
endosome$ sorting$ coupled$ with$ EGFR= (epidermal$ growth$ factor$ receptor)$
another$ important$player$ in$ lactation$(Fowler$et$al$1995)$TOM1L1$belongs$to$
61
$
the$ VHS$ domain$ containing$ proteins$ that$ are$ involved$ in$ the$ postBGolgi$
trafficking$and$signaling$(Wang$et$al$2010)$
12 Fatpercentage
In$ Italian$ Holstein$ breed$ we$ found$ a$ significant$ association$ with$ or$ without$
DGAT1$ effect$ correction$ to$ the$EPS8$ gene$ located$on$BTA5$at$around$95$Mbp$
This$gene$encodes$for$a$substrate$of$the$already$cited$EGFR$and$is$involved$in$the$
fatty$acid$metabolism$(Fazioli$et$al$1993)$A$recent$study$on$Holstein$Friesian$
(Wang$et$al$2012)$found$strong$association$with$a$variant$located$in$the$second$
intron$of$the$epidermal$growth$factor$receptor$pathway$substrate$enforcing$the$
hypothesis$of$a$true$involvement$of$this$gene$on$fat$percentage$
13 Proteinpercentage
The$BCL2L12$gene$(BclB2Blike$protein$12)$associated$with$protein$percentage$in$
Italian$Holstein$breed$represents$another$interesting$result$The$product$of$this$
gene= is$ an$ important$ oncoprotein$ in$ two$ fundamental$ cell$ cycle$ pathways$
namely$ p53$ and$ cytoplasmic$ caspase$ signaling$ BCL2L12= resides$ in$ both$
cytoplasm$and$nucleus$inhibiting$caspases$3$and$7$and$forming$a$complex$with$
p53$thereby$ inhibiting$p53Bdirected$transcriptomic$changes$upon$DNA$damage$
(Stegh$ and$DePinho$ 2011)$ The$members$ of$ BCL2$ family$ are$ considered$ antiB
apoptotic$genes$and$it$is$obvious$that$dysfunctional$programmed$cell$death$plays$
an$ important$ role$ in$ the$ pathogenesis$ and$ in$ the$ progression$ of$ cancer$Many$
members$of$this$family$were$found$to$be$modulated$in$various$malignancies$and$
BCL2L12= in$ particular$ was$ expressed$ in$ mammary$ gland$ and$ proposed$ as$
prognostic$ cancer$ biomarker$ (Thomadaki$ et$ al$ 2007)$ In$ dairy$ cows$ the$
members$of$BCL2$family$are$involved$in$the$key$events$ in$the$mammary$tissue$
development$ and$ remodeling$ during$ lactation$ and$ involution$ (Stefanon$ et$ al$
2002$p$200)$Members$of$BCL2$family$regulate$the$turnover$of$cell$population$
in$the$mammary$gland$during$lactation$and$its$overexpression$was$found$to$be$
linearly$related$to$milk$production$in$Holstein$cow$(Colitti$et$al$2004)$
$
62
$
2 ItalianSimmental
In$ the$ Italian$ Simmental$ breed$ we$ found$ one$ association$ on$ BTA4$ for$ fat$
percentage$(VIPR2)$and$two$associations$ for$milk$yield$one$ in$BTA7$(ZBTB7A)$
and$ the$ other$ in$ BTA14$ related$ to$ the$ DGAT1$ No$ association$ was$ found$ for$
protein$percentage$
21 Fatpercentage
Among$the$genes$found$significantly$associated$with$production$traits$in$Italian$
Simmental$VIPR2=a$ gene$ encoding$ a$ receptor$ for$ a$ small$ vasoactive$ intestinal$
neuropeptide$ was$ the$ single$ signal$ in$ BTA4$ for$ fat$ percentage$ Vasoactive$
intestinal$ peptide$ (VIP)$ is$ synthesized$ in$ the$pituitary$ gland$ and$ among$other$
functions$ stimulates$ prolactin$ secretion$ and$ is$ involved$ in$ proliferation$
survival$and$differentiation$in$human$breast$cancer$cells$(Valdehita$et$al$2010)$$
An$interesting$novel$finding$is$the$effect$of$immunosuppression$(through$direct$
effect$of$IL56)$on$prolactin$cells$and$its$ability$to$regulate$prolactin$by$acting$on$
pituitary$ VIP$ through$ VIP$ receptors$ (Blanco$ et$ al$ 2013)$ These$ evidences$
suggest$ that$VIPR2$ could$ be$ an$ important$ candidate$ for$ further$ investigations$
through$post$GWAS$approaches$such$as$reBsequencing$or$fine$mapping$
22 Milkyield
Surprisingly$the$DGAT1$gene$was$significantly$associated$with$milk$yield$but$not$
with$fat$percentage$in$Italian$Simmental$breed$Indeed$the$most$significant$SNP$
for$the$milk$yield$trait$in$this$breed$was$the$ARS5BFGL5NGS54939$with$a$pBvalue$
of$ 413EB06$ The$ importance$ of$ this$ gene$ in$ the$ lactation$ is$ largely$ known$
(Grisart$et$al$2002)$$
To$ investigate$DGAT1$ signal$ in$ Simmental$ we$ analyzed$ the$ fine$ LD$ structure$
genomeBwide$ and$ in$ BTA14$ proximal$ region$ in$ all$ the$ three$ breeds$ revealing$
different$patterns$
Observed$LD$decay$on$ the$ three$breeds$genomes$revealed$similar$behavior$ for$
Italian$Holstein$and$Italian$Brown$with$r2$gt$010$$until$ around$ 250$ Kbp$ while$
Italian$Simmental$showed$lower$LD$Distances$higher$that$400$Kbp$revealed$half$
level$of$LD$in$Simmental$(Figure$S6)$
63
$
In$DGAT1$region$LD$behavior$is$very$similar$between$Italian$Holstein$and$Italian$
Brown$ (Figure$ S8)$ while$ association$ values$ are$ really$ different$ because$ ARS5
BFGL5NGS54939=is$fixed$in$Italian$Brown$and$purged$by$QC$process$Interestingly$
association$ patterns$ become$ similar$ when$ DGAT1$ effect$ is$ removed$ from$ the$
Holstein$ panel$ (HolsteinDGAT$ Figure$ S8)$ In$ Simmental$ we$ found$ a$ different$
situation$ very$ low$ LD$ level$ and$ significant$ association$ levels$ with$ ARS5BFGL5
NGS54939$with$ the$milk$ trait$This$ is$probably$due$ to$ the$ low$ frequency$of$ the$
p232K$ allele$ in$ Italian$ Simmental$ (Scotti$ et$ al$ 2010)$ and$ to$ the$ different$
selection$strategies$applied$in$these$breeds$
The$ relatively$ high$ pBvalue$ of$ this$ SNP$ is$ responsible$ for$ the$ statistical$
significance$of$many$of$the$13$genes$found$significant$in$this$breed$for$this$trait$
as$ shown$ in$ Table$ 3$ therefore$ reducing$ the$ gene$ set$ to$ three$ locations$ two$
overlapping$features$indicating$a$pseudogene$in$BTA12$and$a$zinc$finger$protein$
ZBTB7A$in$BTA7$DGAT1$role$and$function$will$be$not$discussed$here$as$already$
done$elsewhere$(Grisart$et$al$2002)$
ZBTB7A=is$part$of$zinc$finger$and$BTB$proteins$(Broad$complex$Tramtrack$and$
Bricabrac)$ transcription$ factors$ with$ emerging$ importance$ for$ their$
involvement$ in$ oncogenesis$ and$ cell$ differentiation$by$ repressing$key$ genes$of$
cell$cycle$regulation$as$retinoblastoma$(Rb)$and=p53=(ldquoJeon$et$al$B$2008$B$ProtoB
oncogene$ FBIB1$ (PokemonZBTB7A)$ Represses$ Trpdfrdquo$ nd)= ZBTB7A$ over$
expression$ is$ shown$ in$primary$ and$ recurrent$ breast$ cancer$ tissues$ leading$ to$
disease$progression$and$poor$survival$(Zu$et$al$2011)$Link$between$this$gene$
polymorphism$and$milk$production$might$be$in$the$proper$cellular$development$
of$ the$ mammary$ gland$ This$ transcription$ factor$ is$ also$ known$ as$ a$ master$
regulator$ of$ B$ and$ TBcell$ lineage$ (Lee$ and$ Maeda$ 2012)$ enforcing$ the$ link$
between$ milk$ production$ and$ immune$ protection$ as$ already$ outlined$ by$ of$
Lemay$ and$ colleagues$ with$ proteomic$ studies$ where$ the$ ldquoactivation$ of$ the$
innate$ immune$systemrdquo$resulted$as$one$of$ the$most$represented$gene$ontology$
categories$(Lemay$et$al$2009)$
$
64
$
3 ItalianBrown
In$ the$ Italian$Brown$breed$we$ found$ one$ association$ on$BTA23$ (TXNDC5)$ for$
milk$yield$and$one$in$BTA18$for$protein$percentage$(PINLYP)$The$three$features$
listed$in$Table$3$for$milk$yield$in$Italian$Brown$were$collapsed$due$to$overlap$$
31 Milkyield
Thioredoxin$ domainBcontaining$ protein$ 5$ (TXNDC5)$ belongs$ to$ thioredoxin$
family$small$ubiquitous$proteins$containing$disulphide$domain$affecting$protein$
folding$with$chaperone$activity$preventing$protein$misBfolding$within$the$lumen$
of$ the$endoplasmic$reticulum$This$multifunctional$organelle$have$a$key$role$ in$
several$ processes$ including$ lipid$ catabolism$ and$ protein$ synthesis$ (RamiacuterezB
Torres$ et$ al$ 2012)$ Because$ of$ its$ disulphide$ bonds$ thioredoxin$ facilitate$ the$
reduction$of$other$proteins$by$cysteine$thiol$disulfide$exchange$This$mechanism$
is$essential$for$detossification$
During$ lactation$ indeed$ an$ enhanced$ detoxification$ level$ is$ functional$ to$
preserve$ milk$ quality$ against$ increased$ metabolic$ activities$ Higher$ metabolic$
levels$ lead$ to$ free$ radicals$ production$ making$ necessary$ the$ maintenance$ of$
redox$ homeostasis$ that$ include$ among$ others$ the$ reduction$ and$ oxidation$ of$
peroxiredoxin$and$thioredoxin$(Lu$and$Holmgren$2014)$
32 Proteinpercentage
For$ protein$ percentage$ in$ Italian$ Brown$ there$ is$ only$ one$ associated$ gene$
PINLYP=is$encoding$a$phospholipase$A2$inhibitor$but$this$protein$product$is$not$
yet$well$studied$ $Many$types$of$phospholipase$A2$were$characterized$and$ it$ is$
known$ that$ they$ are$ mediator$ of$ inflammation$ atherosclerosis$ and$ cancer$ in$
mammals$and$that$plays$an$important$role$in$the$innate$immune$response$(Qiu$
and$Lai$2013)$It$may$be$reasonable$to$imagine$a$role$for$PINLYP$in$the$lactation$
in$light$of$previously$discussed$results$that$link$milk$to$innate$immune$response$
(Lemay$et$al$2009)$
$
Although$ limited$ by$ the$ size$ of$ the$ study$ and$ the$ number$ of$ the$ included$
markers$these$results$represent$both$a$confirmation$and$a$step$forward$towards$
the$comprehension$of$the$biological$bases$of$milk$yield$and$quality$$
65
$
New$ genes$ potentially$ involved$ with$ production$ traits$ in$ dairy$ cows$ were$
tracked$when$GWAS$signals$resulted$too$weak$to$be$considered$in$a$classical$SNP$
based$mapping$with$the$ldquoclosest$gene$to$peakrdquo$or$the$windowBbased$strategy$
It$ is$ worth$ mentioning$ that$ none$ of$ the$ analyzed$ SNPs$ (not$ considering$ the$
Italian$Holstein$with$ARS5BFGL5NGS54939= included$ in$ the$ analysis)$were$ above$
the$ genomeBwide$ significance$ threshold$ in$ single$ SNP$ analysis$ meaning$ that$
important$information$is$often$lost$in$such$studies$$
In$addition$ to$ the$newly$ identified$ loci$we$observed$a$confirmation$of$existing$
associations$ with$ genes$ related$ to$ milk$ production$ not$ only$ dissecting$ gene$
specific$ functions$ but$ from$ mapping$ these$ genes$ in$ previously$ characterized$
regions$(Table$S5)$This$ is$not$trivial$when$using$new$data$analysis$techniques$
especially$ in$ high$ noise$ to$ signal$ ratio$ datasets$ like$ those$ here$ presented$We$
believe$ that$ this$ approach$will$ benefit$ of$data$ from$denser$marker$panels$ (eg$
BovineHD)$ to$ exploit$ local$ linkage$ disequilibrium$ and$ increase$ the$ number$ of$
mapped$genes$with$more$than$two$SNPs$
Results$ obtained$ with$ the$ gene$ centric$ approach$ although$ having$ intrinsic$
limitations$are$not$ influenced$by$a$number$of$subjective$choices$often$made$in$
GWAS$ For$ instance$when$ ldquowindow$mappingrdquo$ is$ used$ to$decrease$ the$noise$ to$
signal$ ratio$ eg$ due$ to$ nearby$ markers$ having$ different$ allele$ frequencies$
windows$size$ is$ to$be$decided$This$can$be$based$on$equal$number$of$markers$
equal$number$of$base$pairs$or$equal$map$distance$in$cM$per$window$The$three$
choices$ do$ not$ coincide$ because$ markers$ are$ not$ equally$ spread$ along$ the$
chromosomes$ and$ recombination$ differs$ in$ different$ genomic$ regions$ In$
addition$ windows$ can$ be$ chosen$ nonBoverlapping$ or$ with$ different$ degree$ of$
overlap$Finally$when$stepping$ from$significant$SNPs$or$windows$ to$candidate$
genes$the$size$of$the$flanking$region$from$which$picking$up$candidate$genes$is$to$
be$decided$as$well$Sometimes$the$gene$closer$to$the$signal$peak$is$selected$ in$
other$cases$all$genes$within$certain$boundaries$are$included$in$further$analyses$
All$ these$ choices$ may$ generate$ misleading$ results$ For$ example$ not$ truly$
associated$genes$may$be$retained$for$further$analyses$disregarding$the$correct$
ones$ In$addition$ a$ large$number$of$ false$positives$can$be$ included$ to$be$ later$
interpreted$with$ the$help$of$network$and$pathway$analysis$As$ a$ consequence$
these$decisions$have$a$direct$impact$in$terms$of$production$and$interpretation$of$
66
$
results$leading$to$conclusions$heavily$based$on$the$enrichment$analysis$and$preB
existing$knowledge$$
With$ the$ gene$ based$ association$ we$ also$ noticed$ some$ advantages$ in$ the$
downstream$data$mining$First$genes$are$functional$units$and$the$golden$targets$
of$ the$ ldquoclassical$ postBGWASrdquo$ bioinformatics$ pipelines$ Hence$ the$ gene$ based$
association$leads$to$genes$with$an$embedded$statistically$robust$framework$This$
is$ a$ ldquoplusrdquo$ in$ studying$ complex$ traits$ because$ one$ can$ concentrate$ on$ specific$
signals$ rather$ than$ cherry$ picking$meaningful$ features$ from$ a$ list$ constructed$
with$the$previously$cited$approaches$$
Moreover$it$is$easier$to$track$back$gene$clusters$to$major$gene$effects$B$like$for$
example$DGAT1$in$a$gene$rich$genome$chunk$many$genes$may$result$significant$
due$to$the$same$SNP$(Table$S2$and$Table$3)$With$the$VEGAS$method$finding$the$
ldquotruerdquo$association$is$facilitated$by$following$the$ldquobest$SNPrdquo$in$the$region$
In$addition$we$found$phenotype$coherent$signals$in$such$situation$where$only$a$
few$SNPs$B$ if$none$ndash$are$above$an$acceptable$statistical$genomeBwide$threshold$
using$a$single$SNP$regression$method$
The$gene$centric$approach$ is$a$ first$and$slight$ improvement$ in$ livestock$GWAS$
analyses$since$we$still$have$a$partial$genome$annotation$ In$addition$ it$ is$now$
known$that$ the$definition$of$gene$ is$ to$be$revised$many$genes$do$not$code$ for$
proteins$ have$ alternative$ starting$ sites$ have$ antisense$ transcripts$ host$
regulatory$ RNA$ are$ subjected$ to$ alternative$ splicing$ and$may$ be$ regulated$ by$
sequences$quite$ far$ away$ from$ their$ coding$ regions$ (Consortium$2012)$Other$
limitations$are$the$need$for$larger$number$of$animals$in$these$type$of$studies$as$
in$this$study$and$the$still$relatively$high$cost$of$sequencing$that$cannot$be$used$
routinely$in$large$experiments$at$least$not$yet$$
$
$
Conclusions
This$study$underlines$the$importance$of$the$implementation$of$new$methods$to$
analyze$complex$traits$with$genomic$data$No$golden$path(s)$is$discovered$yet$to$
walk$from$variation$to$functional$units$The$geneBcentric$analysis$may$contribute$
67
$
to$ reveal$ new$ signals$ throughout$ the$ bovine$ genome$ condensing$ weak$
association$signals$in$chromosome$bits$having$functional$significance$
$
$
Acknowledgements
The$Authors$would$ like$ to$ thank$Dr$ Jimmy$ Liu$ for$ his$ advices$ in$ building$ the$
software$AIA$ANAFI$ANARB$and$ANAPRI$breedersrsquo$associations$ for$ their$help$
and$ support$during$ the$data$analysis$ SelMol$ and$ProZoo$projects$ that$provide$
samples$ The$ Authors$ are$ also$ grateful$ to$MIPAAF$ and$ Cariplo$ Foundation$ for$
funding$
$
$
$
References
Akula$ N$ Baranova$ A$ Seto$ D$ Solka$ J$ Nalls$MA$ Singleton$ A$ Ferrucci$ L$
Tanaka$T$Bandinelli$ S$Cho$YS$Kim$YJ$Lee$ JBY$Han$BBG$Bipolar$
Disorder$ Genome$ Study$ (BiGS)$ Consortium$ The$ Wellcome$ Trust$ CaseB
Control$Consortium$McMahon$FJ$2011$A$NetworkBBased$Approach$to$
Prioritize$ Results$ from$ GenomeBWide$ Association$ Studies$ PLoS$ ONE$ 6$
e24220$doi101371journalpone0024220$
Amin$N$ van$Duijn$ CM$ Aulchenko$ YS$ 2007$ A$Genomic$Background$Based$
Method$ for$ Association$ Analysis$ in$ Related$ Individuals$ PLoS$ ONE$ 2$
e1274$doi101371journalpone0001274$
Aulchenko$YS$de$Koning$DBJ$Haley$C$2007$Genomewide$Rapid$Association$
Using$ Mixed$ Model$ and$ Regression$ A$ Fast$ and$ Simple$ Method$ For$
Genomewide$PedigreeBBased$Quantitative$Trait$Loci$Association$Analysis$
Genetics$177$577ndash585$doi101534genetics107075614$
Blanco$ EJ$ CarreteroBHernaacutendez$ M$ GarciacuteaBBarrado$ J$ IglesiasBOsma$ MC$
Carretero$M$Herrero$JJ$Rubio$M$Riesco$JM$Carretero$J$2013$The$
activity$and$proliferation$of$pituitary$prolactinBpositive$cells$and$pituitary$
68
$
VIPBpositive$ cells$ are$ regulated$by$ interleukin$6$Histol$Histopathol$ 28$
1595ndash1604$
Blott$S$Kim$JBJ$Moisio$S$SchmidtBKuntzel$A$Cornet$A$Berzi$P$Cambisano$
N$Ford$C$Grisart$B$Johnson$D$Karim$L$Simon$P$Snell$R$Spelman$
R$ Wong$ J$ Vilkki$ J$ Georges$ M$ Farnir$ F$ Coppieters$ W$ 2003$
Molecular$ dissection$ of$ a$ quantitative$ trait$ locus$ a$ phenylalanineBtoB
tyrosine$substitution$in$the$transmembrane$domain$of$the$bovine$growth$
hormone$ receptor$ is$ associated$ with$ a$ major$ effect$ on$ milk$ yield$ and$
composition$Genetics$163$253ndash266$
Burton$PR$Clayton$DG$Cardon$LR$Craddock$N$Deloukas$P$Duncanson$A$
Kwiatkowski$ DP$ McCarthy$ MI$ Ouwehand$ WH$ Samani$ NJ$ Todd$
JA$ Donnelly$ P$ Barrett$ JC$ Burton$ PR$ Davison$ D$ Donnelly$ P$
Easton$ D$ Evans$ D$ Leung$ HBT$ Marchini$ JL$ Morris$ AP$ Spencer$
CCA$Tobin$MD$Cardon$LR$Clayton$DG$Attwood$AP$Boorman$JP$
Cant$B$Everson$U$Hussey$JM$Jolley$JD$Knight$AS$Koch$K$Meech$
E$ Nutland$ S$ Prowse$ CV$ Stevens$ HE$ Taylor$ NC$ Walters$ GR$
Walker$NM$Watkins$NA$Winzer$T$Todd$JA$Ouwehand$WH$Jones$
RW$ McArdle$ WL$ Ring$ SM$ Strachan$ DP$ Pembrey$ M$ Breen$ G$
Clair$ DS$ Caesar$ S$ GordonBSmith$ K$ Jones$ L$ Fraser$ C$ Green$ EK$
Grozeva$D$Hamshere$ML$Holmans$PA$Jones$IR$Kirov$G$Moskvina$
V$ Nikolov$ I$ OrsquoDonovan$ MC$ Owen$ MJ$ Craddock$ N$ Collier$ DA$
Elkin$A$Farmer$A$Williamson$R$McGuffin$P$Young$AH$Ferrier$IN$
Ball$SG$Balmforth$AJ$Barrett$JH$Bishop$DT$Iles$MM$Maqbool$A$
Yuldasheva$N$Hall$AS$Braund$PS$Burton$PR$Dixon$RJ$Mangino$
M$ Stevens$ S$ Tobin$ MD$ Thompson$ JR$ Samani$ NJ$ Bredin$ F$
Tremelling$ M$ Parkes$ M$ Drummond$ H$ Lees$ CW$ Nimmo$ ER$
Satsangi$ J$Fisher$SA$Forbes$A$Lewis$CM$Onnie$CM$Prescott$NJ$
Sanderson$J$Mathew$CG$Barbour$J$Mohiuddin$MK$Todhunter$CE$
Mansfield$JC$Ahmad$T$Cummings$FR$Jewell$DP$Webster$J$Brown$
MJ$ Clayton$DG$ Lathrop$GM$ Connell$ J$Dominiczak$A$ Samani$NJ$
Marcano$CAB$Burke$B$Dobson$R$Gungadoo$J$Lee$KL$Munroe$PB$
Newhouse$SJ$Onipinla$A$Wallace$C$Xue$M$Caulfield$M$Farrall$M$
Barton$A$ (braggs)$TB$ in$RG$and$G$Bruce$ IN$Donovan$H$Eyre$S$
69
$
Gilbert$ PD$ Hider$ SL$ Hinks$ AM$ John$ SL$ Potter$ C$ Silman$ AJ$Symmons$ DPM$ Thomson$ W$ Worthington$ J$ Clayton$ DG$ Dunger$DB$ Nutland$ S$ Stevens$ HE$ Walker$ NM$ Widmer$ B$ Todd$ JA$Frayling$TM$Freathy$RM$Lango$H$Perry$JRB$Shields$BM$Weedon$MN$ Hattersley$ AT$ Hitman$ GA$ Walker$ M$ Elliott$ KS$ Groves$ CJ$Lindgren$ CM$ Rayner$ NW$ Timpson$ NJ$ Zeggini$ E$ McCarthy$ MI$Newport$M$Sirugo$G$Lyons$E$Vannberg$F$Hill$AVS$Bradbury$LA$Farrar$ C$ Pointon$ JJ$ Wordsworth$ P$ Brown$ MA$ Franklyn$ JA$Heward$JM$Simmonds$MJ$Gough$SCL$Seal$S$(uk)$BCSC$Stratton$MR$Rahman$N$Ban$M$Goris$A$Sawcer$SJ$Compston$A$Conway$D$Jallow$ M$ Newport$ M$ Sirugo$ G$ Rockett$ KA$ Kwiatkowski$ DP$Bumpstead$ SJ$ Chaney$A$Downes$K$ Ghori$MJR$ Gwilliam$R$Hunt$SE$Inouye$M$Keniry$A$King$E$McGinnis$R$Potter$S$Ravindrarajah$R$ Whittaker$ P$ Widden$ C$ Withers$ D$ Deloukas$ P$ Leung$ HBT$Nutland$S$Stevens$HE$Walker$NM$Todd$JA$Easton$D$Clayton$DG$Burton$PR$Tobin$MD$Barrett$JC$Evans$D$Morris$AP$Cardon$LR$Cardin$NJ$Davison$D$Ferreira$T$PereiraBGale$ J$Hallgrimsdoacutettir$ IB$Howie$ BN$Marchini$ JL$ Spencer$ CCA$ Su$ Z$ Teo$ YY$ Vukcevic$ D$Donnelly$P$Bentley$D$Brown$MA$Cardon$LR$Caulfield$M$Clayton$DG$ Compston$ A$ Craddock$ N$ Deloukas$ P$ Donnelly$ P$ Farrall$ M$Gough$ SCL$ Hall$ AS$ Hattersley$ AT$ Hill$ AVS$ Kwiatkowski$ DP$Mathew$ CG$McCarthy$MI$ Ouwehand$WH$ Parkes$M$ Pembrey$M$Rahman$N$Samani$NJ$Stratton$MR$Todd$ JA$Worthington$ J$2007$GenomeBwide$ association$ study$ of$ 14000$ cases$ of$ seven$ common$diseases$ and$ 3000$ shared$ controls$ Nature$ 447$ 661ndash678$doi101038nature05911$
Cantor$ RM$ Lange$ K$ Sinsheimer$ JS$ 2010$ Prioritizing$ GWAS$ Results$ A$Review$ of$ Statistical$ Methods$ and$ Recommendations$ for$ Their$Application$Am$J$Hum$Genet$86$6ndash22$doi101016jajhg200911017$
Clark$ AG$ Hubisz$ MJ$ Bustamante$ CD$ Williamson$ SH$ Nielsen$ R$ 2005$Ascertainment$ bias$ in$ studies$ of$ human$ genomeBwide$ polymorphism$Genome$Res$15$1496ndash1502$doi101101gr4107905$
70
$
CohenBZinder$M$ Seroussi$ E$ Larkin$ DM$ Loor$ JJ$Wind$ AE$ der$ Lee$ JBH$Drackley$ JK$Band$MR$Hernandez$AG$Shani$M$Lewin$HA$Weller$JI$ Ron$ M$ 2005$ Identification$ of$ a$ missense$ mutation$ in$ the$ bovine$ABCG2$gene$with$a$major$effect$on$ the$QTL$on$ chromosome$6$affecting$milk$yield$and$composition$in$Holstein$cattle$Genome$Res$15$936ndash944$doi101101gr3806705$
Cole$JB$Wiggans$GR$Ma$L$Sonstegard$TS$Lawlor$TJ$Crooker$BA$Tassell$CPV$ Yang$ J$ Wang$ S$ Matukumalli$ LK$ Da$ Y$ 2011$ GenomeBwide$association$ analysis$ of$ thirty$ one$ production$ health$ reproduction$ and$body$ conformation$ traits$ in$ contemporary$ US$ Holstein$ cows$ BMC$Genomics$12$408$doi1011861471B2164B12B408$
Colitti$M$Wilde$CJ$Stefanon$B$2004$Functional$expression$of$bclB2$protein$family$and$AIF$in$bovine$mammary$tissue$in$early$lactation$J$Dairy$Res$71$20ndash27$
Consortium$ TEP$ 2012$ An$ integrated$ encyclopedia$ of$ DNA$ elements$ in$ the$human$genome$Nature$489$57ndash74$doi101038nature11247$
De$Mello$F$Cobuci$JA$Martins$MF$Silva$MVGB$Neto$JB$2012$Association$of$the$polymorphism$g8514CgtT$in$the$osteopontin$gene$(SPP1)$with$milk$yield$ in$ the$ dairy$ cattle$ breed$ Girolando$ Anim$ Genet$ 43$ 647ndash648$doi101111j1365B2052201102312x$
Dudbridge$ F$ Gusnanto$ A$ 2008$ Estimation$ of$ significance$ thresholds$ for$genomewide$ association$ scans$ Genet$ Epidemiol$ 32$ 227ndash234$doi101002gepi20297$
Elsik$ CG$ Tellam$ RL$ Worley$ KC$ 2009$ The$ Genome$ Sequence$ of$ Taurine$Cattle$ A$window$ to$ ruminant$ biology$ and$ evolution$ Science$ 324$ 522ndash528$doi101126science1169588$
Fazioli$ F$ Minichiello$ L$ Matoska$ V$ Castagnino$ P$ Miki$ T$ Wong$ WT$ Di$Fiore$ PP$ 1993$ Eps8$ a$ substrate$ for$ the$ epidermal$ growth$ factor$receptor$kinase$enhances$EGFBdependent$mitogenic$signals$EMBO$J$12$3799ndash3808$
Fontanesi$ L$ Calograve$ DG$ Galimberti$ G$ Negrini$ R$ Marino$ R$ Nardone$ A$AjmoneBMarsan$ P$ Russo$ V$ 2014$ A$ candidate$ gene$ association$ study$
71
$
for$ nine$ economically$ important$ traits$ in$ Italian$ Holstein$ cattle$ Anim$
Genet$nandashna$doi101111age12164$
Fowler$KJ$Walker$F$Alexander$W$Hibbs$ML$Nice$EC$Bohmer$RM$Mann$
GB$ Thumwood$ C$ Maglitto$ R$ Danks$ JA$ 1995$ A$ mutation$ in$ the$
epidermal$growth$factor$receptor$in$wavedB2$mice$has$a$profound$effect$
on$ receptor$ biochemistry$ that$ results$ in$ impaired$ lactation$ Proc$ Natl$
Acad$Sci$92$1465ndash1469$doi101073pnas9251465$
Gianola$ D$ Hospital$ F$ Verrier$ E$ 2013$ Contribution$ of$ an$ additive$ locus$ to$
genetic$variance$when$inheritance$is$multiBfactorial$with$implications$on$
interpretation$ of$ GWAS$ Theor$ Appl$ Genet$ 126$ 1457ndash1472$
doi101007s00122B013B2064B2$
Gillette$ M$ Bray$ K$ Blumenthaler$ A$ VargoBGogola$ T$ 2013$ P190B$ RhoGAP$
Overexpression$ in$ the$ Developing$Mammary$ Epithelium$ Induces$ TGFβB
dependent$ Fibroblast$ Activation$ PLoS$ ONE$ 8$ e65105$
doi101371journalpone0065105$
Grisart$B$Coppieters$W$Farnir$F$Karim$L$Ford$C$Berzi$P$Cambisano$N$
Mni$ M$ Reid$ S$ Simon$ P$ Spelman$ R$ Georges$ M$ Snell$ R$ 2002$
Positional$Candidate$Cloning$of$a$QTL$ in$Dairy$Cattle$ Identification$of$a$
Missense$Mutation$ in$the$Bovine$DGAT1$Gene$with$Major$Effect$on$Milk$
Yield$ and$ Composition$ Genome$ Res$ 12$ 222ndash231$
doi101101gr224202$
Hubbard$NE$Chen$QJ$Sickafoose$LK$Wood$MB$Gregg$ JP$Abrahamsson$
NM$ Engelberg$ JA$ Walls$ JE$ Borowsky$ AD$ 2013$ Transgenic$
Mammary$Epithelial$Osteopontin$(Spp1)$Expression$Induces$Proliferation$
and$ Alveologenesis$ Genes$ Cancer$ 4$ 201ndash212$
doi1011771947601913496813$
Hu$ ZBL$ Park$ CA$ Wu$ XBL$ Reecy$ JM$ 2013$ Animal$ QTLdb$ an$ improved$
database$tool$for$livestock$animal$QTLassociation$data$dissemination$in$
the$ postBgenome$ era$ Nucleic$ Acids$ Res$ 41$ D871ndashD879$
doi101093nargks1150$
Jeon$et$al$ B$2008$ B$ProtoBoncogene$FBIB1$ (PokemonZBTB7A)$Represses$Trpdf$
nd$
72
$
Jiang$ L$ Liu$ J$ Sun$ D$ Ma$ P$ Ding$ X$ Yu$ Y$ Zhang$ Q$ 2010$ Genome$Wide$
Association$ Studies$ for$ Milk$ Production$ Traits$ in$ Chinese$ Holstein$
Population$PLoS$ONE$5$e13661$doi101371journalpone0013661$
Kong$ PBZ$ Yang$ F$ Li$ L$ Li$ XBQ$ Feng$ YBM$ 2013$Decreased$FOXF2$mRNA$
Expression$ Indicates$ EarlyBOnset$ Metastasis$ and$ Poor$ Prognosis$ for$
Breast$ Cancer$ Patients$ with$ Histological$ Grade$ II$ Tumor$ PLoS$ ONE$ 8$
e61591$doi101371journalpone0061591$
Lee$SBU$Maeda$T$2012$POKZBTB$proteins$an$emerging$ family$of$proteins$
that$ regulate$ lymphoid$ development$ and$ function$ Immunol$ Rev$ 247$
107ndash119$
Lemay$ DG$ Lynn$ DJ$ Martin$ WF$ Neville$ MC$ Casey$ TM$ Rincon$ G$
Kriventseva$ EV$ Barris$ WC$ Hinrichs$ AS$ Molenaar$ AJ$ 2009$ The$
bovine$lactation$genome$ insights$ into$the$evolution$of$mammalian$milk$
Genome$Biol$10$R43$
Liu$JZ$Mcrae$AF$Nyholt$DR$Medland$SE$Wray$NR$Brown$KM$2010$A$
versatile$ geneBbased$ test$ for$ genomeBwide$ association$ studies$ Am$ J$
Hum$Genet$87$139$
Liu$S$Dontu$G$Mantle$ID$Patel$S$Ahn$N$Jackson$KW$Suri$P$Wicha$MS$
2006$Hedgehog$Signaling$and$BmiB1$Regulate$SelfBrenewal$of$Normal$and$
Malignant$ Human$ Mammary$ Stem$ Cells$ Cancer$ Res$ 66$ 6063ndash6071$
doi1011580008B5472CANB06B0054$
Lu$ J$ Holmgren$ A$ 2014$ The$ thioredoxin$ superfamily$ in$ oxidative$ protein$
folding$ Antioxid$ Redox$ Signal$ 140202091634000$
doi101089ars20145849$
Luna$ A$ Nicodemus$ KK$ 2007$ snpplotter$ an$ RBbased$ SNPhaplotype$
association$and$linkage$disequilibrium$plotting$package$Bioinforma$Oxf$
Engl$23$774ndash776$doi101093bioinformaticsbtl657$
McCarthy$ MI$ Abecasis$ GR$ Cardon$ LR$ Goldstein$ DB$ Little$ J$ Ioannidis$
JPA$ Hirschhorn$ JN$ 2008$ GenomeBwide$ association$ studies$ for$
complex$traits$consensus$uncertainty$and$challenges$Nat$Rev$Genet$9$
356ndash369$doi101038nrg2344$
Meredith$ BK$ Kearney$ FJ$ Finlay$ EK$ Bradley$ DG$ Fahey$ AG$ Berry$ DP$
Lynn$ DJ$ 2012$ GenomeBwide$ associations$ for$ milk$ production$ and$
73
$
somatic$cell$score$in$HolsteinBFriesian$cattle$in$Ireland$BMC$Genet$13$21$doi1011861471B2156B13B21$
Minozzi$ G$Nicolazzi$ EL$ Stella$ A$ Biffani$ S$Negrini$ R$ Lazzari$ B$ AjmoneBMarsan$ P$Williams$ JL$ 2013$ Genome$Wide$ Analysis$ of$ Fertility$ and$Production$ Traits$ in$ Italian$ Holstein$ Cattle$ PLoS$ ONE$ 8$ e80219$doi101371journalpone0080219$
Nicolazzi$EL$Picciolini$M$Strozzi$F$Schnabel$RD$Lawley$C$Pirani$A$Brew$F$ Stella$ A$ 2014$ SNPchiMp$ a$ database$ to$ disentangle$ the$ SNPchip$jungle$ in$ bovine$ livestock$ BMC$ Genomics$ 15$ 123$ doi1011861471B2164B15B123$
Purcell$ S$ Neale$ B$ ToddBBrown$ K$ Thomas$ L$ Ferreira$ MAR$ Bender$ D$Maller$J$Sklar$P$de$Bakker$PIW$Daly$MJ$Sham$PC$2007$PLINK$a$tool$ set$ for$ wholeBgenome$ association$ and$ populationBbased$ linkage$analyses$Am$J$Hum$Genet$81$559ndash575$doi101086519795$
Qiu$ S$ Lai$ L$ 2013$ Antibacterial$ properties$ of$ recombinant$ human$ nonBpancreatic$secretory$phospholipase$A2$Biochem$Biophys$Res$Commun$441$453ndash456$doi101016jbbrc201310092$
Quinlan$AR$Hall$IM$2010$BEDTools$a$flexible$suite$of$utilities$for$comparing$genomic$ features$ Bioinformatics$ 26$ 841ndash842$doi101093bioinformaticsbtq033$
RamiacuterezBTorres$ A$ BarceloacuteBBatllori$ S$ MartiacutenezBBeamonte$ R$ Navarro$ MA$Surra$ JC$ Arnal$ C$ Guilleacuten$N$ Aciacuten$ S$ Osada$ J$ 2012$ Proteomics$ and$gene$ expression$ analyses$ of$ squaleneBsupplemented$ mice$ identify$microsomal$thioredoxin$domainBcontaining$protein$5$changes$associated$with$ hepatic$ steatosis$ J$ Proteomics$ 77$ 27ndash39$doi101016jjprot201207001$
Raven$ LBA$ Cocks$ BG$ Pryce$ JE$ Cottrell$ JJ$ Hayes$ BJ$ 2013$ Genes$ of$ the$RNASE5$pathway$ contain$ SNP$ associated$with$milk$ production$ traits$ in$dairy$cattle$Genet$Sel$Evol$45$25$doi1011861297B9686B45B25$
Scotti$E$Fontanesi$L$Schiavini$F$La$Mattina$V$Bagnato$A$Russo$V$2010$DGAT1$ pK232A$ polymorphism$ in$ dairy$ and$ dual$ purpose$ Italian$ cattle$breeds$Ital$J$Anim$Sci$9$doi104081ijas2010e16$
74
$
Stefanon$ B$ Colitti$ M$ Gabai$ G$ Knight$ CH$ Wilde$ CJ$ 2002$ Mammary$
apoptosis$and$lactation$persistency$in$dairy$animals$J$Dairy$Res$69$37ndash
52$
Stegh$ AH$ DePinho$ RA$ 2011$ Beyond$ effector$ caspase$ inhibition$ Bcl2L12$
neutralizes$ p53$ signaling$ in$ glioblastoma$ Cell$ Cycle$ 10$ 33ndash38$
doi104161cc10114365$
Taberlet$ P$ Valentini$ A$ Rezaei$ HR$ Naderi$ S$ Pompanon$ F$ Negrini$ R$
AjmoneBMarsan$ P$ 2008$ Are$ cattle$ sheep$ and$ goats$ endangered$
species$Mol$Ecol$17$275ndash284$doi101111j1365B294X200703475x$
Team$RC$ 2014$ R$ A$ Language$ and$Environment$ for$ Statistical$ Computing$ R$
Foundation$for$Statistical$Computing$Vienna$Austria$
Thomadaki$H$ Talieri$M$ Scorilas$ A$ 2007$ Prognostic$ value$ of$ the$ apoptosis$
related$genes$BCL2$and$BCL2L12$in$breast$cancer$Cancer$Lett$247$48ndash
55$doi101016jcanlet200603016$
Valdehita$ A$ Bajo$ AM$ FernaacutendezBMartiacutenez$ AB$ Arenas$ MI$ Vacas$ E$
Valenzuela$ P$ RuiacutezBVillaespesa$ A$ Prieto$ JC$ Carmena$ MJ$ 2010$
Nuclear$ localization$ of$ vasoactive$ intestinal$ peptide$ (VIP)$ receptors$ in$
human$ breast$ cancer$ Peptides$ 31$ 2035ndash2045$
doi101016jpeptides201007024$
Wang$T$Liu$NS$Seet$LBF$Hong$W$2010$The$Emerging$Role$of$VHS$DomainB
Containing$Tom1$Tom1L1$and$Tom1L2$in$Membrane$Trafficking$Traffic$
11$1119ndash1128$doi101111j1600B0854201001098x$
Wang$ X$Wurmser$ C$ Pausch$ H$ Jung$ S$ Reinhardt$ F$ Tetens$ J$ Thaller$ G$
Fries$R$2012$Identification$and$Dissection$of$Four$Major$QTL$Affecting$
Milk$Fat$Content$in$the$German$HolsteinBFriesian$Population$PLoS$ONE$7$
e40711$doi101371journalpone0040711$
Xu$S$2003$Theoretical$Basis$of$the$Beavis$Effect$Genetics$165$2259ndash2268$
Zimin$AV$Delcher$AL$Florea$L$Kelley$DR$Schatz$MC$Puiu$D$Hanrahan$
F$ Pertea$ G$ Tassell$ CPV$ Sonstegard$ TS$ Marccedilais$ G$ Roberts$ M$
Subramanian$ P$ Yorke$ JA$ Salzberg$ SL$ 2009$ A$ wholeBgenome$
assembly$ of$ the$ domestic$ cow$ Bos$ taurus$ Genome$ Biol$ 10$ R42$
doi101186gbB2009B10B4Br42$
75
$
Zu$X$Ma$J$Liu$H$Liu$F$Tan$C$Yu$L$Wang$J$Xie$Z$Cao$D$Jiang$Y$2011$ProBoncogene$ Pokemon$ promotes$ breast$ cancer$ progression$ by$upregulating$ survivin$ expression$ Breast$ Cancer$ Res$ 13$ R26$doi101186bcr2843$ $
76
$
Supplementaryfiles$
You$can$find$Chapter$͵$supplementary$files$at$
- DocTa$(Doctoral$Thesis$Archive)$(httptesionlineunicattit)$
- Google$Drive$(httptinyurlcomPhDMMChapter03)$or$scan$the$QR$code$
$
$
SupplementaryFigureS1Multidimensionalscaling
Spatial$distribution$of$genetic$structure$in$the$analyzed$breeds$$
$
Supplementary$FigureS2GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$Italian$Brown$
$
Supplementary$FigureS3GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$Italian$Holstein$
$
Supplementary$FigureS4GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$HolsteinDGAT$
$
Supplementary$FigureS5GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$Italian$Simmental$
77
$
$Supplementary$FigureS6LDdecayGenomeBwide$LD$decay$for$the$three$breeds$(Holstein$Simmental$Brown)$$Supplementary$FigureS7DescriptiveplotsDescriptive$statistics$for$the$Italian$Brown$Italian$Holstein$and$Italian$Simmental$$Supplementary$FigureS8DGAT1LDFocus$on$the$LD$of$the$DGAT1$region$in$all$the$datasets$(Brown$Holstein$HolsteinDGAT$Simmental)$$Supplementary$TableS1SNPfactNumber$of$subjects$and$SNPs$after$QC$$Supplementary$TableS2GeneBasedAssociationresultsFull$results$of$the$significant$genes$associated$with$the$traits$in$all$the$datasets$$ Sheet$1$Brown$ndash$fat$percentage$$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS3SingleSNPAssociationresultsFull$results$of$the$SNP$that$overcome$the$suggestive$threshold$$$ Sheet$1$Brown$ndash$fat$percentage$
78
$
$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS4GenesandtraitsGeneWise$pBvalues$of$the$discussed$genes$in$all$the$analysis$In$grey$are$evidenced$the$pBvalues$that$overcome$the$significance$threshold$Genes$are$alphabetically$listed$$Supplementary$TableS5GenesinQTLsGenes$from$the$geneBbased$association$mapping$into$known$QTLs$$$$$
79
CHAPTER(4(
MUGBAS(a(species(free(gene(based(association(suite(
S(Capomaccio1(M(Milanesi1(L(Bomba1(et(al(1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122ItalyTheseauthorscontributedequallytothiswork
____________________ApplicationnotesubmittedtoBioinformatics
Abstract(Theassociationstudiesbetweengenomewidemarkersandphenotypes(GWAS)
arenowwidelyusedinmodelandnonmodelspeciesHowevertoolsthatmerge
singlemarkerresultsandtheirprobabilityofbeingassociatedtofunctionalunits
(typicallygenes)areseldomdevelopedforspeciesotherthanhuman
Herewe presentMUGBAS a suite that calculates a generegionbased pvalue
fromagivenannotationsinglemarkerGWAresultandgenotypedatawiththe
objective to ease postGWAS analysis The software is species and annotation
independentfasthighlyparallelizedandreadyforhighdensitymarkerstudies
Availabilityandimplementationhttpsbitbucketorgcapemastermugbas
Introduction(With the availability of highdensity SNP panels Genome Wide Association
Studies (GWAS) have become the gold standard for dissecting the biology of
complextraits(McCarthyetal2008)Thisistrueforhumansandotherspecies
TheunderlyingideaofGWASissimplefindmarkertraitassociationsexploiting
the linkage disequilibrium (LD) that exists between the causative mutation ndash
whichweignorendashandoneormoremarkersofknownchromosomalposition
Whileanumberofwellestablishedapproachescanbeeasilyusedinastandard
GWAS (Aulchenko et al 2007 Kang et al 2010 Purcell et al 2007) to find
marker(s) associated to a particular phenotype several issues have to be
accountedforduringtheupanddownstreamanalyses
Firstly genome wide studies have to take into account multiple testing If
stringent statistical thresholds are applied a proportion ofmoderate or small
effectsactuallyassociatedwiththetraitwillbedisregardedThereforereducing
the number of tests while maintaining all the information provided by high
densitypanelswouldbeaclearadvantageInadditiononceasignificantsignalis
detected different strategiesmaybeused to tag the candidate causative gene
Sometimesthegeneclosertothepeaksignalisselectedinothercasesallgenes
withinaflankingregionofanarbitrarysizeofthesignificantSNP(s)areincluded
82
infurtheranalysesThesechoicesmayendupinbiasresultseitherretaininga
largenumberofnottrulyassociatedgenesordisregardingthecorrectones
Somemethodshaverecentlybeenproposedtoovercometheselimitationsusing
aprioriknowledgeastheldquocandidatepathwayrdquoanalysis(Ravenetal2013)and
the ldquogenebasedrdquo association strategies (Akula et al 2011 Cantor et al 2010
Liuetal2010)TheseapproachesrestrictGWAstudiesonlytoknowngenesor
genesubsetsresizingthemultipletestingproblemandincreasingthepowerof
theanalyses
Usually bioinformatics pipelines are developed for the analysis of human and
modelorganismsandarenotadaptedtononmodelorganismThisisthecaseof
the VEGAS software (Liu et al 2010 Yang et al 2014) developed only for
humansandonlyforgenebasedanalyses
Here we introduce MUGBAS [MUlti species GeneBased Association Suite] a
softwarebasedonVEGASthatusesGWASdataandagivenannotation(typically
genesbutanyregionmaybesuitable)toestimategenebasedandregionbased
associationpvaluesThesuite is speciesandannotation free ready toanalyze
highdensitydatafromanyexperimentaldesigns
Methods(MUGBASisasuiteofscriptsbuiltinBashRPythonandPerlThesuiterequires
several prerequisites that can be easily fulfilled in a LinuxUnixMac
environmentwith a few command lines Further information on how tomeet
theserequirementsaredetailedintheonlinerepository
MUGBAS suite can be invoked launching the Bash wrapper that checks
dependenciesandstoresuserchoices
Required input files are MAPPED files in PLINK format with genotype data
from(atleast200individualsarerecommendedforaproperLDcalculation)an
annotation file inBED format (chromosome startingposition endingposition
nameandusercustomfields)ofthedesiredgenomefeatureandtheGWASresult
file with mandatory headers ldquoSNPNAMErdquo and ldquoPVALUErdquo Multiple association
resultscanbetestedatthesametime ifprovidedinseparatefiles inthesame
directoryAtrialdatasetisdownloadablealongwiththesoftwarereleasePlease
83
notethatforsimplicityweprovideageneannotationbutanyBEDfilewithany
desiredfeaturecanbeused
The program starts the analysis assigning SNPs to any feature of the given
annotationusingbedtools(QuinlanandHall2010)Thisstepiscustomizableby
user defined boundaries (up and downstream) in order to catch regulatory
regionsthatcouldbeinhighLDwiththecurrentgeneregion
Inordertospeeduptheanalysistheprogramsubsamples200individualsfrom
theinputdatasetWethenimplementedtwolevelsofparallelizationoneforthe
LDandonefortheldquogenewiserdquoorldquoregionwiserdquopvaluecalculationThelatteris
particularlyusefulanalyzingmorethanoneGWAresultsfromthesamedataset
BrieflyMUGBASfirstsplitstheLDcalculationinncores(asdefinedbytheuser)
using the foreach doParallel packages (Analytics andWeston 2014a 2014b)
and then distributes traits across cores using GNU Parallel and foreach
doParallel
SinceLDcalculationisaslowprocessVEGASbenefitsofHapMapphase2datato
speed up the processwhile preserving the option for a user defined (human)
population using PLINK In order to expand these analysis to any species LD
calculationinMUGBASisimplementedwithther2fast()functionfromGenABEL
package(Aulchenkoetal2007)whilethegenewisepvalueiscalculatedasin
theVEGASapproachBrieflythegenewiseteststatisticisthesumofthenupper
tailchisquaredvaluesofaSNPsubsetwithonedegreeoffreedom(wheren is
thenumberofSNPmappinginthegivengene)Withperfectlinkageequilibrium
the genewise pvalue would be the onetailed pvalue of a chisquare
distributionwithndegreesof freedomOtherwisemorelikelythedistribution
of the pvalues has to be evaluated (with simulations) andweighted (with LD
values)(Liuetal2010)
Once every generegion has its ldquogenewiserdquo pvalue a FDR qvalue (False
Discovery Rate) statistics is calculated with the base R function padjust()
Results are stored and enrichedwith ldquomanhattanrdquo and ldquoundergroundrdquo plots of
theldquogenewiserdquopvalueandtheqvaluerespectively
84
Results(MUGBAS output is generegionoriented with useful information about the
location of the analyzed feature the ldquoBest SNPrdquo within that feature with its
genomiccoordinatesandtheoriginalpvaluethegeneorregionwisestatistics
(pvalue number of simulations and qvalue) the custom annotation of that
particularentryandthepositionoftheSNPwithrespecttothefeatureandthe
decidedboundariesAll this informationareuseful to track theeffectofhighly
significant markers that could map into multiple region of interest and
effectively pinpoint the most probable location to focus on with data mining
processes
A typical analysis performed with a highdensity panel (gt600K markers)
distributed in 10 midperformance cores (Intel Xeon X5675 307 GHz) on
threetraitssimultaneouslytakesaboutfourhours
Discussion(Generegionbased methods are now widely used and accepted in genomic
research However non modelspecies suffer for the lack of availability of
specifictoolstoenhancediscoveryfromgenomewidedataMUGBASprovidesa
robust and accepted statistics to researchers dealing with species other than
humantoeasilyjumpfromassociatedmarkerstothedesiredfunctionalunits
Acknowledgements(((WewouldliketothankDrJimmyLiuforhisadvicesinbuildingthesuite
Reference(Akula N Baranova A Seto D Solka J NallsMA Singleton A Ferrucci L
TanakaTBandinelli SChoYSKimYJLee JYHanBGBipolar
Disorder Genome Study (BiGS) Consortium The Wellcome Trust Case
85
ControlConsortiumMcMahonFJ2011ANetworkBasedApproachtoPrioritize Results from GenomeWide Association Studies PLoS ONE 6e24220doi101371journalpone0024220
Analytics R Weston S 2014a doParallel Foreach parallel adaptor for theparallelpackage
AnalyticsRWestonS2014bforeachForeachloopingconstructforRAulchenkoYSdeKoningDJHaleyC2007GenomewideRapidAssociation
Using Mixed Model and Regression A Fast and Simple Method ForGenomewidePedigreeBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614
Cantor RM Lange K Sinsheimer JS 2010 Prioritizing GWAS Results AReview of Statistical Methods and Recommendations for TheirApplicationAmJHumGenet866ndash22doi101016jajhg200911017
KangHMSulJHServiceSKZaitlenNAKongSFreimerNBSabattiCEskin E 2010 Variance component model to account for samplestructure in genomewide association studies Nat Genet 42 348ndash354doi101038ng548
LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKM2010Aversatile genebased test for genomewide association studies Am JHumGenet87139
McCarthy MI Abecasis GR Cardon LR Goldstein DB Little J IoannidisJPA Hirschhorn JN 2008 Genomewide association studies forcomplextraitsconsensusuncertaintyandchallengesNatRevGenet9356ndash369doi101038nrg2344
Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender DMallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKatool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795
QuinlanARHallIM2010BEDToolsaflexiblesuiteofutilitiesforcomparinggenomic features Bioinformatics 26 841ndash842doi101093bioinformaticsbtq033
86
Raven LA Cocks BG Pryce JE Cottrell JJ Hayes BJ 2013 Genes of the
RNASE5pathway contain SNP associatedwithmilk production traits in
dairycattleGenetSelEvol4525doi101186129796864525
87
CHAPTER5
Imputationaccuracyisrobusttocattlereferencegenomeupdates
MMilanesi1etal1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122Italy
____________________ShortCommunicationacceptedinAnimalGenetics
SummaryGenotype imputation is routinely applied in a large number of cattle breeds Imputation
hasbecomeaneedduetothelargenumberofSNParrayswithvariabledensity(currently
from2900to777962SNPs)Althoughmanyauthorshavestudiedtheeffectofdifferent
statisticalmethodsonimputationaccuracytheimpactofa(likely)changeinthereference
genomeassemblyonimputationfromlowertohigherdensityhasnotbeendeterminedso
far In this work 1021 Italian Simmental SNP genotypes were reJmapped on the three
mostrecentreferencegenomeassembliesFour imputationmethodswereusedtoassess
theimpactofanupdateinthereferencegenomeAsexpectedthefourmethodsbehaved
differentlywith largedifferences in termsof accuracyUpdatingSNPcoordinateson the
three tested cattle reference genome assemblies determined only a slight variation on
imputationresultswithinmethod
KeywordsImputationreferencegenomeassemblySNPbovine
MaintextFromthepublicationof the firstbovinesinglenucleotidepolymorphism(SNP)array the
wholegenomicscenarioofthisspeciesevolvedrapidlyNowadaysseveralSNPpanelshave
been designed with a range of densities mapping on different reference genome
assemblies and produced by different manufacturers When running analyses with SNP
arrays of different densities or from different commercial companies (eg Illumina and
Affymetrix) an imputation step is often required The impact of different statistical
methods on imputation accuracy has already been assessed (Pei et al 2008) (Marchini
andHowie2010)(Maetal2013)Surprisinglyuptonowtherearenotavailablestudies
investigating the impact of an update on the reference genome assembly on imputation
accuracy This assessment would be of fundamental importance considering that reJ
mappingSNPsdeterminesarearrangementofthehaplotypestructureofthegenotypes(in
somecasesSNPsarereJmappedtoadifferentchromosome)andthatthebovinereference
assembly is likely toberepeatedlyupdatedover timeThisworkwasaimedatassessing
the effect of different reference genome assemblies in cattle As test casewe compared
four different imputation methods from low (Illumina BovineLDv10 Beadchip) to
90
mediumhigh (Illumina BovineSNP50v2 Beadchip) density in the Italian Simmental
populationForeachmethodweassessedthe impacton imputationaccuracyofmapping
SNPs on the BTAU42 BTAU46 (wwwhgscbcmeduotherJmammalsbovineJgenomeJ
project)orUMD31(Ziminetal2009)cattlereferencegenomeassemblies
Atotalof900and121SimmentalbullsweregenotypedwiththeIlluminaBovineSNP50v1
and v2 Beadchips (Illumina Inc San Diego CA) respectively Illumina BovineSNP50v2
Beadchip (hereinafter named 54k) the official reference chip for the Italian Simmental
breedersassociation(ANAPRI)wasconsideredasreferenceSNPpanelTheSNPchiMpv1
tool(Nicolazzietal2014)wasusedtoobtainSNPcoordinatesonBTAU42UMD31and
BTAU46genomeassembliesForeachassemblySNPsnotcontainedinthe54kunmapped
orinthesexualchromosomeswerediscardedThefinaldatasetcontained5118952886
and51290SNPsinBTAU42UMD31andBTAU46respectively
FourimputationmethodswereconsideredPedImpute(Nicolazzietal2013)FindHapv2
(VanRadenetal2011)FImpute(Sargolzaeietal2014)andBeaglev332(Browningand
Browning2007)Thefirstthreeusepedigreeandpopulation(ie linkagedisequilibrium
LD) informationwhereas the fourth uses only LD information Performance of the four
methods testedacrossgenomicassemblieswas comparedusingwrongly imputedalleles
(ldquoerrorsrdquo)andallelicsquaredcorrelation(ldquoallelicJR2rdquo)statisticsAllelicJR2wasdefined
asthesquaredcorrelationbetweenimputedandtrue(eggenotyped)allelesmultipliedby
100 as in Browning amp Browning (2009) The 878 oldest sires were used as training
population whereas the 143 youngest bulls were used as test population SNPs not in
commonwith the lowdensity panelwere set tomissing in the test population The test
population was split in four scenarios individuals with sire and maternal grandsire
genotypedinthereferencepopulation(scenarioAJ84animals)individualswithonlythe
sire genotyped (scenario B J 34 animals) individuals with only the maternal grandsire
genotyped(scenarioCJ13animals)ornoneofcloserelativesgenotyped(scenarioDJ12
animals)Thepopulationusedwashighlystructuredwithnearlyhalfofthebullspresent
in the dataset (490) sired by 115 genotyped sires In addition 35 out of these 115
genotypedsiresweresiredby22bulls(eggrandsires)alsopresentinthedataset
DifferencesacrossassembliesarenothomogeneousFromBTAU42toBTAU4646SNPs
weremapped to different chromosomes and the average difference in physical position
withinchromosomeswasnearly025MbpHowevertherewasnosubstantialdifferencein
the order of markers In fact not considering unmapped SNPs on either of the
aforementionedassembliesadifferenceinthesequentialorderwasobservedforonly134
91
single (ie spread throughout the genome) SNPs On the contrary from UMD31 to
BTAU46 222 SNPsweremapped to different chromosomes and amuch larger average
differenceinSNPsphysicalpositionwasobserved(iemorethan2Mbp)Alsothenumber
ofSNPswithdifferentsequentialorderwasmuchlarger(ie1893SNPs)Howevermore
than50oftherearrangementsinvolvedblocksfrom3tonearly100markersAsaresult
slight differences in overall imputation accuracy from BTA46 to UMD31 rather than
between the two BTAU assemblies were expected across methods On the contrary if
rearrangements are due to issues in the assembly of scaffolds theymight have a direct
(negative) impact on local imputation performances In fact a preliminary analysis on
BTA1consideringlocalerroracrossassembliesandmethodssuggestatleasttwosuch
SNPsclustersat44and138MbponUMD31(FiguresS1S2andS3)
Considering allmethods and scenarios amaximum average variation of 02 inerrors
(Table1)andof09inallelicJR2(Table2)wasobservedacrossreferenceassembliesThe
standarddeviation(SD)oftheimputationaccuracywasstableacrossreferenceassemblies
withinmethodsOnthecontraryalargervariabilitywasobservedacrossmethods
PedImpute and FindHap showed a relatively large average allelicJR2 variability across
scenarios evaluated over the same reference assembly Only PedImpute showed a large
variation in SD across scenarios in errors and allelicJR2 with high variability on
scenarios C and D showing a low performance of thismethodwhen there are no close
relatives(egsiredam)genotypedinthereferencedatasetFImputeinsteadachievedthe
highestaccuracyacrossallreferencegenomesandscenariossurprisinglyobtainingitsbest
resultsJerrorsof14(06SD)andallelicJR2of955(26SD)JinscenarioCwhereonly
onerelativelycloserelativeisgenotyped(maternalgrandJsire)FImputeappeartobeable
todetectshorterhaplotypesfromdistantrelativesmoreaccuratelythananyothermethod
tested in this study thus it is lessdependenton theavailabilityofpedigree information
thantheothertwopedigreeJbasedmethods(Sargolzaeietal2010)
The large variability across methods within assembly was expected although the
differences obtained among PedImpute FindHap and Beagle were larger in this study
comparedtoasimilarstudyinItalianHolstein(Nicolazzietal2013)Thiscoulddependon
a number of factors First the sample size of the test population is small (especially for
groups C and D where less than 15 individuals were present) Second the interaction
between population structure and imputationmethodsmay have a role PedImpute and
FindHap rely much more heavily on pedigree information than FImpute and Beagle
assumesall individualsareunrelatedHowever the lattertwomethods indirectlybenefit
92
greatly of from highly structured populations with large number of relatives ndash close or
distantndashpresent inthereferencepopulation(Sargolzaeietal2014)Anotherinteresting
factor that influences the imputation accuracy may be the persistence of LD at long
distances this is much lower in the Italian Simmental than in the Italian Holstein
population(FigureS4)Informationonrelatives(eitherpedigreeinformationorgenotyped
relativesinthereferencepopulation)isexpectedtoplayabiggerroleifLDisconservedat
shorter distances This hypothesis is supported by the fact that although there is high
variability acrossmethods better performances of the pedigreeJbasedmethods are still
observedinscenariosAandB(andCforFImpute)
Theslightdifferencesinimputationaccuracyobservedacrossreferenceassembliesmaybe
explained by the aforementioned small variation in the SNP blocks across reference
assemblies irrespectively of the rearrangements of single SNPs Considering that SNP
blocksaremostlymaintainedweexpectthatbreedswithlargerLDpersistenceatlonger
distances (ie Holstein Brown) will obtain even better imputation accuracy across
assembliescomparedtowhatreportedinthisstudyinItalianSimmentalHoweverastill
relativelyunexploredaspectof imputationperformance is thedistributionof imputation
errors across the genomeNot considering genotyping errors (which are expected to be
randomly distributed in the genome) a cluster of consecutive markers with high
percentage of imputation error across methods are most probably identifying missJ
assembledregions in thegenome(FiguresS1S2andS3)Althoughthe impactonoverall
imputation accuracy is limited as results in this study show these errorsmight have a
large impact on any analyses relying on specific genomic coordinates (eg genomeJwide
associationstudies)Morecomprehensiveresultsincludingannotationstudiesandwhole
genomesequencingdataarerequiredtoconfirmandinvestigatetheseobservations
Toconcludeonlyasmalleffectofupdatingthereferencegenomeassemblyonimputation
accuracywas observed in Italian Simmental On the other hand as already reported by
other studies the variability of imputation accuracy estimates is much larger across
imputation methods The choice of the most suitable imputation method based on the
breedgeneticbackgrounditscharacteristicsandontheavailabledataisthereforecritical
AcknowledgementsThe research leading to these results has received funding from the European Unions
Seventh Framework Programme (FP72007J2013) under grant agreement ndeg 289592 ndash
93
Gene2FarmPartofthegenotypicdatausedinthisstudywasprovidedbyprojectldquoSelMolrdquo
fundedbytheItalianMinistryofAgriculture(MiPAAF)MMwassupportedbytheDoctoral
SchoolontheAgroJFoodSystem(Agrisystem)of theUniversitagraveCattolicadelSacroCuore
(Italy) FB was supported by the Marie Curie European Reintegration Grant
ldquoNEUTRADAPTrdquo(FP7JPEOPLEJ2009JRG)
ReferencesBrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingandMissingJ
Data Inference for WholeJGenome Association Studies By Use of Localized
HaplotypeClusteringAmJHumGenet811084ndash1097
MaPBroslashndumRFZhangQLundMSSuG2013Comparisonofdifferentmethods
forimputinggenomeJwidemarkergenotypesinSwedishandFinnishRedCattleJ
DairySci964666ndash4677doi103168jds2012J6316
Marchini J Howie B 2010 Genotype imputation for genomeJwide association studies
NatRevGenet11499ndash511doi101038nrg2796
NicolazziELBiffaniSJansenG2013ShortcommunicationImputinggenotypesusing
PedImputefastalgorithmcombiningpedigreeandpopulationinformationJournal
ofDairyScience962649ndash2653doi103168jds2012J6062
NicolazziELPiccioliniMStrozziFSchnabelRDLawleyCPiraniABrewFStella
A 2014 SNPchiMp a database to disentangle the SNPchip jungle in bovine
livestockBMCGenomics15123doi1011861471J2164J15J123
PeiYLiJZhangLPapasianCJDengH2008Analysesandcomparisonofaccuracyof
differentgenotypeimputationmethodsPLoSONE3551
VanRadenPMOrsquoConnellJRWiggansGRWeigelKA2011Genomicevaluationswith
manymoregenotypesGenetSelEvol43
94
Tableamp1ampIm
putationampaccuracyampacrossampassem
bliesampforampPedImputeampFindH
apampFImputeampandampBeagleamppercentageampofampwronglyampim
putedamp
allelesamp
PedImputeamp
FindHapamp
FImputeamp
Beagleamp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
Scenarioamp
Namp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
A1amp
84
20
07
21
07
21
07
32
09
32
09
32
09
15
09
15
09
15
09
24
11
25
11
25
11
B2amp
34
23
08
23
09
22
08
36
11
36
12
36
11
15
07
15
09
14
06
23
08
23
08
23
08
C3amp
13
98
41
98
41
98
41
51
10
52
10
51
10
14
06
14
06
14
06
23
06
23
06
24
06
D4 amp
12
88
49
89
49
88
49
54
11
56
12
54
11
16
11
17
12
16
11
26
15
26
15
26
14
1 Sireandm
aternalgrandsiregenotyped
2 Siregenotypedmate
rnalgrandsireno
tgenotyped
3 Sireno
tgenotypedmate
rnalgrandsiregenotyped
4 Sireandm
aternalgrandsireno
tgenotyped
Tableamp2ampIm
putationampaccuracyampacrossampassem
bliesampforampPedImputeampFindH
apampFImputeampandampBeagleampallelicJR2amp
PedImputeamp
FindHapamp
FImputeamp
Beagleamp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
Scenarioamp
Namp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
A1amp
84
941
20
940
20
941
20
907
24
908
25
907
24
952
26
952
27
952
27
925
33
923
33
925
33
B2amp
34
935
24
934
25
935
24
896
30
896
34
895
32
954
20
954
21
954
20
929
24
928
25
928
24
C3amp
13
759
89
761
88
758
87
848
33
845
30
848
32
955
18
955
19
955
19
931
19
928
19
927
19
D4 amp
12
782
113
777
113
781
112
841
32
832
35
839
33
949
34
947
38
949
35
921
44
919
45
921
44
1 Sireandm
aternalgrandsiregenotyped
2 Siregenotypedmate
rnalgrandsireno
tgenotyped
3 Sireno
tgenotypedmate
rnalgrandsiregenotyped
4 Sireandm
aternalgrandsireno
tgenotyped
95
Supplementaryfiles
YoucanfindChapterͷsupplementaryfilesat
- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)
- GoogleDrive(httptinyurlcomPhDMMChapter05)orscantheQRcode
SupplementaryFigureS1ErrorpercentageacrossmethodsforBTAU42onBTA1
SupplementaryFigureS2ErrorpercentageacrossmethodsforUMD31onBTA1
SupplementaryFigureS3ErrorpercentageacrossmethodsforBTAU46onBTA1
SupplementaryFigureS4Linkagedisequilibrium(LD)decayoverdistanceinItalian
SimmentalandHolstein
ThedottedlinewithblackdotsindicatesLDdecayoverdistanceinItalianHolstein
whereasthedashedlinewithwhitediamondsindicatesthesamevaluesforItalian
SimmentalItalianHolsteingenotypeswereasubsetofthedatausedinNicolazzietal
(2013)
96
CHAPTER6
QuestfordeleteriousrecessivevariantsinItalianHolsteinbullscombiningexomesequencingandhighdensitySNPanalysis
MMilanesi1etal
1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly
____________________Manuscript
Abstract
Outbreed species carry a genetic load of deleterious recessive variants Those
that are lethal survive only in the heterozygous state in the population while
otherssimplyreducethefitnessofindividualshomozygousforthevariantThe
long=termdestinyofthesemutationsistodecreaseinfrequencyduetonegative
selectionandeventuallydisappearbecauseofgeneticdriftHowever theymay
bemaintainedinlivestockpopulationsandevenincreaseinfrequencywhenin
linkage with favourable alleles for traits under selection or occurring in the
genomeofsireshavinghighgeneticvalueforproductivetraitsWesearchedfor
these variants combining exome sequence data from 20 Italian Holstein bulls
sampledfromoppositetailsofthemaleandfemalefertilityEBVdistributionthe
HighDensity(HD)SNPgenotypingof1009progenytestedItalianHolsteinbulls
and information existing in public databases Exome sequencing detected
202956 variants Among these 5167 were classified as ldquodeleteriousrdquo when
annotatedwithSIFTandAnnovarThesewerefilteredbypickingthosemapping
in regions out of equilibrium and lack of homozygotes in the progeny tested
Holstein bulls or having an asymmetric distribution in bulls plus=variant and
minus=variant for fertility EBV The resulting 193 variants in 163 genes were
furtherassessedbycomparativegenomicshaplotypeanalysisandfunctionFor
eachassessmentapositiveornegativescorewasgiventoevidenceinfavouror
against thevariant tobedeleteriousA totalof35variantshadapositive total
score These genes are involved in development cell motility and immune
responseSomeof themareknowntobeexpressed in the testiclesandappear
importantforspermfertilityinmodelspeciesothersareresponsibleforgenetic
defectsinhumanandmousecarryingmutationssimilartotheonesdetectedin
cattleThesegenevariantsarecandidatetobeamongthosethatconstitutethe
ldquogeneticloadrdquoofdeleteriousrecessivegenesintheHolsteinpopulation
100
Introduction
The presence of deleterious recessive variants is well tolerated by outbred
speciesWhenthesevariantsappeartheymayspreadandsurvivealongtimein
the population in the heterozygous state (Simmons and Crow 1977) It is
estimatedthatoutbredindividualshumanincludedcarryonaverageagenetic
loadofmorethan100loss=of=functionvariantsintheirgenome(MacArthuretal
2012)
Theeffectofdeleterious recessivevariantsappearswhendemographicevents
eggeneticbottlenecksreducethegeneticvariabilityofthepopulationorwhen
relatedindividualsareintentionallymatedforbreedingpurposesInthesecases
recessive deleterious reduce the fitness of individuals carrying them in the
homozygous state and in extreme case induce very severe genetic defects
premature death and abortion (McClure et al 2014 Sahana et al 2013
VanRadenetal2011)Theythereforecontributetoexplainthephenomenonof
inbreeding depression and its negative effect on fertility (Charlesworth and
Willis2009)
The search for deleterious recessive variants is particularly interesting in
Holstein cattle Thisbreed is experiencing anegative genetic trend for fertility
traits (Jamrozik et al 2005) it is worth noting that this is occurring while
inbreedingisunderstrictmonitoringandistheoreticallyincreasingataverylow
pace(about005peryear=Mrodeetal2009)Inadditionmatingisplannedto
minimise relationship between animals and the additive genetic variance for
mosttraitsndashincludingmilkproduction=isapparentlynotdecreasinginspiteof
the selection pressure applied by modern schemes (Biffani et al 2002)
indicatingthatoveralldiversityisonlyslightlydecreasing
SelectiontoincreasefertilityischallengingSelectioncriteriagenerallyadopted
to improve this trait are far from the biological mechanisms that govern
reproduction Inaddition thesemechanismsarecomplexandverysensitive to
environmental(management)conditions(Laneetal2013Wathesetal2014)
Asa result fertility traitshave loworvery lowheritability (Pryceetal1997)
andthereforelowresponsetoselectionandslowgeneticprogress
101
Forthesereasons thequest formolecularvariants influencing fertility traits is
on=goingsincetheavailabilityofmoleculartoolsforDNAanalysisAnumberof
investigations tested the effect of functional candidate genes running genome
wide association analyses to find QTLs for fertility traits using hundreds
microsatellite markers in structured families (Ashwell et al 2004) or tens of
thousand SNP panels in whole populations (Aston and Carrell 2009) The
availability of medium density SNP panels allowed the search for deleterious
recessivesevenwithouttheuseofphenotypictraitsbyscanningthegenomein
searchofregionslackingoneofthetwohomozygousclasses(Sahanaetal2013
VanRaden et al 2011) This approach has highlighted a number of genomic
regions likely carrying deleterious recessives This information has immediate
application inmolecular assisted breeding and a relevant scientific interest in
thesearchofcausativevariantsandassessmentoftheirfunctions
Thedevelopmentofnewsequencingtechnologieshasgreatlydecreasedthecost
of sequencing so that nowadays the sequencing of few target individuals is
becoming the gold standard to seek for causal mutations for rare monogenic
defectsinhumansandotherspecies(ChunandFay2009)Oftenonlytheexome
the protein coding part of the genome is sequenced searching for those
mutationsthatseverelyalterproteinstructureandorfunction(Bamshadetal
2011 Li et al 2013) A typical mammalian exome consists in a few tens of
millionnucleotides(around2ofthereferencegenome)comparedtothefew
billionofthewholegenomeThereforewiththesamenumberofreadsexome
sequencinghastheadvantageofamoreaccurategenotypingdueto thehigher
sequencing depth an easier data management lower computation load and
easier interpretation (Bieseckeretal2011Majewskietal2011) Indeed the
numbervariants found inexomesareone=twoordersofmagnitude lowerthan
thatdetectedinwholegenomesandmutationsaredetectedinannotatedgenes
orimmediatelynearbyOntheotherhandtheexomeapproachcanonlyidentify
variantson annotatedgenesVariants inunknownexonsor that occur innon=
coding regulatory features will be missed (Bamshad et al 2011) In humans
these are turning out to be of outmost importance for gene regulation and
downstreamphenotypic effects (ENCODEConsortium2012) but they are still
verypoorlyannotatedinlivestock
102
Herewesequencedtheexomesof20Holsteinbullsprogenytestedselectedfrom
the tails of male and female fertility EBVs (Estimated Breeding Value) Since
livestockfertilityisacomplextraitandcausativemutationscannotbesoughtby
the approach used in monogenic diseases we integrated the sequence
information with high density SNP chip data obtained on a much larger
populationofItalianHolsteinbullstodetectgenescarryingallelescandidateto
be deleterious recessives and confirm their likely deleterious effect by
comparativegenomics
Materialsandmethods
1SamplingandDNAextraction
Semen(straws)andbloodsampleswereprovidedbyItaliancertificatedartificial
insemination centres (UE Directive 88407CEE) thus ethics committee
approvalforthisstudyisnotrequired
A total of 20 ItalianHolstein progeny tested bullswere sampled for complete
exome sequencing Among them 19were chosen from the tails of female and
male fertilityEBV(EstimatedBreedingValue)distribution (Table1) tohave5
animals fromeach tail One animalwas to be in theminus=variant tail of both
traitsBullswereselectedasunrelatedaspossibleSamplingwaspossiblethanks
tothecollaborationwithANAFI(ItalianHolsteinBreederAssociation)
ANAFIdevelopedageneticevaluationfor fertilitycombiningdifferenttraitsas
explained by Biffani and colleagues (Biffani et al 2005) The traits are ldquodays
from calving to first inseminationrdquo ldquocalving intervalrdquo ldquofirst=service non return
rateto56daysrdquoldquoangularityrdquoandldquomatureequivalentmilkyieldat305daysrdquo
MalefertilityisexpressedasanEstimatedRelativeConceptionRate(ERCR)and
isameasureofconceptionrateofaservicesirerelativetoservicesiresofherd
mates(Soggiuetal2013)
103
Table1FemaleandmalefertilityEBVsofthe20animalssequenced
Animal EBVFemaleFertility EBVMaleFertility GroupfriA01 97 =111 NAfriA02 98 =166 MalLfriA03 115 0 FemHfriA04 114 0 FemHfriA05 114 0 FemHfriA06 86 0 FemLfriA07 98 195 MalHfriA08 105 213 MalHfriA09 95 =154 MalLfriA10 95 149 MalHfriA11 102 264 MalHfriA12 91 =475 FemLMalLfriA13 105 =179 MalLfriA14 102 207 MalHfriA15 99 =405 MalLfriA16 85 0 FemLfriA17 87 0 FemLfriA18 121 0 FemHfriA19 117 0 FemHfriA20 88 0 FemL
A total of 1009 Italian Holstein bulls were chosen from the entire population
balancing theirpresence in the tailsof theEBV forkgprotein (PROT) somatic
cell count (SCC) fertility (FERT) and functional longevity (LONG) for high
densitygenotyping(TableS1)
DNAwasisolatedfromwholebloodandsemenstrawssamplesinLGSlaboratory
(CremonaItalywwwlgscrit)usingtheGenomixkitaccordingtoLGSinternal
protocolDNAsforexomecaptureandsequencingwereextractedindependently
tosatisfy therequirement indicatedby IGA technologies (Udine Italy)at least
12mgofDNAataconcentration100ngulwithR260280higherthan18and
R260230near19
2Moleculardataproductionandqualitychecking
21Exomecaptureandsequencing
104
Exomes were captured using the ldquoSureSelected All exon Bovinerdquo (Agilent
Technologies) kit in early version Probes coordinates are based on UMD31
bovinereferencesequence(Ziminetal2009)anddesignedagainstNCBIexons
as well as predicted exons UTRs and miRNAs Number and distribution of
probesperchromosomearedetailedinTableS2
rdquoOvation Ultralow Library Systemsrdquo protocol was followed for library
preparation After quantification with Qubit (Life technologies) and quality
controlwith2100Bioanalyzer(AgilentTechnologies)DNAwassonicatedwith
Bioruptor (Diagenode) to obtain fragments between 150 to 200 bp in length
blunt end=repaired adenylated ligated with paired=end adaptors and purified
followingtheprotocolprovidedbythemanufacturer
LibrarieswerehybridizedwithldquoSureSelectedAllexonBovinerdquoprobesusingthe
Encore Target Capture Module (Nugen) SureSelect Target Enrichment and
Herculase II FusionEnzymewithdNTPCombo (AgilentTechnology) kit Then
capturedDNAwasamplifiedandtaggedwithlibraryspecificindexesQualityof
capturedsampleswastestedwithQubitand2100Bioanalyzer
Paired=end 2X100bp sequencing was performed on Illumina HiSeq2500
(IlluminaSanDiegoCA)atanexpectedcoverageof50X
22Sequencealignment
Adapters and reads shorter than 36bp were trimmed using Trimmomatic
(version032)(Bolgeretal2014)ForeachFastqfilethealignmentandpairing
of the reads to theUMD31 reference sequencewas performedwithBurrows=
WheelerAligner(BWA)(version077=r441)(LiandDurbin2009)applyingalso
trimmingoption (=q5) SAM files resultedwere converted intoBAM file using
SAMtools (version0118) (Lietal2009)Foreach individualBAM fileswere
merged in a single one with Picard (version 1111)
(httpsgithubcombroadinstitutepicard)Duplicatereadswereidentifiedand
markedwithPicardFinallyindividualBAMfileswereindexedwithSAMtools
105
Sequencedepthwasevaluatedsimultaneouslyonall20sequencedanimalsand
onlyonregionstargetedbyprobesusingDepthOfCoverageofGATK(McKennaet
al2010)
23ExomeVariantcallingandqualitycheck
Singlenucleotidepolymorphisms(SNPs)insertionsanddeletions(InDels)were
calledsimultaneouslyonall20sequencedanimalsandonlyonregionstargeted
byprobesusingUnifiedGenotyperofGATK(DePristoetal2011)
All variantswere filteredusingVariantFiltration ofGATK to removeoneswith
lowmappingandorsequencingqualityThechoiceoffiltersandthresholdswas
made in accordance to ldquoGATK best practicerdquo for exome sequencing and
distributionoffiltervalues
bull ldquoSNPclusterrdquoexcludedvariantifmorethan3variantswerefoundin10
bp
bull ldquoBaseQualityRankSumTestrdquoexcludedifitrsquoslowerthen=5ormorethan
5
bull ldquoMappingQualityRankSumTestrdquoexcludedif itrsquoslowerthen=7ormore
than6
bull ldquoRead Position Rank Sum Testrdquo excluded if itrsquos lower then =4 or more
than4
bull ldquoFisherStrandrdquoexcludedifitrsquosmorethan30
bull ldquoDepthrdquoexcluded if itrsquos lower then60X(ieaminimumvalueof3Xper
animals)ormorethan3500X
bull ldquoQualityrdquoexcludedifitrsquoslowerthen40
bull ldquoMappingqualityrdquoexcludedifitrsquoslowerthen30
Filters were applied to all variants but only bi=allelic SNPs were used in the
following analyses Genotypes with ldquoGenotype Depthrdquo lower than 5 and
ldquoGenotype Qualityrdquo lower than 20 were discarded The remaining ones were
converted intoldquoABrdquocode(A=referencealleleandB=alternativeallele)Then
data were transformed in PLINK (Purcell et al 2007) format using a home=
made python script To prepare the working dataset the final quality control
106
removedvariantswithmore than25missingdata (that iswithmore than5
animals out of 20 without genotypes) monomorphic (minor allele frequency
MAFlt1)ornotmappinginautosomesBasicpopulationgeneticsparameters
werecalculatedtoevaluatethepresenceofcrypticstructureinthedatathatmay
influencetheoutcomeofthedownstreamanalyses
In total 24390 SNPs identified in the exomes were also included in the HD
SNPchip panel HD genotype from nineteen out to twenty of the sequenced
animals sequenced was used to compare technologies Genotypes at common
locationswerecomparedacrosstechnologies toevaluatesequencequalityand
theeffectofsequencedepthongenotypecalling
24BovineHDGenotypingBeadChipdataandqualitycheck
AllanimalsweregenotypedusingtheBovineHDGenotypingBeadChip(Illumina
San Diego CA) Genotypes were quality controlled and filtered with standard
threshold Markers with more than 5 missing data and MAF lt 1 were
excluded Animals with more than 5missing data were also excluded Only
autosomal SNPs were retained Thereafter genotypes were phased and the
remainingmissingdata imputedusingBeaglev332(BrowningandBrowning
2007)
Ofthe20animalssequenced12hadalsoBovineHDGenotypingBeadChipand7
BovineSNP50 Genotyping BeadChip version 2 genotypes available For
concordance and haplotype analysis we imputed the 50K data to 800K as
previouslydescribedabove
3Strategyfordetectingcandidatedeleteriousvariants
Wesearchedforcandidatedeleteriousvariantsfollowinganintegratedapproach
that joined the analysis of i) exomes of a few Italian Holstein bulls having
plusvariant and minusvariant EBVs for a target trait ii) HD genotyping of a
population of 1009 progeny tested Italian Holstein bulls iii) the comparative
genomic analysis of 100 vertebrate species The strategy followed for the
107
analysisisshowninFigure1Inthefirststep(Search)welookedforcandidate
deleterious variants in the exomes In step2 (Filter)we filtered candidatesby
assessingwhichof thesevariantseitherhadanon=randomdistributionamong
animals having extreme EBVs or co=mapped with genomic regions carrying
outliersignals inthelargepopulationofprogenytestedbulls Instep3(Asses)
candidateswere further evaluated by i) assessing their conservation across a
panel of 100 vertebrate species ii) searching for associated haplotypes and
evaluating their frequencies in the population of progeny tested bulls iii)
evaluatingtheirassociationwithgeneticdefectsinhumanandmodelspeciesiv)
examining their expression profile and function when known Methods are
belowdescribedindetail
Figure1Strategyfollowedtoidentifydeleteriousvariants
108
31Step1Searchforcandidatedeleteriousvariants
Quest for candidate deleterious variants in exome sequence data by
variantannotation
WeannotatedallvariantsusingANNOVAR(Wangetal2010)andVEP(Variant
Effect Predictor) (McClure et al 2014) software They use the Ensembl gene
annotation database (UMD31 reference sequence release 74) to investigate
locationandpotentialeffectoncodingsequencesofallvariantsVEPintegrates
the SIFT algorithm (Kumar et al 2009) to predict the effect of amino acid
substitutiononproteinfunctionForeachsequencechangesequencehomology
and physic=chemical similarity between the reference amino acid and the
alternative is considered providing a score that evaluates the mutation
tolerabilityVariantswithscorelowerthan005wereclassifiedasldquodeleteriousrdquo
inaccordingtosoftwarerecommendations
SIFTalsocomparedallvariants found in theexomeswith thosepresent in the
dbSNPdatabase(version140)toidentifythosethatwerenovel
32Step2Filteringofcandidatedeleteriousvariants
2a Analysis of deleterious variant distribution among animals having
extremesEBVvaluesformaleandfemalefertilitytraits
Association between variants and the classification in the tails of the EBV
distributionwas estimated byFisherexact testwhich compares frequencies of
alleles (a vs A) in the two tails not assumingH=W equilibriumgenotypic test
that compares the frequency of genotypes (aa vs Aa vs AA) and the recessive
geneactiontestwhichtestsforthedistributionofonehomozygousclassversus
theothers(aavsaAAA)AlltestsareimplementedinPLINKfunctionsThe5
significance threshold was set empirically by permutation (N=100000) either
point=wise that is not corrected for multiple testing (EMP1 = Empirical
Probability1)andthereafterfamily=wisecorrectedformultipletesting(EMP2=
EmpiricalProbability2)
109
GenesexceedingEMP1were considered suggestive and thoseexceedingEMP2
significant candidates to be deleterious recessives and their characteristics
conservationacrossspeciesandfunctionfurtherinvestigated
2b Quest for regions candidate to carry deleterious variants in the
populationanalysedwiththeHDBeadChip
Basic population genetics parameters and Multi Dimensional Scaling were
calculated with GenABEL R package (Aulchenko et al 2007) to evaluate the
presenceofcrypticstructure inthedatathatmayinfluencetheoutcomeofthe
downstreamanalyses
ExpectedandobservedallelicandgenotypicfrequenciesandFisherexacttestof
Hardy=Weinberg proportions (Janis E Wigginton 2005) were calculated with
PLINKon theHDSNPdatasetValueswere averaged in slidingwindowsof10
markersThisvalue(10markers)waschosenaftertestingdifferentwindowsize
as represent a good compromise between noise reduction and resolution (not
shown)
Candidate regions containing deleterious mutations were identified following
twodifferentapproaches
In one approach hereafter refereed to as ldquoOut of Hardy=Weinbergrdquo approach
(OHW)markerssignificantlydeviatingfromofHardy=WeinbergequilibriumatP
le84710=8 (nominalvaluePle005Bonferronicorrected formultiple testing)
werefirstidentifiedWindowswithatleastthreesignificantmarkersbelowthis
thresholdandhavingahomozygotedeficitwereconsideredcandidatestocarry
deleterious mutations In this OHW approach we applied very stringent
constraints to reduce false positive discoveries due to the Wahlund effect
inducedbypopulationsubstructure(seeresultsandFigureS5)
In thesecondapproachhereafterreferred toas ldquoLackOfHomozygotesrdquo (LOH)
approach the differences between observed and expected frequencies of
homozygotesfortheminorallelewerecomputedandaveragedacrossmarkers
included in each sliding window Values were standardized and only regions
carrying three overlapping windows below =3Z score value were considered
110
candidates to carry deleterious mutations Thereafter significant windows
carryingatleastonemarkerincommonwerejoinedtodefineregionboundaries
ExomesequencingknowledgeandHDBeadChipdatawerecrossed inorder to
identifygenescarryingseverevariantslocatedwithinsignificantregionsortheir
proximity (50Kb upstream or downstream) These genes were considered
candidates to be deleterious recessives and their characteristics conservation
acrossspeciesandfunctionwasfurtherinvestigated
33Step3Assessmentofcandidatedeleteriousvariants
Genes carrying variations classified as deleterious by SIFT or ANNOVAR and
eitherassuggestivebyfertilityEBVtailsanalysisorassignificantbyOHWeLOH
analyses of the Holstein bull population were submitted to comparative
genomics population genetics and functional analyses In order to prioritize
genesvariants discoveredwe scored each evaluation using a subjective scale
ranging from =3 to 3 to measuring the evidence in favour (scores 1 to 3) or
against(scores=1to=3)thedeleteriouseffectoftheSNPsThescore0inallcases
indicated the absence of evidence or that a specific assessment failed The
completescorescalethereforerangedfrom=9to9
3aComparativegenomics
For comparative genomics purposes the position of eachmutationwas ldquolifted
overrdquo from UMD31 bovine coordinates to hg19 human coordinates using the
specifictoolatUCSC(Kentetal2002=httpgenomeucscedu)Orthologous
proteins were aligned using the UCSC genome browser and amino acid
conservation assessed in a set of 100 vertebrate species that included human
primates (11 species) euarchontoglires (egmouse14 species) laurasiatheria
(egsheep25species)afrotheira(egelephant6species)othermammals(eg
platypus5species)birds(egchicken14species)sarcopterygii(egturtle8
species) fish (eg zebrafish 16 species) Results of amino acid conservation
analysiswererecordedonlyifhomologousproteinfromatleast50speciescould
beretrievedandalignedThereafteraminoacidconservationwasscored3ifthe
111
aminoacidwasconservedinallvertebrates2ifconservedinallmammalsand1
ifconservedinthelargemajority(atleast90)ofmammalsAscoreof=1was
given when the amino acid was not conserved among species and a score =3
when both variants observed in cattle were carried by other species Finally
wheneithertheliftoverprocessortheproteinalignmentfailedthecomparison
wasconsiderednoninformativeandgivenascore0
3bSearchofhaplotypesassociatedtodeleteriousvariantsandanalysisof
HDpopulationdata
To acquire further evidence we searched for the haplotypes that carry the
deleterious variations and estimated their frequency in the
homozygousheterozygousstatuswithinthebullpopulationgenotypedwiththe
HDBeadChip
Weusedamultistepapproachtofindhaplotypescarryingdeleteriousvariantsi)
from the phased HD BeadChip data we extracted the haplotypes of the 20
sequencedanimalsintheregionofthevariant(100markersoneachside)ii)in
these 20 animals we searched the shorter haplotype able to distinguish
homozygous andor heterozygous animals for the deleterious variant from all
other haplotypes iii) we assessed the frequency of homozygotes and
heterozygotes for thehaplotype in thewholepopulationofbulls characterised
withtheHDBeadChip
Scoresinthiscasewere=3inthecaseofuniquehaplotypesshowinganumberof
homozygotesnonsignificantlydifferentfromthenumberexpectedfromHardy=
Weinbergproportions3inthecaseofuniquehaplotypesshowinganumberof
homozygotes significantly lower than expected 1 in the case of unique
haplotypesneverhomozygousandrare0whenevernouniquehaplotypecould
befound
3cFunctionalanalyses
With PantherDB (Mi et al 2010) resources we classified our gene list into
functionalcategoriesaccordingtotheGeneOntologyclassificationAllsuggestive
112
andcandidateEnsemblGeneIDsubmittedtothebulkanalysiswererecognizedand annotated Beyond the GO term characterization in order to understandpotentialdetrimentaleffectsofmutationsandalsopinpointpotentialcandidatesfor fertility traits we retrieved information for each gene according to thefollowing parameters i) the involvement in known Mendelian or complexdiseases fromOMIMdatabase ii) theavailabilityofknock=outmice iii) and theeventual expression in reproductive tissues from the Gene Expression Atlas(httpwwwebiacukgxa)(Table4)
Alsoforthesedataweusedthesamescaleattributingarangefrom=3pointstogenes having no relation with fertility viability and genetic defects and norelevant phenotypic effects in knock=out mouse models to 3 to gene closelyrelated to fertility and viability associated to genetic defects in human andaffectingfertilityorsurvivalwhenknocked=outInparticularascoreranging=1to+1wasattributedtogenesassociatedtohumansyndromes(=1=noassociationrecordedinOMIMdatabase0=noinformationinOMIM1=associationinOMIM)A score with same range to information collected from knock=out mousephenotypes(=1=knock=outmouseshowsnoimpairedphenotype0=noknockoutmouse has been produced 1= knock=out mouse shows impaired phenotype)Thesamescorewasattributedtomolecularfunctions(=1=functionunrelatedtofertilityandviability0=functionindirectlyrelatedasimmunityandmetabolism1=functiondirectlyrelatedasfertilityanddevelopment)
Results
Exomecaptureandsequencing
A totalof225414probeswereused for capturing thecattleexome (5355Mbabout 2 of UMD31 reference genomes) Mitochondrial (MT) and Ychromosome were not included in probe design The number of probes perchromosomespannedfrom2500onBTA27toalmost12000inBTA3(TableS2)
113
The average exome sequencing depth was 4058 slightly lower than the
expected50XAllanimalshadatleast24Xaveragegenomessequencedbutina
same animal the sequencing depth varied across chromosomes and regions
withinchromosomesThelowestchromosomeaveragevalueobservedwas10X
inBTA25oftheshallowersequencedanimalAveragecoverageperanimaland
chromosomeandoverallmeansareshowninFiguresS1
In total1613461402 raw100bppair=ends sequence readswereproducedA
total of 1425514112 100bp reads were mapped and properly paired to the
reference genome Paired reads per animal ranged between 41973510 and
138840392
Insynthesissequencingdepthwasvariableasexpectedandalmostallanimals
hadsomeregionsofscarceornosequencedepthbutoverallallexomeregions
were sufficiently covered to permit a comprehensive analysis of molecular
variantsoftheanimalsinvestigated
Step1Searchforcandidatedeleteriousvariants
Variantdetectionandannotation
Thesequencingdepthofallvariantscalledonthealignmentofallreadsacross
animalsfollowedaChi=squaredistributionAveragedepthwas79694spanning
from2Xto5000Xwithpeakfrequencyclassat150=250X(FigureS2)
A multistep approach was followed to clean the exome sequence dataset
Cleaning process steps and the number onmarkers retained in each step are
describedinTable2
114
Table2Typeandnumberofmarkersdetectedandretainedafterdatasetcleaning
TypeofvariationNumberofvariations(N)
Rawdata
Aftervariantcleaning
Aftergenotypecleaning(onlyautosomal)
Complex 1213 963 0Deletion 6938 5031 0Insertion 5510 3656 0MultiallelicComplex 5710 3995 0MultiallelicSNP 803 512 0SNP 242988 203079 164277Total 263162 217236 164277whencomparedtothereferencegenome
Atotalof263162variantswereidentifiedAmongthese217236(8255)wereretainedafterapplyingthevariantfilters(seematerialsandmethodssection)
According to GATK classification the vast majority (9348) of variationsidentified were biallelic SNPs followed by insertions and deletions (InDels40) more complex variations and multiallelic SNPs (252) Indels andcomplexvariationsalthoughlessfrequentthanSNPsaffectedarelevantportionoftheexome(49589bp)Thenumberofvariantsdetectedperchromosomewassignificantly correlated to the number of probes used (r=08761P=228x10=10)andofbasescaptured(r=08424P=53x10=19)perchromosomehoweverBTA23BTA27 BTAX had an outlier behaviour the former two containing highervariation(450and445variantsperKb)andthelatterlower(084variantsperKb)thantheaverage(322variantsperKb)
Bialleic SNPswere further filtered for genotype quality (sequence depth gt 5Xandsequencequalitygt20)andBTAXmarkersThefinalautosomalSNPdatasetcounted164277SNPsAmongthese17829biallelicSNPs(109)describedforthefirsttimehavebeensubmittedtodbSNP
Variantsrsquo annotation is shown in Table 3a Variantswere detected in differentgenedomains(exonsintrons3rsquountranslatedregions)aswellasupstreamanddownstreamgenesMorethan31ofthevariationsweredetectedinexonsandamong them more than 41 have the potential to have an effect on genefunctionbyaffectingsplicingoralterproteinstructure(Table3b)
115
Table 3 A) number of autosomal SNPs annotated in each gene region B) number and
classificationofautosomalSNPsalteringgenefunction
A)
Generegion NofSNPs(autosomes)Downstream 1170Exonic 51293Intergenic 49901Intronic 58116ncRNA_exonic 160ncRNA_intronic 3ncRNA_splicing 2Splicing 184Upstream 1445Upstreamdownstream 54UTR3 1345UTR5 604Total 164277
B)
EffectofexonicSNPs NofSNPs(autosomes)Nonsynonymoussinglenucleotidevariation 20839Stopgainsinglenucleotidevariation 213Stoplosssinglenucleotidevariation 10Synonymoussinglenucleotidevariation 30226Unknown 5Total 51293
whencomparedtothereferencegenome
Amongthe51293exonicSNPs4166in2978geneswereclassifiedldquodeleteriousrdquo
by SIFT Among these 3210 in 2356 genes were observed at least twice
ANNOVARidentified223stopgain(213)orloss(10)variantsin217genes
Theaverageconcordanceat24390SNPs in commonbetweenexomesand the
HD Beadchip was 9716 (Table S3) Discordance was due to either alleles
detected by sequencing andnot detected by genotyping (exome=heterozygote
Beadchip=homozygote) and vice=versa (exome=homozygote
Beadchip=heterozygote) Discordances were randomly distributed among
markers and inversely correlated to sequencing depth (Figure S3) indicating
that most errors are likely occurring during sequencing even if occasional
116
Beadchip failure indetectingsomeallelecannotbeexcludedTwoanimalshad
anoutlierbehaviouronehaving846andthesecond719concordanceBoth
animalswereretainedforvariantdiscoverywhiletheywereexcludedfromthe
analysesofEBVtailsWithoutthesetwooutlieranimalsthemeanconcordance
increasesto9938
Multidimensional scaling confirmed the absence of a substructure in animals
sequenced(FigureS4)
Step2Filteringofcandidatedeleteriousvariants
AnalysisofEBVtails
Fisherexact test isgenerallyused to test for theassociationbetweenavariant
and a Mendelian disease Samples are divided in cases and controls and the
association between alleles or genotypes and the disease is tested under
different inheritance models Here we used the same method to test for
association between allelesgenotypes and animals from opposite tails of the
EBVdistributionfori)malefertility(N=4high+4low)ii)femalefertility(N=5
high+4low)andiii)maleandfemalefertility(N=9high+8low)
Genotypes were tested under all inheritance models implemented in PLINK
Giventhelownumberofsamplesnovariantwassignificantaftercorrectionfor
multiple comparisons (EMP2) and only suggestive variants could be found
(significant at 5 at EMP1)We considered suggestive of carrying deleterious
allelesalistof101genescarrying112variants(TableS4)
Detectionofregionscarryingdeleteriousmutationfrom800KSNPdata
After applying quality control parameters 26653 and 146991 markers were
removed from the original dataset for excessive missing data and MAF
respectivelyInaddition17individualsremovedforlowgenotypingrateAswith
exomes only autosomal SNPswere retained for the analyses thus obtaining a
workingdatasetof590680SNPsand992animals
117
MDS(FigureS5)analysis indicates that thedatasethassomesubstructure that
mightaffectdownstreamanalysesWeaccountedfortheriskoffindingmarkers
outofHardy=WeinbergproportionsduetotheWahlundeffectbychoosingvery
stringentsignificancethresholdintheOHWanalyses
To detect regions candidate to carry deleterious recessivemutations we used
two approaches the ldquoOutside Hardy Weinbergrdquo (OHW) and the ldquoLack Of
Homozygotesrdquo (LOH) approaches (see materials and methods) The two
approaches complement each other in the detection of regions carrying less
homozygotes than expected OHW identifies regions in which a few markers
have extreme homozygote deficit while LOH detects regions in which many
markershaveamoderatedeficit
In the SNP dataset 1884 markers were significantly out of Hardy=Weinberg
equilibrium(Ple84710=8)Atotalof485windowswereidentifiedwithatleast
3 markers out of 10 exceeding this threshold Among these 84 contained
markerssignificantbecauseofhomozygotedeficitThese84windowsidentified
11OHWcandidateregionsin9chromosomes(TableS5)
TheLOHapproachidentified2335windowsbelow=3Zscoreofthestandardised
differencebetweenobservedandexpected frequencyofhomozygotes forMAF
Theseidentified326LOHcandidateregionsspreadonallautosomes(TableS6)
Onlyregionsformedbyatleast3singlewindowswereretainedreducingLOH
candidateregionsto240
Deleterious variants (SIFT deleterious and stop codon gainedlost) were
intersectedwithsignificantOHWandLOHregionSevenofthesemappedwithin
OHW sliding windows 36 within LOH sliding windows Among these 5 were
located in regions significant in both OHW and LOH Other 51 deleterious
variants mapped in the immediate vicinity (+=50Kb) of significant windows
These 83 variants affect 66 unique genes candidate to carry deleterious
recessivealleles(TableS7)
Five regionswere common toOHWandLOHoneonBTA4 (LOH=74959610=
75151417 OHW=74970653=75151417) two on BTA7 (LOH=9626435=
10099512 OHW=9628735=10099512 LOH=10575691=11171132
118
OHW=10486424=11087441) one on BTA15 (LOH=80299924=80928311
OHW=80299924=80921274) and one on BTA18 (LOH=28122373=
28185525OHW=28106022=28164693)
Step3Assessingcandidatedeleteriousvariants
After the two filtering step previously described a total of 191 SNPs were
retained out of the 3433 classified as deleterious by SIFT and Annovar and
analysed further These are carried by 163 genes Eighteen genes carried 2
candidateSNPsthreegenes3SNPsandonegene5SNPsInaddition4SNPsin
four geneswere identified as potential candidates in both the analysis of EBV
tailsandofbullpopulation
A totalof12SNPs introducedstopcodonswhile the remaining inducedamino
acid substitutionsor splicing variants In the filtereddataset theproportionof
stop codons on total variation (63) was only slightly higher than the
proportion found in the entire dataset (51 225 stop codon out of 4391
deleteriousSNPs)
Comparativegenomics
Proteinmultiplealignmentwassuccessfulin146outof191comparisonsIn26
casesthemostfrequentaminoacidincattlewasveryhighlyconservedeitherin
allvertebrates(N=9)orinallmammals(N=17)Inother23casestheaminoacid
was conserved in at least 90 of mammalian species In the last class were
generally included amino acids that changed in mammalian species
phylogenetically verydistant from cattle (eg platypus and related species) In
97 instances the amino acid could change quite freely and occasionally both
variantsobservedincattlewerecarriedbyotherspecies
HaplotypeanalysisofHDpopulationdata
The search for haplotypes associated to the191 variants for further testing in
the population identified 101 unique and invariant haplotypes common to all
119
carriersofatargetvariantanddifferentfromallthosefoundinnoncarriersIn
additional36casesanoncompletelyinvarianthaplotypecouldbeassociatedto
themutationConverselyin54instancesareliablehaplotypecouldnotbefound
(table S8) In total 38 haplotypes had no homozygous individuals in the
population Thiswas expected in the case of 33 rather rare haplotypes (fewer
than 100 heterozygotes observed in the population) Conversely 5 haplotypes
were rather frequent (from 170 to almost 300 heterozygotes observed in the
population)andtheexpectationwastofind78to222homozygousindividuals
in thepopulation insteadof thenone foundAnothervarianthadaremarkable
high frequency (35) and a clear outlier behaviour Only 11 homozygous
individualswereobservedoutofthe121expected
Functionalanalysisofcandidategenes
To assess the potential deleterious effects of the variants identified we also
retrieved functional information for each gene according to the following
parametersi)associationtoknownMendelianorcomplexdiseaseslistedinthe
OMIM database ii) availability and effect of gene knock=out in mice iii)
expression profileswith particular attention in reproductive tissues from the
GeneExpressionAtlas(Table4)
Twenty genes out of 163 have a recognized phenotype in humans These are
mostlyseveresyndromesUnfortunatelyonlyafewgeneshaveanull=miceline
producedwithphenotypicdataavailable
According to the expression data available from Ensembl these genes are
generally (with someexceptions)expressed inavarietyof tissueswitha clear
tendencytobehighlyexpressedintestis
120
Tableamp4Informationon163genesanalyzedconcerningeventualphenotypesinmanorknockoutsinmouseHeaderampcodesAbrainBcolonCheartDkidney
EliverFlungGskeletalmuscleHspleenItestisColoursrepresentthegenelevelofexpressionindifferenttissuefromlow(whitegrey)tohigh(blue)NCin
FunctioncolumnmeansldquoNotClassifiedrdquo
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000000079
CCSAP
Cells
9
1
03
1
07
2
06
2
2
NC
ENSBTAG00000000149
USP40
Cells
6
7
2
7
5
7
2
8
9
fertility
ENSBTAG00000000414
FUT5
Nomatch
37
1
cellular
process
ENSBTAG00000000918
ERMARD
Periventricular
nodularheterotopia
6Xlinked
periventricular
heterotopia6q
terminaldeletion
syndrome
Periventricular
nodularheterotopia
Nomatch
5
4
4
9
6
7
4
9
17
cellular
process
ENSBTAG00000000998
SLC39A6
Cells
6
9
2
7
4
10
1
10
9
cellular
process
ENSBTAG00000001133
VWA8
Mice
2
3
6
7
10
3
3
4
4
blood
metabolism
ENSBTAG00000001140
IAH1
httpswwwmousephenotypeor
gdatagenesMGI1914982sect
ionassociations
BOTHSEXadiposetissue
behaviourneurological
growthsizebodyFEMALE
homeostasismetabolismMALE
limbsdigitstailskeleton
9
18
30
60
23
19
12
17
30
cellular
process
ENSBTAG00000001262
IRGQ
Cells
6
1
5
3
06
2
5
04
2
immunity
ENSBTAG00000001286
ELMO3
httpswwwmousephenotypeor
gdatagenesMGI2679007sect
ionassociations
Decreasedstartlereflex
02
27
14
7
7
02
1
fertility
ENSBTAG00000001291
OR8H3
Nomatch
fertility
ENSBTAG00000001500
FIGNL1
Cells
08
2
01
05
07
08
04
2
3
cellular
process
ENSBTAG00000001505
GRK4
Cells
10
7
09
4
6
3
2
6
218
signalling
ENSBTAG00000002220
SPTLC1
Hereditarysensory
andautonomic
neuropathytype1
No
17
27
3
28
12
28
5
19
6
metabolism
ENSBTAG00000002288
NT5DC4
Nomatch
02
cellular
process
ENSBTAG00000002289
NLRP14
No
04
cellular
process
ENSBTAG00000002660
CCDC11
Situsinversustotalis
Nomatch
3
1
06
2
07
3
03
2
54
fertility
121
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000002809
CAPN10
AutosomalRecessive
MentalRetardation
DIABETESMELLITUS
NONINSULIN
DEPEND
ENT1
Cells
2
13
41
33
29
immunity
ENSBTAG00000002966
DNAJC13
Youngadultonset
Parkinsonism
No
5
52
94
62
84
fertility
ENSBTAG00000003231
LIM2
Pulverulentcataract
Cells
01
02
01
02
cellular
process
ENSBTAG00000003508
CFAP69
httpswwwmousephenotypeor
gdatagenesMGI2443778sect
ionassociations
Abnormalcirculatinginsulinlevel
MaleInfertilityDecreasedmean
plateletvolum
eIncreasedleanbody
mass
203
07
1
01
7NC
ENSBTAG00000003523
UGT2B15
Nomatch
34
metabolism
ENSBTAG00000004081
FAT3
No
2
03
01
01
05
signalling
ENSBTAG00000004555
LRP2
DONN
AIBARROW
SYND
ROME
Intellectualdisability
Cells
56
3
01
disease
ENSBTAG00000004772
THEM
4
Cells
8
16
295
27
52
63
metabolism
ENSBTAG00000004860
SLC27A6
Cells
6
02
10
602
207
01
immunity
ENSBTAG00000005021
SEMA5A
Monosom
y5p
Cells
9
04
89
13
08
08
6immunity
ENSBTAG00000005170
GPR114
Mice
101
02
01
09
6
01
fertility
ENSBTAG00000005248
OFCC1
No
01
01
03
01
01
01
03
NC
ENSBTAG00000005340
SERGEF
Cells
7
10
35
15
58
23
cellular
process
ENSBTAG00000005357
VPS11
Mice
14
13
619
717
815
15
cellular
process
ENSBTAG00000005466
C8H8orf58
Nomatch
05
01
08
02
103
02
04
NC
ENSBTAG00000005514
TRIO
Cells
17
66
10
210
88
7immunity
ENSBTAG00000005633
RGNEF
httpswwwmousephenotypeor
gdatagenesMGI1346016sect
ionassociations
Vertebralfusion
36
214
05
801
08
1cellular
process
ENSBTAG00000006021
CEP250
Mice
3
22
43
41
88
metabolism
ENSBTAG00000006027
USP34
Mice
11
63
74
88
912
fertility
ENSBTAG00000006532
CDK5RAP2
Autosomalrecessive
primary
microcephaly
httpswwwmousephenotypeor
gdatagenesMGI2384875sect
ionassociations
skeletonimmunesystem
growthsizebodyhem
atopoietic
system
5
34
92
74
784
developm
ent
ENSBTAG00000007001
SLC26A1
No
03
01
02
32
903
01
07
metabolism
ENSBTAG00000007340
LOC101906349
Nomatch
NC
ENSBTAG00000007568
MEPE
Cells
cellular
process
ENSBTAG00000007910
EXOC7
Cells
13
24
23
22
12
22
15
22
28
cellular
process
122
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000008650
CAMK1D
Cells
24
2
01
06
1
2
03
7
3
signalling
ENSBTAG00000008797
Uncharacterized
Nomatch
09
04
07
04
NC
ENSBTAG00000008871
DDX4
Cells
02
01
01
132
cellular
process
ENSBTAG00000008996
SFI1
Mice
2
1
06
3
04
3
2
5
NC
ENSBTAG00000009014
UPK1B
Mice
2
2
2
5
4
4
2
7
3
fertility
ENSBTAG00000009030
NOX5
Nomatch
04
01
8
01
immunity
ENSBTAG00000009033
TEX37
httpswwwmousephenotypeor
gdatagenesMGI1921471sect
ionassociations
Abnormalstartlereflex
03
83
NC
ENSBTAG00000009064
CSF2RB
Congenitalpulmonary
alveolarproteinosis
Mice
01
2
04
06
2
23
03
17
05
signalling
ENSBTAG00000009144
BPIFA2A
Nomatch
6
06
05
metabolism
ENSBTAG00000009337
PNMAL1
Cells
8
05
01
3
02
03
03
04
3
Immunity
ENSBTAG00000009444
SLC15A5
Cells
01
cellular
process
ENSBTAG00000009568
KANK2
Cells
1
17
30
9
3
31
9
10
3
cellular
process
ENSBTAG00000009569
DOCK6
AdamsOliver
syndrome
Cells
2
5
10
12
5
15
5
6
21
metabolism
ENSBTAG00000009836
CHGA
Mice
140
555
01
2
02
01
07
1
metabolism
ENSBTAG00000010522
Uncharacterized
Nomatch
01
02
02
03
2
metabolism
ENSBTAG00000010601
DOLK
Familialisolated
dilated
cardiomyopathy
No
4
8
6
22
9
6
5
5
12
metabolism
ENSBTAG00000010847
FOXN4
Cells
01
01
metabolism
ENSBTAG00000010954
ART3
Cells
02
1
55
01
08
2
10
06
71
metabolism
ENSBTAG00000011011
SSH2
No
3
09
5
4
2
2
6
7
10
cellular
process
ENSBTAG00000011387
NAT9
Cells
6
7
3
8
6
8
4
6
6
metabolism
ENSBTAG00000011420
CA9
Cells
04
3
02
2
03
1
01
2
56
metabolism
ENSBTAG00000011433
RGP1
httpswwwmousephenotypeor
gdatagenesMGI1915956sect
ionassociations
Completepreweaninglethality
7
19
11
18
8
12
9
8
25
cellular
process
ENSBTAG00000011622
C18orf21
Mice
9
10
6
6
2
10
5
10
34
NC
ENSBTAG00000011683
NUMB
Cells
21
25
8
35
17
47
6
8
19
immunity
ENSBTAG00000011684
RAB11FIP1
Cells
6
01
8
09
11
02
1
4
NC
ENSBTAG00000012053
ACCSL
No
03
01
cellular
process
ENSBTAG00000012314
LDLR
Homozygousfamilial
hypercholesterolemia
Cells
8
33
11
21
17
24
2
15
2
cellular
process
123
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000012326
LOC101906218
Nomatch
11
01
01
09
01
NC
ENSBTAG00000012458
PSAPL1
Cells
01
NC
ENSBTAG00000012565
PCNX
No
3
32
32
33
48
developm
ent
ENSBTAG00000012833
ATG2B
No
4
21
21
21
29
NC
ENSBTAG00000012921
KIF24
Mice
05
07
03
08
02
202
214
metabolism
ENSBTAG00000013568
UEVLD
Cells
5
41
53
42
214
metabolism
ENSBTAG00000013685
KRT31
Mice
cellular
process
ENSBTAG00000013753
DDRGK1
Mice
22
32
18
56
55
25
14
22
37
NC
ENSBTAG00000013789
OR6K6
No
NC
ENSBTAG00000013838
ALLC
Mice
02
01
78
metabolism
ENSBTAG00000013916
HERC2
Mentalretardation
autosomalrecessive
38SKINHAIREYE
PIGM
ENTATION
VARIATIONIN1
Developm
entaldelay
withautismspectrum
disorderandgait
instability
No
8
74
83
74
65
metabolism
ENSBTAG00000014001
MAP1A
Cells
136
23
22
202
43
352
cellular
process
ENSBTAG00000014058
LDB2
No
28
33
12
825
410
3cellular
process
ENSBTAG00000014284
ALPK2
No
01
10
01
304
02
cellular
process
ENSBTAG00000014750
EPB41L4A
httpswwwmousephenotypeor
gdatagenesMGI103007secti
onassociations
Decreasedcirculatingglucoselevel
Abnormalconeelectrophysiology
Immunesystem
s1
22
20
07
12
13
43
NC
ENSBTAG00000014752
AKAP11
httpswwwmousephenotypeor
gdatagenesMGI2684060sect
ionassociations
Nopheno
24
72
63
306
68
cellular
process
ENSBTAG00000014768
ZNF786
Cells
3
23
42
48
33
cellular
process
ENSBTAG00000014794
SCYL3
Cells
6
10
59
613
516
26
cellular
process
ENSBTAG00000015027
Clusterin
httpswwwmousephenotypeor
gdatagenesMGI88423sectio
nassociations
Aggressionvertebralfusion
05
03
03
02
05
3NC
124
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000015210
LOXHD1
DEAFNESS
AUTOSOMAL
RECESSIVE77
Autosomalrecessive
nonsyndromic
sensorineuDominant
lateonsetFuchs
cornealdystrophy
deafnessautosomal
recessivetype77
No
3
03
NC
ENSBTAG00000015294
COX10
Leighsyndrome
Mitochondrial
complexIVdeficiency
(MTC4D)
Cells
4
6
24
5
2
3
14
3
16
cellular
process
ENSBTAG00000015335
BAI3
No
38
03
8
01
01
2
signalling
ENSBTAG00000015490
HS1BP3
Cells
1
10
5
9
4
8
2
1
37
cellular
process
ENSBTAG00000015602
C7H5orf45
Nomatch
10
cellular
process
ENSBTAG00000015839
MAP4
No
27
85
104
35
11
100
82
46
61
cellular
process
ENSBTAG00000015912
DMKN
No
01
04
03
NC
ENSBTAG00000015980
FASN
AutosomalRecessive
MentalRetardation
Cells
19
29
7
12
6
39
6
7
22
metabolism
ENSBTAG00000015985
GPR20
httpswwwmousephenotypeor
gdatagenesMGI2441803sect
ionassociations
Anatomicaldefectsvarioustissues
03
02
08
01
09
signalling
ENSBTAG00000016495
LINS
AutosomalRecessive
MentalRetardation
Cells
6
3
1
5
3
3
1
5
12
NC
ENSBTAG00000016691
LACC1
Mice
2
9
1
9
9
8
08
7
27
NC
ENSBTAG00000017096
ERICH1
Cells
6
4
2
4
2
5
2
5
13
metabolism
ENSBTAG00000017165
MATN2
No
2
7
2
20
08
3
2
2
7
cellular
process
ENSBTAG00000017418
RRP1B
Cells
6
4
4
5
3
7
3
12
159
cellular
process
ENSBTAG00000017436
ALS2CR11
Cells
01
37
disease
ENSBTAG00000017505
PAXIP1
Cells
2
3
2
4
2
4
2
3
12
signalling
ENSBTAG00000017576
GPR144
Nomatch
02
signalling
ENSBTAG00000018528
USP45
No
1
1
04
1
05
06
02
2
04
cellular
process
ENSBTAG00000018810
THBS2
INTERVERTEBRAL
DISCDISEASE
Cells
1
2
04
09
1
09
04
1
2
fertility
ENSBTAG00000019072
PSD4
Cells
7
8
1
1
7
02
6
4
cellular
process
125
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000019265
AGK
Sengerssyndrom
e
Congenitalcataract
hypertrophic
cardiomyopathy
mitochondrial
myopathy
Cells
7
36
43
52
65
metabolism
ENSBTAG00000019752
BPIFA2B
Nomatch
01
1metabolism
ENSBTAG00000019799
FCAM
R
Cells
07
01
02
01
2
7immunity
ENSBTAG00000019961
ATP4B
Cells
12
03
1
05
cellular
process
ENSBTAG00000020414
DHX37
No
3
33
73
73
413
cellular
process
ENSBTAG00000021073
KIAA1549
PILOCYTIC
ASTROCYTOM
ASOMATIC
httpswwwmousephenotypeor
gdatagenesMGI2669829sect
ionassociations
behaviourneurological
homeostasism
etabolism
hematopoieticsystem
3
04
23
01
21
04
3NC
ENSBTAG00000021233
OR5B17
No
fertility
ENSBTAG00000021375
OR10Q1
No
fertility
ENSBTAG00000021643
EFCAB12
Cells
01
1
9cellular
process
ENSBTAG00000021645
MBD4
Cells
8
43
82
51
10
6cellular
process
ENSBTAG00000022509
DNAH
9
No
02
5
05
5metabolism
ENSBTAG00000022858
LOC783561
Nomatch
fertility
ENSBTAG00000026247
PKHD1L1
Cells
02
01
cellular
process
ENSBTAG00000026323
LYSB
Nomatch
68
metabolism
ENSBTAG00000026350
SPATC1
Cells
05
06
01
01
01
39
NC
ENSBTAG00000027390
RTFDC1
Mice
28
19
21
20
922
13
20
49
NC
ENSBTAG00000031115
Uncharacterized
Nomatch
01
cellular
process
ENSBTAG00000031718
OGFR
Mice
6
17
612
517
39
1fertility
ENSBTAG00000032264
Uncharacterized
Nomatch
52
4
22
NC
ENSBTAG00000032481
DAPL1
No
1
6
07
01
13
10
cellular
process
126
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000032527
ERCC6
CEREBROOCULOFACI
OSKELETAL
SYND
ROME1
COCKAYNE
SYND
ROMEBDe
SanctisCacchione
syndromeMACULAR
DEGENERATION
AGERELATED5UV
SENSITIVE
SYND
ROME1COFS
syndromeCockayne
syndrometype1
Cockaynesyndrome
type2Cockayne
syndrometype3UV
sensitivesyndrome
Cockaynesyndrome
typeB(CSB)De
SanctisCacchione
syndrome(DSC)UV
sensitivesyndrome
(UVS)cerebrooculo
facioskeletal
syndrometype1
(COFS1)
Mice
3
205
22
207
42
cellular
process
ENSBTAG00000032821
SCEL
No
01
02
17
04
NC
ENSBTAG00000033954
PRPS1L1
No
35
metabolism
ENSBTAG00000034991
SLX4IP
Cells
6
21
32
207
33
NC
ENSBTAG00000035226
TOR1AIP1
Cells
9
12
513
816
716
83
NC
ENSBTAG00000035309
OR4F15
No
fertility
ENSBTAG00000035708
LOC527795
Nomatch
01
01
23
metabolism
ENSBTAG00000035985
LOC101909743
OR8K1
No
fertility
ENSBTAG00000036019
RIPPLY3
httpswwwmousephenotypeor
gdatagenesMGI2181192sect
ionassociations
hemopoieticsystem
01
04
01
9
01
malformation
ENSBTAG00000037399
Uncharacterized
Nomatch
2
92
4
24
05
5signalling
ENSBTAG00000038606
DBDR
No
02
01
signalling
ENSBTAG00000038831
PHYHD1
No
2
18
21
23
30
18
16
15
1metabolism
ENSBTAG00000039110
OR8U1
No
fertility
ENSBTAG00000039117
FAM179B
Cells
14
206
31
308
33
cellular
process
ENSBTAG00000039520
SIRPB1
Cells
4
08
32
516
05
12
01
signalling
127
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000040568
Uncharacterized
Nomatch
4
307
54
33
25
cellular
process
ENSBTAG00000045558
LOC100299084
Nomatch
1
22
110
10
09
52
fertility
ENSBTAG00000045588
LOC100298356
Nomatch
metabolism
ENSBTAG00000045709
LOC514057
Nomatch
fertility
ENSBTAG00000046175
Uncharacterized
Nomatch
01
01
cellular
process
ENSBTAG00000046228
Olfactory
receptor
Nomatch
fertility
ENSBTAG00000046239
C10orf12
No
03
104
07
108
06
209
cellular
process
ENSBTAG00000046285
OR8I2
No
fertility
ENSBTAG00000046360
Uncharacterized
Nomatch
NC
ENSBTAG00000046423
LOC618007
Nomatch
NC
ENSBTAG00000046527
LOC100139830
Nomatch
fertility
ENSBTAG00000046590
FAM181A
Cells
09
44
NC
ENSBTAG00000046593
Uncharacterized
Nomatch
20
metabolism
ENSBTAG00000046727
Olfactory
receptor
Nomatch
04
fertility
ENSBTAG00000047150
RTEL
Cells
4
23
64
55
616
cellular
process
ENSBTAG00000047166
SH2D7
Mice
03
101
01
1
02
02
2immunity
ENSBTAG00000047174
Uncharacterized
Nomatch
03
7105
02
01
271
05
03
NC
ENSBTAG00000047317
CL43
Nomatch
01
120
01
immunity
ENSBTAG00000047587
Uncharacterized
Nomatch
22
09
09
35
03
4
06
2cellular
process
ENSBTAG00000047661
FABP12
httpswwwmousephenotypeor
gdatagenesMGI1922747sect
ionassociations
Growthsizebody
01
02
69
metabolism
ENSBTAG00000048021
PNMAL2
No
04
03
2
03
03
02
NC
ENSBTAG00000048153
Uncharacterized
Nomatch
2
05
01
02
01
cellular
process
128
Discussion(
Deleterious recessive variants arenaturally present in outbred animal species
Theydonotcreateproblemsaslongastheyremainatlowfrequencyinrandom
mating populations Under these conditions these variants remain in the
heterozygousstateanddonotproducephenotypiceffectsHoweverassoonas
their frequency increases andmatingoccursbetween related individuals their
deleteriouseffectsbecomeapparentandhighlyaffectthewelfareandviabilityof
animalscarryingtheminthehomozygousstate
In industrialdairy cattlebreedscrossbreeding isgenerallyavoidedandbreeds
areinpracticegeneticisolatesinwhichartificialinsemination(AI)iswidelyused
for rapidly spreading thegeneticprogress in thepopulationAsa consequence
elitesireshavemanythousandandsometimeshundredsofthousanddaughters
Under these conditions mating between relatives becomes practically
unavoidable in the longterm In fact inspiteof theclosecontrolof inbreeding
andofthematingsystemgeneticdefectsregularlyappeartothesurfaceSome
examples are the Complex Vertebral Malformation (CVM) and the Bovine
Leucocyte Adhesion Deficiency (BLAD) in Holstein (Schuumltz et al 2008) and
WeaverdiseaseinBrown(McClureetal2013)Inthesecasestheeffectsofthe
recessivedeleteriousallelesareclearandmoleculardiagnostictoolshavebeen
developedtoavoidtheirfurtherspreadingandtoinitiatetheeradicationprocess
However recessive deleterious alleles may produce their effects in a more
deceitful way for example by affecting fertility inducing early lethality or
influence disease susceptibility All these effects are difficult to trackwith the
current methods of phenotype measurements but have an impact on the
economyofthelivestocksectorandonanimalwelfare
InthispaperwesoughtfordeleteriousrecessiveallelesinItalianHolsteincattle
This breed is recently experiencing a decrease in the genetic trend of fertility
traits (Pryce et al 2014) therefore we started our search from the exome
sequences of 20 sires 19 of which having extreme EBV values for male and
femalefertility
130
The choiceof sequencingexomeswasa compromisebetween completenessof
informationaccuracyofvariantandgenotypedetectionandcostExomesclearly
give incomplete information since they miss all variation in regulatory sites
outside gene boundaries (ENCODE Consortium 2012) These may be very
relevantforthecontroloftraitvariationbuttheyarestillverypoorlyannotated
in livestock On the other hand sequencing exomes yields millions instead of
billionsbasepairsadeepersequencingoftargetregionsandaconsequentmore
reliablevariantandgenotypecalling(FigureS1andS2)Theanalysisofexome
data is also less demanding in terms of computer time (thousands instead of
millions variants are to be analysed) and easier in terms of biological
interpretation Exome sequencing revealed to be very effective in the
identificationofanumberofmutationsaffectinghumanhealth(Bamshadetal
2011 Yang et al 2013) All thesewere defects caused bymutations in single
genes producing clear phenotypic effects havingMendelian inheritance These
conditionsdonotholdinourinvestigationthereforewehadtodevisestrategies
to integrate exome data with other information collected from the Holstein
populationandotherspecies
Theoverallprocedureforidentifythecandidatedeleteriousrecessivewasbased
on the rationale that deleterious recessive alleles in coding sequences are
expectedtobei)causedbyseverealterationoftheproteinsencodedbygenes
ii)mostlyifnotexclusivelycarriedattheheterozygousstateinthepopulation
iii)unevenlydistributedinanimalsofhighandlowfertilityiv)likelyconserved
indifferentspeciesbecauseofselectionconstraintv)likelyassociatedtogenetic
defectsinanyspecieswhenseverelydamaged
A stepwise procedure was set up for reducing candidates to a number
manageable for functional analysis (Figure1) In the first stepmutationswere
discoveredinthesecondfilteredandinthirdassessed
Thisprocedureidentifiesonlysomeofthedeleteriousrecessivesexistinginthe
populationanalysedSincewesequencedonlyexomesandnotwholegenomes
ouranalysesislimitedtocodingsequencesandatthemomenttoSNPvariation
Wethereforemissregulatoryvariationsoutsidegenesand theeffectsof indels
andmorecomplexvariations
131
ThenumberofanimalssequencedislimitedDeleteriousrecessivesareexpected
toberareinthepopulationandwehavesampledonly20animalsfromasubset
ofsiresauthorizedtoAI inItalyAlthoughwehavetakenamongthemthose in
theminus tail of fertility trait thatmighthave ahigher genetic load itrsquos likely
thatwehavemissedothervariantsexistinginthepopulationOntheotherhand
variantscarriedbyAIauthorizedbullsaretheonesthatwillhavealargerimpact
onthepopulationifwidelyspreadbyelitebullsandthereforetheyarethemost
interestingfromabreedingpointofview
Most deleterious variants are expected to be very rare and therefore not
detected by the approach here adopted or detected and afterwards discarded
duringthefilteringprocesses
Sequencing is prone to errors Some real variants may have been discarded
duringQCofsequencedataduetopoorsequencequalityorpoorcoverageofthe
region
Someofthevariationsclassifiedasldquonondeleteriousrdquobythesoftwareusedmay
indeedhave an effect In fact synonymousmutation arenon`neutralwhenever
thedifferenttRNAsforasameaminoacidhavedifferentrelativeabundanceand
translationrates(Goymer2007)
1(Searching(candidate(deleterious(variants(in(exomes(
ExomesproducedafairamountofdataAlltogether5355Mbweresequenced
These represent 9998 of the exome sequences targeted by probes Among
theseonly35hadasequencingdepthacrossanimalslt60XSequencingdepth
is a key factor for variant detection and even more so when the genotype of
sequencedanimalsistobeinferredInvestigatingtheconcordancebetweenthe
genotypes detected at more than 20K SNP by both Beadchip and exome
sequencingwe observed a clear inverse relationship between sequence depth
andnumberofdiscordances(TableS3andFigureS3)Theaveragesequencing
depth expected (50X) and realised in practice (4058) in this investigation
yieldedratheraccurategenotypessincetheaverageconcordanceobservedwas
over97and increased toover99whentwooutlieranimalswereremoved
132
from the dataset The reason for the outlier behaviour of these samples waslikely a suboptimal quality of DNA submitted to sequencing Overall almost215000 variants passed quality controls and among them 192440 autosomalSNPs have been investigated in detail As found in other investigations mostvariants were rare (Peloso et al 2014) For example the MAF distributionobservedin46904SNPsclassifiedneutralhasameanof0197andamedianof0167values(FigureS6)
ThefirststepinourresearchwastoidentifySNPscausingseverealterationsinthe proteins coded by genes These were relevant changes in conformationalterationsofsplicingsitesortruncationsofproteinsToaccomplishthistaskwereliedontwosoftwareSIFTandANNOVARthatidentified4902suchvariationsin3423genes Interestingly the subsetofdeleteriousvariantshas significantlylowermean (0156)andmedian (010) compared toneutral variations (FigureS6) This subset is therefore at least enriched in variants under purifyingselectionhavingadeleteriouseffectatthepopulationlevel
Thenumberofdeleteriousallelesdetectedishighanditisunlikelytheyallhavea relevant phenotypic effect even if most variants are rare In fact variantsdeleterious for a protein are not automatically deleterious for an organism asthe protein change induced by the variant although relevant may not affectproteinfunctionvariationmayoccurinonememberofagenefamilysothatitsfunction may be compensated by other family members the protein functionmay be compensated by other proteins or the affected metabolic pathwaycompensated by alternative pathways The challenge was therefore to filter asubsetofvariantslikelyhavingaphenotypiceffecttobeproposedforvalidationatalargerscaleToaccomplishthistaskwesoughtforadditionalinformationinexomesequencesandinalargepopulationofHolsteinsiresgenotypedwiththebovineHDBeadchip
( (
133
2(Filtering(deleterious(variants(
The second step consisted in filtering the several thousand deleterious SNPsidentified in exomes to search for SNPs and genes that worth deeperinvestigationWeusedtwostrategiestoaccomplishthistask
21$ Distribution$ of$ deleterious$ variants$ in$ fertility$ EBV$ plus9variant$ and$
minus9variant$animals$
Fertility traits are controlled by a high number of genes and are highlygenetically heterogeneous Even using a selected subset of variants identifiedfrom exome sequencing classic association analyses would need sample sizesmuch larger that the one available in this investigation to have a reasonablestatistical power (Stitziel et al 2011) The first approachwe appliedwas theanalysisof SNPallelic andgenotypicdistributionbetween theminusandplus`variant tails of male and female fertility EBVs Given the very low number ofanimals sequencedwedecided to use a simplemodel to exclude from furtheranalyses all markers showing no evidence of association with the EBV tailsratherthantofindsignificantassociations
We divided samples in cases (minus`variant animals for fertility EBV) andcontrols (plus`variant animals) and tested all possible models of inheritanceimplemented in thePlinkpackage for tackling simple genetic defectsWe thenused an empirical point`wise significant threshold based on permutations tohighlight SNPs having alleles or genotypes showing an uneven distributionbetweencasesandcontrolsandconsideredtheseasworthfurtherinvestigationAtotalof112SNPsin101genespassedthisfilteringprocess
$
22$Detection$of$regions$carrying$deleterious$mutation$from$800K$SNP$data
The analysis of exomes alone was therefore insufficient to tag recessivedeleteriousgenesinItalianHolsteinandwesoughtforadditionalinformationin1009ItalianHolsteinbullsgenotypedathighdensitythatarepartofthegenomicselection program run by ANAFI The rationale of the approach is that
134
deleterious recessives are expected to be absent or rare in the homozygous
status in the population Hence they may be tagged searching for genome
regions having markers or haplotypes lacking or having significantly less
homozygotesthanexpectedunderaneutralmodel
We used two complementary approaches to identify these candidate regions
One(OHW)searchedforverystrongsignalsproducedbysinglemarkersoutof
Hardy`Weinberg equilibrium at Ple847x10`8 A minimum of three significant
markersinaslidingwindowsof10wassettoreachsignificancetocapturemore
reliablesignalsavoidexcessivesinglemarkernoiseandaccount forapossible
Wahlund effect due to presence of the genetic structure in the population
difficulttoavoidinthesetofprogenytestedbullsanalysedThesecondapproach
(LOH) captured more moderate signals but from multiple markers in larger
regions Both approaches were designed to detect signals in regions in which
deleteriousvariantsarestillinsegregationandarenotexcessivelyrare
Asexpectedthetwoapproachesidentifiedregionsthatonlypartiallyoverlapped
(TablesS5S6eS7)Inparticular5regionsoutofthe11identifiedbyOHWwere
in common to the 240 identified by LOH Two of these regions overlap with
previous investigations that used the BovineSNP50 Beadchip Sahana et al
(2013) found a region candidate to carrypotential lethal haplotypes inNordic
Holstein on BTA7 in the region 6708007`10907840 This interval overlaps
withbothBTA7regionsdetectedinthisstudyVanRadenetal(2011)detected
inUSHolsteinalargeregiononBTA15spanning76Mb`82M(onBTA30)that
includedLRP4 gene responsible for themulefootdefectOuranalyseswith the
HDBeadchipindicatesontheonesidethatthelargeregiondetectedbySahana
etal(2013)mayconsistintwonearbyregionsOntheothersidethesignalwe
detectedonBTA15doesnot includetheLRP4 locusthat iscompletely fixed in
theItalianbullspopulationanalysed
In total83mutations in66genes fell in eitherorbothOHWandLOHregions
These were joined to the SNPs output of the EBV tail analyses for further
assessment
(
135
3(Assessing(filtered(deleterious(variants(
Different characteristics of the filtered deleterious variants were assessed to
identify among them strong candidates to submit to a large`scale validation
studyTheassessmentphasepermittedtogainadditionalinformationinfavour
or against SNP and gene candidature to be deleterious We quantified this
evidenceusingasubjectivescore
31$Comparative$genomics$analysis$
In the case of comparative genomics the score quantified the level of
conservationof thealternativeaminoacidresidues inducedbySNPvariants in
ItalianHolsteincattleComparisonwasrunagainstaset100vertebratespecies
that included 62 mammals The level of conservation is a good proxy for the
expected severity of a mutation In the case of the 46 SNPs that induced the
change of an amino acid highly conserved in the species investigatedwemay
hypothesize a strong selection pressure in favour of the maintenance or the
invariant amino acid at that position Therefore the amino acid change or the
stopcodonweobserved incattlemayaffectnotonlytheproteinstructureand
function but also the phenotype of homozygous animals Interestingly in all
thesecasestheSNPallelecodingforthenonconservedvariantswastheminor
allele
Converselyinallcasesofhighvariabilityofthetargetaminoacidacrossspecies
(asmanyassevendifferentaminoacidshavesometimesbeenobserved)poorly
supports the presence of phenotypic effects due to its change Sometimes the
informationcollectedisagainstthepresenceofadeleteriousmutationasinthe
case of the presence in different species of both alleles identified by exome
sequencingThis suggests thatneither of these is greatly affecting the viability
andfitnessofanimalscarryingthem
Wealsoobservedanumberoffailures(N=45)intheliftoverofSNPcoordinates
fromcattletohumanorinproteinalignmentThishappenedforexampleinthe
caseofuncharacterized cattleproteins thathavenoorthologous in thehuman
genome and of someolfactory receptors that belong to a highly variable gene
136
family Even if the failure of alignment is an indication of poor proteinconservationacrossspeciestheconservationofthetargetaminoacidcouldnotbeassessesandinthesecaseswepreferredtoconsidertheinformationmissingandtoscoreitaccordingly
32$Haplotype$analysis$
Asecondassessmentofthefilteredcandidateswasontheirbehaviourinthesirepopulation The ideal test would be the direct genotyping of the SNPs in thepopulationand theevaluationof theirallelicandgenotypicproportions In theabsenceofthisinformationwefirstinferredtheHDhaplotypeassociatedtoSNPallelesandassessedthebehaviourofthehaplotypeinthepopulationassumingthisasasurrogateoftheSNPbehaviourSincedeleteriousallelesareexpectedtoberareandthis isalsowhatweobserved fromcomparativegenomicanalysesweinvestigatethefrequencyonlyofthehaplotypecarryingtheminorSNPalleleagainsttheexpectationto findthecorrespondinghaplotypeneverhomozygousor a number of times much lower than expected from its frequency in thepopulation The strategy yielded some interesting confirmation but was onlypartiallysuccessful
Firstofalltheidentificationofauniquehaplotypecarryingtherarevariantwasnot always possible or straightforward In some cases carriers of a same rarevariantdidnotshareexactlythesamehaplotypeWeacceptedsomevariationifthe haplotype remained clearly distinguishable from those of animals notcarryingtherarevariantbutatthesametimereducedthestringencyofthetestInquiteanumberofcasesasamehaplotypecarriedbothcandidatedeleteriousSNP variants This may occur for example in the case of recent mutations IntheselastcasesthetestcouldnotbecarriedoutFinallymostrarevariantswereassociatedtorarehaplotypesthatarenotlikelytohavehomozygousindividualsinthepopulationindependentlyoftheSNPeffect
Thisassessmentwas thereforeonly significant fora fewhaplotypes thathadalarge number of heterozygous individuals and no or only a very fewhomozygotesandforthosethathadalargenumberofhomozygotes
137
33$Functional$analysis$
The final assessment was the analysis of the function of genes carrying the
candidate deleterious SNP This has some limitation since knowledge of gene
function is only partial and mainly gained from investigations on species
different from cattle In any case we considered the association to genetic
defectsinhumanandtheinductionofseverephenotypesinmouseknockoutas
a positive indication of the likely phenotypic effect of the severe protein
alterationweobservedincattleInadditiongenefunctionandexpressionprofile
was evaluated in search for evidence of gene involvement in fertility
developmentorprocessesfundamentalforcellandorganismviability
Amonggenesidentified22areassociatedtohumansyndromes(Table4)Eleven
syndromesareneurologicaldiseasesaffectingbraindevelopmentand inducing
mental retardation (eg Donnay`Barrow Syndrome and Autosomal recessive
primary microcephaly) the others influence development (eg
Cerebrooculofacioskeletal Syndrome 1 and Adams`Oliver syndrome) or organ
function(egCongenitalpulmonaryalveolarproteinosis)
Most of the same defects translated into Holstein would induce physical and
behavioural anomalies likely causingyoungbulls tobe excluded fromprogeny
testing Itshould in factbeconsideredthattheanimalsweanalysedareonthe
onesidethosethatwillmostinfluencethegeneticmakeoffuturegenerationbut
are not a random sample of Holstein animals These bulls have been progeny
testedandauthorisedtobeusedinartificialinseminationinItalyCandidatesto
progenytestingderivefromthematingofthebest1siresandbest2damsof
the Italian Holstein population At the age of 4 months they are taken to the
Italian Holstein breeder association (ANAFI) test station where they are
subjected to performance test sanitary controls and behavioural tests
Thereafterthequalityoftheirsemenischeckedbeforeauthorizationisgranted
toenterinprogenytestInthissampleofanimalsitisthereforeexpectedtofind
never at the homozygous state mutations causing lethality physical
malformation abnormal behaviour and semen infertility Also those having a
138
majoreffectonsemenqualityandhighsusceptibilitytodiseasesareexpectedto
berarelycarriedatthehomozygousstatus
Association with human syndromes is a strong evidence of the potential
phenotypiceffectofmutationsalteringthegenestructureHoweveritshouldbe
considered that variants found in humans are different fromvariants found in
cattle and that the severity of the phenotype induced by a deleterious gene is
often variable even within species Therefore we evaluated this information
togetherwiththeotherindirectevidencesofvariantstobedeleterious
WealsoenquiredthewebsiteoftheInternationalMousePhenotypeConsortium
(httpswwwmousephenotypeorg) to search for the effects of null alleles in
mice The few genes forwhichnull`mice line is available are involved in basic
functionsfortheorganismthatspanfromvision(IAH1andEPB41L4A)toaging
(RGP1)andtissuesdevelopment(FABP12RIPPLY3RGNEF)orfertility(CFAP96)
Aswithgenesassociatedtohumansyndromesmostoftheabnormalphenotypes
observedinknockmousewouldhamperyoungbullstoenterprogenytesting
Gene Ontology (GO) was evaluated on the entire gene set No enrichment on
particularcategoriesGOtermswasobservedThemostrepresentedcategories
in Biological Processes were as expected ldquometabolic processrdquo and ldquocellular
processrdquo however lower level terms ldquoreproductionrdquo and ldquodevelopmental
processrdquo are represented with meaningful hits The analysis of Molecular
Function indicates thatmany of the hits are enzymes (ldquocatalytic activityrdquo) and
haveregulatoryroles(ldquobindingrdquoandldquoenzymeregulatoractivityrdquo)Thisresultcan
be interpreted indifferentwaysOn theone sidedeleterious allelesmay exert
their deleterious effect in a number of different cellular and developmental
processes the correct process is one while there is many options to make it
wrongAlsothenumberofgenesinvestigatedisrelativelylowandformanyof
them there is little or no information available (eg 11 genes code for a
completelyuncharacterizedprotein)Inadditionsomeofthemmayturnoutnot
tobereallydeleteriousandaddnoisetotheGOanalysis
As a final step all the previous information (GO expression patterns known
function)havebeenused to estimate a scoremeasuring the importanceof the
gene function forcattle survivalandreproductionThescoringof functionwas
139
particularlydifficultGOcategoriesweregeneralandallfunctionstaggedhavearelevant importance for animal survival andproductionTherefore in this casewedidnrsquotusethewholescorerangeplanned(from`1to1)andgavenonegativescoresApositivescoreof1wasattributedonly togenesplayingan importantroleinfertilityanddevelopment
$
34$Strong$candidate$deleterious$variants$
Thescoringprocessyielded21variantswithatotalscore=035variantswithpositive and 135 with negative values (Figure 2 and Table 5) Although theprocesswasnoterrorfreeasitreliedonincompleteinformationandhadsomedegreeofsubjectivityparticularlyinthechoiceofrelativeweightsitidentifiedaset of variants that may be considered strong candidates to have deleteriouseffects in the Italian Holstein breed Variants were evaluated individually andthose in a same genes had sometimes different scores For example the fivevariantscarriedbyALPK2hadscoresrangingfrom`7to`2
140
Figure2Graphicalrepresentationoftotalscoreforeachvariants
minus505
Variation
Score
1_645923001_645985561_1381698111_1464490851_150972144
2_7478962_270362092_270455392_375933832_555609862_904645543_75418123_110403993_110404003_188945763_1137877703_1203356984_53767714_265405974_265406204_749514994_1033779064_1057246734_1130233264_1178417155_445557055_445557425_757447955_940308955_940309556_207762566_382803486_863518016_927092906_1075769256_1078851146_1090916426_1166055226_1186534167_13339757_56553577_97149147_167941797_168358027_168621947_197404097_197409057_263293538_602606098_603324848_658345648_704101558_760297748_770028908_872798628_1117618168_1128416959_83888819_83889249_237928389_510133929_1047396559_10515702910_171251510_1584300310_2802235210_2802311010_8290312110_8517371111_4630576711_4675390311_4740957711_5979141011_7832201011_8794387511_9548610911_9945389411_9946100112_1190618112_1247843312_1248084512_1413689712_5304908112_9076120913_381566413_1178793413_1178793813_5247168413_5393493013_5393494713_5393985313_5454042513_5504774013_5998820513_6304575013_6316961013_6539670814_197749414_364069514_3556894514_4688575014_5718402214_6863536315_18267915_18331715_18340415_248692
15_3018920915_3504878815_4630194115_7503314015_7503764515_8024448915_8025571015_8028875915_8038974515_8048559315_8055593115_8094968915_8275455615_8292533216_456201716_3831963616_6257435517_5309539517_5311265517_6609153117_7237753218_2562355018_3495651318_4637204118_5215480618_5408127718_5413095518_5781976819_2144410319_3107215219_3111397219_3279084319_4225178519_5139519419_5619117519_5726625120_763980120_2341043120_5888423620_6417173921_623911921_2034587821_2034628921_2683773321_3100359921_5527419521_5582646321_5817338421_5912071921_6048248421_6048251021_6284314222_5241595922_5242173422_5242204622_5242610422_5242666822_5695165722_5697484722_5779767423_4588505224_2132460424_2145336424_4676490924_5032719524_5815817824_5815862024_5815878424_5820400324_5820447925_3743318026_1811648726_2571599626_2576457027_503107827_3283255728_28422828_169040328_3571880728_4408176529_198588129_753052329_753066329_26397931
141
Table5SingleanalysesscoreandtotalforeachcandidatevariantsChrchrom
osom
ePosphysicalposition
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
164592300
ENSBTAG00000009014
UPK1B
02
I1
01
2
164598556
ENSBTAG00000009014
UPK1B
0I1
I1
00
I2
1138169811
ENSBTAG00000002966
DNAJC13
03
10
15
1146449085
ENSBTAG00000017418
RRP1B
I3
I1
I1
00
I5
1150972144
ENSBTAG00000036019
RIPPLY3
I3
I1
I1
I1
1I5
2747896
ENSBTAG00000013916
HERC2
30
10
04
227036209
ENSBTAG00000004555
LRP2
I3
I1
10
0I3
227045539
ENSBTAG00000004555
LRP2
I3
31
00
1
237593383
ENSBTAG00000032481
DAPL1
00
I1
00
I1
255560986
ENSBTAG00000032264
Uncharacterized
3I3
I1
00
I1
290464554
ENSBTAG00000017436
ALS2CR11
0I1
I1
00
I2
37541812
ENSBTAG00000046175
Uncharacterized
00
00
00
311040399
ENSBTAG00000013789
OR6K6
I3
I3
I1
00
I7
311040400
ENSBTAG00000013789
OR6K6
I3
I3
I1
00
I7
318894576
ENSBTAG00000004772
THEM
40
I3
I1
00
I4
3113787770
ENSBTAG00000000149
USP40
11
I1
01
2
3120335698
ENSBTAG00000002809
CAPN10
I3
21
00
0
45376771
ENSBTAG00000001500
FIGNL1
01
I1
00
0
426540597
ENSBTAG00000033954
PRPS1L1
1I3
I1
00
I3
426540620
ENSBTAG00000033954
PRPS1L1
1I1
I1
00
I1
474951499
ENSBTAG00000003508
CFAP69
I3
I3
I1
10
I6
4103377906
ENSBTAG00000021073
KIAA1549
0I1
11
01
4105724673
ENSBTAG00000019265
AGK
01
10
02
142
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
4113023326
ENSBTAG00000014768
ZNF786
I3
I3
I1
00
I7
4117841715
ENSBTAG00000017505
PAXIP1
0I1
I1
00
I2
544555705
ENSBTAG00000026323
LYSB
I3
3I1
00
I1
544555742
ENSBTAG00000026323
LYSB
I3
0I1
00
I4
575744795
ENSBTAG00000009064
CSF2RB
I3
I3
10
0I5
594030895
ENSBTAG00000009444
SLC15A5
I3
I1
I1
00
I5
594030955
ENSBTAG00000009444
SLC15A5
I3
I3
I1
00
I7
620776256
ENSBTAG00000008797
Uncharacterized
I3
I3
00
0I6
638280348
ENSBTAG00000007568
MEPE
0I3
I1
00
I4
686351801
ENSBTAG00000003523
UGT2B15
1I1
I1
00
I1
692709290
ENSBTAG00000010954
ART3
I3
2I1
00
I2
6107576925
ENSBTAG00000038606
DBDR
0I3
I1
00
I4
6107885114
ENSBTAG00000001505
GRK4
I3
1I1
00
I3
6109091642
ENSBTAG00000007001
SLC26A1
1I3
I1
00
I3
6116605522
ENSBTAG00000014058
LDB2
01
I1
00
0
6118653416
ENSBTAG00000012458
PSAPL1
I3
I1
I1
00
I5
71333975
ENSBTAG00000015602
C7H5orf45
I3
I3
I1
00
I7
75655357
ENSBTAG00000045588
LOC100298356
1I3
I1
00
I3
79714914
ENSBTAG00000046423
LOC618007
00
I1
00
I1
716794179
ENSBTAG00000012314
LDLR
02
10
03
716835802
ENSBTAG00000009568
KANK2
01
I1
00
0
716862194
ENSBTAG00000009569
DOCK6
01
10
02
719740409
ENSBTAG00000000414
FUT5
I3
I3
I1
00
I7
719740905
ENSBTAG00000000414
FUT5
I3
I3
I1
00
I7
143
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
726329353
ENSBTAG00000004860
SLC27A6
0I1
I1
00
I2
860260609
ENSBTAG00000011420
CA9
I3
I1
I1
00
I5
860332484
ENSBTAG00000011433
RGP1
I3
0I1
10
I3
865834564
ENSBTAG00000035708
LOC527795
00
I1
00
I1
870410155
ENSBTAG00000005466
C8H8orf58
I3
I3
I1
00
I7
876029774
ENSBTAG00000015027
Clusterin
00
I1
10
0
877002890
ENSBTAG00000012921
KIF24
12
I1
00
2
887279862
ENSBTAG00000002220
SPTLC1
00
10
01
8111761816
ENSBTAG00000006532
CDK5RAP2
I3
I3
11
1I3
8112841695
ENSBTAG00000013838
ALLC
01
I1
00
0
98388881
ENSBTAG00000015335
BAI3
00
I1
00
I1
98388924
ENSBTAG00000015335
BAI3
00
I1
00
I1
923792838
ENSBTAG00000046593
Uncharacterized
03
00
03
951013392
ENSBTAG00000018528
USP45
02
I1
00
1
9104739655
ENSBTAG00000018810
THBS2
03
10
15
9105157029
ENSBTAG00000000918
ERMARD
I3
21
00
0
10
1712515
ENSBTAG00000014750
EPB41L4A
0I1
I1
I1
0I3
10
15843003
ENSBTAG00000009030
NOX5
I3
I3
I1
00
I7
10
28022352
ENSBTAG00000035309
OR4F15
I3
1I1
00
I3
10
28023110
ENSBTAG00000035309
OR4F15
I3
I1
I1
00
I5
10
82903121
ENSBTAG00000012565
PCNX
I3
2I1
01
I1
10
85173711
ENSBTAG00000011683
NUM
BI3
0I1
00
I4
11
46305767
ENSBTAG00000002288
NT5DC4
0I3
I1
00
I4
11
46753903
ENSBTAG00000019072
PSD4
0I1
I1
00
I2
144
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
11
47409577
ENSBTAG00000009033
TEX37
I3
I2
I1
10
I5
11
59791410
ENSBTAG00000006027
USP34
02
I1
01
211
78322010
ENSBTAG00000015490
HS1BP3
00
I1
00
I1
11
87943875
ENSBTAG00000001140
IAH1
I3
3I1
10
011
95486109
ENSBTAG00000017576
GPR144
I3
I3
I1
00
I7
11
99453894
ENSBTAG00000038831
PHYHD1
11
I1
00
111
99461001
ENSBTAG00000010601
DOLK
12
10
04
12
11906181
ENSBTAG00000001133
VWA8
I3
1I1
00
I3
12
12478433
ENSBTAG00000014752
AKAP11
1I3
I1
I1
0I4
12
12480845
ENSBTAG00000014752
AKAP11
1I1
I1
I1
0I2
12
14136897
ENSBTAG00000016691
LACC1
I3
I3
I1
00
I7
12
53049081
ENSBTAG00000032821
SCEL
I3
I3
I1
00
I7
12
90761209
ENSBTAG00000019961
ATP4B
I3
I3
I1
00
I7
13
3815664
ENSBTAG00000034991
SLX4IP
00
I1
00
I1
13
11787934
ENSBTAG00000008650
CAMK
1D
I3
0I1
00
I4
13
11787938
ENSBTAG00000008650
CAMK
1D
I3
0I1
00
I4
13
52471684
ENSBTAG00000013753
DDRGK1
02
I1
00
113
53934930
ENSBTAG00000039520
SIRPB1
0I3
I1
00
I4
13
53934947
ENSBTAG00000039520
SIRPB1
00
I1
00
I1
13
53939853
ENSBTAG00000039520
SIRPB1
0I3
I1
00
I4
13
54540425
ENSBTAG00000047150
RTEL
I3
0I1
00
I4
13
55047740
ENSBTAG00000031718
OGFR
01
I1
00
013
59988205
ENSBTAG00000027390
RTFDC1
01
I1
00
013
63045750
ENSBTAG00000009144
BPIFA2A
0I3
I1
00
I4
145
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
13
63169610
ENSBTAG00000019752
BPIFA2B
0I1
I1
00
I2
13
65396708
ENSBTAG00000006021
CEP250
0I3
I1
00
I4
14
1977494
ENSBTAG00000026350
SPATC1
I3
I3
I1
00
I7
14
3640695
ENSBTAG00000015985
GPR20
00
I1
00
I1
14
35568945
ENSBTAG00000037399
Uncharacterized
30
I1
00
2
14
46885750
ENSBTAG00000047661
FABP12
0I1
I1
00
I2
14
57184022
ENSBTAG00000026247
PKHD1L1
I3
3I1
00
I1
14
68635363
ENSBTAG00000017165
MATN2
01
I1
00
0
15
182679
ENSBTAG00000046228
Olfactoryreceptor
00
I1
0o
I1
15
183317
ENSBTAG00000046228
Olfactoryreceptor
00
I1
00
I1
15
183404
ENSBTAG00000046228
Olfactoryreceptor
00
I1
00
I1
15
248692
ENSBTAG00000045558
LOC100299084
00
I1
0o
I1
15
30189209
ENSBTAG00000005357
VPS11
02
I1
00
1
15
35048788
ENSBTAG00000005340
SERGEF
0I3
I1
00
I4
15
46301941
ENSBTAG00000002289
NLRP14
I3
I3
I1
00
I7
15
75033140
ENSBTAG00000012053
ACCSL
1I3
I1
00
I3
15
75037645
ENSBTAG00000012053
ACCSL
I3
I3
I1
00
I7
15
80244489
ENSBTAG00000046527
LOC100139830
I3
0I1
00
I4
15
80255710
ENSBTAG00000046285
OR8I2
10
I1
00
0
15
80288759
ENSBTAG00000001291
OR8H3
1I1
I1
00
I1
15
80389745
ENSBTAG00000045709
LOC514057
13
I1
00
3
15
80485593
ENSBTAG00000035985
LOC101909743OR8K1
1I1
I1
00
I1
15
80555931
ENSBTAG00000022858
LOC783561
I3
0I1
00
I4
15
80949689
ENSBTAG00000039110
OR8U1
1I3
I1
00
I3
146
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
15
82754556
ENSBTAG00000021375
OR10Q1
01
I1
00
015
82925332
ENSBTAG00000021233
OR5B17
I3
0I1
00
I4
16
4562017
ENSBTAG00000019799
FCAM
RI3
I3
I1
00
I7
16
38319636
ENSBTAG00000014794
SCYL3
I3
0I1
00
I4
16
62574355
ENSBTAG00000035226
TOR1AIP1
00
I1
00
I1
17
53095395
ENSBTAG00000020414
DHX37
1I1
I1
00
I1
17
53112655
ENSBTAG00000020414
DHX37
0I3
I1
00
I4
17
66091531
ENSBTAG00000010847
FOXN4
0I1
I1
00
I2
17
72377532
ENSBTAG00000008996
SFI1
0I3
I1
00
I4
18
25623550
ENSBTAG00000005170
GPR114
0I3
I1
01
I3
18
34956513
ENSBTAG00000001286
ELMO
33
3I1
11
718
46372041
ENSBTAG00000015912
DMKN
0
1I1
00
018
52154806
ENSBTAG00000001262
IRGQ
I3
1I1
00
I3
18
54081277
ENSBTAG00000009337
PNMA
L1
0I3
I1
00
I4
18
54130955
ENSBTAG00000048021
PNMA
L2
I3
1I1
00
I3
18
57819768
ENSBTAG00000003231
LIM2
1
I3
10
0I1
19
21444103
ENSBTAG00000011011
SSH2
I3
I3
I1
00
I7
19
31072152
ENSBTAG00000022509
DNAH
9I3
0I1
00
I4
19
31113972
ENSBTAG00000022509
DNAH
90
0I1
00
I1
19
32790843
ENSBTAG00000015294
COX10
12
10
04
19
42251785
ENSBTAG00000013685
KRT31
I3
0I1
00
I4
19
51395194
ENSBTAG00000015980
FASN
1I1
10
01
19
56191175
ENSBTAG00000007910
EXOC7
3I3
I1
00
I1
19
57266251
ENSBTAG00000011387
NAT9
12
I1
00
2
147
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
20
7639801
ENSBTAG00000005633
RGNEF
1I1
I1
10
0
20
23410431
ENSBTAG00000008871
DDX4
1I3
I1
00
I3
20
58884236
ENSBTAG00000005514
TRIO
I3
0I1
00
I4
20
64171739
ENSBTAG00000005021
SEMA5A
01
10
02
21
6239119
ENSBTAG00000016495
LINS
02
10
03
21
20345878
ENSBTAG00000012326
LOC101906218
00
I1
00
I1
21
20346289
ENSBTAG00000012326
LOC101906218
I3
0I1
00
I4
21
26837733
ENSBTAG00000047587
Uncharacterized
10
00
01
21
31003599
ENSBTAG00000047166
SH2D7
0I3
I1
00
I4
21
55274195
ENSBTAG00000039117
FAM179B
0I3
I1
00
I4
21
55826463
ENSBTAG00000014001
MAP1A
0I3
I1
00
I4
21
58173384
ENSBTAG00000009836
CHGA
I3
I1
I1
00
I5
21
59120719
ENSBTAG00000046590
FAM181A
01
I1
00
0
21
60482484
ENSBTAG00000017096
ERICH1
0I3
I1
00
I4
21
60482510
ENSBTAG00000017096
ERICH1
01
I1
00
0
21
62843142
ENSBTAG00000012833
ATG2B
0I3
I1
00
I4
22
52415959
ENSBTAG00000015839
MAP4
00
I1
00
I1
22
52421734
ENSBTAG00000015839
MAP4
1I1
I1
00
I1
22
52422046
ENSBTAG00000015839
MAP4
I3
I3
I1
00
I7
22
52426104
ENSBTAG00000047174
Uncharacterized
I3
00
00
I3
22
52426668
ENSBTAG00000047174
Uncharacterized
10
00
01
22
56951657
ENSBTAG00000021645
MBD4
1I1
I1
00
I1
22
56974847
ENSBTAG00000021643
EFCAB12
0I1
I1
00
I2
22
57797674
ENSBTAG00000031115
Uncharacterized
01
00
01
148
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
23
45885052
ENSBTAG00000005248
OFCC1
00
I1
00
I1
24
21324604
ENSBTAG00000000998
SLC39A6
02
I1
00
1
24
21453364
ENSBTAG00000011622
C18orf21
I3
I3
I1
00
I7
24
46764909
ENSBTAG00000015210
LOXHD1
0I3
10
0I2
24
50327195
ENSBTAG00000002660
CCDC11
I3
11
01
0
24
58158178
ENSBTAG00000014284
ALPK2
I3
I1
I1
00
I5
24
58158620
ENSBTAG00000014284
ALPK2
0I1
I1
00
I2
24
58158784
ENSBTAG00000014284
ALPK2
I3
I3
I1
00
I7
24
58204003
ENSBTAG00000014284
ALPK2
I3
I3
I1
00
I7
24
58204479
ENSBTAG00000014284
ALPK2
I3
2I1
00
I2
25
37433180
ENSBTAG00000040568
Uncharacterized
00
I1
00
I1
26
18116487
ENSBTAG00000046239
C10orf12
0I1
I1
00
I2
26
25715996
ENSBTAG00000010522
Uncharacterized
10
00
01
26
25764570
ENSBTAG00000046727
Olfactoryreceptor
00
I1
00
I1
27
5031078
ENSBTAG00000007340
LOC101906349
0I1
I1
00
I2
27
32832557
ENSBTAG00000011684
RAB11FIP1
10
I1
00
0
28
284228
ENSBTAG00000000079
CCSAP
0I1
I1
00
I2
28
1690403
ENSBTAG00000048153
Uncharacterized
10
00
01
28
35718807
ENSBTAG00000047317
CL43
1I1
I1
00
I1
28
44081765
ENSBTAG00000032527
ERCC6
I3
01
00
I2
29
1985881
ENSBTAG00000004081
FAT3
30
I1
00
2
29
7530523
ENSBTAG00000046360
Uncharacterized
01
00
01
29
7530663
ENSBTAG00000046360
Uncharacterized
1I1
00
00
29
26397931
ENSBTAG00000013568
UEVLD
I3
I3
I1
00
I7
149
Interestingly13ofthe22genesassociatedtohumansyndromescarryvariants
appearing in the short list of the 35 positives and only 6 in that of 135 with
negative values This in spite of the low weight (from 1 to 1) given to this
parameterEightofthe35positivevariantsareinproteinsstilluncharacterized
Sixvariantshadascoreof4orhigherThehighestscore(totalscore=7)wasfor
ELMO3 (Engulfment and cell motility gene 3) This gene is involved in cell
motility and migration ELMO3 is also a homolog of C( elegans Ced12 that is
required for many developmental processes included gonadal morphogenesis
Mutant for this gene in C( elegans suffer from abnormal development usually
with lethal consequences or sublethal developmental defects including small
embryosdelayeddevelopmentandreducedfertility(Gumiennyetal2001)In
human it is expressed in testis prostate and ovary
(httpwwwgenecardsorgcgibincarddispplgene=ELMO3)
ThegeneDNAJC13(DnaJ(Hsp40)homologsubfamilyCmember13)hadatotal
score of 5 This gene is involved in the onset of Parkinson disease DNAJC13
regulates the dynamics of clathrin coats on early endosomes Cellular analysis
shows that the mutation Arg855Ser identified by VilarintildeoGuumlell et al (2014)
confers a gainoffunction that impairs endosomal transportApossible roleof
DNAJC13geneinTourettesyndromechronicticdisorderisunderinvestigation
(Sundarametal2011)
A second genehad a scoreof 5THBS2 (Thrombospondin2) It belongs to the
thrombospondin family and is a powerful inhibitor of tumour growth and
angiogenesis It is highly expressed in testis in fetal Leydig cell Hirose et al
(2008) found a significant association between an intronic SNP in the THBS2
gene and lumbar disc herniation Thrombospondin2 is also a matricellular
proteinfoundinhumanserumDeletionofTSP2causesagedependentdilated
cardiomyopathy (Hanatani et al 2014) Mutant mice for this gene display a
variety of mutant phenotype Kyriakides et al 1998 suggest that THBS2
modulates the cell surface properties of mesenchymal cells affecting cell
functionssuchasadhesionandmigration
150
Three genes had a score of 4 all three are associated to human syndromes
Mutation in HERC2 (HECT and RLD domain containing E3 ubiquitin protein
ligase2)isinvolvedinautosomalrecessivementalretardation38(Puffenberger
etal2012)Jietal(2000)identifiedinmutantmouseneuromuscularsecretory
vesicledefectsandspermacrosomedefectsMoreovergeneticvariationsinthis
gene are associated with skinhaireye pigmentation variability Mutations in
DOLK gene (dolichol kinase) are associated with dolichol kinase deficiency
(Lefeberetal2011)COX10(cytochromecoxidaseassemblyhomolog10)gene
is associated to mitochondrial complex IV deficiency causing Leigh syndrome
(Antonickaetal2003)
Conclusions)
Exomesequencingthereforeprovidedvaluableinformationoncodingsequence
variants and on their potential effect on protein function As a group variants
classifiedasdeleteriouswererarerthanvariantsclassifiedasneutralSincethis
isanexpectedeffectofpurifyingselectionthissetofvariantwaslikelyenriched
forvariantshavingaphenotypiceffect
Theinclusionintheanalysesofplusandminusvariantanimalsforfertilitytraits
likelypermittedtosequencethemostimportantvariantsinfluencingthesetraits
butthelownumberofanimalssequencedcombinedwiththecomplexityofthe
traitpermittedtofindonlyasuggestiveassociationbetweenSNPsandtraitsItrsquos
likely that amuch larger number of individuals is to be sequenced if genome
wideassociationisthestrategyoneswanttopursue
We explored an alternative strategy to confirm suggestivedeleterious variants
based on indirect evidence of the variant effect by assessing i) evolutionary
constraintsassociatedtotheaminoacidresiduecodedbythevariantii)variant
behaviour in a large population In the absence of direct genotyping of the
candidatevariantsweusedhaplotypeinformationassurrogateiii)functionand
effectsinotherspecies
151
We attempted to quantify these criteria through a score subjective and only
indicativebutpermitted tohighlighted35genes so farneglected in cattlebut
deservingfurtherattentionandpotentiallyhavingdeleteriouseffectsinHolstein
andothercattlebreedsThesemaybepartoftherecessivedeleterioussetthat
constitutethegeneticloadofItalianHolstein
References)AntonickaHLearySCGuercinGHAgarJNHorvathRKennawayNG
HardingCOJakschMShoubridgeEA2003MutationsinCOX10
resultinadefectinmitochondrialhemeAbiosynthesisandaccountfor
multipleearlyonsetclinicalphenotypesassociatedwithisolatedCOX
deficiencyHumMolGenet122693ndash2702doi101093hmgddg284
AshwellMSHeyenDWSonstegardTSVanTassellCPDaYVanRaden
PMRonMWellerJILewinHA2004DetectionofQuantitativeTrait
LociAffectingMilkProductionHealthandReproductiveTraitsin
HolsteinCattleJournalofDairyScience87468ndash475
doi103168jdsS00220302(04)731860
AstonKICarrellDT2009GenomeWideStudyofSingleNucleotide
PolymorphismsAssociatedWithAzoospermiaandSevere
OligozoospermiaJournalofAndrology30711ndash725
doi102164jandrol109007971
AulchenkoYSRipkeSIsaacsADuijnCMvan2007GenABELanRlibrary
forgenomewideassociationanalysisBioinformatics231294ndash1296
doi101093bioinformaticsbtm108
BamshadMJNgSBBighamAWTaborHKEmondMJNickersonDA
ShendureJ2011ExomesequencingasatoolforMendeliandisease
genediscoveryNatRevGenet12745ndash755doi101038nrg3031
BieseckerLGShiannaKVMullikinJC2011Exomesequencingtheexpert
viewGenomeBiology12128doi101186gb2011129128
152
BiffaniSMarusiMBiscariniFCanavesiF2005Developingagenetic
evaluationforfertilityusingangularityandmilkyieldascorrelatedtraits
InterbullBulletin063
BiffaniSSamoreacuteABCanavesiF2002Inbreedingdepressionforproduction
reproductionandfunctionaltraitsinItalianHolsteincattleInstitut
NationaldelaRechercheAgronomique(INRA)pp0ndash4
BolgerAMLohseMUsadelB2014TrimmomaticAflexibletrimmerfor
IlluminaSequenceDataBioinformaticsbtu170
doi101093bioinformaticsbtu170
BrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingand
MissingDataInferenceforWholeGenomeAssociationStudiesByUseof
LocalizedHaplotypeClusteringAmJHumGenet811084ndash1097
CharlesworthDWillisJH2009ThegeneticsofinbreedingdepressionNat
RevGenet10783ndash796doi101038nrg2664
ChunSFayJC2009Identificationofdeleteriousmutationswithinthree
humangenomesGenomeRes191553ndash1561
doi101101gr092619109
ConsortiumTEP2012AnintegratedencyclopediaofDNAelementsinthe
humangenomeNature48957ndash74doi101038nature11247
DePristoMABanksEPoplinRGarimellaKVMaguireJRHartlC
PhilippakisAAdelAngelGRivasMAHannaMMcKennaA
FennellTJKernytskyAMSivachenkoAYCibulskisKGabrielSB
AltshulerDDalyMJ2011Aframeworkforvariationdiscoveryand
genotypingusingnextgenerationDNAsequencingdataNatGenet43
491ndash498doi101038ng806
GoymerP2007SynonymousmutationsbreaktheirsilenceNatRevGenet8
92ndash92doi101038nrg2056
GumiennyTLBrugneraEToselloTrampontACKinchenJMHaneyLB
NishiwakiKWalkSFNemergutMEMacaraIGFrancisRSchedl
TQinYVanAelstLHengartnerMORavichandranKS2001CED
12ELMOaNovelMemberoftheCrkIIDock180RacPathwayIs
RequiredforPhagocytosisandCellMigrationCell10727ndash41
doi101016S00928674(01)005207
153
HanataniSIzumiyaYTakashioSKimuraYArakiSRokutandaTTsujita
KYamamotoETanakaTYamamuroMKojimaSTayamaS
KaikitaKHokimotoSOgawaH2014Circulatingthrombospondin2
reflectsdiseaseseverityandpredictsoutcomeofheartfailurewith
reducedejectionfractionCircJ78903ndash910
HiroseYChibaKKarasugiTNakajimaMKawaguchiYMikamiY
FuruichiTMioFMiyakeAMiyamotoTOzakiKTakahashiA
MizutaHKuboTKimuraTTanakaTToyamaYIkegawaS2008
AfunctionalpolymorphisminTHBS2thataffectsalternativesplicingand
MMPbindingisassociatedwithlumbardischerniationAmJHum
Genet821122ndash1129doi101016jajhg200803013
httpsgenomeucsceduindexhtml[WWWDocument]nd
JamrozikJFatehiJKistemakerGJSchaefferLR2005EstimatesofGenetic
ParametersforCanadianHolsteinFemaleReproductionTraitsJournalof
DairyScience882199ndash2208doi103168jdsS00220302(05)728952
JanisEWiggintonDJC2005AnoteonexacttestsofHardyWeinberg
equilibriumAmericanjournalofhumangenetics76887ndash93
doi101086429864
JiYRebertNAJoslinJMHigginsMJSchultzRANichollsRD2000
StructureoftheHighlyConservedHERC2GeneandofMultiplePartially
DuplicatedParalogsinHumanGenomeRes10319ndash329
KentWJSugnetCWFureyTSRoskinKMPringleTHZahlerAM
HausslerandD2002TheHumanGenomeBrowseratUCSCGenome
Res12996ndash1006doi101101gr229102
KumarPHenikoffSNgPC2009Predictingtheeffectsofcodingnon
synonymousvariantsonproteinfunctionusingtheSIFTalgorithmNat
Protoc41073ndash1081doi101038nprot200986
KyriakidesTRZhuYHSmithLTBainSDYangZLinMTDanielson
KGIozzoRVLaMarcaMMcKinneyCEGinnsEIBornsteinP
1998Micethatlackthrombospondin2displayconnectivetissue
abnormalitiesthatareassociatedwithdisorderedcollagenfibrillogenesis
anincreasedvasculardensityandableedingdiathesisJCellBiol140
419ndash430
154
LaneEACroweMABeltmanMEMoreSJ2013Theinfluenceofcowand
managementfactorsonreproductiveperformanceofIrishseasonal
calvingdairycowsAnimReprodSci14134ndash41
doi101016janireprosci201306019
LefeberDJdeBrouwerAPMMoravaERiemersmaMSchuurs
HoeijmakersJHMAbsmannerBVerrijpKvandenAkkerWMR
HuijbenKSteenbergenGvanReeuwijkJJozwiakAZuckerN
LorberALammensMKnopfCvanBokhovenHGruumlnewaldSLehle
LKapustaLMandelHWeversRA2011AutosomalRecessive
DilatedCardiomyopathyduetoDOLKMutationsResultsfromAbnormal
DystroglycanOMannosylationPLoSGenet7e1002427
doi101371journalpgen1002427
LiHDurbinR2009FastandaccurateshortreadalignmentwithBurrows
WheelertransformBioinformatics251754ndash1760
doi101093bioinformaticsbtp324
LiHHandsakerBWysokerAFennellTRuanJHomerNMarthG
AbecasisGDurbinR1000GenomeProjectDataProcessingSubgroup
2009TheSequenceAlignmentMapformatandSAMtoolsBioinformatics
252078ndash2079doi101093bioinformaticsbtp352
LiMXKwanJSHBaoSYYangWHoSLSongYQShamPC2013
PredictingMendelianDiseaseCausingNonSynonymousSingle
NucleotideVariantsinExomeSequencingStudiesPLoSGenet9
e1003143doi101371journalpgen1003143
MacArthurDGBalasubramanianSFrankishAHuangNMorrisJWalter
KJostinsLHabeggerLPickrellJKMontgomerySBAlbersCA
ZhangZConradDFLunterGZhengHAyubQDePristoMA
BanksEHuMHandsakerRERosenfeldJFromerMJinMMuXJ
KhuranaEYeKKayMSaundersGISunerMMHuntTBarnes
IHAAmidCCarvalhoSilvaDRBignellAHSnowCYngvadottirB
BumpsteadSCooperDNXueYRomeroIGWangJLiYGibbs
RAMcCarrollSADermitzakisETPritchardJKBarrettJCHarrow
JHurlesMEGersteinMBTylerSmithC2012Asystematicsurvey
155
oflossoffunctionvariantsinhumanproteincodinggenesScience335
823ndash828doi101126science1215040
MajewskiJSchwartzentruberJLalondeEMontpetitAJabadoN2011
WhatcanexomesequencingdoforyouJMedGenet48580ndash589
doi101136jmedgenet2011100223
McClureMCBickhartDNullDVanRadenPXuLWiggansGLiuG
SchroederSGlasscockJArmstrongJColeJBVanTassellCP
SonstegardTS2014BovineExomeSequenceAnalysisandTargeted
SNPGenotypingofRecessiveFertilityDefectsBH1HH2andHH3Reveal
aPutativeCausativeMutationinSMC2forHH3PLoSONE9e92769
doi101371journalpone0092769
McClureMKimEBickhartDNullDCooperTColeJWiggansGAjmone
MarsanPColliLSantusELiuGESchroederSMatukumalliLVan
TassellCSonstegardT2013FineMappingforWeaverSyndromein
BrownSwissCattleandtheIdentificationof41ConcordantMutations
acrossNRCAMPNPLA8andCTTNBP2PLoSONE8e59251
doi101371journalpone0059251
McKennaAHannaMBanksESivachenkoACibulskisKKernytskyA
GarimellaKAltshulerDGabrielSDalyMDePristoMA2010The
GenomeAnalysisToolkitAMapReduceframeworkforanalyzingnext
generationDNAsequencingdataGenomeRes201297ndash1303
doi101101gr107524110
MiHDongQMuruganujanAGaudetPLewisSThomasPD2010
PANTHERversion7improvedphylogenetictreesorthologsand
collaborationwiththeGeneOntologyConsortiumNuclAcidsRes38
D204ndashD210doi101093nargkp1019
MrodeRKearneyJFBiffaniSCoffeyMCanavesiF2009Short
communicationGeneticrelationshipsbetweentheHolsteincow
populationsofthreeEuropeandairycountriesJournalofDairyScience
925760ndash5764doi103168jds20081931
PelosoGMAuerPLBisJCVoormanAMorrisonACStitzielNOBrody
JAKhetarpalSACrosbyJRFornageMIsaacsAJakobsdottirJ
FeitosaMFDaviesGHuffmanJEManichaikulADavisBLohman
156
KJoonAYSmithAVGroveMLZanoniPRedonVDemissieS
LawsonKPetersUCarlsonCJacksonRDRyckmanKKMackey
RHRobinsonJGSiscovickDSSchreinerPJMychaleckyjJC
PankowJSHofmanAUitterlindenAGHarrisTBTaylorKD
StaffordJMReynoldsLMMarioniREDehghanAFrancoOHPatel
APLuYHindyGGottesmanOBottingerEPMelanderOOrho
MelanderMLoosRJFDugaSMerliniPAFarrallMGoelA
AsseltaRGirelliDMartinelliNShahSHKrausWELiMRader
DJReillyMPMcPhersonRWatkinsHArdissinoDZhangQWang
JTsaiMYTaylorHACorreaAGriswoldMELangeLAStarrJM
RudanIEiriksdottirGLaunerLJOrdovasJMLevyDChenYDI
ReinerAPHaywardCPolasekODearyIJBoreckiIBLiuY
GudnasonVWilsonJGvanDuijnCMKooperbergCRichSSPsaty
BMRotterJIOrsquoDonnellCJRiceKBoerwinkleEKathiresanS
CupplesLA2014AssociationofLowFrequencyandRareCoding
SequenceVariantswithBloodLipidsandCoronaryHeartDiseasein
56000WhitesandBlacksTheAmericanJournalofHumanGenetics94
223ndash232doi101016jajhg201401009
PryceJEVeerkampRFThompsonRHillWGSimmG1997Genetic
aspectsofcommonhealthdisordersandmeasuresoffertilityinHolstein
FriesiandairycattleAnimalScience65353ndash360
doi101017S1357729800008559
PryceJEWoolastonRBerryDPWallEWintersMButlerRShafferM
2014WorldTrendsinDairyCowFertilityin10ThWorldCongressof
GeneticsAppliedtoLivestockProduction
PuffenbergerEGJinksRNWangHXinBFiorentiniCShermanEA
DegrazioDShawCSougnezCCibulskisKGabrielSKelleyRI
MortonDHStraussKA2012Ahomozygousmissensemutationin
HERC2associatedwithglobaldevelopmentaldelayandautismspectrum
disorderHumMutat331639ndash1646doi101002humu22237
PurcellSNealeBToddBrownKThomasLFerreiraMARBenderD
MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKA
157
ToolSetforWholeGenomeAssociationandPopulationBasedLinkage
AnalysesAmJHumGenet81559ndash575
SahanaGNielsenUSAamandGPLundMSGuldbrandtsenB2013Novel
HarmfulRecessiveHaplotypesIdentifiedforFertilityTraitsinNordic
HolsteinCattlePLoSONE8e82909doi101371journalpone0082909
SchuumltzEScharfensteinMBrenigB2008Implicationofcomplexvertebral
malformationandbovineleukocyteadhesiondeficiencyDNAbased
testingondiseasefrequencyintheHolsteinpopulationJDairySci91
4854ndash4859doi103168jds20081154
SimmonsMJCrowJF1977MutationsAffectingFitnessinDrosophila
PopulationsAnnualReviewofGenetics1149ndash78
doi101146annurevge11120177000405
SoggiuAPirasCHusseinHACanioMDGaviraghiAGalliAUrbaniA
BonizziLRoncadaP2013UnravellingthebullfertilityproteomeMol
BioSyst91188ndash1195doi101039C3MB25494A
StitzielNOKiezunASunyaevS2011Computationalandstatistical
approachestoanalyzingvariantsidentifiedbyexomesequencing
GenomeBiology12227doi101186gb2011129227
SundaramSKHuqAMSunZYuWBennettLWilsonBJBehenME
ChuganiHT2011ExomesequencingofapedigreewithTourette
syndromeorchronicticdisorderAnnNeurol69901ndash904
doi101002ana22398
VanRadenPMOlsonKMNullDJHutchisonJL2011Harmfulrecessive
effectsonfertilitydetectedbyabsenceofhomozygoushaplotypesJournal
ofDairyScience946153ndash6161doi103168jds20114624
VilarintildeoGuumlellCRajputAMilnerwoodAJShahBSzuTuCTrinhJYuI
EncarnacionMMunsieLNTapiaLGustavssonEKChouP
TatarnikovIEvansDMPishottaFTVoltaMBeccanoKellyD
ThompsonCLinMKShermanHEHanHJGuentherBL
WassermanWWBernardVRossCJAppelCresswellSStoesslAJ
RobinsonCADicksonDWRossOAWszolekZKAaslyJOWuR
MHentatiFGibsonRAMcPhersonPSGirardMRajputMRajput
158
AHFarrerMJ2014DNAJC13mutationsinParkinsondiseaseHumMolGenet231794ndash1801doi101093hmgddt570
WangKLiMHakonarsonH2010ANNOVARfunctionalannotationofgeneticvariantsfromhighthroughputsequencingdataNuclAcidsRes38e164ndashe164doi101093nargkq603
WathesDCPollottGEJohnsonKFRichardsonHCookeJS2014HeiferfertilityandcarryoverconsequencesforlifetimeproductionindairyandbeefcattleAnimal8Suppl191ndash104doi101017S1751731114000755
YangYMuznyDMReidJGBainbridgeMNWillisAWardPABraxtonABeutenJXiaFNiuZHardisonMPersonRBekheirniaMRLeducMSKirbyAPhamPScullJWangMDingYPlonSELupskiJRBeaudetALGibbsRAEngCM2013ClinicalWhole
ExomeSequencingfortheDiagnosisofMendelianDisordersNewEnglandJournalofMedicine3691502ndash1511doi101056NEJMoa1306555
ZiminAVDelcherALFloreaLKelleyDRSchatzMCPuiuDHanrahanFPerteaGTassellCPVSonstegardTSMarccedilaisGRobertsMSubramanianPYorkeJASalzbergSL2009Awholegenome
assemblyofthedomesticcowBostaurusGenomeBiology10R42doi101186gb2009104r42
159
Supplementary)files)
YoucanfindChapter6supplementaryfilesat
- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)
- GoogleDrive(httptinyurlcomPhDMMChapter06)orscantheQRcode
)
Supplementary)Figure)S1)Coverage)analyses)
A)Averagecoverageperanimals
B)Boxplotofchromosomescoverageperanimal
C)BoxplotofanimalcoverageperchromosomeBluelinerepresenttheexpected
50Xcoverage
Supplementary)Figure)S2)Distribution)of)variant)sequencing)depth)
)
Supplementary) Figure) S3) Correlation) between) sequencing) depth) and)
number) of) discordant) genotypes) In) red) values) for) two) outlier) animals)
having)concordance)lt846))
)
Supplementary)Figure)S4)Multidimensional)scaling)of)Italian)Holstein)bulls)
sequenced))
)
Supplementary)Figure)S5)Multidimensional)scaling)of)Italian)Holstein)bulls)
analysed)with)the)800K)SNPchip))
160
Supplementary) Figure) S6) MAF) distribution) on) neutral) (blue)) and)
deleterious)(red))variants))
)
Supplementary) Table) S1) EBV) distribution) on) Italian) Holstein) population)
and)SNPchip)dataset)
Sheet1EBVthresholdvaluetodefineplusandminustailforPROT(kgprotein)
SCC(somaticcellcount)FERT(fertility)andLONG(functional longevity) All
ItalianHolsteinFAbullswereincludeinthisanalysis
Sheet2Numberofdatasetanimalsineachtailforeachtrait
Supplementary)Table)S2)Exome)probes)statistics)number)and)bp)covered)
per)chromosomes)
Supplementary) Table) S3) Concordance) between) animal) genotypes) in)
sequencing)and)SNPchip)data)
Error1 HD data homozygotes ndash Exome data heterozygotes Error2 HD data
heterozygotesndashExomedatahomozygotes
Supplementary) Table) S4) List) of) suggestive) genes) deriving) from) the)
analysis)of)exomes)of)animals) in)the)tails)of)male)and)female)fertility)EBV)
distributions))
Header(code
bull Bfreq_totFrequencyofBalleleinentiredataset
bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset
bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset
bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset
bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset
bull CallRate_totSNPcallratendashentiredataset
bull CallRate_MHSNPcallratendashhighmalefertilitydataset
bull CallRate_MLSNPcallratendashlowmalefertilitydataset
bull CallRate_FHSNPcallratendashhighfemalefertilitydataset
bull CallRate_FLSNPcallratendashhighmalefertilitydataset
bull HWeqHardyWeinbergequilibriumpvalue
bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele
bull Geno_HetNumberofanimalswithheterozygotegenotype
161
bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele
bull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)dataset
bull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)dataset
bull LOW_CRSNPcallratefromlowfertility(maleandfemale)dataset
bull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallele
bull LOW_Het Number of animals from low fertility (male and female) dataset with
heterozygotegenotype
bull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallele
bull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and
female)dataset
bull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)dataset
bull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)dataset
bull HIGH_CRSNPcallratefromhighfertility(maleandfemale)dataset
bull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallele
bull HIGH_Het Number of animals from high fertility (male and female) dataset with
heterozygotegenotype
bull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallele
bull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and
female)dataset
bull Male_TREND inheritance models implemented in PLINK trend test on male fertility
dataset
bull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale
fertilitydataset
bull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male
fertilitydataset
bull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility
dataset
bull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on
femalefertilitydataset
bull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale
fertilitydataset
bull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male
andfemalefertilitydataset
bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test
onmaleandfemalefertilitydataset
162
bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston
maleandfemalefertilitydataset
bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset
bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset
bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset
Supplementary)Table) S5) Regions) candidate) to) carry) deleterious) variants)
identified)by)OHW)approach))
Supplementary)Table) S6) Regions) candidate) to) carry) deleterious) variants)
identify)by)LOH)approach))
Ingreyregionsremovedinfurtheranalysis
Supplementary)Table)S7)List)of)candidate)deleterious)variants)mapping)in)
OHW)and)LOH)regions))Header(code
bull Bfreq_totFrequencyofBalleleinentiredataset
bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset
bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset
bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset
bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset
bull CallRate_totSNPcallratendashentiredataset
bull CallRate_MHSNPcallratendashhighmalefertilitydataset
bull CallRate_MLSNPcallratendashlowmalefertilitydataset
bull CallRate_FHSNPcallratendashhighfemalefertilitydataset
bull CallRate_FLSNPcallratendashhighmalefertilitydataset
bull HWeqHardyWeinbergequilibriumpvalue
bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele
bull Geno_HetNumberofanimalswithheterozygotegenotype
bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele
bull HW_insideDeleteriousmutationsinsideldquoOutofHardyWeinbergrdquoregion
bull HW_outsideDeleteriousmutationsoutsideldquoOutofHardyWeinbergrdquoregion
bull DH_insideDeleteriousmutationsinsideldquoLackofHomozygosityrdquoregion
bull DH_outsideDeleteriousmutationsoutsideldquoLackofHomozygosityrdquoregion
163
bull GeneClass Classifications of gene 1 deleterious mutations inside OHW and LOHregion2deleteriousmutationsinsideOHWorLOHregion
bull 2deleteriousmutationsoutsideOHWandorLOHregionbull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)datasetbull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)datasetbull LOW_CRSNPcallratefromlowfertility(maleandfemale)datasetbull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallelebull LOW_Het Number of animals from low fertility (male and female) dataset with
heterozygotegenotypebull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallelebull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and
female)datasetbull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)datasetbull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)datasetbull HIGH_CRSNPcallratefromhighfertility(maleandfemale)datasetbull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallelebull HIGH_Het Number of animals from high fertility (male and female) dataset with
heterozygotegenotypebull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallelebull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and
female)datasetbull Male_TREND inheritance models implemented in PLINK trend test on male fertility
datasetbull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale
fertilitydatasetbull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male
fertilitydatasetbull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility
datasetbull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on
femalefertilitydatasetbull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale
fertilitydatasetbull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male
andfemalefertilitydataset
164
bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test
onmaleandfemalefertilitydataset
bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston
maleandfemalefertilitydataset
bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset
bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset
bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset
Supplementary)Table)S8)Results)of)haplotype)analysis)))Header(code
bull Chromchromosome
bull PosphysicalpositiononUMD31genome
bull Exome_homonumberofhomozygotefordeleteriousmutationsequencedanimals
bull Exome_hetnumberofheterozygotesequencedanimalsfordeleteriousmutation
bull Outcome NotFound It was impossible to associated a haplotype to the mutation
NotOpt Non completely invariant haplotype could be associated to themutationOK
Putativehaplotype associate tomutationwas found In few case alternativehaplotype
wasreported
bull LenlengthofhaplotypeinnumberofSNPs
bull Pop_numnumbersofselectedhaplotypedetectedinHolsteinpopulation
bull Pop_homo numbers of homozygote animals in Holstein population for selected
haplotype
bull Pop_het numbers of heterozygotes animals in Holstein population for selected
haplotype
bull ExpExpectednumberofhomozygoteanimal
bull Alternative if only one heterozygote sequenced animals for deleteriousmutationwas
foundalltwopossiblehaplotypeswheretested
Supplementary)Table)S9)Descriptive)results)for)the)Biological)Process)(BP)))
NumberofgenesbelongingtoBiologicalProcessGOterms)
Supplementary) Table) S10) Descriptive) results) for) the)Molecular) Function)
(MF))
NumberofgenesbelongingtoMolecularFunctionGOterms
165
CHAPTER7
Conclusion
Since the domestication event occurred in the Fertile Crescent around 10000yearsagothebovinespecieswasselectedbyhumankindtofulfilhisneedsThecreationofbreedsoccurredaround200yearsagointensifiedthedifferentiationbetween populations and promoted the development of intensive selectionschemesTraditionalAandnowadaysgenomicAselectionsareproducingpositivegenetictrendsinmanyproductivetraitsdespitethestillpoorknowledgeaboutphenotype biology A deeper understanding of genome organization and genebiology would increase the accuracy of genomic evaluation by incorporatingpriorknowledgeThisthesisexploresthebovinegenomebyhighAthroughputtechnologiesAsuchasNextGenerationSequencingandHighADensityGenotypingAwithestablishedand innovative procedures to investigate the biology of complex traits and toprovidenewdataandmoleculartoolstobovinebreedingThese goals have been achieved by complementary approaches and differentmethodsInChapter2selectionsignatureswereinvestigatedindependentlyintofive Italian breeds using Illumina BovineSNP50 BeadChip Then amultiAbreedapproach was applied clustering breeds into dairy (Italian Holstein ItalianBrownand Italian Simmental) andbeef (Marchigiana Piedmontese and ItalianSimmental) groups Regions under recent positive selection shared by breedswithasameproductiveaptitudewereinvestigatedinfurtherdetailbyretrievinggenes in the region and analysing their function by ontology and pathwayanalysis In the dairy group selection signals appear nearby genes related tomammarygland(metabolismandresistancetomastitis)andfeedingadaptationIn thebeefgroupgenes inselectedregionsare involved inanimalgrowthandmeatquality(textureandjuiciness)Hencecandidategenesidentifiedbelongtonetworks and pathways important in directing a breed towards either beef ordairyproductionIn Chapter 3 a GWAS approach was used in dairy cattle to correlate SNP tocomplex phenotypes having an economic value Three independent ldquoclassicalrdquosingleAmarkerregressionswereruninthreeItaliandairycattle(ItalianHolsteinItalianBrownandItalianSimmental)UsingthetraditionalsingleSNPapproach
168
only few SNPs were above the nominal genomeAwide significance thresholdwhile applying a geneAbased method identified significant candidate genesassociatedwithmilkphysiology andmammary glanddevelopment in all threepopulationsInterestinglygenesfoundindifferentbreedshavesimilarfunctionbut are not the same suggesting that breed history demography selectioncriteriaandstochasticeventsasgeneticdrifthaveanimpactonthedefinitionofthemainmoleculartargetofselectionschemesCurrent intensive selection strategies rapidly spread the genetic gain in thepopulationbutholdtheassociatedriskofdisseminatingdefectivegenescausinggenetic defects (eg mulefoot complex vertebral malformation and others) ordecreasing fertility In Chapter 6 different technologies and approaches werecombined to obtain filter and prioritize genes candidate to be deleteriousExome sequence datawere exploited to identify deleterious variants affectingItalianHolsteincattleinasetof20animalsextremeforfertilitytraitsMorethan5000 candidate deleterious SNP variants were identified To bypass the smallnumberlimitationandthelackofpowerofanystatisticanalysison20animalsastepwise approachwas used to first filter and then prioritize these candidatedeleterious variants Polymorphisms were retained if having asymmetricdistribution in bulls from opposite tails of fertility EBV distributions andormapping in genomic regions outlier for lack of homozygosity in a 1009 bullpopulation genotyped with the Illumina BovineHD BeadChip A total of 191variants in163genespassed theprocessesThesewereassessedby searchingadditional evidences in favour or against their deleterious effect throughdifferentapproachesi)checkingthevariantconservationacross100vertebratespecies by comparative genomics ii) finding haplotypes associated withcandidatemutationsandevaluatingtheirfrequencyinHolsteinsiii)integratinghumanandmouse functionalannotationsA scorewasproposed toweight theresult of these assessments and rank the variants 35 of them had positivescores These occurred in genes involved in basic biological mechanisms asfertilitydevelopmentand immunityandresultedenriched ingenesassociatedtogeneticdefectsreportedinhumanThesemutationsarestrongcandidatestobe recessive deleterious variants worth to be further investigated in a larger
169
populationtobetterunderstandtheirbiologyimpactandpossibleapplicationinbreedingTo increase the reliability of results based on genomic data and to reduce theimpact of subjective choices in data analysis new toolsapproaches weredeveloped(iethemultiAbreedapproachinSelectionSignaturendashChapter2thepostAGWAS geneAbased approach ndash Chapters 3 and 4) and the existent onescriticallyevaluated(imputationmethodsndashChapter5)In Chapter 4 the postAGWAS geneAbased method used in Chapter 3 wasimplemented in the MUGBAS (MUlti species GeneABased Association Suite)software In contrast to existing software MUGBAS is species and annotationfree AmultipleAtesting correctionwas included in the package in addition tocomputing parallelization for faster processing and plot representation forbetterunderstandingThesoftwareusessingleAmarkerGWASdataandagivenannotation to estimate geneAregionAwise association pAvalues As previouslymentioned in Chapter 3 the ldquoclassicalrdquo GWAS approach (single SNP) foundsignals only in ItalianHolsteinwhileMUGBAS geneAbased approach identifiedsignificantsignalsinallbreedsinvestigatedIn Chapter 5 the impact of different bovine reference sequences on genotypeimputation was investigated In this work Illumina BovineSNP50 BeadChipgenotypesof ItalianSimmentalbullswerereAmappedonthethreemostrecentreferencegenomeassembliesWecomparedfourdifferent imputationmethodsfromlow(IlluminaBovineLDv10Beadchip)tohighdensityThetestpopulationwasdividedintofoursubpopulationsonthebasisofrelationshipandgenotypedataavailabilityUpdatingSNPcoordinatesonthe three testedcattlereferencegenome assemblies determined only a slight variation on imputation resultswithinmethodsOntheotherhandlargedifferencesintermsofaccuracywereobservedamongthefourimputationmethodstestedGenetic structure of industrial breeds was evaluated assessing LinkageDisequilibrium (LD) Since LD decay is correlated to intensity of selection weexpected different behaviours in the breeds analysed Indeed as evident inChapter 2 ndash Figure 7 breeds subjected to lower selection intensity (ie
170
MarchigianaPiedmonteseandItalianSimmental)displayalowerpersistenceof
LDatgreaterdistancescomparedtothosesubjectedtohigherselectionpressure
(ieItalianHolsteinandItalianBrown)Chapter3investigatesLDintheDGAT1
region of dairy cattle breeds In Italian Simmental LD is lower compared to
Italian Holstein and Italian Brown In spite of this significant association
betweentheDGAT1regionandmilktraitswasfoundbothinItalianSimmental
andinItalianHolsteinInItalianBrownnoassociationwasfoundbecauseDGAT1
isfixed
Data and results presented in this thesis represent a clear example of the
progress of molecular technologies occurred in last few years spanning from
mediumdensitygenotypingtonextgenerationsequencing
Thebiologylandscapesofsomecomplextraits(iemilkproductionandquality
fertility)were improved finding candidate genes andpathwaysnotpreviously
associated to the phenotype in the bovine species Moreover insights on
deleterious mutations were inferred through a deep and integrated analysis
usingNGSdata
This new knowledge could be used to improve cattle breeding For example
deleteriousrecessivevariantscouldbeincludedinbullevaluationsbybreedersrsquo
associations to reduce the genetic load of deleterious genes in parallel to
improvingproductivity
While technological progress has been impressivewe are still only scratching
thesurfaceofanimalbiologyNewfrontiersarebeingopenedbywholegenome
sequencing of many animals (eg in the 1000 bull genome project) and by
stepping beyond sequencing (a number of epigenomic projects in animals are
following the footprints of the human ENCODE project) The trend is towards
unravelling complexity and data production ability is to be flanked by data
storingandprocessingandaboveallbynewmethodsandtoolsfordataanalysis
171
ACKNOWLEDGMENTS
ldquoNon egrave la destinazione ma il viaggio che contardquo Questi tre anni sono stati un gran bel viaggio di crescita personale e ricchi diesperienze Non egrave stato un viaggio solitarioma condiviso con vecchie e nuoveconoscenzechemisembradoverosomenzionareInnanzitutto vorrei ringraziare il prof Paolo AjmoneMarsan chemi ha dato lapossibilitagrave di fare questa esperienza Grazie per gli insegnamenti e per leopportunitagravechemihaoffertoSperodinonavertraditolesueaspettativeAgli attuali colleghi Lorenzo Stefano Licia Elisa Riccardo ed Elia e a quellipassati Nicola Fatima Raffaele Ezequiel e Barbara va il mio piugrave sincero edaffettuoso ringraziamento per aver condiviso anche solo in parte questomiopercorso Grazie per avermi sopportato (e non egrave poco) guidato (non solo nelcampolavorativo)efattovivereinunmodospecialelrsquoambientedilavoroUnsincerograzievaatuttiimieicolleghidelXXVIIcicloAgrisystemRicorderogravesempretuttiimomentichehopotutocondividereconvoi UngrazieancheachihafattopartedellamiaesperienzasvedeseGraziedicuoreaCarlFlavioChristianEricaatuttiiragazziitalianiatuttiicolleghildquosvedesirdquoea tutte le altre persone che ho conosciuto Mi avete fatto vivere appienolrsquoesperienzaesteraefattotornareacasaconqualcheconsapevolezzainpiugrave UnsentitoringraziamentoatuttiglialtricolleghiconcuihocollaboratoinquestianniRingrazio tutti gli amici e conoscenti che mi sono stati vicino per avermisupportatoesopportatoognunoamodoproprio Ultimomanonperimportanza ilringraziamentochevaatuttalamiafamigliaper avermi aiutato spronato sostenuto sempre e in particolare in questaesperienzaSemplicementegrazie PS Adirlatuttaancheladestinazionenonegravepoicosigravemale
173
- INDEX
- CHAPTER 1 - Introduction
- CHAPTER 2 - Relative extended haplotype homozygosity (rEHH) signals across breeds reveal dairy and beef specific signatures of selection
- CHAPTER 3 - Searching new signals for productive traits in three Italian cattle breeds
- CHAPTER 4 - MUGBAS a species free gene based ased association suite
- CHAPTER 5 - Imputation accuracy is robust to cattle reference genome updates
- CHAPTER 6 - Quest for deleterious recessive variants in Italian Holstein bulls combining exome sequencing and high density SNP analysis
- CHAPTER 7 - Conclusion
- ACKNOWLEDGMENTS
-
CHAPTER1
Introduction
Background
The bovine species counts almost 15 billion of heads classified in some 800
breeds worldwide (httpfaostat3faoorg) The species is adapted to a wide
varietyofdifferentenvironmentandhusbandrysystemsandisoftenrearedfor
multiple purposes (Taberlet et al 2008) draft power milk meat leather
productionandotheruses(Elsiketal2009)
European cattle (Bos$ taurus) derive from a single domestication event of the
auroch (Bos$primigenius)occurred in theFertileCrescent around10000years
ago(Taberletetal2011)FollowingdomesticationcattlefollowedtheNeolithic
expansionofagricultureandcolonisedEurope(seeAjmoneMMarsanetal2010
for a review) Many thousand years of natural and artificial selection have
shaped cattle phenotype and genotype to meet environmental conditions and
human needs and together with demographic events and genetic drift have
started thedifferentiation of the bovine species in diverse and locally adapted
populations This differentiation process accelerated in the last two centuries
whenthebreedconceptwasinventedsothatnowthisspeciescountshundreds
of different breeds with high diversity in coat colour body size production
aptitudeadaptationtoclimate tolerancetodiseaseetc(Taberletetal2008
Elsiketal2009TheBovineHapMapConsortium2009)
Thisdiversityisnowbeingprogressivelylostwiththeextinctionofmanylocal
populationsmainlyforeconomicreasonsSelectioneffortshavethereforebeen
focussedonafewoutperformingindustrialbreedsthatarepresentlyselectedfor
specialiseddairyorbeefproductionandsometimesdoublepurpose
In these breeds breeding schemes are well developed and have been
traditionally based on the genetic evaluation of sires and dams using data
collectedfromperformanceandprogenytestingandunbiasedlinearprediction
modelsThesearefoundedontheFisherinfinitesimalmodellingofquantitative
trait variation and need no genomic information to estimate breeding values
(EBVs) The genetic progress recorded using this approach has been relevant
particularly for traits having medium to high heritability In Italian Holstein
genetic trends for themain production traits (milk protein and fat yield) are
positiveHoweveraccurateestimateofbreedingvalueshasahighcostinterms
4
of investment and time so that 5 to 6 years are needed to have a dairy bullprogenytested(wwwanafiit)Theadventofgenomicselection(Meuwissenetal2001)markedamilestoneindairycattlebreedingTheideawasdevelopedbeforethemoleculartoolswereinplaceforimplementitinpracticeandhasbecomearealityonlyrecentlyInthisapproach progeny tested bulls are genotyped at many thousand loci and thevalue of their EBVs used to estimate the value of each allele at each locusgenotyped These values are then used to predict the genetic value of younganimals on the basis of their genotype As a result animals from the newgeneration receive an EBV with the same accuracy of estimates obtained byprogenytestingbutmuchearlierThislowerthegenerationintervalandgreatlyincreasestherateofgeneticprogress indairypopulationsThe initial ideawasso good that nowadays genomic selection is widely applied in industrialcountriestothemajordairycattlebreedsWiththedecreaseofgenotypingcostit is likely that its use will be soon expanded to animals selected for otherpurposes (eg beef cattle) and other species (eg pigs) However genomicselection is an extension of traditional selection It is extremely useful inbreedingforcomplextraitsbutgivesnobiologicalinformationonthegenesthatcontrolthemIfthetraditionalselectionviewsasiregenomeasabigblackboxhaving a value but no clue why genomic selection views the same genomedisassembled inmany small boxes but still black and stillwithno clueon thereasonoftheirvalueIt is reasonable to think that a deeper understanding of the biologicalmechanisms controlling traits may further increase the accuracy of genomicevaluationsIndeedthenewtechnologiesoffernewopportunitiesinthissensebyloweringcostsandincreasingprecisionofdataproductionThis thesis is exploring the use of these technologies for retrieving biologicalinformationfromgenomicdataItusesmethodsnowconsideredldquotraditionalrdquoingenomics (GWAS and analysis of selection sweeps) and less traditional ones(gene centered GWAS and exome analysis) All data used here have beenproducedwithin twonationalprojectson farmanimalgenomics fundedby theItalianMinistryofAgriculture
5
M SELMOL ldquoRicerca e innovazione nelle attivitagrave di miglioramento genetico
animalemediante tecniche di geneticamolecolare per la competitivitagrave del
sistemazootecniconazionaleIrdquo(ResearchandInnovationinanimalbreeding
sheepgoatbuffalohorseanddonkeyI)
M INNOVAGENldquoRicercaeinnovazionenelleattivitagravedimiglioramentogenetico
animalemediante tecniche di geneticamolecolare per la competitivitagrave del
sistema zootecnico nazionale IIrdquo (Research and Innovation in animal
breedingsheepgoatbuffalohorseanddonkeyII)
In these projects the research group I work with M Istituto di Zootecnica at
Universitagrave Cattolica del Sacro Cuore di Piacenza M coordinated all the research
activitiesconcerningtheapplicationofgenomicstechniquesindairycattle
Contentsofthethesis
The thesis is structured into a short introduction (this first chapter) 5 core
chapters(Chapter2to6)andashortConclusion(Chapter7)Withtheexception
of Chapter 6 still in preparation the other core chapters containmanuscripts
that at time of writing are under review for their publication in international
peerMreviewedjournalsormanuscript
InChapter2selectionsignatureindairyandbeefcattleareinvestigatedwiththe
Illumina BovineSNP50 Genotyping BeadChip Five Italian cattle breeds (Italian
Holstein Italian Brown Italian Simmental Marchigiana and Piedmontese) are
analyzedwiththerEHHmethod(Sabetietal2002)tofindsignaturesofrecent
selectionCandidategenesinthevicinityofsignaturessharedbyeitherdairyor
beefbreeds are identified and theirpossible role indairy andbeefproduction
discussed
In Chapter 3 a GWAS analysis is run to investigate production traits in three
Italian dairy cattle breeds (Italian Holstein Italian Brown Italian Simmental)
Data are analysed with a well established method (Amin et al 2007)
6
implemented in theGenABELRpackage (Aulchenkoet al 2007) andwith the
newgeneMbasedmethodAssociationisinvestigatedbetween50KSNPsandmilk
production(milkyield)andqualitytraits(percentageofmilkproteinandfat)
InChapter4thegeneMbasedmethodusedinthepreviouschapter3inspiredby
theVEGASapproach(Liuetal2010)isdescribedThesoftwarewasdeveloped
andreMcodedinourlaboratoryinordertopermittheanalysisofanyspeciesin
additiontohumanMultiMtestingcorrectionandplotrepresentationcompletethe
toolsmadeavailable
InChapter5wetesttheimpactoftheuseofdifferentreferencesequencesinthe
imputation step Italian Simmental data are used to test variation in genomeM
wide inputation accuracy using different cattle reference sequences (BTAU42
UMD31 and BTAU46) To increase the analyses reliability four different
inputationsoftwaresaretested
InChapter6Holsteindataareanalysedcombiningexomesequencesand800K
SNPs data to seek deleterious recessive variantsWhile in natural populations
the destiny of these variants is to disappear in livestock breeds they can be
maintained and disseminated if carried by a highly productive sire The
intersection of deleterious mutations identified by sequencing and lack of
homozygousregionsidentifiedbytheanalysisofthepopulationwiththeHDSNP
chipidentifiedgenescandidatetocarryrecessivedeleteriousvariants
Objective
Themainobjectiveofthethesiswastoinvestigatethebovinegenomewiththe
aid new highMthroughput technology and to explore new strategies for linking
biology (genes and their function) to traits This linkage once validated may
increase the accuracy of genomic prediction and find application in selection
schemesegineradicatingthemostdeleteriousvariantsandorintheplanning
ofmatings
7
ReferencesAjmoneMMarsanPGarciaJFLenstraJA2010OntheoriginofcattleHow
aurochsbecamecattleandcolonizedtheworldEvolAnthropol19148ndash157doi101002evan20267
AminNvanDuijnCMAulchenkoYS2007AGenomicBackgroundBasedMethodforAssociationAnalysisinRelatedIndividualsPLoSONE2e1274doi101371journalpone0001274
AulchenkoYSKoningDMJdeHaleyC2007GenomewideRapidAssociationUsingMixedModelandRegressionAFastandSimpleMethodForGenomewidePedigreeMBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614
ElsikCGTellamRLWorleyKC2009TheGenomeSequenceofTaurineCattleAWindowtoRuminantBiologyandEvolutionScience324522ndash528doi101126science1169588
LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKMHaywardNKMontgomeryGWVisscherPMMartinNGMacgregorS2010AVersatileGeneMBasedTestforGenomeMwideAssociationStudiesTheAmericanJournalofHumanGenetics87139ndash145doi101016jajhg201006009
MeuwissenTHHayesBJGoddardME2001PredictionoftotalgeneticvalueusinggenomeMwidedensemarkermapsGenetics1571819ndash1829
SabetiPCReichDEHigginsJMLevineHZPRichterDJSchaffnerSFGabrielSBPlatkoJVPattersonNJMcDonaldGJAckermanHCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash837doi101038nature01140
TaberletPCoissacEPansuJPompanonF2011ConservationgeneticsofcattlesheepandgoatsCRBiol334247ndash254doi101016jcrvi201012007
8
TaberletPValentiniARezaeiHRNaderiSPompanonFNegriniR
AjmoneMMarsanP2008Arecattlesheepandgoatsendangered
speciesMolEcol17275ndash284doi101111j1365M294X200703475x
TheBovineHapMapConsortium2009GenomeMWideSurveyofSNPVariation
UncoverstheGeneticStructureofCattleBreedsScience324528ndash532
doi101126science1167936
9
CHAPTER2
Relative(extendedamphaplotype(homozygosity)(rEHH)ampsignalsampacrossampbreedsamprevealampdairyampand$beef$specificampsignaturesampofampselection
LBomba1ELNicolazzi2MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly2FondazioneParcoTecnologicoPadanoViaEinsteinLocCascinaCodazza26900LodiItalyTheseauthorscontributedequallytothiswork
____________________SubmittedtoGeneticSelectionEvolution
Abstract
Background
Nowadays many methods have been implemented to scan for signatures of
selection at genomescale level within and across breeds Extended haplotype
homozygosity is a proficient approach to detect genome regions under recent
selective pressure within breeds The objective of this study is to identify
common regions under positive recent selection shared by most represented
dairyandbeefItaliancattlebreeds
Results
A total of 3220 animals from Italian Holstein (2179) Italian Brown (775)
Simmental (493) Marchigiana (485) and Piedmontese (379) breeds were
genotyped with the Illumina BovineSNP50 BeadChip v1 in the frame of the
ItalianlivestockgenomicprojectsldquoSelMolrdquoandldquoProzoordquoAfterstandardcleaning
proceduresgenotypeswerephasedandcorehaplotypesidentifiedThedecayof
Linkage Disequilibrium (LD) for each core haplotype was assessedmeasuring
the extended haplotype homozygosity (EHH) To correct for the lack of good
estimatesof local recombinationrates therelativeEHH(rEHH)wascalculated
for each corehaplotypeGenomic regionsunder recentpositive selectionwere
identifiedasthosehighlyfrequentcorehaplotypeswithsignificantrEHHvalues
Significant independentcorehaplotypeswerealignedacrossbreeds to identify
signalsofrecentselectionsharedbydairyandsharedbybeefbreedsOverall82
and 87 common regions under selectionwere detected among dairy and beef
cattlebreedsrespectivelyBioinformaticsanalysisidentified244and232genes
mapping in these common genomic regions Pathway and network analysis of
significant genes revealed molecular functions biologically related to signal of
selectionsdetectedinmilkormeatproductiontypes
Conclusions
Resultssuggest that themultibreedapproach isamethodto identifyselection
signatures in cattle breeds under selection for the same production goal
12
allowing to better pinpoint the genomic regions of interest in dairy and beef
production
Background
The advent of genomic technologies and the consequent availability of single
nucleotidepolymorphism(SNP)datahaveenabledthestudyofselectionevents
on the cattle genome (Hayes et al 2008 Qanbari et al 2010) Such selection
signals arise from environmental or anthropogenic pressures and help to
understand the causes that led to breed formation These studies are usually
addressedinaldquotopdownrdquoapproach(ShahzadandLoor2012)fromgenotypeto
phenotypeinwhichgenomicdataisstatisticallyevaluatedtodetectdirectional
selectionThephenotypeneededtorunselectionsignaturesisintendedinavery
generalsenseabreedaproductionaptitudeorevenageographicareaoforigin
Thisapproachholdsthepotentialto investigatetraitsasadaptationtoextreme
climatesanddifferentfeedingandhusbandrysystemsresiliencetodiseasesetc
thatareveryexpensivedifficultandsometimesimpossibletostudywithclassic
GWAS (GenomeWide Association Study) approaches Therefore it is expected
thatgenomic informationretrievedinthesestudiesonlypartiallyoverlapsthat
identified by GWAS on specific traits Searching for these ldquosignatures of
selectionrdquo is of interest for understanding molecular mechanisms underlying
important biological processes and providing new insights to functional
biologicalinformation(egdiseaseandorimportanteconomictraits(Nielsenet
al 2007)) To date many methods have been implemented to scan these
selectionsignaturesatthegenomiclevelMethodsdifferwithrespecttoseveral
properties (Lenstra et al 2012) there are methods that perform analyses
withinbreedoracrossbreedsthatcomparealleleorhaplotypefrequenciesand
size that detect recent or ancient selection events either still segregating or
fixatedintopopulations(Lenstraetal2012OrsquoBrienetal2014Utsunomiyaet
al2013)Differentmethodshavedifferentsensitivityandrobustnesseg they
maybeinfluencedatadifferentextentbymarkerascertainmentbiasanduneven
distributionofrecombinationhotspotsalongthegenome
13
Extended haplotype homozygosity (EHH) is a method that has been used in
many animal species including cattle (Pan et al 2013 Qanbari et al 2010
Zhangetal2006)TheEHHmethodwasfirstdevelopedbySabetietal(2002)
for human population analysis The main objective of this methodology is to
identifylongrangehaplotypesinheritedwithoutrecombinationeventsMorein
detailunderaneutralevolutionmodelchangesinallelefrequenciesaredriven
onlybygeneticdriftInthisscenarioanewvariantwouldrequirealongtimeto
reach high frequency in the population and the Linkage Disequilibrium (LD)
surroundingitwoulddecayduetorecombinationevents(Kimura1984)Inthe
caseofpositiveselectionaquickriseinfrequencyofabeneficialmutationina
relatively short time will preserve the original haplotype structure as the
number of recombination events would be limited Therefore a signature of
positiveselectionisdefinedasaregionwithanalleleinlongrangeLDlocatedin
anuncommonlyhighfrequencyhaplotype
One of themost interesting features of thismethod is that it is able to detect
genomic regions that are under recent selection These recent selection
signatures can be therefore used to identify the genomic regions involved in
breedformationInadditionEHHmethoddoesnotrequireanydefinitionofan
ancestralallele(asneededintheintegratedhaplotypescoreiHS(Voightetal
2006) and is suited to be applied to SNP data because it is less sensitive to
ascertainmentbiasthanothermethods(Nielsenetal2007)HowevertheEHH
methodgeneratesa largenumberof falsepositiveandnegative resultsdue to
heterogeneous recombination rates along the genome (Qanbari et al 2010)
Moreover another drawback not specific of EHH but shared by all selection
signaturemethod is the lack of robust inferences able to discern true signals
fromthosegeneratedbychance(Kemperetal2014)
To partially account for these limitations Sabeti et al (2002) developed the
relative extended haplotype homozygosity (rEHH) including an empirical
methodologytoobtainsignificantsignalsTherEHHofaputativecorehaplotype
compares its original EHH valuewith that of other haplotypes at that specific
locus using all other haplotypes as control for local variation in the
recombination rate Therefore it identifies genomic regions carrying variants
14
under selection that are still segregating in the population analysed Although
thesemethodswere conceived for human populations theywere successfully
appliedto livestockspeciessuchaspig(Maetal2012)andcattle(Qanbariet
al2010)
After domestication about 10000 years ago in the fertile crescent taurine
cattle have colonized Europe and Africa and have been selected for different
human needs (Taberlet et al 2011) Particularly in the last centuries the
anthropic pressure led to the formation of hundreds of specialized breeds
adapted to different environmental conditions and linked to local tradition
constitutingagenepooldeservingattentionforconservation(AjmoneMarsanet
al2010)Someofthesebreedshavebeenselectedspecificallyfordairybeefor
bothproductiontypesfollowingstrongartificialselectionforthesetraits(Hayes
etal2009)Thepresentstudyaimsat identifyingsignalsof recentdirectional
selectionusingrEHHmethodindairyandbeefproductiontypesusingdatafrom
five Italian dairy beef and dual purpose cattle breeds We focused on the
significant core haplotypes scanned by rEHH that are shared among breeds of
thesameproductiontypeThenweusedabioinformaticsapproachto identify
positionalcandidategeneslyingwithinthegenomicregionsunderselectionand
investigatetheirbiologicalrole
Methods
SampleAnimalsandGenotyping
A total of 4311 bulls of five Italian dairy beef anddual purpose breedswere
genotyped with the Illumina BovineSNP50 BeadChip v1 (Illumina San Diego
CA)joiningthegenotypingeffortsoftwoItalianprojects(namelyldquoSelMolrdquoand
ldquoProzoordquo)Thedatasetincluded101replicatesand773siresonspairsusedfor
downstreamqualitycheckofdataproducedIndetailgenotypesof2954dairy
(2179 ItalianHolsteinand775 ItalianBrown)864beef (485Marchigianaand
379 Piedmontese) and 493 dual purpose (Italian Simmental) bulls were
availableDataqualitycontrol(QC)wasperformedintwostagesafirststageon
15
animals independently for each breed applying the same methods and
thresholdsandasecondstageappliedonmarkersacrossall theindividualsof
thedatasetThefirststageexcludedindividualswithunexpectedlyhigh(gt02)
Mendelian errors for fatherson couples andwith low call rates (lt95)The
secondstageexcludedi)SNPswithle25missingvaluesinthewholedataset
orcompletelymissing inonebreed ii)SNPswithminorallele frequencyle5
and iii) SNPs in sexual chromosomes and with unassigned chromosome or
physicalposition
EstimationofrEHH
HaplotypeswereobtainedusingfastPHASEsoftwareconsideringdefaultoptions
(ScheetandStephens2006)andwererunbreedwiseandchromosomewisein
eachbreedPedigreeinformationforallbullswereprovidedbytheirrespective
breederassociations andwereused to filteroutdirect relatives (in fatherson
pairsthesonwasmaintainedinthedatasetandthefatherdiscarded)andover
representedfamilies(amaximumof5randomlychosenindividualsperhalfsib
familywasallowed)Thefinaldatasetcontainingtheseldquolessrelatedrdquoanimalswill
behenceforthcalled ldquononredundantrdquodatasetThisnonredundantdatasetwas
used also to calculate within breed pairwise wholegenome Linkage
Disequilibrium(LD)Ther2statisticforallpairsofmarkerswasobtainedusing
PLINK software v107 (Purcell et al 2007)Thedecayof LDwas assessedby
averagingr2valuesupto1Mbdistance
To test if the population structure influenced the detection of rEHH we
replicated all analyses on the whole Italian Holstein dataset without any
population correction (ie ldquoredundantrdquo dataset) focusing on genes or gene
clustersthatarewellknowntobeunderrecentselectionincattle(ieldquocandidate
regionsrdquo) In particularwe focused on the casein polled gene cluster and coat
colourgenes(MC1RandKIT(Kemperetal2014Qanbarietal2010))
The EHH and rEHH calculations were performed using Sweep v11 software
(Sabetietal2002)Somedefaultprogramsettingshadtobemodifiedtoadapt
theseanalyses to thebovinegenomeas thesoftwarewasoriginallydeveloped
forhumangeneticanalysesSpecificallylocalrecombinationratesbetweenSNPs
16
wereapproximatedto1cMperMbEHHandrEHHcalculationswereperformed
breedwise and chromosomewise using an automatic core selection with
defaultoptions(ieconsideringlongestnonoverlappingcoresandlimitingcores
toatleast3andnomorethan20SNPs)asinQanbarietal(2010)AlthoughEHH
and rEHH values were obtained for all core haplotypes only those with
frequency ge 25 were retained for further analyses The (empirical) rEHH
significancethresholdwasobtainedbydividingtherEHHvaluesinto20binsof
5 frequency range each logtransforming withinbin values to achieve
normality Core haplotypes with pvalues lt 005 were considered significant
DetectionofcommonselectionsignatureswasobtainedcomparingrEHHvalues
acrossgenomicregionsacrossbreeds
Breedgroupingaccordingtoproductiontype
Toexamine thesignals common toeachof the twoproduction types (iedairy
andbeef) corehaplotypes thatsharedoneormoreSNP inat least twobreeds
from the same production type were selected The dual purpose Italian
Simmentalwas included inbothdairy (ItalianHolsteinand ItalianBrown)and
beef groups (Piedmontese and Marchigiana) since potentially it possesses
haplotypes selected for both production types All downstream analyses were
performedseparatelyforthedairyandthebeefproductiontypes
Detectionandannotationofcandidategenes
Genome annotation was performed using Biomart
(httpwwwbiomartorgindexhtml) a comprehensive source of gene
annotationformanyspeciesprovidedbyEnsembl(wwwensemblorg)Thelist
ofsharedhaplotypesregions(inbp)fordairyandbeefwasusedasinputfilein
theBiomartwebinterfaceThesetofgenesretrievedbyBiomartwasthenused
as input for theCanonicalpathwayandregulatorynetworkanalyses Ingenuity
PathwayAnalysis toolversion80(IPA IngenuityregSystems IncRedwoodCity
CA httpwwwingenuitycom) and a substantial examination of published
literaturewasusedtoexaminethefunctionalrelationshipsamongtheresulting
genes The IPA operates with a proprietary knowledge database providing
complementarypathwayanalysisforseveralspeciesincludingcattleIntheIPA
17
analyses the significance of each biological function identified was estimated
using Fisherrsquos exact test A Benjamini and Hochberg correction for multiple
testingwas calculated for function and canonical pathways For eachNetwork
IPAcomputedascore(pscore=log10(pvalue))fittingthesetstudiedgenesand
a list of biological functionspresent in the IPAknowledgedatabase The score
considered thenumberofgenes in thenetworkand thesizeof thenetwork to
estimatehowrelevantthisnetworkwasaccordingtothelistofgenesprovided
Thenthescorewasusedtorankthenetworks
Results
DatasetQC
Therepeatabilityobtainedfromthe101replicatespresentinthewholedataset
washigherthan998AfterthetwoQCstages105individualsand9730SNPs
were discarded Moreover after phasing 1292 individuals were removed to
reducethe largenumberofsibfamiliespresent intheredundantdatasetAfter
quality control procedures the final dataset had 44271 SNPs and 1132 514
393 410 and 364 individuals from Italian Holstein Italian Brown Italian
SimmentalMarchigianaandPiedmontesebullsrespectively(Table1)
18
Table1JNumberofanimalsgenotypedbeforeandafterQCTotalnumberofanimalgenotypedandrelativenumberofanimaldiscardedafterQCanalysis
Detectionofselectionsignatures
Sweepv11 softwaredetectedrEHHsignalswitha frequencyhigher than25overlapping the test candidate regions in both datasets corrected or not forpopulation structure In the redundant dataset we found only one significantrEHHsignal(caseincluster)whereasinthenonredundantdatasetweidentified5significantrEHHsignalsoverlappingallthetestcandidateregionsconsidered(caseinclusterpolledclusterMC1R)(Table2)
BreedTotal
Genotyped
ED15
misAN
ED1
REPL
ED1
MENDCleaned
HOL 2179 40 31 5 2093
BRW 775 6 16 4 749
SIM 493 6 6 2 479
MAR 485 37 38 410
PIE 379 5 10 364
19
Table2JComparisonofcandidateregionrEHHinnonJredundantandredundantdatasetrEHH overlapping candidate regions in both corrected (nonredundant) and not corrected
(redundant)datasetsforpopulation
Redundant NonRedundant
Candidate
geneBTA1
Closest
SNP(bp)CH2range CH2Frequency
rEHHJ
log(pval)CH2Frequency
rEHHJ
log(pval)
POLLED
Cluster1 1981154 18974181981154 H1079 093037 H1078 167174
MC1R 18 13657912 1331772014007505 H1069 120096 H1069 077167
KIT_BOVIN 6 72821175 7250492172821175 H1054 030030 H1054 033048
Casein
Cluster6 88427760 8835009888452835 H1047 094139 H1046 148133
Casein
Cluster6 88427760 8835009888452835 H2032 002003 H2033 002003
In total5526567847724388and4049corehaplotypeswitha frequency
higher than 25 were detected on Italian Holstein Italian Brown Italian
Simmental Marchigiana and Piedmontese breeds respectively A total of 838
866 740 692 and 613 core haplotypes were found significant outliers (see
Materials and Methods) in the aforementioned breeds Table 3 shows the
distribution of the total and significant core haplotypes per chromosome and
breed
20
Tableamp3amp(ampCoreamphaplotypesampdistributionampDistributionoftotalandsignificantcorehaplotypesperchromosomeandbreed
ampamp ItalianampampHolsteinamp
ItalianampampBrownamp
ItalianampSimmentalamp Marchigianaamp Piedmonteseamp
BTA1amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
1amp 389 66 389 68 335 49 310 51 288 46
2amp 312 47 314 52 267 44 257 42 238 33
3amp 292 48 313 62 264 46 236 34 227 34
4amp 268 46 279 50 245 41 237 39 229 39
5amp 223 32 231 29 200 30 183 27 171 22
6amp 291 51 307 40 265 45 252 38 233 42
7amp 249 37 253 46 214 39 195 33 204 27
8amp 268 39 270 38 235 35 223 32 209 31
9amp 227 37 228 41 210 37 165 22 187 30
10amp 219 36 234 34 192 25 196 26 168 25
11amp 248 39 246 34 209 30 205 33 187 25
12amp 162 24 176 18 148 21 148 25 135 22
13amp 205 32 200 19 155 18 178 35 135 19
14amp 175 17 189 25 168 29 143 22 129 19
15amp 180 31 185 29 148 23 139 13 130 19
16amp 176 28 192 29 160 25 149 19 127 15
17amp 170 27 183 29 168 31 135 22 123 16
18amp 133 18 153 19 119 14 108 10 82 12
19amp 137 22 130 23 118 16 99 18 92 15
20amp 184 35 172 33 145 23 124 23 124 17
21amp 145 23 147 30 113 18 110 21 94 13
22amp 140 24 145 19 115 21 112 18 104 13
23amp 106 13 100 16 67 10 64 9 57 8
24amp 129 11 140 20 124 17 97 23 101 16
25amp 91 12 98 11 68 10 77 12 70 6
26amp 125 7 114 14 93 12 96 15 90 16
27amp 91 12 91 15 75 11 84 12 60 10
28amp 88 9 96 10 70 10 66 9 55 11
29amp 103 15 103 13 82 10 72 9 67 12
TOTamp 5526 838 5678 866 4772 740 4388 692 4049 613
1BostaurusAutosomes2Detectedcorehaplotypeswithafrequencyinthebreedhigherthan253Significantcorehaplotypesatplt005
Comparisonamptoamppreviouslyampreportedampdataamp
AnumberofstudieshavesearchedselectionsweepsinHolstein(Qanbarietal
2010) Brown (Qanbari et al 2011) and Simmental (Fan et al 2014) breeds
Since different methods are expected to identify signatures having different
characteristicsthecomparisonwithpreviousstudiesislimitedtothosestudies
using thesamemethodand thesamebreed(s)analysed inourstudyOnlyone
study analysed (German) HolsteinRFriesian cattle with rEHH (Qanbari et al
2010)Qanbarietal(2010)reportedcandidategenesandgeneclustersunder
recent positive selection that we compared with our rEHH in dairy breeds
dataset(Table4)
Table4JCorehaplotypesincandidategenesfollowingQanbarietal2011
1BosTaurusChromosome2Corehaplotype
Thenumberofcorehaplotypesfoundinourstudyinthecandidateregionswas
lowerpreviouslyreportedWhenconsideringHolsteinsthetwomostsignificant
candidateregionsinbothstudiesmatch(caseinclusterandthesomatostatinSST
gene)althoughitisimpossibletodetermineifthehaplotypeunderselectionis
the same as this information was not provided in Qanbari et al (2010)
However other genes considered significant in Qanbari et al (with pvalues
ranging from 004 to 010) were not significant in our studyWhen using the
same loose significant threshold (pvalues le 010) ldquosignificantldquo signals were
foundalsoinItalianBrownfortheCaseinCluster(log10pvalue=109)andin
ItalianSimmentalfortheSSTgene(log10pvalue=126)
HOLSTEIN BROWN SIMMENTAL
Candidate
geneBTA1
Closest
SNP
(bp)
CH2rangeCH2
Frequency
rEHH
Hlog(p)CH2range
CH2
Frequency
rEHH
Hlog(p)CH2range
CH2
Frequency
rEHH
Hlog(p)
DGAT1 14 444963236653443936
H1057 011443936763332
H1042 0130007
Casein
Cluster6 88391612
8835009888452835
H1046 1481338832601288452835
H2068 0801098835009888452835
H1044 037022
8835009888452835
H2033 018030
8835009888452835
H3034 022045
GH 19 49652377 GHR 20 33908597
SST 1 813769568128358581376961
H1036 200189 8131845181376961
H1042 126053
8128358581376961
H2029 00630084 8131845181376961
H2042 006027
IGFH1 5 71169823
ABCG2 6 37374911 3731702038256889
H1044 031027
3731702038256889
H1040 029031
Leptin 4 95715500
LPR 3 855692038549710885594551
H1047 0910728549710885794693
H1068 0800638549710885794693
H1063 002005
8549710885594551
H2041 027025
PITH1 1 35756434
22
Signaturessharedbydairyandbybeefbreeds
Significantcorehaplotypeswerealignedacrossbreedsto identifythoseshared
among dairy or beef production types Since breeds can be considered
independentsetsofobservationssharedsignaturesaremorelikelytorepresent
real effects rather than false positives A total of 123 and 142 significant core
haplotypesweresharedbetweenatleasttwodairyorbeefbreedsrespectively
(File S1) Only 82 and 87 of those significant core haplotypes contained genes
considered positional candidates under positive selection In total 22 and
17 of the genomewas under selection for dairy and beef production types
respectivelyTherEHHaveragelengthswere216932and190994bpfordairy
andbeefrespectively
Genesetannotationandnetworkpathwayanalysis
Atotalof244and232annotatedgenesfellwithintheselectedregionsfordairy
andbeefproductiontypesrespectively(Table5)
23
Table5JStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreedsStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreeds
DairyCattle BeefCattle
BTA SelSig
n
Genes1 Sum
SelSignsize
(bp)
Mean
SelSign
size(bp)
SelSi
gn
Genes1 Sum
SelSignsize
(bp)
Mean
SelSign
size(bp)
1 7 11 1521608 138328 7 13 2263213 174093
2 5 9 2216685 246298 8 11 2658549 2416863 2 5 1619336 323867 7 21 3755847 1788504 7 21 5956755 283655 6 11 1761288 1601175 4 17 4506119 265066 5 11 1251127 1137396 2 4 1467738 366934 5 12 5149692 4291417 6 30 4842969 161432 3 28 11889191 4246148 0 0 0 0 4 12 3512215 2926859 4 7 1554863 222123 2 6 1106768 18446110 4 17 8839762 519986 2 3 573138 19104611 6 8 1841802 230225 1 1 100439 10043912 2 8 4244133 530517 1 1 536461 53646113 3 8 1933026 241628 2 8 985376 12317214 0 0 0 0 3 9 692397 7693315 5 9 1415405 157267 2 2 189094 9454716 3 6 1302506 217084 4 6 905719 15095317 3 4 471046 117762 2 9 1551010 17233418 0 0 0 0 3 23 2961718 12877019 3 23 5744143 249745 2 2 116938 5846920 1 2 148886 74443 2 4 627971 15699321 3 7 1930206 275744 1 5 1150760 23015222 3 5 620674 124135 3 10 2411751 24117523 0 0 0 0 1 1 68421 6842124 2 18 9166004 509222 2 4 803770 20094225 2 11 2016602 183327 1 3 329199 10973326 2 4 466134 116534 4 7 776080 11086927 1 1 631792 631792 2 3 1004717 33490628 0 0 0 0 1 1 93464 9346429 2 9 935209 103912 1 5 798300 159660TOT 82 244 65393403 216932 87 232 50024613 190994
1GenesidentifiedGenesintheregionssharedbyallthreedairybreeds(8genes)orallthreebeefbreeds(11genes)foramoredetailedanalysiswereselected(Fig1)
24
Figure1JGenomiclocationoftheselectionsignaturessharedamongstudiedbreedsa)GenesinensembletracksaredisplayedasaredboxcorehaplotypesandSNPsarecolouredinorange(MarchigianaMAR)inviolet(PiedmontesePIE)andpink(SimmentalSIM)b)Genesinensemble tracks are displayed as a red box core haplotypes and SNPs are coloured in blue(HolsteinHOL)ingreen(ItalianBrownBRW)andpink(SimmentalSIM)
All the genes identified were submitted to downstream pathwaynetwork
analyses(Table5Figure1FileS1)Interestinglygenesintheregionssharedby
allthreedairyorbeefbreedsweremanifestedinthemostlikelynetworksforthe
two production aptitudes The IPA analysis detected a total of 12 and 15
networksindairyandbeefbreedsrespectively
Indairybreeds network scores ranged from46 to2 (File S2) Eightnetworks
obtainedscoreshigherthan20andincludedmorethan14moleculeseachThey
wereassociatedtoi)geneexpressionRNAdamageandrepairproteinsynthesis
andposttranslationalmodificationii)cellassemblyorganizationfunctionand
maintenance iii) cell cycle proliferation and cancer iv) carbohydrate
metabolism v) gastrointestinal and neurological disease developmental and
endocrine system disorders A total of 23moleculeswere associatedwith the
highest ranked network (score = 46) In this the immunoglobulins mitogen
25
activatedproteinkinasesandNFkBcomplexesformacentralhublinkinggenes
involved in celltocell signaling and interaction hematological system
developmentandfunctionandimmunecelltraffickingfunctions(Figure2)
Figure2JThemostlikelyIPAnetworkindairycattlebreedsThefunctionsassociatedtothenetworkarecelltocellsignallingandinteractionhematologicalsystem development and function immune cell trafficking Nodes in red correspond to genesidentifiedincorehaplotypesoverlappinginallthethreebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds
Breast cancer antiNestrogen resistance gene 3 (BARC3) andpituitary glutaminyl
cyclase gene (QPCT) genes were common to all three dairy breeds and are
directlyconnectedwithmammaryglandmetabolisms(Ezuraetal2004GSun
et al 2012) The solute carrier family 2 member 5 (SLC2A5) gene facilitates
glucosefructose transport (Zhao et al 1998) and the zetaNchain (TCR)
26
associated protein kinase 70kDa (ZAP70) gene plays a critical role in Tcell
signaling (Bonnefont et al 2011) Calpain is another important complex that
togetherwithcalpainN3(CAPN3)mediatesepithelialcelldeathduringmammary
gland involution (Wilde et al 1997) Furthermore RAS guanyl nucleotide
releasingprotein(RASGRP1)activatestheErkMAPkinasecascaderegulatesT
cellsandBcellsdevelopmenthomeostasisanddifferentiationandisinvolvedin
the regulation of breast cancer cell (Madani et al 2004 Yasuda et al 2007
Wickramasingheetal2009)Genesinvolvedintheothernetworksarelistedin
FileS2
The scoreof thebeef cattlebreedsnetwork rangedbetween46and2 and the
numberofmoleculespresentineachnetworkrangedfrom23to1(FileS3)Asin
dairy breeds eight networks obtained a score higher than 20 and 13 ormore
molecules each These networkswere associated to i) carbohydrate lipid and
nucleic acidsmetabolism energyproduction and smallmoleculebiochemistry
ii) cardiovascular system development and function iii) cellular and tissue
developmentcellassemblyandorganizationfunctionandmaintenanceandcell
and organmorphology iv) cell cycle celltocell signaling and interaction v)
DNA replication recombination and repair and RNA posttranscriptional
modification vi) dermatological diseases and conditions infectious disease
organismal injury and abnormalities In the network with the highest score
(score=46)chondroitinsulfateproteoglycan4(CSPG4)andsnurportinN1(SNUPN)
genes were shared among all beef cattle breeds investigated CSPG4 gene is
relatedtomeattendernesswhileSNUPN isanimprintedgeneimportantinthe
embryodevelopmentandinvolvedinhumanmuscleatrophy(Narayananetal
2002) (Figure 3) All the genes included in this network were involved in
carbohydratemetabolismsmallmoleculesbiochemistryandcellcycleTheother
networkswere related to important signaling andmetabolic process essential
foranimalgrowth(FileS3)
27
Figure3JThemostlikelyIPAnetworkinbeefcattlebreedsThe functions associated to the network are carbohydrate metabolism small molecule
biochemistry and cell cycle Nodes in red correspond to genes identified in core haplotypes
overlapping in all the three breeds of each production type whereas those in green depict
overlappingcorehaplotypesinatleasttwoofthosebreeds
A total of 6 and 9 statistically significant Canonical Pathways (FDR le 005
log10(FDR) ge 13) were identified using IPA for dairy and beef breeds
respectively(Figure4)
28
Figure4JBarplotofstatisticallysignificantCanonicalPathwaysPvalues were corrected for multiple testing using BenjaminiHochberg method and arepresented in thegraphas log(pvalue)Thebar represents thepercentageof genes in a givenpathway that meet cutoff criteria within the total number of molecules that belong to thefunctiona)BarplotofstatisticallysignificantCanonicalPathways indairycattlebreedsb)BarplotofstatisticallysignificantCanonicalPathwaysinbeefcattlebreeds
Themost significant Canonical Pathway in dairywas for purinemetabolism (
log10(FDR) = 26) supporting the highly synthetic processes in the mammary
epithelium(Figure5)
29
Figure5JGenesdetectedunderrecentpositiveselectionindairycattleinvolvedinpurinemetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds
Inbeefproductiontheephrinreceptorsignal(log10(FDR)=27)wasthemost
significant Canonical Pathway (Figure 6) Among other functions ephrin is
knowntopromotemuscleprogenitorcellmigrationbeforemitoticactivation(Li
andJohnson2013)AlltheotherCanonicalPathwaysarereportedinFileS4
30
Figure6JGenesdetectedunderrecentpositiveselectioninbeefcattleinvolvedinEphrinmetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds
Discussion
In this work more than 4000 genotyped bulls of five dairy beef and dualpurposeItalianbreedswerescreenedsearchingforselectionsignaturesindairyand beef production types A strict data quality controlwas applied to reducepossiblesourcesofbiassuchastechnicalgenotypingproblemsandpopulationstructurethatcouldhavelargeimpactonrEHHresultsInfactsincetheimpactofpopulationstructureonrEHHvaluesisstillunexploredwereplicatedpartofouranalyseswithandwithoutcloserelativesinanattempttostudytheeffectofpopulationstructureonthisselectionsignaturemethodInprinciplepopulationstratificationshouldleadtoanoverrepresentationofcertainhaplotypessimplyduetothepresenceoftightpedigreelinks(egsirespassinghalfoftheirgeneticmaterialtotheirsons)ratherthanactualselectionprocessesForthisreasonfor
31
the full analyses all siresons couples were removed after haplotype phasing(retaining only the younger animal) and halfsib familieswere restricted to amaximum of five randomly chosen individuals in an attempt to reduce overrepresentation This assessment was restricted to four regions known to beunderselectioninItalianHolsteinasthecaseinclusterthepolledgeneclusterMC1R and KIT Italian Holstein was selected for two reasons i) it is a highlystructuredbreedaccordingtoourdata(abouthalfofindividualsremovedwereclose relatives) ii) allowed us to compare our results with a previous studyAlthoughtheanalysesconductedonbothredundantandnonredundantdatasetswereable to identify rEHHsignals in these regions thenonredundantdatasetproduced five significant rEHH signals rather than just one in the redundantdataset(Table2)Thisresulthighlightstheconfoundingeffectofthepresenceofcloserelativesinthedatasetandconsequentlytheimprovedabilitytodetectedsignificant selection signature while correcting for population structureInterestingly our results only partially overlap those found by Quanbari et al(2010) These authors detected significant signals at the casein cluster andsomatostatinSSTaswedid but also at ahighernumberof candidate regionsThese inconsistencies may be the effect of different sires included in theanalyses of the different dataset size and to the close relative trimmingprocedure we adopted to decrease the effect of population structure andconsequent bias (Table 4) However poor correlation across investigations isoftenobservedalsointhedeeplyinvestigatedhumanspeciesThisisduetotheuse of different within and between breed statistics that identify selectionsignatureshavingdifferent characteristics (ancientrecent segregatingfixatedunder directionalbalancing selection) to the likely high rate of falsepositivesnegativesresultsandalsotothedifferentconsiderationofpopulationstructureandbackgroundselection(Enardetal2014)The number of total and significant core haplotypes identified by the SweepsoftwarewasthehighestinItalianBrownandthelowestinPiedmontesebreedSinceEHHisheavilybasedonpopulationLDtheaverageLDleveloverdistancewascalculatedineachbreed(Figure7)
32
Figure7JMultiJbreedaveragelinkagedisequilibriumagainstphysicaldistance(inkb)Marchigiana Piedmontese and Italian Simmental breeds present a lower persistence of LD allargerdistancethanItalianHolsteinandItalianBrownbreedsThisanalysishighlightedageneralpositivecorrelationbetweenthe levelofLDover distance and the number of total and significant core haplotypes foundHoweverconsideringthatrEHHisarelativemeasure it ismorelikelythatthehighernumberofsignificantcorehaplotypesidentifiedindairybreedswasduemostlytoahigherselectivepressureindairycomparedtobeefbreedsSignificance tests used in detecting selection signatures should measure theprobability of a statistic to be anoutlier compared to its expecteddistributionunder a neutral model However no reliable neutral model has so far beendevelopedincattlebecauseofthecomplexityofthedemographichistoryofthisspecies (Kemper et al 2014) As a consequence empiric rather than modelbasedsignificancetestaregenerallyusedinthedetectionofselectionsignaturesAccordinglyweconsideredasoutliersvaluesas thosesimply falling in the5plusvarianttailoftherEHHdistributionWekeptthewithinbreedsignificancethresholdloosenoncorrectingitformultipletestingbutconsideredonlysignals
33
eitherexceedingthisthresholdbyatleasttwoordersofmagnitudeorsharedby
twoormoreindependentbreeds
Parallel comparison of results from independent analyses of different breeds
allowed the achievement of a double objective i) to identify putative regions
under(recent)selectioninbreedsbelongingtotwoproductiontypeswhichwas
the main objective of this study and ii) reducing false positives since multi
breed analyses served as internal control Since the rEHH method does not
considerphenotypicinformationasignificantsignalmightarisebecausei)the
core haplotype is actually under a selective pressure ii) the result is a false
positive ie causedby chance population structureor anyotherdriving force
However even considering an unrealistic scenario with no false positives a
proportionofthetotalamountofsignalswouldactuallybeselectionsignatures
due to a selectionpressurenot ondairy or beef traits This becausedairy and
beefbreedsinvestigatedareonlyafewandshareanumberoftraitsnotdirectly
related to dairybeef production Even considering this limitation to our
knowledge this is the first dairy and beef multibreed study applying this
strategytoreducefalsepositivesatthecostofpossiblelossofinformationdue
tohighfalsenegativeratesSignificantsignalsharedbydairyandbybeefbreeds
wereused fordownstreamnetworkanalysesonpositional candidategenes to
search forabiological justification towhatwasobtainedstudying thegenomic
information Only the top network for dairy and the top network for beef
productionarediscussedindetail
Dairybreeds
ThemostsignificantnetworkdetectedindairybreedsisassociatedwithCellTo
CellSignalingandInteractionHematologicalSystemDevelopmentandFunction
andImmuneCellTraffickingfunctions(Figure2)ImmunoglobulinandNFkBact
aspivotalgeneswithinthisnetworkInterestinglyBARC3andQPCTgeneswere
sharedamongallthethreedairybreedsBothgeneshavenotyetbeenstudiedin
cattlealthoughinhumanstudiesthesegenesarewellknowntobelinkedtothe
mammaryglandmetobolismandthecalciumregulationTheformerisinvolved
in integrinmediated cell adhesions and signaling which is required for
mammary gland development and functions (G Sun et al 2012) The latter is
34
associatedwithlowradialbodymineraldensity(BMD)inadultwomen(Ezuraet
al2004)AnotherimportantplayeronthisnetworkisSLC2A5genealsoknown
asGLUT5 SLC2A5 gene acts as fructose transporter in the intestine and has a
significantroleinenergybalanceofdairycows(Burantetal1992)Findingthis
gene in adairy cattle specificnetwork seems surprising since there shouldbe
little need for transporting glucose andor fructose in the ruminant intestinal
tractassimplecarbohydratesaredegradedintovolatilefattyacids(VFA)inthe
rumen (Iqbal et al 2009) However it is often the case that large amount of
starchbypasstherumenincowsfedwithdietsrichincerealgrains(Zhaoetal
1998)Thisbypassedstarchneedstobedigestedinthesmallintestinetoavoid
highlevelsofglucoseconcentrationinthelargeintestine
WedetectedcalpaincomplexandcalpainN3(CAPN3)genesunderpositiverecent
selection in the dairy group as reported also by Utsunomiya et al (2013)
AltoughCalpainisknowntobeinvolvedinpostmortemmeattendernizationitis
also related to dairy metabolism as the muscle breakdown promoted by the
calpain gene provides an energy source for milk production especially at the
beginningoflactation(Kuhlaetal2011)InadditionasreportedbyWildeetal
(1997) calpainNcalpastatin system is related to programmed cell death of
alveolar secretory epithelial cells during lactation Zap70 gene is a cytoplasmic
proteintyrosinekinaseItisrelatedtotheimmunesystemasacomponentofthe
Tcell receptor playing a central role in Tcell responses (Wang et al 2010)
Bonnefontet al (2011) found thatZap70 genewasup regulated (FoldChange
(FC)=82FDR=277E12)onmilksomaticcellsofsheepinfectedbySaureus
andSepidermidisshowinganassociationwithmastitisresistance
Purinemetabolismwasthemostsignificantcanonicalpathwayindairybreeds
In a study conducted in human mammary epthelium Maningat et al (2008)
conductedageneexpressionanalysisonbreastmilkfatglobulesidentifyingthe
PurinemetabolismasthemostsignificantpathwaySynthesisandbreakdownof
purine is essential in the metabolism of the tissues of many organisms
particularlyofthemammaryglandduringlactation
Another interesting canonical pathway was endothelin signaling (Figure 5)
endothelin functions as vasocontrictor and is secreted by endothelial cells
35
(Nussdorferetal1999)Acostaetal(1999)reportedthatincattleendothelinsareinvolvedinthefollicularproductionofprostaglandinsandtheregulationofsteroidogenesis in the mature follicle In a recent study Puglisi et al (2013)confirmed the implication of endothelin in particular EDNRA and endothelinconvertin enzyme 1 in reproductive disorder in bovine They classified theEDNRAaspotentialbiomarkerforfertilityincow
Beefbreeds
Inbeefcattlebreedsthefocaladhesionkinase(FAK)togetherwithintegrinandERK12genesplayacentralroleashubsinthemostsignificantnetwork(Figure3)Theyareinvolvedincellularadhesionandcellmotilityprocessesparticularlyduringskeletalmyogenesis(Nguyenetal2014)Moreovertheoverexpressionof FAK can determine a reduced apoptosis and an increase risk of metastaticcancers(MehlenandPuisieux2006)Thesehubsarenotdirectlyinvolvedinourstudybut sharemany interactors thatare located in region found tobeunderselction such as CSPG4 RB1CC1 MGAT3 CIRPB CSPG4 gene belongs tochondroitin sulfate proteoglycans (CSPGs) family CSPGs are proteoglycansconsistingofaproteincoreandachondroitinsulfatesidechainTheyareknownto be structural components of a variety of tissues includingmuscle andplaykey roles inneuraldevelopmentandglial scar formationTheyare involved incellular processes such as cell adhesion cell growth receptor binding cellmigrationandinteractionswithotherextracellularmatrixconstituents
Manystudiesreporttheroleofproteoglycansinmeattextureofseveralmusclesfromthebovinespecies(Nishimuraetal1996)Dubostetal(2013)highlightadirect involvement of proteoglycans with respect to cooked meat juicinessRB1CC1geneplaysacrucialroleinmusculardifferentiationanditsactivationisa essential for myogenic differentiation (Watanabe et al 2005 p 1)Monoacylglycerol acyltransferase (MGAT3) gene cause the synthesis ofdiacylglycerol (DAG)using2monoacylglyceroland fattyacyl coenzymeAThisenzymaticreactionis fundamental fortheabsorptionofdietaryfat inthesmallintestineSunetal(2012p3)reportedinastudyonfiveChinesecattlebreedsthatMGAT3gene isassociatedwithgrowthtraitsCIRPBgenemaybepartofacompensatorymechanism inmuscles undergoing atrophy It preservesmuscle
36
tissuemassduringcoldshockresponsesaginganddisuse(DupontVersteegden
et al 2008) SNUPN is an imprinted gene and is expressed monoallelically
dependingonitsparentaloriginSNUPNplayimportantrolesinembryosurvival
andpostnatal growth regulation (Smithet al 2012Wanget al 2012)Ephrin
receptorsignalingwasthetopcanonicalpathwayproducebyIPAandhasalsoan
interestingbiologicalinterpretationformeatproduction(Figure6)Indeedthis
pathaway is important for the muscle tissue growth and regeneration
participating in correct positioning and formation of the neural muscular
junction(LiandJohnson2013)
Conclusions
In this study we analysed the selection signatures at genomewide level in 5
Italian cattle breeds Then we used a multibreed approach to identify the
genomicregionssharedamongcattlebreedsselectedfordairyorbeefpurpose
Thisapproachincreasedthepowertopinpointregionsofthegenomethatplay
important roles in economically relevant traits in both dairy and beef cattle
Moreover network and pathways analyses were used to display a detailed
functional characterization of genes mapping in the regions detected under
recentpositiveselection
Particularly in dairy cattle genes under directional selection are related to
feeding adaptation (increasing level of starch in the diet) mammary gland
metabolismandresistencetomastitisThebiologicalfeaturesunderselectionin
beef cattle breeds are involved in animal growth meat texture and juiceness
These results have a logical biological explanation However the frequent
approach of interpreting selection signatures on the basis of the function of
geneslocatednearbythepeaksignalofthestatisticusedispresentlychallenged
Novelinformationinhumanssuggeststhatmostselectedvariationisnotlocated
within genes and coding regions but in regulatory sites identified by the
ENCODEproject(Enardetal2014)Thesemaycontroltheexpressionofentire
genomicregionsorofgeneslocatedatarelavantdistancefromtheselectedsite
makingbiologicalinterpretationmorecomplex
37
InanycasefurtherstudiesusingdenserSNPchipsorwholegenomesequencing
producinginformationnotsubjectedtoascertainmentbias(Qanbarietal2014)
may increase the resolution of our analysis and togetherwith new knowledge
arisingonthecontrolofgeneexpressionvalidateorcorrectourresults
Acknowledgements
ANAFI (Italian Association of Holstein Friesian Breeders) ANAPRI (Italian
Association Simmental Breeders) ANABORAPI (National Association of
PiedmonteseBreeders)ANABIC(AssociationofItalianBeefCattleBreeders)and
ANARB (Italian Brown Cattle Breeders Association) are acknowledged for
providing the biological samples and the necessary information for this work
The Authors are also grateful to SelMol project (MiPAAF Ministero delle
PoliticheAgricoleAlimentarieFortestali)andbytheProZooproject(Regione
LombardiandashDGAgricoltura Fondazione Cariplo Fondazione Banca Popolare di
Lodi)forfunding
Thefundershadnoroleinstudydesigndatacollectionandanalysisdecisionto
publishorpreparationofthemanuscript
References
Acosta TJ Berisha B Ozawa T Sato K Schams D Miyamoto A 1999
Evidence for a Local EndothelinAngiotensinAtrial Natriuretic Peptide
SysteminBovineMature Follicles In Vitro Effects on SteroidHormones
and Prostaglandin Secretion Biol Reprod 61 1419ndash1425
doi101095biolreprod6161419
AjmoneMarsan P Garcia JF Lenstra JA 2010 On the origin of cattle how
aurochs became cattle and colonized theworld Evol Anthropol Issues
NewsRev19148ndash157
38
BonnefontCToufeerMCaubetCFoulonETascaCAurelMRBergonier
D Boullier S RobertGranieacute C Foucras G 2011 Transcriptomic
analysisofmilksomaticcells inmastitisresistantandsusceptiblesheep
upon challenge with Staphylococcus epidermidis and Staphylococcus
aureusBMCGenomics12208
BurantCFTakedaJBrotLarocheEBellGIDavidsonNO1992Fructose
transporter inhumanspermatozoaandsmall intestine isGLUT5 JBiol
Chem26714523ndash14526
DubostAMicolDPicardBLethiasCAnduezaDBauchartDListratA
2013Structuralandbiochemicalcharacteristicsofbovineintramuscular
connective tissue and beef quality Meat Sci 95 555ndash561
doi101016jmeatsci201305040
DupontVersteegden EE Nagarajan R Beggs ML Bearden ED Simpson
PMPetersonCA2008IdentificationofcoldshockproteinRBM3asa
possible regulator of skeletal muscle size through expression profiling
AJP Regul Integr Comp Physiol 295 R1263ndashR1273
doi101152ajpregu904552008
Enard D Messer PW Petrov DA 2014 Genomewide signals of positive
selection in human evolution Genome Res gr164822113
doi101101gr164822113
EzuraYKajitaMIshidaRYoshidaSYoshidaHSuzukiTHosoiTInoue
SShirakiMOrimoHEmiM2004Associationofmultiplenucleotide
variationsinthepituitaryglutaminylcyclasegene(QPCT)withlowradial
BMDinadultwomenJBoneMinRes191296ndash301
Fan H Wu Y Qi X Zhang J Li J Gao X Zhang L Li J Gao H 2014
Genomewide detection of selective signatures in Simmental cattle J
ApplGenet1ndash9doi101007s1335301402006
HayesBJChamberlainAJMaceachernSSavinKMcPartlanHMacLeodI
SethuramanLGoddardME2009Agenomemapofdivergentartificial
selectionbetweenBostaurusdairycattleandBostaurusbeefcattleAnim
Genet40176ndash184doi101111j13652052200801815x
Hayes BJ Lien S Nilsen H Olsen HG Berg P Maceachern S Potter S
Meuwissen THE 2008 The origin of selection signatures on bovine
39
chromosome 6 Anim Genet 39 105ndash111 doi101111j1365
2052200701683x
IqbalSZebeliQMazzolariABertoniGDunnSMYangWZAmetajBN
2009 Feeding barley grain steeped in lactic acid modulates rumen
fermentation patterns and increases milk fat content in dairy cows J
DairySci926023ndash6032doi103168jds20092380
KemperKESaxtonSJBolormaaSHayesBJGoddardME2014Selection
for complex traits leaves littleornoclassic signaturesof selectionBMC
Genomics15246doi1011861471216415246
KimuraM1984Theneutraltheoryofmolecularevolution
KuhlaBNuumlrnbergGAlbrechtDGoumlrsSHammonHMMetgesCC2011InvolvementofSkeletalMuscleProteinGlycogenandFatMetabolismin
the Adaptation on Early Lactation of Dairy Cows J Proteome Res 10
4252ndash4262doi101021pr200425h
Lenstra JAGroeneveld LF EdingHKantanen JWilliams JL Taberlet P
NicolazziELSolknerJSimianerHCianiEGarciaJFBrufordMW
AjmoneMarsan P Weigend S 2012 Molecular tools and analytical
approaches for the characterization of farm animal genetic diversity
AnimGenet43483ndash502
Li J Johnson SE 2013 EphrinA5 promotes bovine muscle progenitor cell
migration before mitotic activation J Anim Sci 91 1086ndash1093
doi102527jas20125728
Ma YL Zhang Q Ding XD 2012 [Detecting selection signatures on X
chromosome in pig through high density SNPs] Yi Chuan Hered
ZhongguoYiChuanXueHuiBianJi341251ndash1260
Madani S Hichami A CharkaouiMalki M Khan NA 2004 Diacylglycerols
Containing Omega 3 and Omega 6 Fatty Acids Bind to RasGRP and
Modulate MAP Kinase Activation J Biol Chem 279 1176ndash1183
doi101074jbcM306252200
Maningat PD Sen P Rijnkels M Sunehag AL Hadsell DL Bray M
Haymond MW 2008 Gene expression in the human mammary
epitheliumduring lactation themilk fat globule transcriptome Physiol
Genomics3712ndash22doi101152physiolgenomics903412008
40
Mehlen P Puisieux A 2006Metastasis a question of life or deathNat Rev
Cancer6449ndash458doi101038nrc1886
NarayananUOspinaJKFreyMRHebertMDMateraAG2002SMNthe
Spinal Muscular Atrophy Protein Forms a PreImport Snrnp Complex
withSnurportin1andImportin HumMolGenet111785ndash1795
Nguyen N Yi JS Park H Lee JS Ko YG 2014 Mitsugumin 53 (MG53)
LigaseUbiquitinatesFocalAdhesionKinaseduringSkeletalMyogenesisJ
BiolChem2893209ndash3216doi101074jbcM113525154
NielsenRHellmannIHubiszMBustamanteCClarkAG2007Recentand
ongoing selection in the human genome Nat Rev Genet 8 857ndash868
doi101038nrg2187
NishimuraTHattoriATakahashiK1996Relationshipbetweendegradation
of proteoglycans andweakening of the intramuscular connective tissue
duringpostmortemageingofbeefMeatSci42251ndash260
Nussdorfer GG Rossi GPMalendowicz LKMazzocchi G 1999Autocrine
ParacrineEndothelinSysteminthePhysiologyandPathologyofSteroid
SecretingTissuesPharmacolRev51403ndash438
OrsquoBrienAMPUtsunomiyaYTMeacuteszaacuterosGBickhartDM LiuGE Tassell
CPV Sonstegard TS Silva MVD Garcia JF Soumllkner J 2014
Assessing signatures of selection through variation in linkage
disequilibrium between taurine and indicine cattle Genet Sel Evol 46
19doi101186129796864619
Pan D Zhang S Jiang J Jiang L Zhang Q Liu J 2013 GenomeWide
DetectionofSelectiveSignatureinChineseHolsteinPLoSONE8e60440
doi101371journalpone0060440
PuglisiRCambuliCCapoferriRGianninoLLukajADuchiRLazzariG
Galli C Feligini M Galli A Bongioni G 2013 Differential gene
expressionincumulusoocytecomplexescollectedbyovumpickupfrom
repeat breeder and normally fertile Holstein Friesian heifers Anim
ReprodSci14126ndash33doi101016janireprosci201307003
Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender D
MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKa
41
tool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795
Qanbari S Gianola D Hayes B Schenkel F Miller S Moore S Thaller GSimianer H 2011 Application of site and haplotypefrequency basedapproachesfordetectingselectionsignaturesincattleBMCGenomics12318doi1011861471216412318
Qanbari S Pausch H Jansen S Somel M Strom TM Fries R Nielsen RSimianer H 2014 Classic Selective Sweeps Revealed by MassiveSequencinginCattlePLoSGenet10doi101371journalpgen1004148
Qanbari S Pimentel ECG Tetens J Thaller G Lichtner P Sharifi ARSimianerH2010AgenomewidescanforsignaturesofrecentselectioninHolsteincattleAnimGenetdoi101111j13652052200902016x
Sabeti PC Reich DE Higgins JM Levine HZ Richter DJ Schaffner SFGabriel SB Platko JV Patterson NJ McDonald GJ Ackerman HCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash7
ScheetPStephensM2006Afastandflexiblestatisticalmodelforlargescalepopulation genotype data applications to inferring missing genotypesandhaplotypicphaseAmJHumGenet78629ndash44
Shahzad K Loor JJ 2012 Application of topdown and bottomup systemsapproaches in ruminantphysiologyandmetabolismCurrGenomics13379
SmithLSuzuki JGoffAFilionFTherrien JMurphyBKohanGhadrHLefebvreRBrisvilleABuczinskiSFecteauGPerecinFMeirellesF2012DevelopmentalandEpigeneticAnomaliesinClonedCattleReprodDomestAnim47107ndash114doi101111j14390531201202063x
Sun G Cheng SYS Chen M Lim CJ Pallen CJ 2012 Protein TyrosinePhosphatase Phosphotyrosyl789 Binds BCAR3 To Position Cas forActivationatIntegrinMediatedFocalAdhesionsMolCellBiol323776ndash3789doi101128MCB0021412
Sun J Zhang C Lan X Lei C ChenH 2012 Exploring polymorphisms andassociationsofthebovineMOGAT3genewithgrowthtraitsGenomeNatl
42
Res Counc Can Geacutenome Cons Natl Rech Can 55 56ndash62doi101139g11077
TaberletPCoissacEPansu J PompanonF 2011Conservationgeneticsofcattle sheep and goats C R Biol On the trail of domesticationsmigrationsandinvasionsinagricultureDominiqueJobGeorgesPelletierJeanClaudePernollet334247ndash254doi101016jcrvi201012007
Utsunomiya YT Perez OrsquoBrien AM Sonstegard TS Van Tassell CP doCarmo AS Meszaros G Solkner J Garcia JF 2013 Detecting lociunder recent positive selection in dairy and beef cattle by combiningdifferentgenomewidescanmethodsPLoSOne8e64280
Voight BF Kudaravalli S Wen X Pritchard JK 2006 A Map of RecentPositive Selection in the Human Genome PLoS Biol 4 e72doi101371journalpbio0040072
WangHKadlecekTAAuYeungBBGoodfellowHESHsuLYFreedmanTSWeissA2010ZAP70AnEssentialKinaseinTcellSignalingColdSpringHarbPerspectBiol2doi101101cshperspecta002279
WangMZhangXKangLJiangCJiangY2012MolecularcharacterizationofporcineNECD SNRPNandUBE3Agenesand imprinting status in theskeletal muscle of neonate pigs Mol Biol Rep 39 9415ndash9422doi101007s1103301218066
WatanabeRChanoTInoueHIsonoTKoiwaiOOkabeH2005Rb1cc1iscritical for myoblast differentiation through Rb1 regulation VirchowsArchIntJPathol447643ndash648doi101007s0042800411831
WickramasingheNSManavalanTTDoughertySMRiggsKALiYKlingeCM 2009 Estradiol downregulates miR21 expression and increasesmiR21targetgeneexpressioninMCF7breastcancercellsNucleicAcidsRes372584ndash2595doi101093nargkp117
WildeCJAddeyCVLiPFernigDG1997Programmedcelldeathinbovinemammary tissue during lactation and involution Exp Physiol 82 943ndash953
YasudaSStevensRLTeradaTTakedaMHashimotoTFukaeJHoritaTKataoka H Atsumi T Koike T 2007 Defective Expression of Ras
43
Guanyl NucleotideReleasing Protein 1 in a Subset of Patients with
SystemicLupusErythematosusJImmunol1794890ndash4900
ZhangCBaileyDKAwadTLiuGXingGCaoMValmeekamVRetiefJ
Matsuzaki H Taub M Seielstad M Kennedy GC 2006 A whole
genome longrange haplotype (WGLRH) test for detecting imprints of
positiveselectioninhumanpopulationsBioinformatics222122ndash8
ZhaoFQOkineEKCheesemanCIShiraziBeecheySPKennellyJJ1998
Glucose transporter gene expression in lactating bovine gastrointestinal
tractJAnimSci762921ndash2929
44
Supplementaryfiles
YoucanfindChapterʹsupplementaryfilesat
- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)
- GoogleDrive(httptinyurlcomPhDMMChapter02)orscantheQRcode
SupplementaryfileS1ndashSignificantcorehaplotypesandgenessharedin
dairyandbeef
Significantcorehaplotypes(pvaluele005haplotypefrequencyge025)shared
indairyandbeefcattlebreedsandrelativegenesintersectingwiththemThefile
isdividedinto4tablesThefirst2sheets(sheet1and2)shownthesignificant
corehaplotypesfordairyandbeefThelast2sheets(sheet3and4)showngenes
intersectingwiththesignificantcorehaplotypesfordairyandbeef
SupplementaryfileS2ndashNetworksrankingindairybreeds
Networksrankingindairybreedswithrelativemoleculessymbolslistscore
valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop
functionforeachnetwork
SupplementaryfileS3ndashNetworksrankinginbeefbreeds
Networksrankinginbeefbreedswithrelativemoleculessymbolslistscore
valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop
functionforeachnetwork
45
SupplementaryfileS4ndashCanonicalpathwaysrankingindairyorbeefcattle
breeds
Canonicalpathwaysrankingindairy(sheet1)orbeef(sheet2)cattlebreedswithrelativemoleculessymbolslistratioandndashlog10ofthepvaluesforeachcanonicalpathway
46
CHAPTER3Searchingnewsignalsforproductivetraitsthroughgenebasedassociationanalysisin
threeItaliancattlebreedsSCapomaccio1MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItalyTheseauthorscontributedequallytothiswork
____________________SubmittedtoAnimalGenetics
Abstract
Genome wide association studies (GWAS) have been widely applied to
disentanglethegeneticbasisofcomplextraits IncattlebreedsclassicalGWAS
approach with medium density markers panels are far to be conclusive
especiallyforcomplextraitsThisisduetotheintrinsiclimitationsofGWASand
to the choices that are made to step from the associationrsquos signals to the
functionalunits
Here we applied a geneHbased association strategy to prioritize genotypeH
phenotype associations found with classical approaches in three Italian dairy
cattlebreedswithdifferentsamplesizes(ItalianHolsteinn=2058ItalianBrown
n=745ItalianSimmentaln=477)formilkproductionandqualitytraits
WhileclassicalsinglemarkerregressionfailedtorevealgenomeHwidesignificant
genotypeHphenotype association except for Italian Holstein the gene based
approach found specific genes in each breed that are associated with milk
physiologyandmammaryglanddevelopment
Since no established method is yet implemented to step from variation to
functional units our strategy may contribute to reveal new genes that play
significant roles in complex traits like thosehere investigated condensing low
signalsofassociationinageneHcentricfashionapproach
Background
Modern cattle breeds have nomore than 200 years of history (Taberlet et al
2008) a time spanwhere selection reproductive isolationand the consequent
geneticdriftshapedallelefrequenciesandlinkagedisequilibriumthroughoutthe
genome This is particularly true for industrial dairy breeds where the joint
application of reproductive technologies and modern genetic evaluation
methodshave imposedahigh selectionpressureona limitednumberof traits
(Taberletetal2008)involvedinmilkquantityandquality
Indeed the milk production industry has been historically ready to acquire
technicalimprovementsfromdifferentfieldsstatisticsbiologyandengineering
50
ForexampleBLUPbasedstrategiesandhighHdensitygenotyping throughSNPHchipsarenowroutinelyincludedinbullevaluationshavingsignificantimpactonselectionefficiencyandpopulationstructureThese techniques have dramatically improved milk yield and compositionwithout knowing the biological basis of the traits under selection To datedespite many efforts and many QTL mapped only a few genes have beeneffectivelyassociatedwithmilkyieldandcomposition(Grisartetal2002Blottetal2003CohenHZinderetal2005)In thiscontextGenomeWideAssociationStudies(GWAS) is themostcommonmethodfordissectingthebiologythatunderliescomplextraits(McCarthyetal2008)TheunderlyingideaofGWASissimplefindmarkerHtraitassociationsexploitingthe linkage disequilibrium (LD) that exists between the causative mutation ndashwhichweignorendashandoneormoreanalyzedmarkerswhichwewellknowInthiswayonecanpinpointgenomicregionscarryingcausalvariantsforanytrait
Although a number of GWAS have been conducted in cattle productive andmorphological traits (some examples (Jiang et al 2010 Cole et al 2011Meredith et al 2012)) and found several genomic regions associated to thesetraits none focused their attention directly on genes involved inmilk biologyMoreoveroneof themajordrawbacksofGWASisrelatedtothescarceresultsportability across populations due to a variety of confounding factorspopulationstructuredifferentialLD levelsbreedspecific selection targetsandeven SNPs ascertainment bias (Clark et al 2005) For instance significantassociationsonfatpercentagewereidentifiedinmanyregionsoftheBostaurusgenome in different breeds Except for genes of major effect there is littleagreement on which regions and H more importantly H genes are actuallyinvolvedinthecontrolofthetraitinvestigatedThisissomehownotsurprisingone favorable allele may segregate in one breed and be fixated in a differentbreed the same allele segregates in both breeds but allelesmay differ or thesegregates in both breeds but the genetic background masks the effect ofsegregation
51
AnotherlimitationinGWASisthestringentsignificancethresholdoftenapplied
tocorrectformultipletestingAsaresulta largeproportionofgeneswithlow
effectaredisregardedwithconsequentoverestimationof thehigheffectgenes
variance(Xu2003)Thisbiasisparticularlyworryingwheninvestigatingtraits
controlledbya largenumberofgeneswithsmalleffectand inhighLDasmilk
yield(GIANOLAETAL2013)
To overcome this kind of limitation different strategies have been invented
usingprior knowledge Interesting ldquocandidatepathwayrdquo approacheshavebeen
conducted(Ravenetal2013)testingforassociationbetweenaphenotypeand
genes inpathwaysthatare likelytobe involvedinthecontrolofthetraitThis
approach reduces thenumberofmarkers tobe tested increases the statistical
poweroftheexperimentrestrictingtheanalysistoexistingknowledge
Other methods based on statistical manipulations or particular experimental
designs have been developed to prioritize or validate genotypeHphenotype
associations For example geneHbased association strategies while restricting
GWA study only to genes and neighboring genomic regionsmay identify new
candidategenesdeservingpostHGWASanalysis(Akulaetal2011Cantoretal
2010 Liu et al 2010) Moreover it is true that classical GWAS downstream
bioinformatics analyses concentrate their efforts in findingmeaningful results
usinggenesasfinaltarget
The aim of ourworkwas to find new candidate genes and novel information
associated to complex traits as milk production (yiled) and quality (fat and
protein percentage) using a geneHcentric genome wide association in three
differentdairybreedsItalianHolsteinItalianBrownandItalianSimmental
Methods
BiologicalSamples
Bulls from the three most represented dairy breeds in Italy (Italian Holstein
ItalianBrownandItalianSimmental)wereselectedfromArtificialInsemination
(AI) bulls progeny tested in Italy until 2010 and forwhich biological samples
52
wereavailableItalianHolsteinbullswerechosenaccordingtofollowingcriteria
i) having the national Selection Index namely production functionality type
(PFT)withreliabilitygt75andii)havingthelowestpossiblerelationshipwith
theothersiresinthesetThiswasachievedbymaximizingthevariabilitywithin
the group On the contrary considering the low number of available Italian
BrownandItalianSimmentalbullstheywereallincludedintheanalysesAfter
thisprocedureatotalof2139ItalianHolstein775ItalianBrownand486Italian
Simmental samples (including quality control replicates)were genotypedwith
theBovineSNP50BeadChipv1(IlluminaSanDiegoCA)
Traitsconsideredintheanalysis
Italian Holstein (ANAFI) Italian Brown (ANARB) and Italian Simmental
(ANAPRI)breedersassociationsprovidedderegressedproofs(DRP)ordaughter
yielddeviations (DYD)hereafter referredasphenotype forall traitsanalyzed
Traitsanalyzedweremilkyieldfatpercentageandproteinpercentage
Genotypingandqualitycontrol
Quality controls (QC) were run independently for each breed using the same
pipelineandthresholds IndetailQCexcluded i)thereplicatesamplewiththe
highernumberofmissinggenotypesii)sampleswithunexpectedlyhigh(ge100)
mendelianerrors(forfatherHsonpairs)iii)sampleswithmorethan5missing
genotypes iv) SNPs with ge 25 missing data v) SNPs with minor allele
frequency(MAF)le5vi)SNPswithmendelianerrorsge25andvii)SNPsout
ofHardyHWeinbergequilibriumtestedbyFisherexacttest(ple0001Bonferroni
corrected)
Analysisofpopulationstructure
MultiDimensionScaling(MDS)wasperformedusingPLINK107(Purcelletal
2007)andresultsplottedwithRscriptsThisanalysisshowninsupplementary
material (FigureS1) exploredpopulationsubstructureandverified thegenetic
homogeneityofthedatasetpriortofurtheranalyses
53
PairwiseLDwascalculatedusingPLINK107(Purcelletal2007)LDdecayand
LD structure on specific regions were calculated with in house R scripts and
snpplotterRpackage(LunaandNicodemus2007)
GWAS
GenomeHwide association analysis was made with the GenABEL package in R
(Team 2014) using GRAMMARHCG (Genome wide Association using Mixed
Model and Regression H Genomic Control) approach (Amin et al 2007
Aulchenkoetal2007)
Basingonpriorknowledge(Minozzietal2013) in ItalianHolsteintheDGAT1
regionwas also treated as fixed effect inGenABEL analysis of the three traits
ResultsusingthismodelarereferredtoasldquoHolsteinDGATrdquo
Manhattan andquantileHquantileplots for each traitwereproduced to allowa
thoroughevaluationoftheGWASresultsAllcalculatedpHvaluesfromtheGWAS
analysiswere retained for the downstream prioritizationwith the geneHbased
association
Workingongenomecoordinates
Coordinatesof themarkerswere liftedover fromBTA40(Elsiketal2009)to
UMD31assembly(Ziminetal2009)usingSnpChimpv1(Nicolazzietal2014)
andgenecoordinateswereretrievedusingBioMartEnsembl
QTLcoordinatesweredownloadedfromtheAnimalQTLdb(Huetal2013) in
bedformat
Operations on genomic intervalswereperformedusing thebedtools suite ver
2162(QuinlanandHall2010)
GeneBasedAssociationanalysis
Toreducetheantagonismbetweensignificancethresholdandfalsepositiverate
of a classical GWA study (Gianola et al 2013) and to facilitate selection
signature comparison across the three breeds investigated we used a geneH
54
centricstatisticthatcollapsesinasinglevalueallthepHvaluesfromSNPsrelated
toagenecorrectingforLDinformation
Our approachderivesdirectly from the softwareVEGAS (VersatileGeneHBased
TestforGenomeHwideAssociationStudies)(Liuetal2010)adaptedtopursue
meaningfulresultsinanyspeciesbeyondhumans
Briefly the geneHwise test statistic is the sum of the n upper tail chiHsquared
valuesofaSNPsubset withonedegreeof freedom(wheren is thenumberof
SNPmappinginthegivengene)WithperfectlinkageequilibriumthegeneHwise
pHvaluewouldbetheoneHsidepHvalueofachiHsquaredistributionwithndegrees
of freedom Otherwise more likely the distribution of the pHvalues has to be
evaluated(withsimulations)andweighted(withLDvalues)(Liuetal2010)
The VEGAS speciesHfree procedure was implemented in a UNIX environment
usingBashandRlanguagesusingEnsemblannotationasinputfilesPLINKdata
typeandpHvaluesforeachSNPfromthepreviouslycitedGRAMMARprocedure
WemappedSNPsonBostaurusgenes(UMD31)andtheirboundaries(100000
bpupHanddownHstream)toaccountforregulatoryregionsOnlygeneswithat
least 2 SNPs mapped were retained Significant entries from the geneHbased
associationweremappedinthecattleQTLdatabasewithbedtools
ResultsandDiscussion
In this study a gene based association strategy was applied to identify new
genomicregionsboostingGWASanalysisresultsformilkproductionandquality
fromthreeItaliandairybreeds
Complextraitslikethosehereevaluatedareaffectedbyafewmajorgeneswith
large effects andmany otherswithmoderate to low effects the latter are not
easily identifiedby genomewide scans inmodern cattle breedsduemainly to
samplesizelimitationwhilesignalfromtheformersarelostingenomescansH
withsomeexceptionsHduerapidfixations
InthisscenariowithmediumdensitySNPpanelsliketheoneusedinthisstudy
associationbasedonsingleSNPregressionorotherldquoclassicalrdquomethodscouldbe
inconclusiveintermsofresults
55
AftertheQCprocess7452058and477samplesand3575938535and38574SNPs were retained for Italian Brown Italian Holstein and Italian Simmentalbulls respectively (Table S1) HolsteinDGAT analysis retained 38534markersoneSNPlessthantheotherHolsteinpanelDescriptive statistics on genotypes for each breed (histogram and barplot forheterozygosis and call rate per individual Figure S7 and multi dimensionalscalingFigureS1)revealedabsenceofconfoundingeffectsWiththesedatasetsatthenominalgenomeHwidepHvaluesignificancethresholdof 005 (ple125 x 10H6 after Bonferroni correction (Burton et al 2007Dudbridge and Gusnanto 2008)) significant associations were found only inItalianHolsteinbetweenmilkyieldandfatpercentageand26SNPsmappingintheproximalregionofBTA14nearbyDGAT1(Table1)
56
Table1PHvaluesoftheSNPsovercomingthegenomewidesignificancethresholdintheHolsteinbreedformilkyieldandfatpercentage(notaccountingforDGAT1effect)SNPsaresortedbypositionsintheUMD31assembly
SNPID BTA Position(Bp) POvalue(Fat) POValue(Milk)
Hapmap30383OBTCO005848 14 1489496 112EH12 361EH12
ARSOBFGLONGSO57820 14 1651311 247EH46 161EH16
ARSOBFGLONGSO34135 14 1675278 857EH17
ARSOBFGLONGSO94706 14 1696470 665EH16
ARSOBFGLONGSO4939 14 1801116 534EH49 106EH20
Hapmap52798Oss46526455 14 1923292 915EH21
UAOIFASAO6878 14 2002873 382EH24
ARSOBFGLONGSO107379 14 2054457 118EH33 664EH15
ARSOBFGLONGSO18365 14 2117455 250EH16
Hapmap30922OBTCO002021 14 2138926 439EH13
Hapmap25384OBTCO001997 14 2217163 390EH10 697EH09
Hapmap24715OBTCO001973 14 2239085 108EH09 369EH09
BTAO35941OnoOrs 14 2276443 402EH13
Hapmap30374OBTCO002159 14 2468020 528EH12
Hapmap30086OBTCO002066 14 2524432 798EH11
ARSOBFGLONGSO74378 14 3640788 161EH08
ARSOBFGLOBACO1511 14 3765019 375EH09
UAOIFASAO9288 14 3956956 811EH09
ARSOBFGLONGSO100480 14 4364952 133EH09
UAOIFASAO5306 14 4468478 458EH08
ThisislikelyduetotheeffectoftheK232ADGAT1mutationthatissegregatingin
thisbreed(Fontanesietal2014)NootherSNPwassignificantlyassociatedto
anyothertraitinthe3breedsinvestigated
SomeSNPswereunderasuggestivesignificancethresholdofple5x10H5Details
on these markers are in supplementary material (Table S3) but they are not
discussedherePlotsdescribingatglancetheassociationresultsrevealingalow
signaltonoiseratioareavailableintheonlinesupplementarymaterial(Figure
S2S3S4andS5respectively for ItalianBrown ItalianHolsteinHolsteinDGAT
andItalianSimmental)
57
Asperthegenecentricanalysiswemappedatotalof192371989919894and
19844 genes on 24616 available on Ensembl (release version 73) in Italian
BrownItalianHolsteinHolsteinDGATandItalianSimmentalrespectivelyGenes
with a geneHwisepHvalue at least 10H5were considered significantly associated
withthestudiedtraitThesignificanceappearedtobeadequatetothehighrate
ofredundancyof theannotationfeaturesandto figureoutthetopassociations
with a reasonable level of confidence Results on these associations per breed
andpertraitaresummarizedinTable2Table3andinsupplementarymaterials
(TableS2andTableS4)ThemajorityofsignificantgenesfoundwiththegeneH
centricanalysismapped inpreviouslycharacterizedQTL thatareassociated to
milkproductionorqualityThesearesummarizedintheTableS5
Table2Numberofsignificantlyassociatedgenesforeachbreedineachtraitinthegenebasedassociationtest
Fat Milkyield Protein
Brown 3 1
Holstein 92 80 1
HolsteinDGAT 1 4 1
Simmental 1 13
58
Tableamp3FeaturessignificantlyassociatedperbreedtraitHolsteinanalysisnotaccountingforDGAT1isshow
ninTableS3
Breedamp
Traitamp
EnsemblampIDamp
GeneWiseampPValueamp
SNPsInGeneamp
BestSNPamp
BestSNP_PValueamp
BTAamp
Startamp(bp)amp
Endamp(bp)amp
GeneampNam
eamp
BROWNamp
FAT
Nosignificantassociationfound
MILK
ENSBTAG00000019237
500EG05
4Hapmap53770Gss46526325
204EG04
23
47333403
47348212
TXNDC5
ENSBTAG00000043918
500EG05
4Hapmap53770Gss46526325
204EG04
23
47337650
47337787
5S_rRNA
ENSBTAG00000046839
500EG05
4Hapmap53770Gss46526325
204EG04
23
47328131
47352621
UncharacterizedProtein
PROTEIN
ENSBTAG00000001260
700EG05
3ARSGBFGLGNGSG116222
441EG03
18
52120387
52124418
PINLYP
amp
HOLSTEINDGATamp
FAT
ENSBTAG00000000369
100EG05
3Hapmap53294Grs29016908
333EG05
594673755
94749093
EPS8
MILK
ENSBTAG00000026896
130EG05
5ARSGBFGLGNGSG104949
385EG04
23
51549721
51553563
FOXF2
ENSBTAG00000005260
400EG05
5Hapmap33288GBTCG033751
120EG03
638120578
38127577
SPP1
ENSBTAG00000006579
500EG05
4ARSGBFGLGBACG2207
632EG05
15
54418559
54459500
P4HA3
ENSBTAG00000007013
700EG05
6UAGIFASAG7665
396EG05
19
5221507
5283308
TOM1L1
PROTEIN
ENSBTAG00000006638
600EG05
3ARSGBFGLGNGSG104774
176EG04
18
56523439
56530499
BCL2L12
amp
SIMMENTHALamp
FAT
ENSBTAG00000037649
900EG05
6Hapmap30001GBTAG142716
529EG05
4120427915
120494453
VIPR2
MILK
ENSBTAG00000017538
100EG05
6ARSGBFGLGBACG13093
587EG06
12
85630428
85630912
Pseudogene
ENSBTAG00000000856
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1766767
1769754
FBXL6
ENSBTAG00000000857
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1763994
1766621
GPR172B
ENSBTAG00000026356
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1795351
1804562
DGAT1
ENSBTAG00000035158
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1770660
1772329
TMEM
249
ENSBTAG00000046208
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1782901
1788087
SCRT1
ENSBTAG00000009811
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1825982
1844587
UncharacterizedProtein
ENSBTAG00000009816
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1838135
1840081
UncharacterizedProtein
ENSBTAG00000014458
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1844664
1894424
MROH1
ENSBTAG00000020751
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1806081
1825793
HSF1
ENSBTAG00000044406
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1883849
1883971
SCARNA15
ENSBTAG00000046889
400EG05
5ARSGBFGLGBACG13093
587EG06
12
85610279
85610762
UncharacterizedProtein
ENSBTAG00000020117
800EG05
3BTAG78494GnoGrs
188EG04
721189879
21196263
ZBTB7A
PROTEIN
Nosignificantassociationfound
amp
59
1 ItalianHolstein
The$ large$ number$ of$ significant$ signals$ was$ observed$ in$ the$ Italian$ Holstein$
especially$ for$ milk$ yield$ and$ fat$ percentage$ As$ shown$ in$ Table$ 2$ the$ strong$
effect$ of$ DGAT1$ in$ Italian$ Holsteins$ observed$ with$ individual$ SNPs$ was$
confirmed$by$the$geneBcentric$approach$In$this$region$almost$the$totality$of$the$
geneBwise$pBvalues$under$the$chosen$threshold$appeared$to$be$influenced$by$this$
major$ gene$ and$ by$ the$ high$ level$ of$ linkage$ disequilibrium$ that$ characterizes$
cosmopolitan$bovine$breeds$(see$Figure$S6)$Within$this$pool$of$genes$it$is$worth$
to$mention$PLEC$ (plectin)$ a$ cell$ surface$ receptor$ gene$ in$ the$RNASE5$ pathway$
(Angiogenin$ encoded$ by$ angiogenin$ ribonuclease$ RNase$ A$ family$ 5)$ Proteins$
encoded$ by$ these$ genes$ that$ are$ present$ in$ milk$ are$ involved$ in$ protein$
synthesis$ and$ act$ as$ growth$ factors$ in$ vitro$Moreover$RNASE5$ seems$ to$ have$
other$ functions$ such$ as$ regulation$ of$ stress$ response$ and$ apoptosis$ and$ even$
angiogenesis$ through$ which$ it$ may$ also$ contribute$ to$ the$ regulate$ milk$
production$(Raven$et$al$2013)$
The$ exclusion$ of$DGAT1$ effect$ (HolsteinDGAT$ results)$ highlighted$ that$ all$ the$
significant$ effects$ on$ fat$ percentage$ found$ for$ genes$ located$ within$
approximately$25$Mb$up$or$downstream$from$the$excluded$SNP$(ARS5BFGL5NGS5
4939)$were$ redundant$ signals$ of$DGAT1$ (Table3$ and$Table$ S2$ and$ S3)$ This$ is$
also$observed$for$milk$yield$due$to$the$genetic$correlations$between$fat$and$milk$$
Also$ PLEC$ was$ not$ significant$ in$ our$ analyses$ when$ taking$ into$ account$ the$
DGAT1$effect$contradicting$previous$findings$in$Holstein$that$report$association$
of$PLEC$gene$with$production$traits$either$accounting$or$not$for$the$effect$of$the$
diglyceride$ acyltransferase$ (Raven$ et$ al$ 2013)$ A$ possible$ explanation$ for$ the$
different$results$between$these$studies$is$a$difference$between$genetic$structures$
of$the$populations$analyzed$To$reduce$false$positive$signals$we$only$evaluated$
the$genes$based$on$HolsteinDGAT$results$for$the$Italian$Holstein$$
11 Milkyield
We$found$genes$associated$with$this$trait$in$Holstein$in$particular$SPP1$P4HA3=
FOXF2$and$TOM1L1$genes$
Among$ these$ the$ osteopontin$ SPP1$ (secreted$ phosphoprotein$ 1)$ is$ directly$
related$to$milk$production$(de$Mello$et$al$2012)$ It$ is$ located$ in$BTA6$nearby$
60
$
ABCG2$ and$ there$ is$ evidence$ that$ polymorphisms$ in$ this$ region$ are$ strongly$
correlated$with$high$production$ levels$of$milk$ in$various$breeds$(CohenBZinder$
et$ al$ 2005)$Osteopontin$SPP1$ is$ a$ secreted$glycophosphoprotein$ that$plays$ a$
crucial$ role$ in$ mammary$ gland$ development$ having$ an$ essential$ role$ also$ in$
mammary$gland$differentiation$and$branching$of$the$mammary$epithelial$ductal$
system$ This$ gene$has$ also$ been$ implicated$ in$many$ types$ of$ cancer$ including$
breast$cancer$ for$ the$potentiality$of$proliferation$and$alveologenesis$ (Hubbard$
et$ al$ 2013)$ It$ is$worth$mentioning$ that$ the$most$ significant$ SNPs$within$ this$
gene$was$Hapmap332885BTC5033751=with=a$pBvalue$of$00012$ largely$over$ the$
genomeBwide$significance$ threshold$This$signal$would$be$ ignored$ in$ the$single$
SNP$approach$
Another$gene$revealed$by$the$geneBbased$association$is$P4HA3$encoding$for$the$
prolyl$4Bhydroxylase$α$polypeptide$III$involved$in$the$correct$folding$of$nascent$
procollagen$ chains$ It$ is$ known$ that$ stromalBepithelial$ interactions$ are$ critical$
for$mammary$gland$development$and$defective$deposition$or$reorganization$of$
extracellular$ matrix$ can$ lead$ to$ incorrect$ mammary$ epithelial$ morphogenesis$
(Gillette$ et$ al$ 2013)$ highlighting$ the$ role$ of$ this$ gene$ in$ the$ healthy$ breast$
tissue=
FOXF2=gene$ (Forkhead$Box$F2)$ is$ a$member$of$ the$ forkhead$box$ transcription$
factors$ known$ to$ be$ involved$ in$ tissue$ development$ as$ part$ of$ the$ ldquoHedgehog$
signaling$pathwayrdquo$This$key$cascade$is$essential$for$the$correct$segmentation$of$
tissues$during$embryonic$development$but$has$been$recently$demonstrated$that$
is$active$in$adult$stem$cells$proliferation$in$mammary$tissue$(Liu$et$al$2006)$In$
particular$ FOXF2$ plays$ a$ key$ role$ in$ extracellular$ matrix$ synthesis$ and$
epithelialBmesenchymal$ interactions$ and$ could$ therefore$ affect$ the$ proper$
development$of$the$mammary$stroma$In$addition$low$levels$of$transcription$of$
this$gene$are$correlated$ to$early$onset$metastasis$ in$breast$cancer$ (Kong$et$al$
2013)$ underlining$ a$ specific$ role$ in$ the$ mammary$ gland$
Lastly$ TOM1L1$ (Target$ Of$ Myb1$ ChickenBLike$ 1)$ is$ another$ gene$ potentially$
involved$ with$ the$ milk$ physiology$ as$ is$ known$ its$ implication$ with$ the$ early$
endosome$ sorting$ coupled$ with$ EGFR= (epidermal$ growth$ factor$ receptor)$
another$ important$player$ in$ lactation$(Fowler$et$al$1995)$TOM1L1$belongs$to$
61
$
the$ VHS$ domain$ containing$ proteins$ that$ are$ involved$ in$ the$ postBGolgi$
trafficking$and$signaling$(Wang$et$al$2010)$
12 Fatpercentage
In$ Italian$ Holstein$ breed$ we$ found$ a$ significant$ association$ with$ or$ without$
DGAT1$ effect$ correction$ to$ the$EPS8$ gene$ located$on$BTA5$at$around$95$Mbp$
This$gene$encodes$for$a$substrate$of$the$already$cited$EGFR$and$is$involved$in$the$
fatty$acid$metabolism$(Fazioli$et$al$1993)$A$recent$study$on$Holstein$Friesian$
(Wang$et$al$2012)$found$strong$association$with$a$variant$located$in$the$second$
intron$of$the$epidermal$growth$factor$receptor$pathway$substrate$enforcing$the$
hypothesis$of$a$true$involvement$of$this$gene$on$fat$percentage$
13 Proteinpercentage
The$BCL2L12$gene$(BclB2Blike$protein$12)$associated$with$protein$percentage$in$
Italian$Holstein$breed$represents$another$interesting$result$The$product$of$this$
gene= is$ an$ important$ oncoprotein$ in$ two$ fundamental$ cell$ cycle$ pathways$
namely$ p53$ and$ cytoplasmic$ caspase$ signaling$ BCL2L12= resides$ in$ both$
cytoplasm$and$nucleus$inhibiting$caspases$3$and$7$and$forming$a$complex$with$
p53$thereby$ inhibiting$p53Bdirected$transcriptomic$changes$upon$DNA$damage$
(Stegh$ and$DePinho$ 2011)$ The$members$ of$ BCL2$ family$ are$ considered$ antiB
apoptotic$genes$and$it$is$obvious$that$dysfunctional$programmed$cell$death$plays$
an$ important$ role$ in$ the$ pathogenesis$ and$ in$ the$ progression$ of$ cancer$Many$
members$of$this$family$were$found$to$be$modulated$in$various$malignancies$and$
BCL2L12= in$ particular$ was$ expressed$ in$ mammary$ gland$ and$ proposed$ as$
prognostic$ cancer$ biomarker$ (Thomadaki$ et$ al$ 2007)$ In$ dairy$ cows$ the$
members$of$BCL2$family$are$involved$in$the$key$events$ in$the$mammary$tissue$
development$ and$ remodeling$ during$ lactation$ and$ involution$ (Stefanon$ et$ al$
2002$p$200)$Members$of$BCL2$family$regulate$the$turnover$of$cell$population$
in$the$mammary$gland$during$lactation$and$its$overexpression$was$found$to$be$
linearly$related$to$milk$production$in$Holstein$cow$(Colitti$et$al$2004)$
$
62
$
2 ItalianSimmental
In$ the$ Italian$ Simmental$ breed$ we$ found$ one$ association$ on$ BTA4$ for$ fat$
percentage$(VIPR2)$and$two$associations$ for$milk$yield$one$ in$BTA7$(ZBTB7A)$
and$ the$ other$ in$ BTA14$ related$ to$ the$ DGAT1$ No$ association$ was$ found$ for$
protein$percentage$
21 Fatpercentage
Among$the$genes$found$significantly$associated$with$production$traits$in$Italian$
Simmental$VIPR2=a$ gene$ encoding$ a$ receptor$ for$ a$ small$ vasoactive$ intestinal$
neuropeptide$ was$ the$ single$ signal$ in$ BTA4$ for$ fat$ percentage$ Vasoactive$
intestinal$ peptide$ (VIP)$ is$ synthesized$ in$ the$pituitary$ gland$ and$ among$other$
functions$ stimulates$ prolactin$ secretion$ and$ is$ involved$ in$ proliferation$
survival$and$differentiation$in$human$breast$cancer$cells$(Valdehita$et$al$2010)$$
An$interesting$novel$finding$is$the$effect$of$immunosuppression$(through$direct$
effect$of$IL56)$on$prolactin$cells$and$its$ability$to$regulate$prolactin$by$acting$on$
pituitary$ VIP$ through$ VIP$ receptors$ (Blanco$ et$ al$ 2013)$ These$ evidences$
suggest$ that$VIPR2$ could$ be$ an$ important$ candidate$ for$ further$ investigations$
through$post$GWAS$approaches$such$as$reBsequencing$or$fine$mapping$
22 Milkyield
Surprisingly$the$DGAT1$gene$was$significantly$associated$with$milk$yield$but$not$
with$fat$percentage$in$Italian$Simmental$breed$Indeed$the$most$significant$SNP$
for$the$milk$yield$trait$in$this$breed$was$the$ARS5BFGL5NGS54939$with$a$pBvalue$
of$ 413EB06$ The$ importance$ of$ this$ gene$ in$ the$ lactation$ is$ largely$ known$
(Grisart$et$al$2002)$$
To$ investigate$DGAT1$ signal$ in$ Simmental$ we$ analyzed$ the$ fine$ LD$ structure$
genomeBwide$ and$ in$ BTA14$ proximal$ region$ in$ all$ the$ three$ breeds$ revealing$
different$patterns$
Observed$LD$decay$on$ the$ three$breeds$genomes$revealed$similar$behavior$ for$
Italian$Holstein$and$Italian$Brown$with$r2$gt$010$$until$ around$ 250$ Kbp$ while$
Italian$Simmental$showed$lower$LD$Distances$higher$that$400$Kbp$revealed$half$
level$of$LD$in$Simmental$(Figure$S6)$
63
$
In$DGAT1$region$LD$behavior$is$very$similar$between$Italian$Holstein$and$Italian$
Brown$ (Figure$ S8)$ while$ association$ values$ are$ really$ different$ because$ ARS5
BFGL5NGS54939=is$fixed$in$Italian$Brown$and$purged$by$QC$process$Interestingly$
association$ patterns$ become$ similar$ when$ DGAT1$ effect$ is$ removed$ from$ the$
Holstein$ panel$ (HolsteinDGAT$ Figure$ S8)$ In$ Simmental$ we$ found$ a$ different$
situation$ very$ low$ LD$ level$ and$ significant$ association$ levels$ with$ ARS5BFGL5
NGS54939$with$ the$milk$ trait$This$ is$probably$due$ to$ the$ low$ frequency$of$ the$
p232K$ allele$ in$ Italian$ Simmental$ (Scotti$ et$ al$ 2010)$ and$ to$ the$ different$
selection$strategies$applied$in$these$breeds$
The$ relatively$ high$ pBvalue$ of$ this$ SNP$ is$ responsible$ for$ the$ statistical$
significance$of$many$of$the$13$genes$found$significant$in$this$breed$for$this$trait$
as$ shown$ in$ Table$ 3$ therefore$ reducing$ the$ gene$ set$ to$ three$ locations$ two$
overlapping$features$indicating$a$pseudogene$in$BTA12$and$a$zinc$finger$protein$
ZBTB7A$in$BTA7$DGAT1$role$and$function$will$be$not$discussed$here$as$already$
done$elsewhere$(Grisart$et$al$2002)$
ZBTB7A=is$part$of$zinc$finger$and$BTB$proteins$(Broad$complex$Tramtrack$and$
Bricabrac)$ transcription$ factors$ with$ emerging$ importance$ for$ their$
involvement$ in$ oncogenesis$ and$ cell$ differentiation$by$ repressing$key$ genes$of$
cell$cycle$regulation$as$retinoblastoma$(Rb)$and=p53=(ldquoJeon$et$al$B$2008$B$ProtoB
oncogene$ FBIB1$ (PokemonZBTB7A)$ Represses$ Trpdfrdquo$ nd)= ZBTB7A$ over$
expression$ is$ shown$ in$primary$ and$ recurrent$ breast$ cancer$ tissues$ leading$ to$
disease$progression$and$poor$survival$(Zu$et$al$2011)$Link$between$this$gene$
polymorphism$and$milk$production$might$be$in$the$proper$cellular$development$
of$ the$ mammary$ gland$ This$ transcription$ factor$ is$ also$ known$ as$ a$ master$
regulator$ of$ B$ and$ TBcell$ lineage$ (Lee$ and$ Maeda$ 2012)$ enforcing$ the$ link$
between$ milk$ production$ and$ immune$ protection$ as$ already$ outlined$ by$ of$
Lemay$ and$ colleagues$ with$ proteomic$ studies$ where$ the$ ldquoactivation$ of$ the$
innate$ immune$systemrdquo$resulted$as$one$of$ the$most$represented$gene$ontology$
categories$(Lemay$et$al$2009)$
$
64
$
3 ItalianBrown
In$ the$ Italian$Brown$breed$we$ found$ one$ association$ on$BTA23$ (TXNDC5)$ for$
milk$yield$and$one$in$BTA18$for$protein$percentage$(PINLYP)$The$three$features$
listed$in$Table$3$for$milk$yield$in$Italian$Brown$were$collapsed$due$to$overlap$$
31 Milkyield
Thioredoxin$ domainBcontaining$ protein$ 5$ (TXNDC5)$ belongs$ to$ thioredoxin$
family$small$ubiquitous$proteins$containing$disulphide$domain$affecting$protein$
folding$with$chaperone$activity$preventing$protein$misBfolding$within$the$lumen$
of$ the$endoplasmic$reticulum$This$multifunctional$organelle$have$a$key$role$ in$
several$ processes$ including$ lipid$ catabolism$ and$ protein$ synthesis$ (RamiacuterezB
Torres$ et$ al$ 2012)$ Because$ of$ its$ disulphide$ bonds$ thioredoxin$ facilitate$ the$
reduction$of$other$proteins$by$cysteine$thiol$disulfide$exchange$This$mechanism$
is$essential$for$detossification$
During$ lactation$ indeed$ an$ enhanced$ detoxification$ level$ is$ functional$ to$
preserve$ milk$ quality$ against$ increased$ metabolic$ activities$ Higher$ metabolic$
levels$ lead$ to$ free$ radicals$ production$ making$ necessary$ the$ maintenance$ of$
redox$ homeostasis$ that$ include$ among$ others$ the$ reduction$ and$ oxidation$ of$
peroxiredoxin$and$thioredoxin$(Lu$and$Holmgren$2014)$
32 Proteinpercentage
For$ protein$ percentage$ in$ Italian$ Brown$ there$ is$ only$ one$ associated$ gene$
PINLYP=is$encoding$a$phospholipase$A2$inhibitor$but$this$protein$product$is$not$
yet$well$studied$ $Many$types$of$phospholipase$A2$were$characterized$and$ it$ is$
known$ that$ they$ are$ mediator$ of$ inflammation$ atherosclerosis$ and$ cancer$ in$
mammals$and$that$plays$an$important$role$in$the$innate$immune$response$(Qiu$
and$Lai$2013)$It$may$be$reasonable$to$imagine$a$role$for$PINLYP$in$the$lactation$
in$light$of$previously$discussed$results$that$link$milk$to$innate$immune$response$
(Lemay$et$al$2009)$
$
Although$ limited$ by$ the$ size$ of$ the$ study$ and$ the$ number$ of$ the$ included$
markers$these$results$represent$both$a$confirmation$and$a$step$forward$towards$
the$comprehension$of$the$biological$bases$of$milk$yield$and$quality$$
65
$
New$ genes$ potentially$ involved$ with$ production$ traits$ in$ dairy$ cows$ were$
tracked$when$GWAS$signals$resulted$too$weak$to$be$considered$in$a$classical$SNP$
based$mapping$with$the$ldquoclosest$gene$to$peakrdquo$or$the$windowBbased$strategy$
It$ is$ worth$ mentioning$ that$ none$ of$ the$ analyzed$ SNPs$ (not$ considering$ the$
Italian$Holstein$with$ARS5BFGL5NGS54939= included$ in$ the$ analysis)$were$ above$
the$ genomeBwide$ significance$ threshold$ in$ single$ SNP$ analysis$ meaning$ that$
important$information$is$often$lost$in$such$studies$$
In$addition$ to$ the$newly$ identified$ loci$we$observed$a$confirmation$of$existing$
associations$ with$ genes$ related$ to$ milk$ production$ not$ only$ dissecting$ gene$
specific$ functions$ but$ from$ mapping$ these$ genes$ in$ previously$ characterized$
regions$(Table$S5)$This$ is$not$trivial$when$using$new$data$analysis$techniques$
especially$ in$ high$ noise$ to$ signal$ ratio$ datasets$ like$ those$ here$ presented$We$
believe$ that$ this$ approach$will$ benefit$ of$data$ from$denser$marker$panels$ (eg$
BovineHD)$ to$ exploit$ local$ linkage$ disequilibrium$ and$ increase$ the$ number$ of$
mapped$genes$with$more$than$two$SNPs$
Results$ obtained$ with$ the$ gene$ centric$ approach$ although$ having$ intrinsic$
limitations$are$not$ influenced$by$a$number$of$subjective$choices$often$made$in$
GWAS$ For$ instance$when$ ldquowindow$mappingrdquo$ is$ used$ to$decrease$ the$noise$ to$
signal$ ratio$ eg$ due$ to$ nearby$ markers$ having$ different$ allele$ frequencies$
windows$size$ is$ to$be$decided$This$can$be$based$on$equal$number$of$markers$
equal$number$of$base$pairs$or$equal$map$distance$in$cM$per$window$The$three$
choices$ do$ not$ coincide$ because$ markers$ are$ not$ equally$ spread$ along$ the$
chromosomes$ and$ recombination$ differs$ in$ different$ genomic$ regions$ In$
addition$ windows$ can$ be$ chosen$ nonBoverlapping$ or$ with$ different$ degree$ of$
overlap$Finally$when$stepping$ from$significant$SNPs$or$windows$ to$candidate$
genes$the$size$of$the$flanking$region$from$which$picking$up$candidate$genes$is$to$
be$decided$as$well$Sometimes$the$gene$closer$to$the$signal$peak$is$selected$ in$
other$cases$all$genes$within$certain$boundaries$are$included$in$further$analyses$
All$ these$ choices$ may$ generate$ misleading$ results$ For$ example$ not$ truly$
associated$genes$may$be$retained$for$further$analyses$disregarding$the$correct$
ones$ In$addition$ a$ large$number$of$ false$positives$can$be$ included$ to$be$ later$
interpreted$with$ the$help$of$network$and$pathway$analysis$As$ a$ consequence$
these$decisions$have$a$direct$impact$in$terms$of$production$and$interpretation$of$
66
$
results$leading$to$conclusions$heavily$based$on$the$enrichment$analysis$and$preB
existing$knowledge$$
With$ the$ gene$ based$ association$ we$ also$ noticed$ some$ advantages$ in$ the$
downstream$data$mining$First$genes$are$functional$units$and$the$golden$targets$
of$ the$ ldquoclassical$ postBGWASrdquo$ bioinformatics$ pipelines$ Hence$ the$ gene$ based$
association$leads$to$genes$with$an$embedded$statistically$robust$framework$This$
is$ a$ ldquoplusrdquo$ in$ studying$ complex$ traits$ because$ one$ can$ concentrate$ on$ specific$
signals$ rather$ than$ cherry$ picking$meaningful$ features$ from$ a$ list$ constructed$
with$the$previously$cited$approaches$$
Moreover$it$is$easier$to$track$back$gene$clusters$to$major$gene$effects$B$like$for$
example$DGAT1$in$a$gene$rich$genome$chunk$many$genes$may$result$significant$
due$to$the$same$SNP$(Table$S2$and$Table$3)$With$the$VEGAS$method$finding$the$
ldquotruerdquo$association$is$facilitated$by$following$the$ldquobest$SNPrdquo$in$the$region$
In$addition$we$found$phenotype$coherent$signals$in$such$situation$where$only$a$
few$SNPs$B$ if$none$ndash$are$above$an$acceptable$statistical$genomeBwide$threshold$
using$a$single$SNP$regression$method$
The$gene$centric$approach$ is$a$ first$and$slight$ improvement$ in$ livestock$GWAS$
analyses$since$we$still$have$a$partial$genome$annotation$ In$addition$ it$ is$now$
known$that$ the$definition$of$gene$ is$ to$be$revised$many$genes$do$not$code$ for$
proteins$ have$ alternative$ starting$ sites$ have$ antisense$ transcripts$ host$
regulatory$ RNA$ are$ subjected$ to$ alternative$ splicing$ and$may$ be$ regulated$ by$
sequences$quite$ far$ away$ from$ their$ coding$ regions$ (Consortium$2012)$Other$
limitations$are$the$need$for$larger$number$of$animals$in$these$type$of$studies$as$
in$this$study$and$the$still$relatively$high$cost$of$sequencing$that$cannot$be$used$
routinely$in$large$experiments$at$least$not$yet$$
$
$
Conclusions
This$study$underlines$the$importance$of$the$implementation$of$new$methods$to$
analyze$complex$traits$with$genomic$data$No$golden$path(s)$is$discovered$yet$to$
walk$from$variation$to$functional$units$The$geneBcentric$analysis$may$contribute$
67
$
to$ reveal$ new$ signals$ throughout$ the$ bovine$ genome$ condensing$ weak$
association$signals$in$chromosome$bits$having$functional$significance$
$
$
Acknowledgements
The$Authors$would$ like$ to$ thank$Dr$ Jimmy$ Liu$ for$ his$ advices$ in$ building$ the$
software$AIA$ANAFI$ANARB$and$ANAPRI$breedersrsquo$associations$ for$ their$help$
and$ support$during$ the$data$analysis$ SelMol$ and$ProZoo$projects$ that$provide$
samples$ The$ Authors$ are$ also$ grateful$ to$MIPAAF$ and$ Cariplo$ Foundation$ for$
funding$
$
$
$
References
Akula$ N$ Baranova$ A$ Seto$ D$ Solka$ J$ Nalls$MA$ Singleton$ A$ Ferrucci$ L$
Tanaka$T$Bandinelli$ S$Cho$YS$Kim$YJ$Lee$ JBY$Han$BBG$Bipolar$
Disorder$ Genome$ Study$ (BiGS)$ Consortium$ The$ Wellcome$ Trust$ CaseB
Control$Consortium$McMahon$FJ$2011$A$NetworkBBased$Approach$to$
Prioritize$ Results$ from$ GenomeBWide$ Association$ Studies$ PLoS$ ONE$ 6$
e24220$doi101371journalpone0024220$
Amin$N$ van$Duijn$ CM$ Aulchenko$ YS$ 2007$ A$Genomic$Background$Based$
Method$ for$ Association$ Analysis$ in$ Related$ Individuals$ PLoS$ ONE$ 2$
e1274$doi101371journalpone0001274$
Aulchenko$YS$de$Koning$DBJ$Haley$C$2007$Genomewide$Rapid$Association$
Using$ Mixed$ Model$ and$ Regression$ A$ Fast$ and$ Simple$ Method$ For$
Genomewide$PedigreeBBased$Quantitative$Trait$Loci$Association$Analysis$
Genetics$177$577ndash585$doi101534genetics107075614$
Blanco$ EJ$ CarreteroBHernaacutendez$ M$ GarciacuteaBBarrado$ J$ IglesiasBOsma$ MC$
Carretero$M$Herrero$JJ$Rubio$M$Riesco$JM$Carretero$J$2013$The$
activity$and$proliferation$of$pituitary$prolactinBpositive$cells$and$pituitary$
68
$
VIPBpositive$ cells$ are$ regulated$by$ interleukin$6$Histol$Histopathol$ 28$
1595ndash1604$
Blott$S$Kim$JBJ$Moisio$S$SchmidtBKuntzel$A$Cornet$A$Berzi$P$Cambisano$
N$Ford$C$Grisart$B$Johnson$D$Karim$L$Simon$P$Snell$R$Spelman$
R$ Wong$ J$ Vilkki$ J$ Georges$ M$ Farnir$ F$ Coppieters$ W$ 2003$
Molecular$ dissection$ of$ a$ quantitative$ trait$ locus$ a$ phenylalanineBtoB
tyrosine$substitution$in$the$transmembrane$domain$of$the$bovine$growth$
hormone$ receptor$ is$ associated$ with$ a$ major$ effect$ on$ milk$ yield$ and$
composition$Genetics$163$253ndash266$
Burton$PR$Clayton$DG$Cardon$LR$Craddock$N$Deloukas$P$Duncanson$A$
Kwiatkowski$ DP$ McCarthy$ MI$ Ouwehand$ WH$ Samani$ NJ$ Todd$
JA$ Donnelly$ P$ Barrett$ JC$ Burton$ PR$ Davison$ D$ Donnelly$ P$
Easton$ D$ Evans$ D$ Leung$ HBT$ Marchini$ JL$ Morris$ AP$ Spencer$
CCA$Tobin$MD$Cardon$LR$Clayton$DG$Attwood$AP$Boorman$JP$
Cant$B$Everson$U$Hussey$JM$Jolley$JD$Knight$AS$Koch$K$Meech$
E$ Nutland$ S$ Prowse$ CV$ Stevens$ HE$ Taylor$ NC$ Walters$ GR$
Walker$NM$Watkins$NA$Winzer$T$Todd$JA$Ouwehand$WH$Jones$
RW$ McArdle$ WL$ Ring$ SM$ Strachan$ DP$ Pembrey$ M$ Breen$ G$
Clair$ DS$ Caesar$ S$ GordonBSmith$ K$ Jones$ L$ Fraser$ C$ Green$ EK$
Grozeva$D$Hamshere$ML$Holmans$PA$Jones$IR$Kirov$G$Moskvina$
V$ Nikolov$ I$ OrsquoDonovan$ MC$ Owen$ MJ$ Craddock$ N$ Collier$ DA$
Elkin$A$Farmer$A$Williamson$R$McGuffin$P$Young$AH$Ferrier$IN$
Ball$SG$Balmforth$AJ$Barrett$JH$Bishop$DT$Iles$MM$Maqbool$A$
Yuldasheva$N$Hall$AS$Braund$PS$Burton$PR$Dixon$RJ$Mangino$
M$ Stevens$ S$ Tobin$ MD$ Thompson$ JR$ Samani$ NJ$ Bredin$ F$
Tremelling$ M$ Parkes$ M$ Drummond$ H$ Lees$ CW$ Nimmo$ ER$
Satsangi$ J$Fisher$SA$Forbes$A$Lewis$CM$Onnie$CM$Prescott$NJ$
Sanderson$J$Mathew$CG$Barbour$J$Mohiuddin$MK$Todhunter$CE$
Mansfield$JC$Ahmad$T$Cummings$FR$Jewell$DP$Webster$J$Brown$
MJ$ Clayton$DG$ Lathrop$GM$ Connell$ J$Dominiczak$A$ Samani$NJ$
Marcano$CAB$Burke$B$Dobson$R$Gungadoo$J$Lee$KL$Munroe$PB$
Newhouse$SJ$Onipinla$A$Wallace$C$Xue$M$Caulfield$M$Farrall$M$
Barton$A$ (braggs)$TB$ in$RG$and$G$Bruce$ IN$Donovan$H$Eyre$S$
69
$
Gilbert$ PD$ Hider$ SL$ Hinks$ AM$ John$ SL$ Potter$ C$ Silman$ AJ$Symmons$ DPM$ Thomson$ W$ Worthington$ J$ Clayton$ DG$ Dunger$DB$ Nutland$ S$ Stevens$ HE$ Walker$ NM$ Widmer$ B$ Todd$ JA$Frayling$TM$Freathy$RM$Lango$H$Perry$JRB$Shields$BM$Weedon$MN$ Hattersley$ AT$ Hitman$ GA$ Walker$ M$ Elliott$ KS$ Groves$ CJ$Lindgren$ CM$ Rayner$ NW$ Timpson$ NJ$ Zeggini$ E$ McCarthy$ MI$Newport$M$Sirugo$G$Lyons$E$Vannberg$F$Hill$AVS$Bradbury$LA$Farrar$ C$ Pointon$ JJ$ Wordsworth$ P$ Brown$ MA$ Franklyn$ JA$Heward$JM$Simmonds$MJ$Gough$SCL$Seal$S$(uk)$BCSC$Stratton$MR$Rahman$N$Ban$M$Goris$A$Sawcer$SJ$Compston$A$Conway$D$Jallow$ M$ Newport$ M$ Sirugo$ G$ Rockett$ KA$ Kwiatkowski$ DP$Bumpstead$ SJ$ Chaney$A$Downes$K$ Ghori$MJR$ Gwilliam$R$Hunt$SE$Inouye$M$Keniry$A$King$E$McGinnis$R$Potter$S$Ravindrarajah$R$ Whittaker$ P$ Widden$ C$ Withers$ D$ Deloukas$ P$ Leung$ HBT$Nutland$S$Stevens$HE$Walker$NM$Todd$JA$Easton$D$Clayton$DG$Burton$PR$Tobin$MD$Barrett$JC$Evans$D$Morris$AP$Cardon$LR$Cardin$NJ$Davison$D$Ferreira$T$PereiraBGale$ J$Hallgrimsdoacutettir$ IB$Howie$ BN$Marchini$ JL$ Spencer$ CCA$ Su$ Z$ Teo$ YY$ Vukcevic$ D$Donnelly$P$Bentley$D$Brown$MA$Cardon$LR$Caulfield$M$Clayton$DG$ Compston$ A$ Craddock$ N$ Deloukas$ P$ Donnelly$ P$ Farrall$ M$Gough$ SCL$ Hall$ AS$ Hattersley$ AT$ Hill$ AVS$ Kwiatkowski$ DP$Mathew$ CG$McCarthy$MI$ Ouwehand$WH$ Parkes$M$ Pembrey$M$Rahman$N$Samani$NJ$Stratton$MR$Todd$ JA$Worthington$ J$2007$GenomeBwide$ association$ study$ of$ 14000$ cases$ of$ seven$ common$diseases$ and$ 3000$ shared$ controls$ Nature$ 447$ 661ndash678$doi101038nature05911$
Cantor$ RM$ Lange$ K$ Sinsheimer$ JS$ 2010$ Prioritizing$ GWAS$ Results$ A$Review$ of$ Statistical$ Methods$ and$ Recommendations$ for$ Their$Application$Am$J$Hum$Genet$86$6ndash22$doi101016jajhg200911017$
Clark$ AG$ Hubisz$ MJ$ Bustamante$ CD$ Williamson$ SH$ Nielsen$ R$ 2005$Ascertainment$ bias$ in$ studies$ of$ human$ genomeBwide$ polymorphism$Genome$Res$15$1496ndash1502$doi101101gr4107905$
70
$
CohenBZinder$M$ Seroussi$ E$ Larkin$ DM$ Loor$ JJ$Wind$ AE$ der$ Lee$ JBH$Drackley$ JK$Band$MR$Hernandez$AG$Shani$M$Lewin$HA$Weller$JI$ Ron$ M$ 2005$ Identification$ of$ a$ missense$ mutation$ in$ the$ bovine$ABCG2$gene$with$a$major$effect$on$ the$QTL$on$ chromosome$6$affecting$milk$yield$and$composition$in$Holstein$cattle$Genome$Res$15$936ndash944$doi101101gr3806705$
Cole$JB$Wiggans$GR$Ma$L$Sonstegard$TS$Lawlor$TJ$Crooker$BA$Tassell$CPV$ Yang$ J$ Wang$ S$ Matukumalli$ LK$ Da$ Y$ 2011$ GenomeBwide$association$ analysis$ of$ thirty$ one$ production$ health$ reproduction$ and$body$ conformation$ traits$ in$ contemporary$ US$ Holstein$ cows$ BMC$Genomics$12$408$doi1011861471B2164B12B408$
Colitti$M$Wilde$CJ$Stefanon$B$2004$Functional$expression$of$bclB2$protein$family$and$AIF$in$bovine$mammary$tissue$in$early$lactation$J$Dairy$Res$71$20ndash27$
Consortium$ TEP$ 2012$ An$ integrated$ encyclopedia$ of$ DNA$ elements$ in$ the$human$genome$Nature$489$57ndash74$doi101038nature11247$
De$Mello$F$Cobuci$JA$Martins$MF$Silva$MVGB$Neto$JB$2012$Association$of$the$polymorphism$g8514CgtT$in$the$osteopontin$gene$(SPP1)$with$milk$yield$ in$ the$ dairy$ cattle$ breed$ Girolando$ Anim$ Genet$ 43$ 647ndash648$doi101111j1365B2052201102312x$
Dudbridge$ F$ Gusnanto$ A$ 2008$ Estimation$ of$ significance$ thresholds$ for$genomewide$ association$ scans$ Genet$ Epidemiol$ 32$ 227ndash234$doi101002gepi20297$
Elsik$ CG$ Tellam$ RL$ Worley$ KC$ 2009$ The$ Genome$ Sequence$ of$ Taurine$Cattle$ A$window$ to$ ruminant$ biology$ and$ evolution$ Science$ 324$ 522ndash528$doi101126science1169588$
Fazioli$ F$ Minichiello$ L$ Matoska$ V$ Castagnino$ P$ Miki$ T$ Wong$ WT$ Di$Fiore$ PP$ 1993$ Eps8$ a$ substrate$ for$ the$ epidermal$ growth$ factor$receptor$kinase$enhances$EGFBdependent$mitogenic$signals$EMBO$J$12$3799ndash3808$
Fontanesi$ L$ Calograve$ DG$ Galimberti$ G$ Negrini$ R$ Marino$ R$ Nardone$ A$AjmoneBMarsan$ P$ Russo$ V$ 2014$ A$ candidate$ gene$ association$ study$
71
$
for$ nine$ economically$ important$ traits$ in$ Italian$ Holstein$ cattle$ Anim$
Genet$nandashna$doi101111age12164$
Fowler$KJ$Walker$F$Alexander$W$Hibbs$ML$Nice$EC$Bohmer$RM$Mann$
GB$ Thumwood$ C$ Maglitto$ R$ Danks$ JA$ 1995$ A$ mutation$ in$ the$
epidermal$growth$factor$receptor$in$wavedB2$mice$has$a$profound$effect$
on$ receptor$ biochemistry$ that$ results$ in$ impaired$ lactation$ Proc$ Natl$
Acad$Sci$92$1465ndash1469$doi101073pnas9251465$
Gianola$ D$ Hospital$ F$ Verrier$ E$ 2013$ Contribution$ of$ an$ additive$ locus$ to$
genetic$variance$when$inheritance$is$multiBfactorial$with$implications$on$
interpretation$ of$ GWAS$ Theor$ Appl$ Genet$ 126$ 1457ndash1472$
doi101007s00122B013B2064B2$
Gillette$ M$ Bray$ K$ Blumenthaler$ A$ VargoBGogola$ T$ 2013$ P190B$ RhoGAP$
Overexpression$ in$ the$ Developing$Mammary$ Epithelium$ Induces$ TGFβB
dependent$ Fibroblast$ Activation$ PLoS$ ONE$ 8$ e65105$
doi101371journalpone0065105$
Grisart$B$Coppieters$W$Farnir$F$Karim$L$Ford$C$Berzi$P$Cambisano$N$
Mni$ M$ Reid$ S$ Simon$ P$ Spelman$ R$ Georges$ M$ Snell$ R$ 2002$
Positional$Candidate$Cloning$of$a$QTL$ in$Dairy$Cattle$ Identification$of$a$
Missense$Mutation$ in$the$Bovine$DGAT1$Gene$with$Major$Effect$on$Milk$
Yield$ and$ Composition$ Genome$ Res$ 12$ 222ndash231$
doi101101gr224202$
Hubbard$NE$Chen$QJ$Sickafoose$LK$Wood$MB$Gregg$ JP$Abrahamsson$
NM$ Engelberg$ JA$ Walls$ JE$ Borowsky$ AD$ 2013$ Transgenic$
Mammary$Epithelial$Osteopontin$(Spp1)$Expression$Induces$Proliferation$
and$ Alveologenesis$ Genes$ Cancer$ 4$ 201ndash212$
doi1011771947601913496813$
Hu$ ZBL$ Park$ CA$ Wu$ XBL$ Reecy$ JM$ 2013$ Animal$ QTLdb$ an$ improved$
database$tool$for$livestock$animal$QTLassociation$data$dissemination$in$
the$ postBgenome$ era$ Nucleic$ Acids$ Res$ 41$ D871ndashD879$
doi101093nargks1150$
Jeon$et$al$ B$2008$ B$ProtoBoncogene$FBIB1$ (PokemonZBTB7A)$Represses$Trpdf$
nd$
72
$
Jiang$ L$ Liu$ J$ Sun$ D$ Ma$ P$ Ding$ X$ Yu$ Y$ Zhang$ Q$ 2010$ Genome$Wide$
Association$ Studies$ for$ Milk$ Production$ Traits$ in$ Chinese$ Holstein$
Population$PLoS$ONE$5$e13661$doi101371journalpone0013661$
Kong$ PBZ$ Yang$ F$ Li$ L$ Li$ XBQ$ Feng$ YBM$ 2013$Decreased$FOXF2$mRNA$
Expression$ Indicates$ EarlyBOnset$ Metastasis$ and$ Poor$ Prognosis$ for$
Breast$ Cancer$ Patients$ with$ Histological$ Grade$ II$ Tumor$ PLoS$ ONE$ 8$
e61591$doi101371journalpone0061591$
Lee$SBU$Maeda$T$2012$POKZBTB$proteins$an$emerging$ family$of$proteins$
that$ regulate$ lymphoid$ development$ and$ function$ Immunol$ Rev$ 247$
107ndash119$
Lemay$ DG$ Lynn$ DJ$ Martin$ WF$ Neville$ MC$ Casey$ TM$ Rincon$ G$
Kriventseva$ EV$ Barris$ WC$ Hinrichs$ AS$ Molenaar$ AJ$ 2009$ The$
bovine$lactation$genome$ insights$ into$the$evolution$of$mammalian$milk$
Genome$Biol$10$R43$
Liu$JZ$Mcrae$AF$Nyholt$DR$Medland$SE$Wray$NR$Brown$KM$2010$A$
versatile$ geneBbased$ test$ for$ genomeBwide$ association$ studies$ Am$ J$
Hum$Genet$87$139$
Liu$S$Dontu$G$Mantle$ID$Patel$S$Ahn$N$Jackson$KW$Suri$P$Wicha$MS$
2006$Hedgehog$Signaling$and$BmiB1$Regulate$SelfBrenewal$of$Normal$and$
Malignant$ Human$ Mammary$ Stem$ Cells$ Cancer$ Res$ 66$ 6063ndash6071$
doi1011580008B5472CANB06B0054$
Lu$ J$ Holmgren$ A$ 2014$ The$ thioredoxin$ superfamily$ in$ oxidative$ protein$
folding$ Antioxid$ Redox$ Signal$ 140202091634000$
doi101089ars20145849$
Luna$ A$ Nicodemus$ KK$ 2007$ snpplotter$ an$ RBbased$ SNPhaplotype$
association$and$linkage$disequilibrium$plotting$package$Bioinforma$Oxf$
Engl$23$774ndash776$doi101093bioinformaticsbtl657$
McCarthy$ MI$ Abecasis$ GR$ Cardon$ LR$ Goldstein$ DB$ Little$ J$ Ioannidis$
JPA$ Hirschhorn$ JN$ 2008$ GenomeBwide$ association$ studies$ for$
complex$traits$consensus$uncertainty$and$challenges$Nat$Rev$Genet$9$
356ndash369$doi101038nrg2344$
Meredith$ BK$ Kearney$ FJ$ Finlay$ EK$ Bradley$ DG$ Fahey$ AG$ Berry$ DP$
Lynn$ DJ$ 2012$ GenomeBwide$ associations$ for$ milk$ production$ and$
73
$
somatic$cell$score$in$HolsteinBFriesian$cattle$in$Ireland$BMC$Genet$13$21$doi1011861471B2156B13B21$
Minozzi$ G$Nicolazzi$ EL$ Stella$ A$ Biffani$ S$Negrini$ R$ Lazzari$ B$ AjmoneBMarsan$ P$Williams$ JL$ 2013$ Genome$Wide$ Analysis$ of$ Fertility$ and$Production$ Traits$ in$ Italian$ Holstein$ Cattle$ PLoS$ ONE$ 8$ e80219$doi101371journalpone0080219$
Nicolazzi$EL$Picciolini$M$Strozzi$F$Schnabel$RD$Lawley$C$Pirani$A$Brew$F$ Stella$ A$ 2014$ SNPchiMp$ a$ database$ to$ disentangle$ the$ SNPchip$jungle$ in$ bovine$ livestock$ BMC$ Genomics$ 15$ 123$ doi1011861471B2164B15B123$
Purcell$ S$ Neale$ B$ ToddBBrown$ K$ Thomas$ L$ Ferreira$ MAR$ Bender$ D$Maller$J$Sklar$P$de$Bakker$PIW$Daly$MJ$Sham$PC$2007$PLINK$a$tool$ set$ for$ wholeBgenome$ association$ and$ populationBbased$ linkage$analyses$Am$J$Hum$Genet$81$559ndash575$doi101086519795$
Qiu$ S$ Lai$ L$ 2013$ Antibacterial$ properties$ of$ recombinant$ human$ nonBpancreatic$secretory$phospholipase$A2$Biochem$Biophys$Res$Commun$441$453ndash456$doi101016jbbrc201310092$
Quinlan$AR$Hall$IM$2010$BEDTools$a$flexible$suite$of$utilities$for$comparing$genomic$ features$ Bioinformatics$ 26$ 841ndash842$doi101093bioinformaticsbtq033$
RamiacuterezBTorres$ A$ BarceloacuteBBatllori$ S$ MartiacutenezBBeamonte$ R$ Navarro$ MA$Surra$ JC$ Arnal$ C$ Guilleacuten$N$ Aciacuten$ S$ Osada$ J$ 2012$ Proteomics$ and$gene$ expression$ analyses$ of$ squaleneBsupplemented$ mice$ identify$microsomal$thioredoxin$domainBcontaining$protein$5$changes$associated$with$ hepatic$ steatosis$ J$ Proteomics$ 77$ 27ndash39$doi101016jjprot201207001$
Raven$ LBA$ Cocks$ BG$ Pryce$ JE$ Cottrell$ JJ$ Hayes$ BJ$ 2013$ Genes$ of$ the$RNASE5$pathway$ contain$ SNP$ associated$with$milk$ production$ traits$ in$dairy$cattle$Genet$Sel$Evol$45$25$doi1011861297B9686B45B25$
Scotti$E$Fontanesi$L$Schiavini$F$La$Mattina$V$Bagnato$A$Russo$V$2010$DGAT1$ pK232A$ polymorphism$ in$ dairy$ and$ dual$ purpose$ Italian$ cattle$breeds$Ital$J$Anim$Sci$9$doi104081ijas2010e16$
74
$
Stefanon$ B$ Colitti$ M$ Gabai$ G$ Knight$ CH$ Wilde$ CJ$ 2002$ Mammary$
apoptosis$and$lactation$persistency$in$dairy$animals$J$Dairy$Res$69$37ndash
52$
Stegh$ AH$ DePinho$ RA$ 2011$ Beyond$ effector$ caspase$ inhibition$ Bcl2L12$
neutralizes$ p53$ signaling$ in$ glioblastoma$ Cell$ Cycle$ 10$ 33ndash38$
doi104161cc10114365$
Taberlet$ P$ Valentini$ A$ Rezaei$ HR$ Naderi$ S$ Pompanon$ F$ Negrini$ R$
AjmoneBMarsan$ P$ 2008$ Are$ cattle$ sheep$ and$ goats$ endangered$
species$Mol$Ecol$17$275ndash284$doi101111j1365B294X200703475x$
Team$RC$ 2014$ R$ A$ Language$ and$Environment$ for$ Statistical$ Computing$ R$
Foundation$for$Statistical$Computing$Vienna$Austria$
Thomadaki$H$ Talieri$M$ Scorilas$ A$ 2007$ Prognostic$ value$ of$ the$ apoptosis$
related$genes$BCL2$and$BCL2L12$in$breast$cancer$Cancer$Lett$247$48ndash
55$doi101016jcanlet200603016$
Valdehita$ A$ Bajo$ AM$ FernaacutendezBMartiacutenez$ AB$ Arenas$ MI$ Vacas$ E$
Valenzuela$ P$ RuiacutezBVillaespesa$ A$ Prieto$ JC$ Carmena$ MJ$ 2010$
Nuclear$ localization$ of$ vasoactive$ intestinal$ peptide$ (VIP)$ receptors$ in$
human$ breast$ cancer$ Peptides$ 31$ 2035ndash2045$
doi101016jpeptides201007024$
Wang$T$Liu$NS$Seet$LBF$Hong$W$2010$The$Emerging$Role$of$VHS$DomainB
Containing$Tom1$Tom1L1$and$Tom1L2$in$Membrane$Trafficking$Traffic$
11$1119ndash1128$doi101111j1600B0854201001098x$
Wang$ X$Wurmser$ C$ Pausch$ H$ Jung$ S$ Reinhardt$ F$ Tetens$ J$ Thaller$ G$
Fries$R$2012$Identification$and$Dissection$of$Four$Major$QTL$Affecting$
Milk$Fat$Content$in$the$German$HolsteinBFriesian$Population$PLoS$ONE$7$
e40711$doi101371journalpone0040711$
Xu$S$2003$Theoretical$Basis$of$the$Beavis$Effect$Genetics$165$2259ndash2268$
Zimin$AV$Delcher$AL$Florea$L$Kelley$DR$Schatz$MC$Puiu$D$Hanrahan$
F$ Pertea$ G$ Tassell$ CPV$ Sonstegard$ TS$ Marccedilais$ G$ Roberts$ M$
Subramanian$ P$ Yorke$ JA$ Salzberg$ SL$ 2009$ A$ wholeBgenome$
assembly$ of$ the$ domestic$ cow$ Bos$ taurus$ Genome$ Biol$ 10$ R42$
doi101186gbB2009B10B4Br42$
75
$
Zu$X$Ma$J$Liu$H$Liu$F$Tan$C$Yu$L$Wang$J$Xie$Z$Cao$D$Jiang$Y$2011$ProBoncogene$ Pokemon$ promotes$ breast$ cancer$ progression$ by$upregulating$ survivin$ expression$ Breast$ Cancer$ Res$ 13$ R26$doi101186bcr2843$ $
76
$
Supplementaryfiles$
You$can$find$Chapter$͵$supplementary$files$at$
- DocTa$(Doctoral$Thesis$Archive)$(httptesionlineunicattit)$
- Google$Drive$(httptinyurlcomPhDMMChapter03)$or$scan$the$QR$code$
$
$
SupplementaryFigureS1Multidimensionalscaling
Spatial$distribution$of$genetic$structure$in$the$analyzed$breeds$$
$
Supplementary$FigureS2GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$Italian$Brown$
$
Supplementary$FigureS3GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$Italian$Holstein$
$
Supplementary$FigureS4GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$HolsteinDGAT$
$
Supplementary$FigureS5GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$Italian$Simmental$
77
$
$Supplementary$FigureS6LDdecayGenomeBwide$LD$decay$for$the$three$breeds$(Holstein$Simmental$Brown)$$Supplementary$FigureS7DescriptiveplotsDescriptive$statistics$for$the$Italian$Brown$Italian$Holstein$and$Italian$Simmental$$Supplementary$FigureS8DGAT1LDFocus$on$the$LD$of$the$DGAT1$region$in$all$the$datasets$(Brown$Holstein$HolsteinDGAT$Simmental)$$Supplementary$TableS1SNPfactNumber$of$subjects$and$SNPs$after$QC$$Supplementary$TableS2GeneBasedAssociationresultsFull$results$of$the$significant$genes$associated$with$the$traits$in$all$the$datasets$$ Sheet$1$Brown$ndash$fat$percentage$$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS3SingleSNPAssociationresultsFull$results$of$the$SNP$that$overcome$the$suggestive$threshold$$$ Sheet$1$Brown$ndash$fat$percentage$
78
$
$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS4GenesandtraitsGeneWise$pBvalues$of$the$discussed$genes$in$all$the$analysis$In$grey$are$evidenced$the$pBvalues$that$overcome$the$significance$threshold$Genes$are$alphabetically$listed$$Supplementary$TableS5GenesinQTLsGenes$from$the$geneBbased$association$mapping$into$known$QTLs$$$$$
79
CHAPTER(4(
MUGBAS(a(species(free(gene(based(association(suite(
S(Capomaccio1(M(Milanesi1(L(Bomba1(et(al(1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122ItalyTheseauthorscontributedequallytothiswork
____________________ApplicationnotesubmittedtoBioinformatics
Abstract(Theassociationstudiesbetweengenomewidemarkersandphenotypes(GWAS)
arenowwidelyusedinmodelandnonmodelspeciesHowevertoolsthatmerge
singlemarkerresultsandtheirprobabilityofbeingassociatedtofunctionalunits
(typicallygenes)areseldomdevelopedforspeciesotherthanhuman
Herewe presentMUGBAS a suite that calculates a generegionbased pvalue
fromagivenannotationsinglemarkerGWAresultandgenotypedatawiththe
objective to ease postGWAS analysis The software is species and annotation
independentfasthighlyparallelizedandreadyforhighdensitymarkerstudies
Availabilityandimplementationhttpsbitbucketorgcapemastermugbas
Introduction(With the availability of highdensity SNP panels Genome Wide Association
Studies (GWAS) have become the gold standard for dissecting the biology of
complextraits(McCarthyetal2008)Thisistrueforhumansandotherspecies
TheunderlyingideaofGWASissimplefindmarkertraitassociationsexploiting
the linkage disequilibrium (LD) that exists between the causative mutation ndash
whichweignorendashandoneormoremarkersofknownchromosomalposition
Whileanumberofwellestablishedapproachescanbeeasilyusedinastandard
GWAS (Aulchenko et al 2007 Kang et al 2010 Purcell et al 2007) to find
marker(s) associated to a particular phenotype several issues have to be
accountedforduringtheupanddownstreamanalyses
Firstly genome wide studies have to take into account multiple testing If
stringent statistical thresholds are applied a proportion ofmoderate or small
effectsactuallyassociatedwiththetraitwillbedisregardedThereforereducing
the number of tests while maintaining all the information provided by high
densitypanelswouldbeaclearadvantageInadditiononceasignificantsignalis
detected different strategiesmaybeused to tag the candidate causative gene
Sometimesthegeneclosertothepeaksignalisselectedinothercasesallgenes
withinaflankingregionofanarbitrarysizeofthesignificantSNP(s)areincluded
82
infurtheranalysesThesechoicesmayendupinbiasresultseitherretaininga
largenumberofnottrulyassociatedgenesordisregardingthecorrectones
Somemethodshaverecentlybeenproposedtoovercometheselimitationsusing
aprioriknowledgeastheldquocandidatepathwayrdquoanalysis(Ravenetal2013)and
the ldquogenebasedrdquo association strategies (Akula et al 2011 Cantor et al 2010
Liuetal2010)TheseapproachesrestrictGWAstudiesonlytoknowngenesor
genesubsetsresizingthemultipletestingproblemandincreasingthepowerof
theanalyses
Usually bioinformatics pipelines are developed for the analysis of human and
modelorganismsandarenotadaptedtononmodelorganismThisisthecaseof
the VEGAS software (Liu et al 2010 Yang et al 2014) developed only for
humansandonlyforgenebasedanalyses
Here we introduce MUGBAS [MUlti species GeneBased Association Suite] a
softwarebasedonVEGASthatusesGWASdataandagivenannotation(typically
genesbutanyregionmaybesuitable)toestimategenebasedandregionbased
associationpvaluesThesuite is speciesandannotation free ready toanalyze
highdensitydatafromanyexperimentaldesigns
Methods(MUGBASisasuiteofscriptsbuiltinBashRPythonandPerlThesuiterequires
several prerequisites that can be easily fulfilled in a LinuxUnixMac
environmentwith a few command lines Further information on how tomeet
theserequirementsaredetailedintheonlinerepository
MUGBAS suite can be invoked launching the Bash wrapper that checks
dependenciesandstoresuserchoices
Required input files are MAPPED files in PLINK format with genotype data
from(atleast200individualsarerecommendedforaproperLDcalculation)an
annotation file inBED format (chromosome startingposition endingposition
nameandusercustomfields)ofthedesiredgenomefeatureandtheGWASresult
file with mandatory headers ldquoSNPNAMErdquo and ldquoPVALUErdquo Multiple association
resultscanbetestedatthesametime ifprovidedinseparatefiles inthesame
directoryAtrialdatasetisdownloadablealongwiththesoftwarereleasePlease
83
notethatforsimplicityweprovideageneannotationbutanyBEDfilewithany
desiredfeaturecanbeused
The program starts the analysis assigning SNPs to any feature of the given
annotationusingbedtools(QuinlanandHall2010)Thisstepiscustomizableby
user defined boundaries (up and downstream) in order to catch regulatory
regionsthatcouldbeinhighLDwiththecurrentgeneregion
Inordertospeeduptheanalysistheprogramsubsamples200individualsfrom
theinputdatasetWethenimplementedtwolevelsofparallelizationoneforthe
LDandonefortheldquogenewiserdquoorldquoregionwiserdquopvaluecalculationThelatteris
particularlyusefulanalyzingmorethanoneGWAresultsfromthesamedataset
BrieflyMUGBASfirstsplitstheLDcalculationinncores(asdefinedbytheuser)
using the foreach doParallel packages (Analytics andWeston 2014a 2014b)
and then distributes traits across cores using GNU Parallel and foreach
doParallel
SinceLDcalculationisaslowprocessVEGASbenefitsofHapMapphase2datato
speed up the processwhile preserving the option for a user defined (human)
population using PLINK In order to expand these analysis to any species LD
calculationinMUGBASisimplementedwithther2fast()functionfromGenABEL
package(Aulchenkoetal2007)whilethegenewisepvalueiscalculatedasin
theVEGASapproachBrieflythegenewiseteststatisticisthesumofthenupper
tailchisquaredvaluesofaSNPsubsetwithonedegreeoffreedom(wheren is
thenumberofSNPmappinginthegivengene)Withperfectlinkageequilibrium
the genewise pvalue would be the onetailed pvalue of a chisquare
distributionwithndegreesof freedomOtherwisemorelikelythedistribution
of the pvalues has to be evaluated (with simulations) andweighted (with LD
values)(Liuetal2010)
Once every generegion has its ldquogenewiserdquo pvalue a FDR qvalue (False
Discovery Rate) statistics is calculated with the base R function padjust()
Results are stored and enrichedwith ldquomanhattanrdquo and ldquoundergroundrdquo plots of
theldquogenewiserdquopvalueandtheqvaluerespectively
84
Results(MUGBAS output is generegionoriented with useful information about the
location of the analyzed feature the ldquoBest SNPrdquo within that feature with its
genomiccoordinatesandtheoriginalpvaluethegeneorregionwisestatistics
(pvalue number of simulations and qvalue) the custom annotation of that
particularentryandthepositionoftheSNPwithrespecttothefeatureandthe
decidedboundariesAll this informationareuseful to track theeffectofhighly
significant markers that could map into multiple region of interest and
effectively pinpoint the most probable location to focus on with data mining
processes
A typical analysis performed with a highdensity panel (gt600K markers)
distributed in 10 midperformance cores (Intel Xeon X5675 307 GHz) on
threetraitssimultaneouslytakesaboutfourhours
Discussion(Generegionbased methods are now widely used and accepted in genomic
research However non modelspecies suffer for the lack of availability of
specifictoolstoenhancediscoveryfromgenomewidedataMUGBASprovidesa
robust and accepted statistics to researchers dealing with species other than
humantoeasilyjumpfromassociatedmarkerstothedesiredfunctionalunits
Acknowledgements(((WewouldliketothankDrJimmyLiuforhisadvicesinbuildingthesuite
Reference(Akula N Baranova A Seto D Solka J NallsMA Singleton A Ferrucci L
TanakaTBandinelli SChoYSKimYJLee JYHanBGBipolar
Disorder Genome Study (BiGS) Consortium The Wellcome Trust Case
85
ControlConsortiumMcMahonFJ2011ANetworkBasedApproachtoPrioritize Results from GenomeWide Association Studies PLoS ONE 6e24220doi101371journalpone0024220
Analytics R Weston S 2014a doParallel Foreach parallel adaptor for theparallelpackage
AnalyticsRWestonS2014bforeachForeachloopingconstructforRAulchenkoYSdeKoningDJHaleyC2007GenomewideRapidAssociation
Using Mixed Model and Regression A Fast and Simple Method ForGenomewidePedigreeBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614
Cantor RM Lange K Sinsheimer JS 2010 Prioritizing GWAS Results AReview of Statistical Methods and Recommendations for TheirApplicationAmJHumGenet866ndash22doi101016jajhg200911017
KangHMSulJHServiceSKZaitlenNAKongSFreimerNBSabattiCEskin E 2010 Variance component model to account for samplestructure in genomewide association studies Nat Genet 42 348ndash354doi101038ng548
LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKM2010Aversatile genebased test for genomewide association studies Am JHumGenet87139
McCarthy MI Abecasis GR Cardon LR Goldstein DB Little J IoannidisJPA Hirschhorn JN 2008 Genomewide association studies forcomplextraitsconsensusuncertaintyandchallengesNatRevGenet9356ndash369doi101038nrg2344
Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender DMallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKatool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795
QuinlanARHallIM2010BEDToolsaflexiblesuiteofutilitiesforcomparinggenomic features Bioinformatics 26 841ndash842doi101093bioinformaticsbtq033
86
Raven LA Cocks BG Pryce JE Cottrell JJ Hayes BJ 2013 Genes of the
RNASE5pathway contain SNP associatedwithmilk production traits in
dairycattleGenetSelEvol4525doi101186129796864525
87
CHAPTER5
Imputationaccuracyisrobusttocattlereferencegenomeupdates
MMilanesi1etal1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122Italy
____________________ShortCommunicationacceptedinAnimalGenetics
SummaryGenotype imputation is routinely applied in a large number of cattle breeds Imputation
hasbecomeaneedduetothelargenumberofSNParrayswithvariabledensity(currently
from2900to777962SNPs)Althoughmanyauthorshavestudiedtheeffectofdifferent
statisticalmethodsonimputationaccuracytheimpactofa(likely)changeinthereference
genomeassemblyonimputationfromlowertohigherdensityhasnotbeendeterminedso
far In this work 1021 Italian Simmental SNP genotypes were reJmapped on the three
mostrecentreferencegenomeassembliesFour imputationmethodswereusedtoassess
theimpactofanupdateinthereferencegenomeAsexpectedthefourmethodsbehaved
differentlywith largedifferences in termsof accuracyUpdatingSNPcoordinateson the
three tested cattle reference genome assemblies determined only a slight variation on
imputationresultswithinmethod
KeywordsImputationreferencegenomeassemblySNPbovine
MaintextFromthepublicationof the firstbovinesinglenucleotidepolymorphism(SNP)array the
wholegenomicscenarioofthisspeciesevolvedrapidlyNowadaysseveralSNPpanelshave
been designed with a range of densities mapping on different reference genome
assemblies and produced by different manufacturers When running analyses with SNP
arrays of different densities or from different commercial companies (eg Illumina and
Affymetrix) an imputation step is often required The impact of different statistical
methods on imputation accuracy has already been assessed (Pei et al 2008) (Marchini
andHowie2010)(Maetal2013)Surprisinglyuptonowtherearenotavailablestudies
investigating the impact of an update on the reference genome assembly on imputation
accuracy This assessment would be of fundamental importance considering that reJ
mappingSNPsdeterminesarearrangementofthehaplotypestructureofthegenotypes(in
somecasesSNPsarereJmappedtoadifferentchromosome)andthatthebovinereference
assembly is likely toberepeatedlyupdatedover timeThisworkwasaimedatassessing
the effect of different reference genome assemblies in cattle As test casewe compared
four different imputation methods from low (Illumina BovineLDv10 Beadchip) to
90
mediumhigh (Illumina BovineSNP50v2 Beadchip) density in the Italian Simmental
populationForeachmethodweassessedthe impacton imputationaccuracyofmapping
SNPs on the BTAU42 BTAU46 (wwwhgscbcmeduotherJmammalsbovineJgenomeJ
project)orUMD31(Ziminetal2009)cattlereferencegenomeassemblies
Atotalof900and121SimmentalbullsweregenotypedwiththeIlluminaBovineSNP50v1
and v2 Beadchips (Illumina Inc San Diego CA) respectively Illumina BovineSNP50v2
Beadchip (hereinafter named 54k) the official reference chip for the Italian Simmental
breedersassociation(ANAPRI)wasconsideredasreferenceSNPpanelTheSNPchiMpv1
tool(Nicolazzietal2014)wasusedtoobtainSNPcoordinatesonBTAU42UMD31and
BTAU46genomeassembliesForeachassemblySNPsnotcontainedinthe54kunmapped
orinthesexualchromosomeswerediscardedThefinaldatasetcontained5118952886
and51290SNPsinBTAU42UMD31andBTAU46respectively
FourimputationmethodswereconsideredPedImpute(Nicolazzietal2013)FindHapv2
(VanRadenetal2011)FImpute(Sargolzaeietal2014)andBeaglev332(Browningand
Browning2007)Thefirstthreeusepedigreeandpopulation(ie linkagedisequilibrium
LD) informationwhereas the fourth uses only LD information Performance of the four
methods testedacrossgenomicassemblieswas comparedusingwrongly imputedalleles
(ldquoerrorsrdquo)andallelicsquaredcorrelation(ldquoallelicJR2rdquo)statisticsAllelicJR2wasdefined
asthesquaredcorrelationbetweenimputedandtrue(eggenotyped)allelesmultipliedby
100 as in Browning amp Browning (2009) The 878 oldest sires were used as training
population whereas the 143 youngest bulls were used as test population SNPs not in
commonwith the lowdensity panelwere set tomissing in the test population The test
population was split in four scenarios individuals with sire and maternal grandsire
genotypedinthereferencepopulation(scenarioAJ84animals)individualswithonlythe
sire genotyped (scenario B J 34 animals) individuals with only the maternal grandsire
genotyped(scenarioCJ13animals)ornoneofcloserelativesgenotyped(scenarioDJ12
animals)Thepopulationusedwashighlystructuredwithnearlyhalfofthebullspresent
in the dataset (490) sired by 115 genotyped sires In addition 35 out of these 115
genotypedsiresweresiredby22bulls(eggrandsires)alsopresentinthedataset
DifferencesacrossassembliesarenothomogeneousFromBTAU42toBTAU4646SNPs
weremapped to different chromosomes and the average difference in physical position
withinchromosomeswasnearly025MbpHowevertherewasnosubstantialdifferencein
the order of markers In fact not considering unmapped SNPs on either of the
aforementionedassembliesadifferenceinthesequentialorderwasobservedforonly134
91
single (ie spread throughout the genome) SNPs On the contrary from UMD31 to
BTAU46 222 SNPsweremapped to different chromosomes and amuch larger average
differenceinSNPsphysicalpositionwasobserved(iemorethan2Mbp)Alsothenumber
ofSNPswithdifferentsequentialorderwasmuchlarger(ie1893SNPs)Howevermore
than50oftherearrangementsinvolvedblocksfrom3tonearly100markersAsaresult
slight differences in overall imputation accuracy from BTA46 to UMD31 rather than
between the two BTAU assemblies were expected across methods On the contrary if
rearrangements are due to issues in the assembly of scaffolds theymight have a direct
(negative) impact on local imputation performances In fact a preliminary analysis on
BTA1consideringlocalerroracrossassembliesandmethodssuggestatleasttwosuch
SNPsclustersat44and138MbponUMD31(FiguresS1S2andS3)
Considering allmethods and scenarios amaximum average variation of 02 inerrors
(Table1)andof09inallelicJR2(Table2)wasobservedacrossreferenceassembliesThe
standarddeviation(SD)oftheimputationaccuracywasstableacrossreferenceassemblies
withinmethodsOnthecontraryalargervariabilitywasobservedacrossmethods
PedImpute and FindHap showed a relatively large average allelicJR2 variability across
scenarios evaluated over the same reference assembly Only PedImpute showed a large
variation in SD across scenarios in errors and allelicJR2 with high variability on
scenarios C and D showing a low performance of thismethodwhen there are no close
relatives(egsiredam)genotypedinthereferencedatasetFImputeinsteadachievedthe
highestaccuracyacrossallreferencegenomesandscenariossurprisinglyobtainingitsbest
resultsJerrorsof14(06SD)andallelicJR2of955(26SD)JinscenarioCwhereonly
onerelativelycloserelativeisgenotyped(maternalgrandJsire)FImputeappeartobeable
todetectshorterhaplotypesfromdistantrelativesmoreaccuratelythananyothermethod
tested in this study thus it is lessdependenton theavailabilityofpedigree information
thantheothertwopedigreeJbasedmethods(Sargolzaeietal2010)
The large variability across methods within assembly was expected although the
differences obtained among PedImpute FindHap and Beagle were larger in this study
comparedtoasimilarstudyinItalianHolstein(Nicolazzietal2013)Thiscoulddependon
a number of factors First the sample size of the test population is small (especially for
groups C and D where less than 15 individuals were present) Second the interaction
between population structure and imputationmethodsmay have a role PedImpute and
FindHap rely much more heavily on pedigree information than FImpute and Beagle
assumesall individualsareunrelatedHowever the lattertwomethods indirectlybenefit
92
greatly of from highly structured populations with large number of relatives ndash close or
distantndashpresent inthereferencepopulation(Sargolzaeietal2014)Anotherinteresting
factor that influences the imputation accuracy may be the persistence of LD at long
distances this is much lower in the Italian Simmental than in the Italian Holstein
population(FigureS4)Informationonrelatives(eitherpedigreeinformationorgenotyped
relativesinthereferencepopulation)isexpectedtoplayabiggerroleifLDisconservedat
shorter distances This hypothesis is supported by the fact that although there is high
variability acrossmethods better performances of the pedigreeJbasedmethods are still
observedinscenariosAandB(andCforFImpute)
Theslightdifferencesinimputationaccuracyobservedacrossreferenceassembliesmaybe
explained by the aforementioned small variation in the SNP blocks across reference
assemblies irrespectively of the rearrangements of single SNPs Considering that SNP
blocksaremostlymaintainedweexpectthatbreedswithlargerLDpersistenceatlonger
distances (ie Holstein Brown) will obtain even better imputation accuracy across
assembliescomparedtowhatreportedinthisstudyinItalianSimmentalHoweverastill
relativelyunexploredaspectof imputationperformance is thedistributionof imputation
errors across the genomeNot considering genotyping errors (which are expected to be
randomly distributed in the genome) a cluster of consecutive markers with high
percentage of imputation error across methods are most probably identifying missJ
assembledregions in thegenome(FiguresS1S2andS3)Althoughthe impactonoverall
imputation accuracy is limited as results in this study show these errorsmight have a
large impact on any analyses relying on specific genomic coordinates (eg genomeJwide
associationstudies)Morecomprehensiveresultsincludingannotationstudiesandwhole
genomesequencingdataarerequiredtoconfirmandinvestigatetheseobservations
Toconcludeonlyasmalleffectofupdatingthereferencegenomeassemblyonimputation
accuracywas observed in Italian Simmental On the other hand as already reported by
other studies the variability of imputation accuracy estimates is much larger across
imputation methods The choice of the most suitable imputation method based on the
breedgeneticbackgrounditscharacteristicsandontheavailabledataisthereforecritical
AcknowledgementsThe research leading to these results has received funding from the European Unions
Seventh Framework Programme (FP72007J2013) under grant agreement ndeg 289592 ndash
93
Gene2FarmPartofthegenotypicdatausedinthisstudywasprovidedbyprojectldquoSelMolrdquo
fundedbytheItalianMinistryofAgriculture(MiPAAF)MMwassupportedbytheDoctoral
SchoolontheAgroJFoodSystem(Agrisystem)of theUniversitagraveCattolicadelSacroCuore
(Italy) FB was supported by the Marie Curie European Reintegration Grant
ldquoNEUTRADAPTrdquo(FP7JPEOPLEJ2009JRG)
ReferencesBrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingandMissingJ
Data Inference for WholeJGenome Association Studies By Use of Localized
HaplotypeClusteringAmJHumGenet811084ndash1097
MaPBroslashndumRFZhangQLundMSSuG2013Comparisonofdifferentmethods
forimputinggenomeJwidemarkergenotypesinSwedishandFinnishRedCattleJ
DairySci964666ndash4677doi103168jds2012J6316
Marchini J Howie B 2010 Genotype imputation for genomeJwide association studies
NatRevGenet11499ndash511doi101038nrg2796
NicolazziELBiffaniSJansenG2013ShortcommunicationImputinggenotypesusing
PedImputefastalgorithmcombiningpedigreeandpopulationinformationJournal
ofDairyScience962649ndash2653doi103168jds2012J6062
NicolazziELPiccioliniMStrozziFSchnabelRDLawleyCPiraniABrewFStella
A 2014 SNPchiMp a database to disentangle the SNPchip jungle in bovine
livestockBMCGenomics15123doi1011861471J2164J15J123
PeiYLiJZhangLPapasianCJDengH2008Analysesandcomparisonofaccuracyof
differentgenotypeimputationmethodsPLoSONE3551
VanRadenPMOrsquoConnellJRWiggansGRWeigelKA2011Genomicevaluationswith
manymoregenotypesGenetSelEvol43
94
Tableamp1ampIm
putationampaccuracyampacrossampassem
bliesampforampPedImputeampFindH
apampFImputeampandampBeagleamppercentageampofampwronglyampim
putedamp
allelesamp
PedImputeamp
FindHapamp
FImputeamp
Beagleamp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
Scenarioamp
Namp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
A1amp
84
20
07
21
07
21
07
32
09
32
09
32
09
15
09
15
09
15
09
24
11
25
11
25
11
B2amp
34
23
08
23
09
22
08
36
11
36
12
36
11
15
07
15
09
14
06
23
08
23
08
23
08
C3amp
13
98
41
98
41
98
41
51
10
52
10
51
10
14
06
14
06
14
06
23
06
23
06
24
06
D4 amp
12
88
49
89
49
88
49
54
11
56
12
54
11
16
11
17
12
16
11
26
15
26
15
26
14
1 Sireandm
aternalgrandsiregenotyped
2 Siregenotypedmate
rnalgrandsireno
tgenotyped
3 Sireno
tgenotypedmate
rnalgrandsiregenotyped
4 Sireandm
aternalgrandsireno
tgenotyped
Tableamp2ampIm
putationampaccuracyampacrossampassem
bliesampforampPedImputeampFindH
apampFImputeampandampBeagleampallelicJR2amp
PedImputeamp
FindHapamp
FImputeamp
Beagleamp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
Scenarioamp
Namp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
A1amp
84
941
20
940
20
941
20
907
24
908
25
907
24
952
26
952
27
952
27
925
33
923
33
925
33
B2amp
34
935
24
934
25
935
24
896
30
896
34
895
32
954
20
954
21
954
20
929
24
928
25
928
24
C3amp
13
759
89
761
88
758
87
848
33
845
30
848
32
955
18
955
19
955
19
931
19
928
19
927
19
D4 amp
12
782
113
777
113
781
112
841
32
832
35
839
33
949
34
947
38
949
35
921
44
919
45
921
44
1 Sireandm
aternalgrandsiregenotyped
2 Siregenotypedmate
rnalgrandsireno
tgenotyped
3 Sireno
tgenotypedmate
rnalgrandsiregenotyped
4 Sireandm
aternalgrandsireno
tgenotyped
95
Supplementaryfiles
YoucanfindChapterͷsupplementaryfilesat
- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)
- GoogleDrive(httptinyurlcomPhDMMChapter05)orscantheQRcode
SupplementaryFigureS1ErrorpercentageacrossmethodsforBTAU42onBTA1
SupplementaryFigureS2ErrorpercentageacrossmethodsforUMD31onBTA1
SupplementaryFigureS3ErrorpercentageacrossmethodsforBTAU46onBTA1
SupplementaryFigureS4Linkagedisequilibrium(LD)decayoverdistanceinItalian
SimmentalandHolstein
ThedottedlinewithblackdotsindicatesLDdecayoverdistanceinItalianHolstein
whereasthedashedlinewithwhitediamondsindicatesthesamevaluesforItalian
SimmentalItalianHolsteingenotypeswereasubsetofthedatausedinNicolazzietal
(2013)
96
CHAPTER6
QuestfordeleteriousrecessivevariantsinItalianHolsteinbullscombiningexomesequencingandhighdensitySNPanalysis
MMilanesi1etal
1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly
____________________Manuscript
Abstract
Outbreed species carry a genetic load of deleterious recessive variants Those
that are lethal survive only in the heterozygous state in the population while
otherssimplyreducethefitnessofindividualshomozygousforthevariantThe
long=termdestinyofthesemutationsistodecreaseinfrequencyduetonegative
selectionandeventuallydisappearbecauseofgeneticdriftHowever theymay
bemaintainedinlivestockpopulationsandevenincreaseinfrequencywhenin
linkage with favourable alleles for traits under selection or occurring in the
genomeofsireshavinghighgeneticvalueforproductivetraitsWesearchedfor
these variants combining exome sequence data from 20 Italian Holstein bulls
sampledfromoppositetailsofthemaleandfemalefertilityEBVdistributionthe
HighDensity(HD)SNPgenotypingof1009progenytestedItalianHolsteinbulls
and information existing in public databases Exome sequencing detected
202956 variants Among these 5167 were classified as ldquodeleteriousrdquo when
annotatedwithSIFTandAnnovarThesewerefilteredbypickingthosemapping
in regions out of equilibrium and lack of homozygotes in the progeny tested
Holstein bulls or having an asymmetric distribution in bulls plus=variant and
minus=variant for fertility EBV The resulting 193 variants in 163 genes were
furtherassessedbycomparativegenomicshaplotypeanalysisandfunctionFor
eachassessmentapositiveornegativescorewasgiventoevidenceinfavouror
against thevariant tobedeleteriousA totalof35variantshadapositive total
score These genes are involved in development cell motility and immune
responseSomeof themareknowntobeexpressed in the testiclesandappear
importantforspermfertilityinmodelspeciesothersareresponsibleforgenetic
defectsinhumanandmousecarryingmutationssimilartotheonesdetectedin
cattleThesegenevariantsarecandidatetobeamongthosethatconstitutethe
ldquogeneticloadrdquoofdeleteriousrecessivegenesintheHolsteinpopulation
100
Introduction
The presence of deleterious recessive variants is well tolerated by outbred
speciesWhenthesevariantsappeartheymayspreadandsurvivealongtimein
the population in the heterozygous state (Simmons and Crow 1977) It is
estimatedthatoutbredindividualshumanincludedcarryonaverageagenetic
loadofmorethan100loss=of=functionvariantsintheirgenome(MacArthuretal
2012)
Theeffectofdeleterious recessivevariantsappearswhendemographicevents
eggeneticbottlenecksreducethegeneticvariabilityofthepopulationorwhen
relatedindividualsareintentionallymatedforbreedingpurposesInthesecases
recessive deleterious reduce the fitness of individuals carrying them in the
homozygous state and in extreme case induce very severe genetic defects
premature death and abortion (McClure et al 2014 Sahana et al 2013
VanRadenetal2011)Theythereforecontributetoexplainthephenomenonof
inbreeding depression and its negative effect on fertility (Charlesworth and
Willis2009)
The search for deleterious recessive variants is particularly interesting in
Holstein cattle Thisbreed is experiencing anegative genetic trend for fertility
traits (Jamrozik et al 2005) it is worth noting that this is occurring while
inbreedingisunderstrictmonitoringandistheoreticallyincreasingataverylow
pace(about005peryear=Mrodeetal2009)Inadditionmatingisplannedto
minimise relationship between animals and the additive genetic variance for
mosttraitsndashincludingmilkproduction=isapparentlynotdecreasinginspiteof
the selection pressure applied by modern schemes (Biffani et al 2002)
indicatingthatoveralldiversityisonlyslightlydecreasing
SelectiontoincreasefertilityischallengingSelectioncriteriagenerallyadopted
to improve this trait are far from the biological mechanisms that govern
reproduction Inaddition thesemechanismsarecomplexandverysensitive to
environmental(management)conditions(Laneetal2013Wathesetal2014)
Asa result fertility traitshave loworvery lowheritability (Pryceetal1997)
andthereforelowresponsetoselectionandslowgeneticprogress
101
Forthesereasons thequest formolecularvariants influencing fertility traits is
on=goingsincetheavailabilityofmoleculartoolsforDNAanalysisAnumberof
investigations tested the effect of functional candidate genes running genome
wide association analyses to find QTLs for fertility traits using hundreds
microsatellite markers in structured families (Ashwell et al 2004) or tens of
thousand SNP panels in whole populations (Aston and Carrell 2009) The
availability of medium density SNP panels allowed the search for deleterious
recessivesevenwithouttheuseofphenotypictraitsbyscanningthegenomein
searchofregionslackingoneofthetwohomozygousclasses(Sahanaetal2013
VanRaden et al 2011) This approach has highlighted a number of genomic
regions likely carrying deleterious recessives This information has immediate
application inmolecular assisted breeding and a relevant scientific interest in
thesearchofcausativevariantsandassessmentoftheirfunctions
Thedevelopmentofnewsequencingtechnologieshasgreatlydecreasedthecost
of sequencing so that nowadays the sequencing of few target individuals is
becoming the gold standard to seek for causal mutations for rare monogenic
defectsinhumansandotherspecies(ChunandFay2009)Oftenonlytheexome
the protein coding part of the genome is sequenced searching for those
mutationsthatseverelyalterproteinstructureandorfunction(Bamshadetal
2011 Li et al 2013) A typical mammalian exome consists in a few tens of
millionnucleotides(around2ofthereferencegenome)comparedtothefew
billionofthewholegenomeThereforewiththesamenumberofreadsexome
sequencinghastheadvantageofamoreaccurategenotypingdueto thehigher
sequencing depth an easier data management lower computation load and
easier interpretation (Bieseckeretal2011Majewskietal2011) Indeed the
numbervariants found inexomesareone=twoordersofmagnitude lowerthan
thatdetectedinwholegenomesandmutationsaredetectedinannotatedgenes
orimmediatelynearbyOntheotherhandtheexomeapproachcanonlyidentify
variantson annotatedgenesVariants inunknownexonsor that occur innon=
coding regulatory features will be missed (Bamshad et al 2011) In humans
these are turning out to be of outmost importance for gene regulation and
downstreamphenotypic effects (ENCODEConsortium2012) but they are still
verypoorlyannotatedinlivestock
102
Herewesequencedtheexomesof20Holsteinbullsprogenytestedselectedfrom
the tails of male and female fertility EBVs (Estimated Breeding Value) Since
livestockfertilityisacomplextraitandcausativemutationscannotbesoughtby
the approach used in monogenic diseases we integrated the sequence
information with high density SNP chip data obtained on a much larger
populationofItalianHolsteinbullstodetectgenescarryingallelescandidateto
be deleterious recessives and confirm their likely deleterious effect by
comparativegenomics
Materialsandmethods
1SamplingandDNAextraction
Semen(straws)andbloodsampleswereprovidedbyItaliancertificatedartificial
insemination centres (UE Directive 88407CEE) thus ethics committee
approvalforthisstudyisnotrequired
A total of 20 ItalianHolstein progeny tested bullswere sampled for complete
exome sequencing Among them 19were chosen from the tails of female and
male fertilityEBV(EstimatedBreedingValue)distribution (Table1) tohave5
animals fromeach tail One animalwas to be in theminus=variant tail of both
traitsBullswereselectedasunrelatedaspossibleSamplingwaspossiblethanks
tothecollaborationwithANAFI(ItalianHolsteinBreederAssociation)
ANAFIdevelopedageneticevaluationfor fertilitycombiningdifferenttraitsas
explained by Biffani and colleagues (Biffani et al 2005) The traits are ldquodays
from calving to first inseminationrdquo ldquocalving intervalrdquo ldquofirst=service non return
rateto56daysrdquoldquoangularityrdquoandldquomatureequivalentmilkyieldat305daysrdquo
MalefertilityisexpressedasanEstimatedRelativeConceptionRate(ERCR)and
isameasureofconceptionrateofaservicesirerelativetoservicesiresofherd
mates(Soggiuetal2013)
103
Table1FemaleandmalefertilityEBVsofthe20animalssequenced
Animal EBVFemaleFertility EBVMaleFertility GroupfriA01 97 =111 NAfriA02 98 =166 MalLfriA03 115 0 FemHfriA04 114 0 FemHfriA05 114 0 FemHfriA06 86 0 FemLfriA07 98 195 MalHfriA08 105 213 MalHfriA09 95 =154 MalLfriA10 95 149 MalHfriA11 102 264 MalHfriA12 91 =475 FemLMalLfriA13 105 =179 MalLfriA14 102 207 MalHfriA15 99 =405 MalLfriA16 85 0 FemLfriA17 87 0 FemLfriA18 121 0 FemHfriA19 117 0 FemHfriA20 88 0 FemL
A total of 1009 Italian Holstein bulls were chosen from the entire population
balancing theirpresence in the tailsof theEBV forkgprotein (PROT) somatic
cell count (SCC) fertility (FERT) and functional longevity (LONG) for high
densitygenotyping(TableS1)
DNAwasisolatedfromwholebloodandsemenstrawssamplesinLGSlaboratory
(CremonaItalywwwlgscrit)usingtheGenomixkitaccordingtoLGSinternal
protocolDNAsforexomecaptureandsequencingwereextractedindependently
tosatisfy therequirement indicatedby IGA technologies (Udine Italy)at least
12mgofDNAataconcentration100ngulwithR260280higherthan18and
R260230near19
2Moleculardataproductionandqualitychecking
21Exomecaptureandsequencing
104
Exomes were captured using the ldquoSureSelected All exon Bovinerdquo (Agilent
Technologies) kit in early version Probes coordinates are based on UMD31
bovinereferencesequence(Ziminetal2009)anddesignedagainstNCBIexons
as well as predicted exons UTRs and miRNAs Number and distribution of
probesperchromosomearedetailedinTableS2
rdquoOvation Ultralow Library Systemsrdquo protocol was followed for library
preparation After quantification with Qubit (Life technologies) and quality
controlwith2100Bioanalyzer(AgilentTechnologies)DNAwassonicatedwith
Bioruptor (Diagenode) to obtain fragments between 150 to 200 bp in length
blunt end=repaired adenylated ligated with paired=end adaptors and purified
followingtheprotocolprovidedbythemanufacturer
LibrarieswerehybridizedwithldquoSureSelectedAllexonBovinerdquoprobesusingthe
Encore Target Capture Module (Nugen) SureSelect Target Enrichment and
Herculase II FusionEnzymewithdNTPCombo (AgilentTechnology) kit Then
capturedDNAwasamplifiedandtaggedwithlibraryspecificindexesQualityof
capturedsampleswastestedwithQubitand2100Bioanalyzer
Paired=end 2X100bp sequencing was performed on Illumina HiSeq2500
(IlluminaSanDiegoCA)atanexpectedcoverageof50X
22Sequencealignment
Adapters and reads shorter than 36bp were trimmed using Trimmomatic
(version032)(Bolgeretal2014)ForeachFastqfilethealignmentandpairing
of the reads to theUMD31 reference sequencewas performedwithBurrows=
WheelerAligner(BWA)(version077=r441)(LiandDurbin2009)applyingalso
trimmingoption (=q5) SAM files resultedwere converted intoBAM file using
SAMtools (version0118) (Lietal2009)Foreach individualBAM fileswere
merged in a single one with Picard (version 1111)
(httpsgithubcombroadinstitutepicard)Duplicatereadswereidentifiedand
markedwithPicardFinallyindividualBAMfileswereindexedwithSAMtools
105
Sequencedepthwasevaluatedsimultaneouslyonall20sequencedanimalsand
onlyonregionstargetedbyprobesusingDepthOfCoverageofGATK(McKennaet
al2010)
23ExomeVariantcallingandqualitycheck
Singlenucleotidepolymorphisms(SNPs)insertionsanddeletions(InDels)were
calledsimultaneouslyonall20sequencedanimalsandonlyonregionstargeted
byprobesusingUnifiedGenotyperofGATK(DePristoetal2011)
All variantswere filteredusingVariantFiltration ofGATK to removeoneswith
lowmappingandorsequencingqualityThechoiceoffiltersandthresholdswas
made in accordance to ldquoGATK best practicerdquo for exome sequencing and
distributionoffiltervalues
bull ldquoSNPclusterrdquoexcludedvariantifmorethan3variantswerefoundin10
bp
bull ldquoBaseQualityRankSumTestrdquoexcludedifitrsquoslowerthen=5ormorethan
5
bull ldquoMappingQualityRankSumTestrdquoexcludedif itrsquoslowerthen=7ormore
than6
bull ldquoRead Position Rank Sum Testrdquo excluded if itrsquos lower then =4 or more
than4
bull ldquoFisherStrandrdquoexcludedifitrsquosmorethan30
bull ldquoDepthrdquoexcluded if itrsquos lower then60X(ieaminimumvalueof3Xper
animals)ormorethan3500X
bull ldquoQualityrdquoexcludedifitrsquoslowerthen40
bull ldquoMappingqualityrdquoexcludedifitrsquoslowerthen30
Filters were applied to all variants but only bi=allelic SNPs were used in the
following analyses Genotypes with ldquoGenotype Depthrdquo lower than 5 and
ldquoGenotype Qualityrdquo lower than 20 were discarded The remaining ones were
converted intoldquoABrdquocode(A=referencealleleandB=alternativeallele)Then
data were transformed in PLINK (Purcell et al 2007) format using a home=
made python script To prepare the working dataset the final quality control
106
removedvariantswithmore than25missingdata (that iswithmore than5
animals out of 20 without genotypes) monomorphic (minor allele frequency
MAFlt1)ornotmappinginautosomesBasicpopulationgeneticsparameters
werecalculatedtoevaluatethepresenceofcrypticstructureinthedatathatmay
influencetheoutcomeofthedownstreamanalyses
In total 24390 SNPs identified in the exomes were also included in the HD
SNPchip panel HD genotype from nineteen out to twenty of the sequenced
animals sequenced was used to compare technologies Genotypes at common
locationswerecomparedacrosstechnologies toevaluatesequencequalityand
theeffectofsequencedepthongenotypecalling
24BovineHDGenotypingBeadChipdataandqualitycheck
AllanimalsweregenotypedusingtheBovineHDGenotypingBeadChip(Illumina
San Diego CA) Genotypes were quality controlled and filtered with standard
threshold Markers with more than 5 missing data and MAF lt 1 were
excluded Animals with more than 5missing data were also excluded Only
autosomal SNPs were retained Thereafter genotypes were phased and the
remainingmissingdata imputedusingBeaglev332(BrowningandBrowning
2007)
Ofthe20animalssequenced12hadalsoBovineHDGenotypingBeadChipand7
BovineSNP50 Genotyping BeadChip version 2 genotypes available For
concordance and haplotype analysis we imputed the 50K data to 800K as
previouslydescribedabove
3Strategyfordetectingcandidatedeleteriousvariants
Wesearchedforcandidatedeleteriousvariantsfollowinganintegratedapproach
that joined the analysis of i) exomes of a few Italian Holstein bulls having
plusvariant and minusvariant EBVs for a target trait ii) HD genotyping of a
population of 1009 progeny tested Italian Holstein bulls iii) the comparative
genomic analysis of 100 vertebrate species The strategy followed for the
107
analysisisshowninFigure1Inthefirststep(Search)welookedforcandidate
deleterious variants in the exomes In step2 (Filter)we filtered candidatesby
assessingwhichof thesevariantseitherhadanon=randomdistributionamong
animals having extreme EBVs or co=mapped with genomic regions carrying
outliersignals inthelargepopulationofprogenytestedbulls Instep3(Asses)
candidateswere further evaluated by i) assessing their conservation across a
panel of 100 vertebrate species ii) searching for associated haplotypes and
evaluating their frequencies in the population of progeny tested bulls iii)
evaluatingtheirassociationwithgeneticdefectsinhumanandmodelspeciesiv)
examining their expression profile and function when known Methods are
belowdescribedindetail
Figure1Strategyfollowedtoidentifydeleteriousvariants
108
31Step1Searchforcandidatedeleteriousvariants
Quest for candidate deleterious variants in exome sequence data by
variantannotation
WeannotatedallvariantsusingANNOVAR(Wangetal2010)andVEP(Variant
Effect Predictor) (McClure et al 2014) software They use the Ensembl gene
annotation database (UMD31 reference sequence release 74) to investigate
locationandpotentialeffectoncodingsequencesofallvariantsVEPintegrates
the SIFT algorithm (Kumar et al 2009) to predict the effect of amino acid
substitutiononproteinfunctionForeachsequencechangesequencehomology
and physic=chemical similarity between the reference amino acid and the
alternative is considered providing a score that evaluates the mutation
tolerabilityVariantswithscorelowerthan005wereclassifiedasldquodeleteriousrdquo
inaccordingtosoftwarerecommendations
SIFTalsocomparedallvariants found in theexomeswith thosepresent in the
dbSNPdatabase(version140)toidentifythosethatwerenovel
32Step2Filteringofcandidatedeleteriousvariants
2a Analysis of deleterious variant distribution among animals having
extremesEBVvaluesformaleandfemalefertilitytraits
Association between variants and the classification in the tails of the EBV
distributionwas estimated byFisherexact testwhich compares frequencies of
alleles (a vs A) in the two tails not assumingH=W equilibriumgenotypic test
that compares the frequency of genotypes (aa vs Aa vs AA) and the recessive
geneactiontestwhichtestsforthedistributionofonehomozygousclassversus
theothers(aavsaAAA)AlltestsareimplementedinPLINKfunctionsThe5
significance threshold was set empirically by permutation (N=100000) either
point=wise that is not corrected for multiple testing (EMP1 = Empirical
Probability1)andthereafterfamily=wisecorrectedformultipletesting(EMP2=
EmpiricalProbability2)
109
GenesexceedingEMP1were considered suggestive and thoseexceedingEMP2
significant candidates to be deleterious recessives and their characteristics
conservationacrossspeciesandfunctionfurtherinvestigated
2b Quest for regions candidate to carry deleterious variants in the
populationanalysedwiththeHDBeadChip
Basic population genetics parameters and Multi Dimensional Scaling were
calculated with GenABEL R package (Aulchenko et al 2007) to evaluate the
presenceofcrypticstructure inthedatathatmayinfluencetheoutcomeofthe
downstreamanalyses
ExpectedandobservedallelicandgenotypicfrequenciesandFisherexacttestof
Hardy=Weinberg proportions (Janis E Wigginton 2005) were calculated with
PLINKon theHDSNPdatasetValueswere averaged in slidingwindowsof10
markersThisvalue(10markers)waschosenaftertestingdifferentwindowsize
as represent a good compromise between noise reduction and resolution (not
shown)
Candidate regions containing deleterious mutations were identified following
twodifferentapproaches
In one approach hereafter refereed to as ldquoOut of Hardy=Weinbergrdquo approach
(OHW)markerssignificantlydeviatingfromofHardy=WeinbergequilibriumatP
le84710=8 (nominalvaluePle005Bonferronicorrected formultiple testing)
werefirstidentifiedWindowswithatleastthreesignificantmarkersbelowthis
thresholdandhavingahomozygotedeficitwereconsideredcandidatestocarry
deleterious mutations In this OHW approach we applied very stringent
constraints to reduce false positive discoveries due to the Wahlund effect
inducedbypopulationsubstructure(seeresultsandFigureS5)
In thesecondapproachhereafterreferred toas ldquoLackOfHomozygotesrdquo (LOH)
approach the differences between observed and expected frequencies of
homozygotesfortheminorallelewerecomputedandaveragedacrossmarkers
included in each sliding window Values were standardized and only regions
carrying three overlapping windows below =3Z score value were considered
110
candidates to carry deleterious mutations Thereafter significant windows
carryingatleastonemarkerincommonwerejoinedtodefineregionboundaries
ExomesequencingknowledgeandHDBeadChipdatawerecrossed inorder to
identifygenescarryingseverevariantslocatedwithinsignificantregionsortheir
proximity (50Kb upstream or downstream) These genes were considered
candidates to be deleterious recessives and their characteristics conservation
acrossspeciesandfunctionwasfurtherinvestigated
33Step3Assessmentofcandidatedeleteriousvariants
Genes carrying variations classified as deleterious by SIFT or ANNOVAR and
eitherassuggestivebyfertilityEBVtailsanalysisorassignificantbyOHWeLOH
analyses of the Holstein bull population were submitted to comparative
genomics population genetics and functional analyses In order to prioritize
genesvariants discoveredwe scored each evaluation using a subjective scale
ranging from =3 to 3 to measuring the evidence in favour (scores 1 to 3) or
against(scores=1to=3)thedeleteriouseffectoftheSNPsThescore0inallcases
indicated the absence of evidence or that a specific assessment failed The
completescorescalethereforerangedfrom=9to9
3aComparativegenomics
For comparative genomics purposes the position of eachmutationwas ldquolifted
overrdquo from UMD31 bovine coordinates to hg19 human coordinates using the
specifictoolatUCSC(Kentetal2002=httpgenomeucscedu)Orthologous
proteins were aligned using the UCSC genome browser and amino acid
conservation assessed in a set of 100 vertebrate species that included human
primates (11 species) euarchontoglires (egmouse14 species) laurasiatheria
(egsheep25species)afrotheira(egelephant6species)othermammals(eg
platypus5species)birds(egchicken14species)sarcopterygii(egturtle8
species) fish (eg zebrafish 16 species) Results of amino acid conservation
analysiswererecordedonlyifhomologousproteinfromatleast50speciescould
beretrievedandalignedThereafteraminoacidconservationwasscored3ifthe
111
aminoacidwasconservedinallvertebrates2ifconservedinallmammalsand1
ifconservedinthelargemajority(atleast90)ofmammalsAscoreof=1was
given when the amino acid was not conserved among species and a score =3
when both variants observed in cattle were carried by other species Finally
wheneithertheliftoverprocessortheproteinalignmentfailedthecomparison
wasconsiderednoninformativeandgivenascore0
3bSearchofhaplotypesassociatedtodeleteriousvariantsandanalysisof
HDpopulationdata
To acquire further evidence we searched for the haplotypes that carry the
deleterious variations and estimated their frequency in the
homozygousheterozygousstatuswithinthebullpopulationgenotypedwiththe
HDBeadChip
Weusedamultistepapproachtofindhaplotypescarryingdeleteriousvariantsi)
from the phased HD BeadChip data we extracted the haplotypes of the 20
sequencedanimalsintheregionofthevariant(100markersoneachside)ii)in
these 20 animals we searched the shorter haplotype able to distinguish
homozygous andor heterozygous animals for the deleterious variant from all
other haplotypes iii) we assessed the frequency of homozygotes and
heterozygotes for thehaplotype in thewholepopulationofbulls characterised
withtheHDBeadChip
Scoresinthiscasewere=3inthecaseofuniquehaplotypesshowinganumberof
homozygotesnonsignificantlydifferentfromthenumberexpectedfromHardy=
Weinbergproportions3inthecaseofuniquehaplotypesshowinganumberof
homozygotes significantly lower than expected 1 in the case of unique
haplotypesneverhomozygousandrare0whenevernouniquehaplotypecould
befound
3cFunctionalanalyses
With PantherDB (Mi et al 2010) resources we classified our gene list into
functionalcategoriesaccordingtotheGeneOntologyclassificationAllsuggestive
112
andcandidateEnsemblGeneIDsubmittedtothebulkanalysiswererecognizedand annotated Beyond the GO term characterization in order to understandpotentialdetrimentaleffectsofmutationsandalsopinpointpotentialcandidatesfor fertility traits we retrieved information for each gene according to thefollowing parameters i) the involvement in known Mendelian or complexdiseases fromOMIMdatabase ii) theavailabilityofknock=outmice iii) and theeventual expression in reproductive tissues from the Gene Expression Atlas(httpwwwebiacukgxa)(Table4)
Alsoforthesedataweusedthesamescaleattributingarangefrom=3pointstogenes having no relation with fertility viability and genetic defects and norelevant phenotypic effects in knock=out mouse models to 3 to gene closelyrelated to fertility and viability associated to genetic defects in human andaffectingfertilityorsurvivalwhenknocked=outInparticularascoreranging=1to+1wasattributedtogenesassociatedtohumansyndromes(=1=noassociationrecordedinOMIMdatabase0=noinformationinOMIM1=associationinOMIM)A score with same range to information collected from knock=out mousephenotypes(=1=knock=outmouseshowsnoimpairedphenotype0=noknockoutmouse has been produced 1= knock=out mouse shows impaired phenotype)Thesamescorewasattributedtomolecularfunctions(=1=functionunrelatedtofertilityandviability0=functionindirectlyrelatedasimmunityandmetabolism1=functiondirectlyrelatedasfertilityanddevelopment)
Results
Exomecaptureandsequencing
A totalof225414probeswereused for capturing thecattleexome (5355Mbabout 2 of UMD31 reference genomes) Mitochondrial (MT) and Ychromosome were not included in probe design The number of probes perchromosomespannedfrom2500onBTA27toalmost12000inBTA3(TableS2)
113
The average exome sequencing depth was 4058 slightly lower than the
expected50XAllanimalshadatleast24Xaveragegenomessequencedbutina
same animal the sequencing depth varied across chromosomes and regions
withinchromosomesThelowestchromosomeaveragevalueobservedwas10X
inBTA25oftheshallowersequencedanimalAveragecoverageperanimaland
chromosomeandoverallmeansareshowninFiguresS1
In total1613461402 raw100bppair=ends sequence readswereproducedA
total of 1425514112 100bp reads were mapped and properly paired to the
reference genome Paired reads per animal ranged between 41973510 and
138840392
Insynthesissequencingdepthwasvariableasexpectedandalmostallanimals
hadsomeregionsofscarceornosequencedepthbutoverallallexomeregions
were sufficiently covered to permit a comprehensive analysis of molecular
variantsoftheanimalsinvestigated
Step1Searchforcandidatedeleteriousvariants
Variantdetectionandannotation
Thesequencingdepthofallvariantscalledonthealignmentofallreadsacross
animalsfollowedaChi=squaredistributionAveragedepthwas79694spanning
from2Xto5000Xwithpeakfrequencyclassat150=250X(FigureS2)
A multistep approach was followed to clean the exome sequence dataset
Cleaning process steps and the number onmarkers retained in each step are
describedinTable2
114
Table2Typeandnumberofmarkersdetectedandretainedafterdatasetcleaning
TypeofvariationNumberofvariations(N)
Rawdata
Aftervariantcleaning
Aftergenotypecleaning(onlyautosomal)
Complex 1213 963 0Deletion 6938 5031 0Insertion 5510 3656 0MultiallelicComplex 5710 3995 0MultiallelicSNP 803 512 0SNP 242988 203079 164277Total 263162 217236 164277whencomparedtothereferencegenome
Atotalof263162variantswereidentifiedAmongthese217236(8255)wereretainedafterapplyingthevariantfilters(seematerialsandmethodssection)
According to GATK classification the vast majority (9348) of variationsidentified were biallelic SNPs followed by insertions and deletions (InDels40) more complex variations and multiallelic SNPs (252) Indels andcomplexvariationsalthoughlessfrequentthanSNPsaffectedarelevantportionoftheexome(49589bp)Thenumberofvariantsdetectedperchromosomewassignificantly correlated to the number of probes used (r=08761P=228x10=10)andofbasescaptured(r=08424P=53x10=19)perchromosomehoweverBTA23BTA27 BTAX had an outlier behaviour the former two containing highervariation(450and445variantsperKb)andthelatterlower(084variantsperKb)thantheaverage(322variantsperKb)
Bialleic SNPswere further filtered for genotype quality (sequence depth gt 5Xandsequencequalitygt20)andBTAXmarkersThefinalautosomalSNPdatasetcounted164277SNPsAmongthese17829biallelicSNPs(109)describedforthefirsttimehavebeensubmittedtodbSNP
Variantsrsquo annotation is shown in Table 3a Variantswere detected in differentgenedomains(exonsintrons3rsquountranslatedregions)aswellasupstreamanddownstreamgenesMorethan31ofthevariationsweredetectedinexonsandamong them more than 41 have the potential to have an effect on genefunctionbyaffectingsplicingoralterproteinstructure(Table3b)
115
Table 3 A) number of autosomal SNPs annotated in each gene region B) number and
classificationofautosomalSNPsalteringgenefunction
A)
Generegion NofSNPs(autosomes)Downstream 1170Exonic 51293Intergenic 49901Intronic 58116ncRNA_exonic 160ncRNA_intronic 3ncRNA_splicing 2Splicing 184Upstream 1445Upstreamdownstream 54UTR3 1345UTR5 604Total 164277
B)
EffectofexonicSNPs NofSNPs(autosomes)Nonsynonymoussinglenucleotidevariation 20839Stopgainsinglenucleotidevariation 213Stoplosssinglenucleotidevariation 10Synonymoussinglenucleotidevariation 30226Unknown 5Total 51293
whencomparedtothereferencegenome
Amongthe51293exonicSNPs4166in2978geneswereclassifiedldquodeleteriousrdquo
by SIFT Among these 3210 in 2356 genes were observed at least twice
ANNOVARidentified223stopgain(213)orloss(10)variantsin217genes
Theaverageconcordanceat24390SNPs in commonbetweenexomesand the
HD Beadchip was 9716 (Table S3) Discordance was due to either alleles
detected by sequencing andnot detected by genotyping (exome=heterozygote
Beadchip=homozygote) and vice=versa (exome=homozygote
Beadchip=heterozygote) Discordances were randomly distributed among
markers and inversely correlated to sequencing depth (Figure S3) indicating
that most errors are likely occurring during sequencing even if occasional
116
Beadchip failure indetectingsomeallelecannotbeexcludedTwoanimalshad
anoutlierbehaviouronehaving846andthesecond719concordanceBoth
animalswereretainedforvariantdiscoverywhiletheywereexcludedfromthe
analysesofEBVtailsWithoutthesetwooutlieranimalsthemeanconcordance
increasesto9938
Multidimensional scaling confirmed the absence of a substructure in animals
sequenced(FigureS4)
Step2Filteringofcandidatedeleteriousvariants
AnalysisofEBVtails
Fisherexact test isgenerallyused to test for theassociationbetweenavariant
and a Mendelian disease Samples are divided in cases and controls and the
association between alleles or genotypes and the disease is tested under
different inheritance models Here we used the same method to test for
association between allelesgenotypes and animals from opposite tails of the
EBVdistributionfori)malefertility(N=4high+4low)ii)femalefertility(N=5
high+4low)andiii)maleandfemalefertility(N=9high+8low)
Genotypes were tested under all inheritance models implemented in PLINK
Giventhelownumberofsamplesnovariantwassignificantaftercorrectionfor
multiple comparisons (EMP2) and only suggestive variants could be found
(significant at 5 at EMP1)We considered suggestive of carrying deleterious
allelesalistof101genescarrying112variants(TableS4)
Detectionofregionscarryingdeleteriousmutationfrom800KSNPdata
After applying quality control parameters 26653 and 146991 markers were
removed from the original dataset for excessive missing data and MAF
respectivelyInaddition17individualsremovedforlowgenotypingrateAswith
exomes only autosomal SNPswere retained for the analyses thus obtaining a
workingdatasetof590680SNPsand992animals
117
MDS(FigureS5)analysis indicates that thedatasethassomesubstructure that
mightaffectdownstreamanalysesWeaccountedfortheriskoffindingmarkers
outofHardy=WeinbergproportionsduetotheWahlundeffectbychoosingvery
stringentsignificancethresholdintheOHWanalyses
To detect regions candidate to carry deleterious recessivemutations we used
two approaches the ldquoOutside Hardy Weinbergrdquo (OHW) and the ldquoLack Of
Homozygotesrdquo (LOH) approaches (see materials and methods) The two
approaches complement each other in the detection of regions carrying less
homozygotes than expected OHW identifies regions in which a few markers
have extreme homozygote deficit while LOH detects regions in which many
markershaveamoderatedeficit
In the SNP dataset 1884 markers were significantly out of Hardy=Weinberg
equilibrium(Ple84710=8)Atotalof485windowswereidentifiedwithatleast
3 markers out of 10 exceeding this threshold Among these 84 contained
markerssignificantbecauseofhomozygotedeficitThese84windowsidentified
11OHWcandidateregionsin9chromosomes(TableS5)
TheLOHapproachidentified2335windowsbelow=3Zscoreofthestandardised
differencebetweenobservedandexpected frequencyofhomozygotes forMAF
Theseidentified326LOHcandidateregionsspreadonallautosomes(TableS6)
Onlyregionsformedbyatleast3singlewindowswereretainedreducingLOH
candidateregionsto240
Deleterious variants (SIFT deleterious and stop codon gainedlost) were
intersectedwithsignificantOHWandLOHregionSevenofthesemappedwithin
OHW sliding windows 36 within LOH sliding windows Among these 5 were
located in regions significant in both OHW and LOH Other 51 deleterious
variants mapped in the immediate vicinity (+=50Kb) of significant windows
These 83 variants affect 66 unique genes candidate to carry deleterious
recessivealleles(TableS7)
Five regionswere common toOHWandLOHoneonBTA4 (LOH=74959610=
75151417 OHW=74970653=75151417) two on BTA7 (LOH=9626435=
10099512 OHW=9628735=10099512 LOH=10575691=11171132
118
OHW=10486424=11087441) one on BTA15 (LOH=80299924=80928311
OHW=80299924=80921274) and one on BTA18 (LOH=28122373=
28185525OHW=28106022=28164693)
Step3Assessingcandidatedeleteriousvariants
After the two filtering step previously described a total of 191 SNPs were
retained out of the 3433 classified as deleterious by SIFT and Annovar and
analysed further These are carried by 163 genes Eighteen genes carried 2
candidateSNPsthreegenes3SNPsandonegene5SNPsInaddition4SNPsin
four geneswere identified as potential candidates in both the analysis of EBV
tailsandofbullpopulation
A totalof12SNPs introducedstopcodonswhile the remaining inducedamino
acid substitutionsor splicing variants In the filtereddataset theproportionof
stop codons on total variation (63) was only slightly higher than the
proportion found in the entire dataset (51 225 stop codon out of 4391
deleteriousSNPs)
Comparativegenomics
Proteinmultiplealignmentwassuccessfulin146outof191comparisonsIn26
casesthemostfrequentaminoacidincattlewasveryhighlyconservedeitherin
allvertebrates(N=9)orinallmammals(N=17)Inother23casestheaminoacid
was conserved in at least 90 of mammalian species In the last class were
generally included amino acids that changed in mammalian species
phylogenetically verydistant from cattle (eg platypus and related species) In
97 instances the amino acid could change quite freely and occasionally both
variantsobservedincattlewerecarriedbyotherspecies
HaplotypeanalysisofHDpopulationdata
The search for haplotypes associated to the191 variants for further testing in
the population identified 101 unique and invariant haplotypes common to all
119
carriersofatargetvariantanddifferentfromallthosefoundinnoncarriersIn
additional36casesanoncompletelyinvarianthaplotypecouldbeassociatedto
themutationConverselyin54instancesareliablehaplotypecouldnotbefound
(table S8) In total 38 haplotypes had no homozygous individuals in the
population Thiswas expected in the case of 33 rather rare haplotypes (fewer
than 100 heterozygotes observed in the population) Conversely 5 haplotypes
were rather frequent (from 170 to almost 300 heterozygotes observed in the
population)andtheexpectationwastofind78to222homozygousindividuals
in thepopulation insteadof thenone foundAnothervarianthadaremarkable
high frequency (35) and a clear outlier behaviour Only 11 homozygous
individualswereobservedoutofthe121expected
Functionalanalysisofcandidategenes
To assess the potential deleterious effects of the variants identified we also
retrieved functional information for each gene according to the following
parametersi)associationtoknownMendelianorcomplexdiseaseslistedinthe
OMIM database ii) availability and effect of gene knock=out in mice iii)
expression profileswith particular attention in reproductive tissues from the
GeneExpressionAtlas(Table4)
Twenty genes out of 163 have a recognized phenotype in humans These are
mostlyseveresyndromesUnfortunatelyonlyafewgeneshaveanull=miceline
producedwithphenotypicdataavailable
According to the expression data available from Ensembl these genes are
generally (with someexceptions)expressed inavarietyof tissueswitha clear
tendencytobehighlyexpressedintestis
120
Tableamp4Informationon163genesanalyzedconcerningeventualphenotypesinmanorknockoutsinmouseHeaderampcodesAbrainBcolonCheartDkidney
EliverFlungGskeletalmuscleHspleenItestisColoursrepresentthegenelevelofexpressionindifferenttissuefromlow(whitegrey)tohigh(blue)NCin
FunctioncolumnmeansldquoNotClassifiedrdquo
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000000079
CCSAP
Cells
9
1
03
1
07
2
06
2
2
NC
ENSBTAG00000000149
USP40
Cells
6
7
2
7
5
7
2
8
9
fertility
ENSBTAG00000000414
FUT5
Nomatch
37
1
cellular
process
ENSBTAG00000000918
ERMARD
Periventricular
nodularheterotopia
6Xlinked
periventricular
heterotopia6q
terminaldeletion
syndrome
Periventricular
nodularheterotopia
Nomatch
5
4
4
9
6
7
4
9
17
cellular
process
ENSBTAG00000000998
SLC39A6
Cells
6
9
2
7
4
10
1
10
9
cellular
process
ENSBTAG00000001133
VWA8
Mice
2
3
6
7
10
3
3
4
4
blood
metabolism
ENSBTAG00000001140
IAH1
httpswwwmousephenotypeor
gdatagenesMGI1914982sect
ionassociations
BOTHSEXadiposetissue
behaviourneurological
growthsizebodyFEMALE
homeostasismetabolismMALE
limbsdigitstailskeleton
9
18
30
60
23
19
12
17
30
cellular
process
ENSBTAG00000001262
IRGQ
Cells
6
1
5
3
06
2
5
04
2
immunity
ENSBTAG00000001286
ELMO3
httpswwwmousephenotypeor
gdatagenesMGI2679007sect
ionassociations
Decreasedstartlereflex
02
27
14
7
7
02
1
fertility
ENSBTAG00000001291
OR8H3
Nomatch
fertility
ENSBTAG00000001500
FIGNL1
Cells
08
2
01
05
07
08
04
2
3
cellular
process
ENSBTAG00000001505
GRK4
Cells
10
7
09
4
6
3
2
6
218
signalling
ENSBTAG00000002220
SPTLC1
Hereditarysensory
andautonomic
neuropathytype1
No
17
27
3
28
12
28
5
19
6
metabolism
ENSBTAG00000002288
NT5DC4
Nomatch
02
cellular
process
ENSBTAG00000002289
NLRP14
No
04
cellular
process
ENSBTAG00000002660
CCDC11
Situsinversustotalis
Nomatch
3
1
06
2
07
3
03
2
54
fertility
121
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000002809
CAPN10
AutosomalRecessive
MentalRetardation
DIABETESMELLITUS
NONINSULIN
DEPEND
ENT1
Cells
2
13
41
33
29
immunity
ENSBTAG00000002966
DNAJC13
Youngadultonset
Parkinsonism
No
5
52
94
62
84
fertility
ENSBTAG00000003231
LIM2
Pulverulentcataract
Cells
01
02
01
02
cellular
process
ENSBTAG00000003508
CFAP69
httpswwwmousephenotypeor
gdatagenesMGI2443778sect
ionassociations
Abnormalcirculatinginsulinlevel
MaleInfertilityDecreasedmean
plateletvolum
eIncreasedleanbody
mass
203
07
1
01
7NC
ENSBTAG00000003523
UGT2B15
Nomatch
34
metabolism
ENSBTAG00000004081
FAT3
No
2
03
01
01
05
signalling
ENSBTAG00000004555
LRP2
DONN
AIBARROW
SYND
ROME
Intellectualdisability
Cells
56
3
01
disease
ENSBTAG00000004772
THEM
4
Cells
8
16
295
27
52
63
metabolism
ENSBTAG00000004860
SLC27A6
Cells
6
02
10
602
207
01
immunity
ENSBTAG00000005021
SEMA5A
Monosom
y5p
Cells
9
04
89
13
08
08
6immunity
ENSBTAG00000005170
GPR114
Mice
101
02
01
09
6
01
fertility
ENSBTAG00000005248
OFCC1
No
01
01
03
01
01
01
03
NC
ENSBTAG00000005340
SERGEF
Cells
7
10
35
15
58
23
cellular
process
ENSBTAG00000005357
VPS11
Mice
14
13
619
717
815
15
cellular
process
ENSBTAG00000005466
C8H8orf58
Nomatch
05
01
08
02
103
02
04
NC
ENSBTAG00000005514
TRIO
Cells
17
66
10
210
88
7immunity
ENSBTAG00000005633
RGNEF
httpswwwmousephenotypeor
gdatagenesMGI1346016sect
ionassociations
Vertebralfusion
36
214
05
801
08
1cellular
process
ENSBTAG00000006021
CEP250
Mice
3
22
43
41
88
metabolism
ENSBTAG00000006027
USP34
Mice
11
63
74
88
912
fertility
ENSBTAG00000006532
CDK5RAP2
Autosomalrecessive
primary
microcephaly
httpswwwmousephenotypeor
gdatagenesMGI2384875sect
ionassociations
skeletonimmunesystem
growthsizebodyhem
atopoietic
system
5
34
92
74
784
developm
ent
ENSBTAG00000007001
SLC26A1
No
03
01
02
32
903
01
07
metabolism
ENSBTAG00000007340
LOC101906349
Nomatch
NC
ENSBTAG00000007568
MEPE
Cells
cellular
process
ENSBTAG00000007910
EXOC7
Cells
13
24
23
22
12
22
15
22
28
cellular
process
122
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000008650
CAMK1D
Cells
24
2
01
06
1
2
03
7
3
signalling
ENSBTAG00000008797
Uncharacterized
Nomatch
09
04
07
04
NC
ENSBTAG00000008871
DDX4
Cells
02
01
01
132
cellular
process
ENSBTAG00000008996
SFI1
Mice
2
1
06
3
04
3
2
5
NC
ENSBTAG00000009014
UPK1B
Mice
2
2
2
5
4
4
2
7
3
fertility
ENSBTAG00000009030
NOX5
Nomatch
04
01
8
01
immunity
ENSBTAG00000009033
TEX37
httpswwwmousephenotypeor
gdatagenesMGI1921471sect
ionassociations
Abnormalstartlereflex
03
83
NC
ENSBTAG00000009064
CSF2RB
Congenitalpulmonary
alveolarproteinosis
Mice
01
2
04
06
2
23
03
17
05
signalling
ENSBTAG00000009144
BPIFA2A
Nomatch
6
06
05
metabolism
ENSBTAG00000009337
PNMAL1
Cells
8
05
01
3
02
03
03
04
3
Immunity
ENSBTAG00000009444
SLC15A5
Cells
01
cellular
process
ENSBTAG00000009568
KANK2
Cells
1
17
30
9
3
31
9
10
3
cellular
process
ENSBTAG00000009569
DOCK6
AdamsOliver
syndrome
Cells
2
5
10
12
5
15
5
6
21
metabolism
ENSBTAG00000009836
CHGA
Mice
140
555
01
2
02
01
07
1
metabolism
ENSBTAG00000010522
Uncharacterized
Nomatch
01
02
02
03
2
metabolism
ENSBTAG00000010601
DOLK
Familialisolated
dilated
cardiomyopathy
No
4
8
6
22
9
6
5
5
12
metabolism
ENSBTAG00000010847
FOXN4
Cells
01
01
metabolism
ENSBTAG00000010954
ART3
Cells
02
1
55
01
08
2
10
06
71
metabolism
ENSBTAG00000011011
SSH2
No
3
09
5
4
2
2
6
7
10
cellular
process
ENSBTAG00000011387
NAT9
Cells
6
7
3
8
6
8
4
6
6
metabolism
ENSBTAG00000011420
CA9
Cells
04
3
02
2
03
1
01
2
56
metabolism
ENSBTAG00000011433
RGP1
httpswwwmousephenotypeor
gdatagenesMGI1915956sect
ionassociations
Completepreweaninglethality
7
19
11
18
8
12
9
8
25
cellular
process
ENSBTAG00000011622
C18orf21
Mice
9
10
6
6
2
10
5
10
34
NC
ENSBTAG00000011683
NUMB
Cells
21
25
8
35
17
47
6
8
19
immunity
ENSBTAG00000011684
RAB11FIP1
Cells
6
01
8
09
11
02
1
4
NC
ENSBTAG00000012053
ACCSL
No
03
01
cellular
process
ENSBTAG00000012314
LDLR
Homozygousfamilial
hypercholesterolemia
Cells
8
33
11
21
17
24
2
15
2
cellular
process
123
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000012326
LOC101906218
Nomatch
11
01
01
09
01
NC
ENSBTAG00000012458
PSAPL1
Cells
01
NC
ENSBTAG00000012565
PCNX
No
3
32
32
33
48
developm
ent
ENSBTAG00000012833
ATG2B
No
4
21
21
21
29
NC
ENSBTAG00000012921
KIF24
Mice
05
07
03
08
02
202
214
metabolism
ENSBTAG00000013568
UEVLD
Cells
5
41
53
42
214
metabolism
ENSBTAG00000013685
KRT31
Mice
cellular
process
ENSBTAG00000013753
DDRGK1
Mice
22
32
18
56
55
25
14
22
37
NC
ENSBTAG00000013789
OR6K6
No
NC
ENSBTAG00000013838
ALLC
Mice
02
01
78
metabolism
ENSBTAG00000013916
HERC2
Mentalretardation
autosomalrecessive
38SKINHAIREYE
PIGM
ENTATION
VARIATIONIN1
Developm
entaldelay
withautismspectrum
disorderandgait
instability
No
8
74
83
74
65
metabolism
ENSBTAG00000014001
MAP1A
Cells
136
23
22
202
43
352
cellular
process
ENSBTAG00000014058
LDB2
No
28
33
12
825
410
3cellular
process
ENSBTAG00000014284
ALPK2
No
01
10
01
304
02
cellular
process
ENSBTAG00000014750
EPB41L4A
httpswwwmousephenotypeor
gdatagenesMGI103007secti
onassociations
Decreasedcirculatingglucoselevel
Abnormalconeelectrophysiology
Immunesystem
s1
22
20
07
12
13
43
NC
ENSBTAG00000014752
AKAP11
httpswwwmousephenotypeor
gdatagenesMGI2684060sect
ionassociations
Nopheno
24
72
63
306
68
cellular
process
ENSBTAG00000014768
ZNF786
Cells
3
23
42
48
33
cellular
process
ENSBTAG00000014794
SCYL3
Cells
6
10
59
613
516
26
cellular
process
ENSBTAG00000015027
Clusterin
httpswwwmousephenotypeor
gdatagenesMGI88423sectio
nassociations
Aggressionvertebralfusion
05
03
03
02
05
3NC
124
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000015210
LOXHD1
DEAFNESS
AUTOSOMAL
RECESSIVE77
Autosomalrecessive
nonsyndromic
sensorineuDominant
lateonsetFuchs
cornealdystrophy
deafnessautosomal
recessivetype77
No
3
03
NC
ENSBTAG00000015294
COX10
Leighsyndrome
Mitochondrial
complexIVdeficiency
(MTC4D)
Cells
4
6
24
5
2
3
14
3
16
cellular
process
ENSBTAG00000015335
BAI3
No
38
03
8
01
01
2
signalling
ENSBTAG00000015490
HS1BP3
Cells
1
10
5
9
4
8
2
1
37
cellular
process
ENSBTAG00000015602
C7H5orf45
Nomatch
10
cellular
process
ENSBTAG00000015839
MAP4
No
27
85
104
35
11
100
82
46
61
cellular
process
ENSBTAG00000015912
DMKN
No
01
04
03
NC
ENSBTAG00000015980
FASN
AutosomalRecessive
MentalRetardation
Cells
19
29
7
12
6
39
6
7
22
metabolism
ENSBTAG00000015985
GPR20
httpswwwmousephenotypeor
gdatagenesMGI2441803sect
ionassociations
Anatomicaldefectsvarioustissues
03
02
08
01
09
signalling
ENSBTAG00000016495
LINS
AutosomalRecessive
MentalRetardation
Cells
6
3
1
5
3
3
1
5
12
NC
ENSBTAG00000016691
LACC1
Mice
2
9
1
9
9
8
08
7
27
NC
ENSBTAG00000017096
ERICH1
Cells
6
4
2
4
2
5
2
5
13
metabolism
ENSBTAG00000017165
MATN2
No
2
7
2
20
08
3
2
2
7
cellular
process
ENSBTAG00000017418
RRP1B
Cells
6
4
4
5
3
7
3
12
159
cellular
process
ENSBTAG00000017436
ALS2CR11
Cells
01
37
disease
ENSBTAG00000017505
PAXIP1
Cells
2
3
2
4
2
4
2
3
12
signalling
ENSBTAG00000017576
GPR144
Nomatch
02
signalling
ENSBTAG00000018528
USP45
No
1
1
04
1
05
06
02
2
04
cellular
process
ENSBTAG00000018810
THBS2
INTERVERTEBRAL
DISCDISEASE
Cells
1
2
04
09
1
09
04
1
2
fertility
ENSBTAG00000019072
PSD4
Cells
7
8
1
1
7
02
6
4
cellular
process
125
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000019265
AGK
Sengerssyndrom
e
Congenitalcataract
hypertrophic
cardiomyopathy
mitochondrial
myopathy
Cells
7
36
43
52
65
metabolism
ENSBTAG00000019752
BPIFA2B
Nomatch
01
1metabolism
ENSBTAG00000019799
FCAM
R
Cells
07
01
02
01
2
7immunity
ENSBTAG00000019961
ATP4B
Cells
12
03
1
05
cellular
process
ENSBTAG00000020414
DHX37
No
3
33
73
73
413
cellular
process
ENSBTAG00000021073
KIAA1549
PILOCYTIC
ASTROCYTOM
ASOMATIC
httpswwwmousephenotypeor
gdatagenesMGI2669829sect
ionassociations
behaviourneurological
homeostasism
etabolism
hematopoieticsystem
3
04
23
01
21
04
3NC
ENSBTAG00000021233
OR5B17
No
fertility
ENSBTAG00000021375
OR10Q1
No
fertility
ENSBTAG00000021643
EFCAB12
Cells
01
1
9cellular
process
ENSBTAG00000021645
MBD4
Cells
8
43
82
51
10
6cellular
process
ENSBTAG00000022509
DNAH
9
No
02
5
05
5metabolism
ENSBTAG00000022858
LOC783561
Nomatch
fertility
ENSBTAG00000026247
PKHD1L1
Cells
02
01
cellular
process
ENSBTAG00000026323
LYSB
Nomatch
68
metabolism
ENSBTAG00000026350
SPATC1
Cells
05
06
01
01
01
39
NC
ENSBTAG00000027390
RTFDC1
Mice
28
19
21
20
922
13
20
49
NC
ENSBTAG00000031115
Uncharacterized
Nomatch
01
cellular
process
ENSBTAG00000031718
OGFR
Mice
6
17
612
517
39
1fertility
ENSBTAG00000032264
Uncharacterized
Nomatch
52
4
22
NC
ENSBTAG00000032481
DAPL1
No
1
6
07
01
13
10
cellular
process
126
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000032527
ERCC6
CEREBROOCULOFACI
OSKELETAL
SYND
ROME1
COCKAYNE
SYND
ROMEBDe
SanctisCacchione
syndromeMACULAR
DEGENERATION
AGERELATED5UV
SENSITIVE
SYND
ROME1COFS
syndromeCockayne
syndrometype1
Cockaynesyndrome
type2Cockayne
syndrometype3UV
sensitivesyndrome
Cockaynesyndrome
typeB(CSB)De
SanctisCacchione
syndrome(DSC)UV
sensitivesyndrome
(UVS)cerebrooculo
facioskeletal
syndrometype1
(COFS1)
Mice
3
205
22
207
42
cellular
process
ENSBTAG00000032821
SCEL
No
01
02
17
04
NC
ENSBTAG00000033954
PRPS1L1
No
35
metabolism
ENSBTAG00000034991
SLX4IP
Cells
6
21
32
207
33
NC
ENSBTAG00000035226
TOR1AIP1
Cells
9
12
513
816
716
83
NC
ENSBTAG00000035309
OR4F15
No
fertility
ENSBTAG00000035708
LOC527795
Nomatch
01
01
23
metabolism
ENSBTAG00000035985
LOC101909743
OR8K1
No
fertility
ENSBTAG00000036019
RIPPLY3
httpswwwmousephenotypeor
gdatagenesMGI2181192sect
ionassociations
hemopoieticsystem
01
04
01
9
01
malformation
ENSBTAG00000037399
Uncharacterized
Nomatch
2
92
4
24
05
5signalling
ENSBTAG00000038606
DBDR
No
02
01
signalling
ENSBTAG00000038831
PHYHD1
No
2
18
21
23
30
18
16
15
1metabolism
ENSBTAG00000039110
OR8U1
No
fertility
ENSBTAG00000039117
FAM179B
Cells
14
206
31
308
33
cellular
process
ENSBTAG00000039520
SIRPB1
Cells
4
08
32
516
05
12
01
signalling
127
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000040568
Uncharacterized
Nomatch
4
307
54
33
25
cellular
process
ENSBTAG00000045558
LOC100299084
Nomatch
1
22
110
10
09
52
fertility
ENSBTAG00000045588
LOC100298356
Nomatch
metabolism
ENSBTAG00000045709
LOC514057
Nomatch
fertility
ENSBTAG00000046175
Uncharacterized
Nomatch
01
01
cellular
process
ENSBTAG00000046228
Olfactory
receptor
Nomatch
fertility
ENSBTAG00000046239
C10orf12
No
03
104
07
108
06
209
cellular
process
ENSBTAG00000046285
OR8I2
No
fertility
ENSBTAG00000046360
Uncharacterized
Nomatch
NC
ENSBTAG00000046423
LOC618007
Nomatch
NC
ENSBTAG00000046527
LOC100139830
Nomatch
fertility
ENSBTAG00000046590
FAM181A
Cells
09
44
NC
ENSBTAG00000046593
Uncharacterized
Nomatch
20
metabolism
ENSBTAG00000046727
Olfactory
receptor
Nomatch
04
fertility
ENSBTAG00000047150
RTEL
Cells
4
23
64
55
616
cellular
process
ENSBTAG00000047166
SH2D7
Mice
03
101
01
1
02
02
2immunity
ENSBTAG00000047174
Uncharacterized
Nomatch
03
7105
02
01
271
05
03
NC
ENSBTAG00000047317
CL43
Nomatch
01
120
01
immunity
ENSBTAG00000047587
Uncharacterized
Nomatch
22
09
09
35
03
4
06
2cellular
process
ENSBTAG00000047661
FABP12
httpswwwmousephenotypeor
gdatagenesMGI1922747sect
ionassociations
Growthsizebody
01
02
69
metabolism
ENSBTAG00000048021
PNMAL2
No
04
03
2
03
03
02
NC
ENSBTAG00000048153
Uncharacterized
Nomatch
2
05
01
02
01
cellular
process
128
Discussion(
Deleterious recessive variants arenaturally present in outbred animal species
Theydonotcreateproblemsaslongastheyremainatlowfrequencyinrandom
mating populations Under these conditions these variants remain in the
heterozygousstateanddonotproducephenotypiceffectsHoweverassoonas
their frequency increases andmatingoccursbetween related individuals their
deleteriouseffectsbecomeapparentandhighlyaffectthewelfareandviabilityof
animalscarryingtheminthehomozygousstate
In industrialdairy cattlebreedscrossbreeding isgenerallyavoidedandbreeds
areinpracticegeneticisolatesinwhichartificialinsemination(AI)iswidelyused
for rapidly spreading thegeneticprogress in thepopulationAsa consequence
elitesireshavemanythousandandsometimeshundredsofthousanddaughters
Under these conditions mating between relatives becomes practically
unavoidable in the longterm In fact inspiteof theclosecontrolof inbreeding
andofthematingsystemgeneticdefectsregularlyappeartothesurfaceSome
examples are the Complex Vertebral Malformation (CVM) and the Bovine
Leucocyte Adhesion Deficiency (BLAD) in Holstein (Schuumltz et al 2008) and
WeaverdiseaseinBrown(McClureetal2013)Inthesecasestheeffectsofthe
recessivedeleteriousallelesareclearandmoleculardiagnostictoolshavebeen
developedtoavoidtheirfurtherspreadingandtoinitiatetheeradicationprocess
However recessive deleterious alleles may produce their effects in a more
deceitful way for example by affecting fertility inducing early lethality or
influence disease susceptibility All these effects are difficult to trackwith the
current methods of phenotype measurements but have an impact on the
economyofthelivestocksectorandonanimalwelfare
InthispaperwesoughtfordeleteriousrecessiveallelesinItalianHolsteincattle
This breed is recently experiencing a decrease in the genetic trend of fertility
traits (Pryce et al 2014) therefore we started our search from the exome
sequences of 20 sires 19 of which having extreme EBV values for male and
femalefertility
130
The choiceof sequencingexomeswasa compromisebetween completenessof
informationaccuracyofvariantandgenotypedetectionandcostExomesclearly
give incomplete information since they miss all variation in regulatory sites
outside gene boundaries (ENCODE Consortium 2012) These may be very
relevantforthecontroloftraitvariationbuttheyarestillverypoorlyannotated
in livestock On the other hand sequencing exomes yields millions instead of
billionsbasepairsadeepersequencingoftargetregionsandaconsequentmore
reliablevariantandgenotypecalling(FigureS1andS2)Theanalysisofexome
data is also less demanding in terms of computer time (thousands instead of
millions variants are to be analysed) and easier in terms of biological
interpretation Exome sequencing revealed to be very effective in the
identificationofanumberofmutationsaffectinghumanhealth(Bamshadetal
2011 Yang et al 2013) All thesewere defects caused bymutations in single
genes producing clear phenotypic effects havingMendelian inheritance These
conditionsdonotholdinourinvestigationthereforewehadtodevisestrategies
to integrate exome data with other information collected from the Holstein
populationandotherspecies
Theoverallprocedureforidentifythecandidatedeleteriousrecessivewasbased
on the rationale that deleterious recessive alleles in coding sequences are
expectedtobei)causedbyseverealterationoftheproteinsencodedbygenes
ii)mostlyifnotexclusivelycarriedattheheterozygousstateinthepopulation
iii)unevenlydistributedinanimalsofhighandlowfertilityiv)likelyconserved
indifferentspeciesbecauseofselectionconstraintv)likelyassociatedtogenetic
defectsinanyspecieswhenseverelydamaged
A stepwise procedure was set up for reducing candidates to a number
manageable for functional analysis (Figure1) In the first stepmutationswere
discoveredinthesecondfilteredandinthirdassessed
Thisprocedureidentifiesonlysomeofthedeleteriousrecessivesexistinginthe
populationanalysedSincewesequencedonlyexomesandnotwholegenomes
ouranalysesislimitedtocodingsequencesandatthemomenttoSNPvariation
Wethereforemissregulatoryvariationsoutsidegenesand theeffectsof indels
andmorecomplexvariations
131
ThenumberofanimalssequencedislimitedDeleteriousrecessivesareexpected
toberareinthepopulationandwehavesampledonly20animalsfromasubset
ofsiresauthorizedtoAI inItalyAlthoughwehavetakenamongthemthose in
theminus tail of fertility trait thatmighthave ahigher genetic load itrsquos likely
thatwehavemissedothervariantsexistinginthepopulationOntheotherhand
variantscarriedbyAIauthorizedbullsaretheonesthatwillhavealargerimpact
onthepopulationifwidelyspreadbyelitebullsandthereforetheyarethemost
interestingfromabreedingpointofview
Most deleterious variants are expected to be very rare and therefore not
detected by the approach here adopted or detected and afterwards discarded
duringthefilteringprocesses
Sequencing is prone to errors Some real variants may have been discarded
duringQCofsequencedataduetopoorsequencequalityorpoorcoverageofthe
region
Someofthevariationsclassifiedasldquonondeleteriousrdquobythesoftwareusedmay
indeedhave an effect In fact synonymousmutation arenon`neutralwhenever
thedifferenttRNAsforasameaminoacidhavedifferentrelativeabundanceand
translationrates(Goymer2007)
1(Searching(candidate(deleterious(variants(in(exomes(
ExomesproducedafairamountofdataAlltogether5355Mbweresequenced
These represent 9998 of the exome sequences targeted by probes Among
theseonly35hadasequencingdepthacrossanimalslt60XSequencingdepth
is a key factor for variant detection and even more so when the genotype of
sequencedanimalsistobeinferredInvestigatingtheconcordancebetweenthe
genotypes detected at more than 20K SNP by both Beadchip and exome
sequencingwe observed a clear inverse relationship between sequence depth
andnumberofdiscordances(TableS3andFigureS3)Theaveragesequencing
depth expected (50X) and realised in practice (4058) in this investigation
yieldedratheraccurategenotypessincetheaverageconcordanceobservedwas
over97and increased toover99whentwooutlieranimalswereremoved
132
from the dataset The reason for the outlier behaviour of these samples waslikely a suboptimal quality of DNA submitted to sequencing Overall almost215000 variants passed quality controls and among them 192440 autosomalSNPs have been investigated in detail As found in other investigations mostvariants were rare (Peloso et al 2014) For example the MAF distributionobservedin46904SNPsclassifiedneutralhasameanof0197andamedianof0167values(FigureS6)
ThefirststepinourresearchwastoidentifySNPscausingseverealterationsinthe proteins coded by genes These were relevant changes in conformationalterationsofsplicingsitesortruncationsofproteinsToaccomplishthistaskwereliedontwosoftwareSIFTandANNOVARthatidentified4902suchvariationsin3423genes Interestingly the subsetofdeleteriousvariantshas significantlylowermean (0156)andmedian (010) compared toneutral variations (FigureS6) This subset is therefore at least enriched in variants under purifyingselectionhavingadeleteriouseffectatthepopulationlevel
Thenumberofdeleteriousallelesdetectedishighanditisunlikelytheyallhavea relevant phenotypic effect even if most variants are rare In fact variantsdeleterious for a protein are not automatically deleterious for an organism asthe protein change induced by the variant although relevant may not affectproteinfunctionvariationmayoccurinonememberofagenefamilysothatitsfunction may be compensated by other family members the protein functionmay be compensated by other proteins or the affected metabolic pathwaycompensated by alternative pathways The challenge was therefore to filter asubsetofvariantslikelyhavingaphenotypiceffecttobeproposedforvalidationatalargerscaleToaccomplishthistaskwesoughtforadditionalinformationinexomesequencesandinalargepopulationofHolsteinsiresgenotypedwiththebovineHDBeadchip
( (
133
2(Filtering(deleterious(variants(
The second step consisted in filtering the several thousand deleterious SNPsidentified in exomes to search for SNPs and genes that worth deeperinvestigationWeusedtwostrategiestoaccomplishthistask
21$ Distribution$ of$ deleterious$ variants$ in$ fertility$ EBV$ plus9variant$ and$
minus9variant$animals$
Fertility traits are controlled by a high number of genes and are highlygenetically heterogeneous Even using a selected subset of variants identifiedfrom exome sequencing classic association analyses would need sample sizesmuch larger that the one available in this investigation to have a reasonablestatistical power (Stitziel et al 2011) The first approachwe appliedwas theanalysisof SNPallelic andgenotypicdistributionbetween theminusandplus`variant tails of male and female fertility EBVs Given the very low number ofanimals sequencedwedecided to use a simplemodel to exclude from furtheranalyses all markers showing no evidence of association with the EBV tailsratherthantofindsignificantassociations
We divided samples in cases (minus`variant animals for fertility EBV) andcontrols (plus`variant animals) and tested all possible models of inheritanceimplemented in thePlinkpackage for tackling simple genetic defectsWe thenused an empirical point`wise significant threshold based on permutations tohighlight SNPs having alleles or genotypes showing an uneven distributionbetweencasesandcontrolsandconsideredtheseasworthfurtherinvestigationAtotalof112SNPsin101genespassedthisfilteringprocess
$
22$Detection$of$regions$carrying$deleterious$mutation$from$800K$SNP$data
The analysis of exomes alone was therefore insufficient to tag recessivedeleteriousgenesinItalianHolsteinandwesoughtforadditionalinformationin1009ItalianHolsteinbullsgenotypedathighdensitythatarepartofthegenomicselection program run by ANAFI The rationale of the approach is that
134
deleterious recessives are expected to be absent or rare in the homozygous
status in the population Hence they may be tagged searching for genome
regions having markers or haplotypes lacking or having significantly less
homozygotesthanexpectedunderaneutralmodel
We used two complementary approaches to identify these candidate regions
One(OHW)searchedforverystrongsignalsproducedbysinglemarkersoutof
Hardy`Weinberg equilibrium at Ple847x10`8 A minimum of three significant
markersinaslidingwindowsof10wassettoreachsignificancetocapturemore
reliablesignalsavoidexcessivesinglemarkernoiseandaccount forapossible
Wahlund effect due to presence of the genetic structure in the population
difficulttoavoidinthesetofprogenytestedbullsanalysedThesecondapproach
(LOH) captured more moderate signals but from multiple markers in larger
regions Both approaches were designed to detect signals in regions in which
deleteriousvariantsarestillinsegregationandarenotexcessivelyrare
Asexpectedthetwoapproachesidentifiedregionsthatonlypartiallyoverlapped
(TablesS5S6eS7)Inparticular5regionsoutofthe11identifiedbyOHWwere
in common to the 240 identified by LOH Two of these regions overlap with
previous investigations that used the BovineSNP50 Beadchip Sahana et al
(2013) found a region candidate to carrypotential lethal haplotypes inNordic
Holstein on BTA7 in the region 6708007`10907840 This interval overlaps
withbothBTA7regionsdetectedinthisstudyVanRadenetal(2011)detected
inUSHolsteinalargeregiononBTA15spanning76Mb`82M(onBTA30)that
includedLRP4 gene responsible for themulefootdefectOuranalyseswith the
HDBeadchipindicatesontheonesidethatthelargeregiondetectedbySahana
etal(2013)mayconsistintwonearbyregionsOntheothersidethesignalwe
detectedonBTA15doesnot includetheLRP4 locusthat iscompletely fixed in
theItalianbullspopulationanalysed
In total83mutations in66genes fell in eitherorbothOHWandLOHregions
These were joined to the SNPs output of the EBV tail analyses for further
assessment
(
135
3(Assessing(filtered(deleterious(variants(
Different characteristics of the filtered deleterious variants were assessed to
identify among them strong candidates to submit to a large`scale validation
studyTheassessmentphasepermittedtogainadditionalinformationinfavour
or against SNP and gene candidature to be deleterious We quantified this
evidenceusingasubjectivescore
31$Comparative$genomics$analysis$
In the case of comparative genomics the score quantified the level of
conservationof thealternativeaminoacidresidues inducedbySNPvariants in
ItalianHolsteincattleComparisonwasrunagainstaset100vertebratespecies
that included 62 mammals The level of conservation is a good proxy for the
expected severity of a mutation In the case of the 46 SNPs that induced the
change of an amino acid highly conserved in the species investigatedwemay
hypothesize a strong selection pressure in favour of the maintenance or the
invariant amino acid at that position Therefore the amino acid change or the
stopcodonweobserved incattlemayaffectnotonlytheproteinstructureand
function but also the phenotype of homozygous animals Interestingly in all
thesecasestheSNPallelecodingforthenonconservedvariantswastheminor
allele
Converselyinallcasesofhighvariabilityofthetargetaminoacidacrossspecies
(asmanyassevendifferentaminoacidshavesometimesbeenobserved)poorly
supports the presence of phenotypic effects due to its change Sometimes the
informationcollectedisagainstthepresenceofadeleteriousmutationasinthe
case of the presence in different species of both alleles identified by exome
sequencingThis suggests thatneither of these is greatly affecting the viability
andfitnessofanimalscarryingthem
Wealsoobservedanumberoffailures(N=45)intheliftoverofSNPcoordinates
fromcattletohumanorinproteinalignmentThishappenedforexampleinthe
caseofuncharacterized cattleproteins thathavenoorthologous in thehuman
genome and of someolfactory receptors that belong to a highly variable gene
136
family Even if the failure of alignment is an indication of poor proteinconservationacrossspeciestheconservationofthetargetaminoacidcouldnotbeassessesandinthesecaseswepreferredtoconsidertheinformationmissingandtoscoreitaccordingly
32$Haplotype$analysis$
Asecondassessmentofthefilteredcandidateswasontheirbehaviourinthesirepopulation The ideal test would be the direct genotyping of the SNPs in thepopulationand theevaluationof theirallelicandgenotypicproportions In theabsenceofthisinformationwefirstinferredtheHDhaplotypeassociatedtoSNPallelesandassessedthebehaviourofthehaplotypeinthepopulationassumingthisasasurrogateoftheSNPbehaviourSincedeleteriousallelesareexpectedtoberareandthis isalsowhatweobserved fromcomparativegenomicanalysesweinvestigatethefrequencyonlyofthehaplotypecarryingtheminorSNPalleleagainsttheexpectationto findthecorrespondinghaplotypeneverhomozygousor a number of times much lower than expected from its frequency in thepopulation The strategy yielded some interesting confirmation but was onlypartiallysuccessful
Firstofalltheidentificationofauniquehaplotypecarryingtherarevariantwasnot always possible or straightforward In some cases carriers of a same rarevariantdidnotshareexactlythesamehaplotypeWeacceptedsomevariationifthe haplotype remained clearly distinguishable from those of animals notcarryingtherarevariantbutatthesametimereducedthestringencyofthetestInquiteanumberofcasesasamehaplotypecarriedbothcandidatedeleteriousSNP variants This may occur for example in the case of recent mutations IntheselastcasesthetestcouldnotbecarriedoutFinallymostrarevariantswereassociatedtorarehaplotypesthatarenotlikelytohavehomozygousindividualsinthepopulationindependentlyoftheSNPeffect
Thisassessmentwas thereforeonly significant fora fewhaplotypes thathadalarge number of heterozygous individuals and no or only a very fewhomozygotesandforthosethathadalargenumberofhomozygotes
137
33$Functional$analysis$
The final assessment was the analysis of the function of genes carrying the
candidate deleterious SNP This has some limitation since knowledge of gene
function is only partial and mainly gained from investigations on species
different from cattle In any case we considered the association to genetic
defectsinhumanandtheinductionofseverephenotypesinmouseknockoutas
a positive indication of the likely phenotypic effect of the severe protein
alterationweobservedincattleInadditiongenefunctionandexpressionprofile
was evaluated in search for evidence of gene involvement in fertility
developmentorprocessesfundamentalforcellandorganismviability
Amonggenesidentified22areassociatedtohumansyndromes(Table4)Eleven
syndromesareneurologicaldiseasesaffectingbraindevelopmentand inducing
mental retardation (eg Donnay`Barrow Syndrome and Autosomal recessive
primary microcephaly) the others influence development (eg
Cerebrooculofacioskeletal Syndrome 1 and Adams`Oliver syndrome) or organ
function(egCongenitalpulmonaryalveolarproteinosis)
Most of the same defects translated into Holstein would induce physical and
behavioural anomalies likely causingyoungbulls tobe excluded fromprogeny
testing Itshould in factbeconsideredthattheanimalsweanalysedareonthe
onesidethosethatwillmostinfluencethegeneticmakeoffuturegenerationbut
are not a random sample of Holstein animals These bulls have been progeny
testedandauthorisedtobeusedinartificialinseminationinItalyCandidatesto
progenytestingderivefromthematingofthebest1siresandbest2damsof
the Italian Holstein population At the age of 4 months they are taken to the
Italian Holstein breeder association (ANAFI) test station where they are
subjected to performance test sanitary controls and behavioural tests
Thereafterthequalityoftheirsemenischeckedbeforeauthorizationisgranted
toenterinprogenytestInthissampleofanimalsitisthereforeexpectedtofind
never at the homozygous state mutations causing lethality physical
malformation abnormal behaviour and semen infertility Also those having a
138
majoreffectonsemenqualityandhighsusceptibilitytodiseasesareexpectedto
berarelycarriedatthehomozygousstatus
Association with human syndromes is a strong evidence of the potential
phenotypiceffectofmutationsalteringthegenestructureHoweveritshouldbe
considered that variants found in humans are different fromvariants found in
cattle and that the severity of the phenotype induced by a deleterious gene is
often variable even within species Therefore we evaluated this information
togetherwiththeotherindirectevidencesofvariantstobedeleterious
WealsoenquiredthewebsiteoftheInternationalMousePhenotypeConsortium
(httpswwwmousephenotypeorg) to search for the effects of null alleles in
mice The few genes forwhichnull`mice line is available are involved in basic
functionsfortheorganismthatspanfromvision(IAH1andEPB41L4A)toaging
(RGP1)andtissuesdevelopment(FABP12RIPPLY3RGNEF)orfertility(CFAP96)
Aswithgenesassociatedtohumansyndromesmostoftheabnormalphenotypes
observedinknockmousewouldhamperyoungbullstoenterprogenytesting
Gene Ontology (GO) was evaluated on the entire gene set No enrichment on
particularcategoriesGOtermswasobservedThemostrepresentedcategories
in Biological Processes were as expected ldquometabolic processrdquo and ldquocellular
processrdquo however lower level terms ldquoreproductionrdquo and ldquodevelopmental
processrdquo are represented with meaningful hits The analysis of Molecular
Function indicates thatmany of the hits are enzymes (ldquocatalytic activityrdquo) and
haveregulatoryroles(ldquobindingrdquoandldquoenzymeregulatoractivityrdquo)Thisresultcan
be interpreted indifferentwaysOn theone sidedeleterious allelesmay exert
their deleterious effect in a number of different cellular and developmental
processes the correct process is one while there is many options to make it
wrongAlsothenumberofgenesinvestigatedisrelativelylowandformanyof
them there is little or no information available (eg 11 genes code for a
completelyuncharacterizedprotein)Inadditionsomeofthemmayturnoutnot
tobereallydeleteriousandaddnoisetotheGOanalysis
As a final step all the previous information (GO expression patterns known
function)havebeenused to estimate a scoremeasuring the importanceof the
gene function forcattle survivalandreproductionThescoringof functionwas
139
particularlydifficultGOcategoriesweregeneralandallfunctionstaggedhavearelevant importance for animal survival andproductionTherefore in this casewedidnrsquotusethewholescorerangeplanned(from`1to1)andgavenonegativescoresApositivescoreof1wasattributedonly togenesplayingan importantroleinfertilityanddevelopment
$
34$Strong$candidate$deleterious$variants$
Thescoringprocessyielded21variantswithatotalscore=035variantswithpositive and 135 with negative values (Figure 2 and Table 5) Although theprocesswasnoterrorfreeasitreliedonincompleteinformationandhadsomedegreeofsubjectivityparticularlyinthechoiceofrelativeweightsitidentifiedaset of variants that may be considered strong candidates to have deleteriouseffects in the Italian Holstein breed Variants were evaluated individually andthose in a same genes had sometimes different scores For example the fivevariantscarriedbyALPK2hadscoresrangingfrom`7to`2
140
Figure2Graphicalrepresentationoftotalscoreforeachvariants
minus505
Variation
Score
1_645923001_645985561_1381698111_1464490851_150972144
2_7478962_270362092_270455392_375933832_555609862_904645543_75418123_110403993_110404003_188945763_1137877703_1203356984_53767714_265405974_265406204_749514994_1033779064_1057246734_1130233264_1178417155_445557055_445557425_757447955_940308955_940309556_207762566_382803486_863518016_927092906_1075769256_1078851146_1090916426_1166055226_1186534167_13339757_56553577_97149147_167941797_168358027_168621947_197404097_197409057_263293538_602606098_603324848_658345648_704101558_760297748_770028908_872798628_1117618168_1128416959_83888819_83889249_237928389_510133929_1047396559_10515702910_171251510_1584300310_2802235210_2802311010_8290312110_8517371111_4630576711_4675390311_4740957711_5979141011_7832201011_8794387511_9548610911_9945389411_9946100112_1190618112_1247843312_1248084512_1413689712_5304908112_9076120913_381566413_1178793413_1178793813_5247168413_5393493013_5393494713_5393985313_5454042513_5504774013_5998820513_6304575013_6316961013_6539670814_197749414_364069514_3556894514_4688575014_5718402214_6863536315_18267915_18331715_18340415_248692
15_3018920915_3504878815_4630194115_7503314015_7503764515_8024448915_8025571015_8028875915_8038974515_8048559315_8055593115_8094968915_8275455615_8292533216_456201716_3831963616_6257435517_5309539517_5311265517_6609153117_7237753218_2562355018_3495651318_4637204118_5215480618_5408127718_5413095518_5781976819_2144410319_3107215219_3111397219_3279084319_4225178519_5139519419_5619117519_5726625120_763980120_2341043120_5888423620_6417173921_623911921_2034587821_2034628921_2683773321_3100359921_5527419521_5582646321_5817338421_5912071921_6048248421_6048251021_6284314222_5241595922_5242173422_5242204622_5242610422_5242666822_5695165722_5697484722_5779767423_4588505224_2132460424_2145336424_4676490924_5032719524_5815817824_5815862024_5815878424_5820400324_5820447925_3743318026_1811648726_2571599626_2576457027_503107827_3283255728_28422828_169040328_3571880728_4408176529_198588129_753052329_753066329_26397931
141
Table5SingleanalysesscoreandtotalforeachcandidatevariantsChrchrom
osom
ePosphysicalposition
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
164592300
ENSBTAG00000009014
UPK1B
02
I1
01
2
164598556
ENSBTAG00000009014
UPK1B
0I1
I1
00
I2
1138169811
ENSBTAG00000002966
DNAJC13
03
10
15
1146449085
ENSBTAG00000017418
RRP1B
I3
I1
I1
00
I5
1150972144
ENSBTAG00000036019
RIPPLY3
I3
I1
I1
I1
1I5
2747896
ENSBTAG00000013916
HERC2
30
10
04
227036209
ENSBTAG00000004555
LRP2
I3
I1
10
0I3
227045539
ENSBTAG00000004555
LRP2
I3
31
00
1
237593383
ENSBTAG00000032481
DAPL1
00
I1
00
I1
255560986
ENSBTAG00000032264
Uncharacterized
3I3
I1
00
I1
290464554
ENSBTAG00000017436
ALS2CR11
0I1
I1
00
I2
37541812
ENSBTAG00000046175
Uncharacterized
00
00
00
311040399
ENSBTAG00000013789
OR6K6
I3
I3
I1
00
I7
311040400
ENSBTAG00000013789
OR6K6
I3
I3
I1
00
I7
318894576
ENSBTAG00000004772
THEM
40
I3
I1
00
I4
3113787770
ENSBTAG00000000149
USP40
11
I1
01
2
3120335698
ENSBTAG00000002809
CAPN10
I3
21
00
0
45376771
ENSBTAG00000001500
FIGNL1
01
I1
00
0
426540597
ENSBTAG00000033954
PRPS1L1
1I3
I1
00
I3
426540620
ENSBTAG00000033954
PRPS1L1
1I1
I1
00
I1
474951499
ENSBTAG00000003508
CFAP69
I3
I3
I1
10
I6
4103377906
ENSBTAG00000021073
KIAA1549
0I1
11
01
4105724673
ENSBTAG00000019265
AGK
01
10
02
142
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
4113023326
ENSBTAG00000014768
ZNF786
I3
I3
I1
00
I7
4117841715
ENSBTAG00000017505
PAXIP1
0I1
I1
00
I2
544555705
ENSBTAG00000026323
LYSB
I3
3I1
00
I1
544555742
ENSBTAG00000026323
LYSB
I3
0I1
00
I4
575744795
ENSBTAG00000009064
CSF2RB
I3
I3
10
0I5
594030895
ENSBTAG00000009444
SLC15A5
I3
I1
I1
00
I5
594030955
ENSBTAG00000009444
SLC15A5
I3
I3
I1
00
I7
620776256
ENSBTAG00000008797
Uncharacterized
I3
I3
00
0I6
638280348
ENSBTAG00000007568
MEPE
0I3
I1
00
I4
686351801
ENSBTAG00000003523
UGT2B15
1I1
I1
00
I1
692709290
ENSBTAG00000010954
ART3
I3
2I1
00
I2
6107576925
ENSBTAG00000038606
DBDR
0I3
I1
00
I4
6107885114
ENSBTAG00000001505
GRK4
I3
1I1
00
I3
6109091642
ENSBTAG00000007001
SLC26A1
1I3
I1
00
I3
6116605522
ENSBTAG00000014058
LDB2
01
I1
00
0
6118653416
ENSBTAG00000012458
PSAPL1
I3
I1
I1
00
I5
71333975
ENSBTAG00000015602
C7H5orf45
I3
I3
I1
00
I7
75655357
ENSBTAG00000045588
LOC100298356
1I3
I1
00
I3
79714914
ENSBTAG00000046423
LOC618007
00
I1
00
I1
716794179
ENSBTAG00000012314
LDLR
02
10
03
716835802
ENSBTAG00000009568
KANK2
01
I1
00
0
716862194
ENSBTAG00000009569
DOCK6
01
10
02
719740409
ENSBTAG00000000414
FUT5
I3
I3
I1
00
I7
719740905
ENSBTAG00000000414
FUT5
I3
I3
I1
00
I7
143
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
726329353
ENSBTAG00000004860
SLC27A6
0I1
I1
00
I2
860260609
ENSBTAG00000011420
CA9
I3
I1
I1
00
I5
860332484
ENSBTAG00000011433
RGP1
I3
0I1
10
I3
865834564
ENSBTAG00000035708
LOC527795
00
I1
00
I1
870410155
ENSBTAG00000005466
C8H8orf58
I3
I3
I1
00
I7
876029774
ENSBTAG00000015027
Clusterin
00
I1
10
0
877002890
ENSBTAG00000012921
KIF24
12
I1
00
2
887279862
ENSBTAG00000002220
SPTLC1
00
10
01
8111761816
ENSBTAG00000006532
CDK5RAP2
I3
I3
11
1I3
8112841695
ENSBTAG00000013838
ALLC
01
I1
00
0
98388881
ENSBTAG00000015335
BAI3
00
I1
00
I1
98388924
ENSBTAG00000015335
BAI3
00
I1
00
I1
923792838
ENSBTAG00000046593
Uncharacterized
03
00
03
951013392
ENSBTAG00000018528
USP45
02
I1
00
1
9104739655
ENSBTAG00000018810
THBS2
03
10
15
9105157029
ENSBTAG00000000918
ERMARD
I3
21
00
0
10
1712515
ENSBTAG00000014750
EPB41L4A
0I1
I1
I1
0I3
10
15843003
ENSBTAG00000009030
NOX5
I3
I3
I1
00
I7
10
28022352
ENSBTAG00000035309
OR4F15
I3
1I1
00
I3
10
28023110
ENSBTAG00000035309
OR4F15
I3
I1
I1
00
I5
10
82903121
ENSBTAG00000012565
PCNX
I3
2I1
01
I1
10
85173711
ENSBTAG00000011683
NUM
BI3
0I1
00
I4
11
46305767
ENSBTAG00000002288
NT5DC4
0I3
I1
00
I4
11
46753903
ENSBTAG00000019072
PSD4
0I1
I1
00
I2
144
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
11
47409577
ENSBTAG00000009033
TEX37
I3
I2
I1
10
I5
11
59791410
ENSBTAG00000006027
USP34
02
I1
01
211
78322010
ENSBTAG00000015490
HS1BP3
00
I1
00
I1
11
87943875
ENSBTAG00000001140
IAH1
I3
3I1
10
011
95486109
ENSBTAG00000017576
GPR144
I3
I3
I1
00
I7
11
99453894
ENSBTAG00000038831
PHYHD1
11
I1
00
111
99461001
ENSBTAG00000010601
DOLK
12
10
04
12
11906181
ENSBTAG00000001133
VWA8
I3
1I1
00
I3
12
12478433
ENSBTAG00000014752
AKAP11
1I3
I1
I1
0I4
12
12480845
ENSBTAG00000014752
AKAP11
1I1
I1
I1
0I2
12
14136897
ENSBTAG00000016691
LACC1
I3
I3
I1
00
I7
12
53049081
ENSBTAG00000032821
SCEL
I3
I3
I1
00
I7
12
90761209
ENSBTAG00000019961
ATP4B
I3
I3
I1
00
I7
13
3815664
ENSBTAG00000034991
SLX4IP
00
I1
00
I1
13
11787934
ENSBTAG00000008650
CAMK
1D
I3
0I1
00
I4
13
11787938
ENSBTAG00000008650
CAMK
1D
I3
0I1
00
I4
13
52471684
ENSBTAG00000013753
DDRGK1
02
I1
00
113
53934930
ENSBTAG00000039520
SIRPB1
0I3
I1
00
I4
13
53934947
ENSBTAG00000039520
SIRPB1
00
I1
00
I1
13
53939853
ENSBTAG00000039520
SIRPB1
0I3
I1
00
I4
13
54540425
ENSBTAG00000047150
RTEL
I3
0I1
00
I4
13
55047740
ENSBTAG00000031718
OGFR
01
I1
00
013
59988205
ENSBTAG00000027390
RTFDC1
01
I1
00
013
63045750
ENSBTAG00000009144
BPIFA2A
0I3
I1
00
I4
145
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
13
63169610
ENSBTAG00000019752
BPIFA2B
0I1
I1
00
I2
13
65396708
ENSBTAG00000006021
CEP250
0I3
I1
00
I4
14
1977494
ENSBTAG00000026350
SPATC1
I3
I3
I1
00
I7
14
3640695
ENSBTAG00000015985
GPR20
00
I1
00
I1
14
35568945
ENSBTAG00000037399
Uncharacterized
30
I1
00
2
14
46885750
ENSBTAG00000047661
FABP12
0I1
I1
00
I2
14
57184022
ENSBTAG00000026247
PKHD1L1
I3
3I1
00
I1
14
68635363
ENSBTAG00000017165
MATN2
01
I1
00
0
15
182679
ENSBTAG00000046228
Olfactoryreceptor
00
I1
0o
I1
15
183317
ENSBTAG00000046228
Olfactoryreceptor
00
I1
00
I1
15
183404
ENSBTAG00000046228
Olfactoryreceptor
00
I1
00
I1
15
248692
ENSBTAG00000045558
LOC100299084
00
I1
0o
I1
15
30189209
ENSBTAG00000005357
VPS11
02
I1
00
1
15
35048788
ENSBTAG00000005340
SERGEF
0I3
I1
00
I4
15
46301941
ENSBTAG00000002289
NLRP14
I3
I3
I1
00
I7
15
75033140
ENSBTAG00000012053
ACCSL
1I3
I1
00
I3
15
75037645
ENSBTAG00000012053
ACCSL
I3
I3
I1
00
I7
15
80244489
ENSBTAG00000046527
LOC100139830
I3
0I1
00
I4
15
80255710
ENSBTAG00000046285
OR8I2
10
I1
00
0
15
80288759
ENSBTAG00000001291
OR8H3
1I1
I1
00
I1
15
80389745
ENSBTAG00000045709
LOC514057
13
I1
00
3
15
80485593
ENSBTAG00000035985
LOC101909743OR8K1
1I1
I1
00
I1
15
80555931
ENSBTAG00000022858
LOC783561
I3
0I1
00
I4
15
80949689
ENSBTAG00000039110
OR8U1
1I3
I1
00
I3
146
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
15
82754556
ENSBTAG00000021375
OR10Q1
01
I1
00
015
82925332
ENSBTAG00000021233
OR5B17
I3
0I1
00
I4
16
4562017
ENSBTAG00000019799
FCAM
RI3
I3
I1
00
I7
16
38319636
ENSBTAG00000014794
SCYL3
I3
0I1
00
I4
16
62574355
ENSBTAG00000035226
TOR1AIP1
00
I1
00
I1
17
53095395
ENSBTAG00000020414
DHX37
1I1
I1
00
I1
17
53112655
ENSBTAG00000020414
DHX37
0I3
I1
00
I4
17
66091531
ENSBTAG00000010847
FOXN4
0I1
I1
00
I2
17
72377532
ENSBTAG00000008996
SFI1
0I3
I1
00
I4
18
25623550
ENSBTAG00000005170
GPR114
0I3
I1
01
I3
18
34956513
ENSBTAG00000001286
ELMO
33
3I1
11
718
46372041
ENSBTAG00000015912
DMKN
0
1I1
00
018
52154806
ENSBTAG00000001262
IRGQ
I3
1I1
00
I3
18
54081277
ENSBTAG00000009337
PNMA
L1
0I3
I1
00
I4
18
54130955
ENSBTAG00000048021
PNMA
L2
I3
1I1
00
I3
18
57819768
ENSBTAG00000003231
LIM2
1
I3
10
0I1
19
21444103
ENSBTAG00000011011
SSH2
I3
I3
I1
00
I7
19
31072152
ENSBTAG00000022509
DNAH
9I3
0I1
00
I4
19
31113972
ENSBTAG00000022509
DNAH
90
0I1
00
I1
19
32790843
ENSBTAG00000015294
COX10
12
10
04
19
42251785
ENSBTAG00000013685
KRT31
I3
0I1
00
I4
19
51395194
ENSBTAG00000015980
FASN
1I1
10
01
19
56191175
ENSBTAG00000007910
EXOC7
3I3
I1
00
I1
19
57266251
ENSBTAG00000011387
NAT9
12
I1
00
2
147
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
20
7639801
ENSBTAG00000005633
RGNEF
1I1
I1
10
0
20
23410431
ENSBTAG00000008871
DDX4
1I3
I1
00
I3
20
58884236
ENSBTAG00000005514
TRIO
I3
0I1
00
I4
20
64171739
ENSBTAG00000005021
SEMA5A
01
10
02
21
6239119
ENSBTAG00000016495
LINS
02
10
03
21
20345878
ENSBTAG00000012326
LOC101906218
00
I1
00
I1
21
20346289
ENSBTAG00000012326
LOC101906218
I3
0I1
00
I4
21
26837733
ENSBTAG00000047587
Uncharacterized
10
00
01
21
31003599
ENSBTAG00000047166
SH2D7
0I3
I1
00
I4
21
55274195
ENSBTAG00000039117
FAM179B
0I3
I1
00
I4
21
55826463
ENSBTAG00000014001
MAP1A
0I3
I1
00
I4
21
58173384
ENSBTAG00000009836
CHGA
I3
I1
I1
00
I5
21
59120719
ENSBTAG00000046590
FAM181A
01
I1
00
0
21
60482484
ENSBTAG00000017096
ERICH1
0I3
I1
00
I4
21
60482510
ENSBTAG00000017096
ERICH1
01
I1
00
0
21
62843142
ENSBTAG00000012833
ATG2B
0I3
I1
00
I4
22
52415959
ENSBTAG00000015839
MAP4
00
I1
00
I1
22
52421734
ENSBTAG00000015839
MAP4
1I1
I1
00
I1
22
52422046
ENSBTAG00000015839
MAP4
I3
I3
I1
00
I7
22
52426104
ENSBTAG00000047174
Uncharacterized
I3
00
00
I3
22
52426668
ENSBTAG00000047174
Uncharacterized
10
00
01
22
56951657
ENSBTAG00000021645
MBD4
1I1
I1
00
I1
22
56974847
ENSBTAG00000021643
EFCAB12
0I1
I1
00
I2
22
57797674
ENSBTAG00000031115
Uncharacterized
01
00
01
148
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
23
45885052
ENSBTAG00000005248
OFCC1
00
I1
00
I1
24
21324604
ENSBTAG00000000998
SLC39A6
02
I1
00
1
24
21453364
ENSBTAG00000011622
C18orf21
I3
I3
I1
00
I7
24
46764909
ENSBTAG00000015210
LOXHD1
0I3
10
0I2
24
50327195
ENSBTAG00000002660
CCDC11
I3
11
01
0
24
58158178
ENSBTAG00000014284
ALPK2
I3
I1
I1
00
I5
24
58158620
ENSBTAG00000014284
ALPK2
0I1
I1
00
I2
24
58158784
ENSBTAG00000014284
ALPK2
I3
I3
I1
00
I7
24
58204003
ENSBTAG00000014284
ALPK2
I3
I3
I1
00
I7
24
58204479
ENSBTAG00000014284
ALPK2
I3
2I1
00
I2
25
37433180
ENSBTAG00000040568
Uncharacterized
00
I1
00
I1
26
18116487
ENSBTAG00000046239
C10orf12
0I1
I1
00
I2
26
25715996
ENSBTAG00000010522
Uncharacterized
10
00
01
26
25764570
ENSBTAG00000046727
Olfactoryreceptor
00
I1
00
I1
27
5031078
ENSBTAG00000007340
LOC101906349
0I1
I1
00
I2
27
32832557
ENSBTAG00000011684
RAB11FIP1
10
I1
00
0
28
284228
ENSBTAG00000000079
CCSAP
0I1
I1
00
I2
28
1690403
ENSBTAG00000048153
Uncharacterized
10
00
01
28
35718807
ENSBTAG00000047317
CL43
1I1
I1
00
I1
28
44081765
ENSBTAG00000032527
ERCC6
I3
01
00
I2
29
1985881
ENSBTAG00000004081
FAT3
30
I1
00
2
29
7530523
ENSBTAG00000046360
Uncharacterized
01
00
01
29
7530663
ENSBTAG00000046360
Uncharacterized
1I1
00
00
29
26397931
ENSBTAG00000013568
UEVLD
I3
I3
I1
00
I7
149
Interestingly13ofthe22genesassociatedtohumansyndromescarryvariants
appearing in the short list of the 35 positives and only 6 in that of 135 with
negative values This in spite of the low weight (from 1 to 1) given to this
parameterEightofthe35positivevariantsareinproteinsstilluncharacterized
Sixvariantshadascoreof4orhigherThehighestscore(totalscore=7)wasfor
ELMO3 (Engulfment and cell motility gene 3) This gene is involved in cell
motility and migration ELMO3 is also a homolog of C( elegans Ced12 that is
required for many developmental processes included gonadal morphogenesis
Mutant for this gene in C( elegans suffer from abnormal development usually
with lethal consequences or sublethal developmental defects including small
embryosdelayeddevelopmentandreducedfertility(Gumiennyetal2001)In
human it is expressed in testis prostate and ovary
(httpwwwgenecardsorgcgibincarddispplgene=ELMO3)
ThegeneDNAJC13(DnaJ(Hsp40)homologsubfamilyCmember13)hadatotal
score of 5 This gene is involved in the onset of Parkinson disease DNAJC13
regulates the dynamics of clathrin coats on early endosomes Cellular analysis
shows that the mutation Arg855Ser identified by VilarintildeoGuumlell et al (2014)
confers a gainoffunction that impairs endosomal transportApossible roleof
DNAJC13geneinTourettesyndromechronicticdisorderisunderinvestigation
(Sundarametal2011)
A second genehad a scoreof 5THBS2 (Thrombospondin2) It belongs to the
thrombospondin family and is a powerful inhibitor of tumour growth and
angiogenesis It is highly expressed in testis in fetal Leydig cell Hirose et al
(2008) found a significant association between an intronic SNP in the THBS2
gene and lumbar disc herniation Thrombospondin2 is also a matricellular
proteinfoundinhumanserumDeletionofTSP2causesagedependentdilated
cardiomyopathy (Hanatani et al 2014) Mutant mice for this gene display a
variety of mutant phenotype Kyriakides et al 1998 suggest that THBS2
modulates the cell surface properties of mesenchymal cells affecting cell
functionssuchasadhesionandmigration
150
Three genes had a score of 4 all three are associated to human syndromes
Mutation in HERC2 (HECT and RLD domain containing E3 ubiquitin protein
ligase2)isinvolvedinautosomalrecessivementalretardation38(Puffenberger
etal2012)Jietal(2000)identifiedinmutantmouseneuromuscularsecretory
vesicledefectsandspermacrosomedefectsMoreovergeneticvariationsinthis
gene are associated with skinhaireye pigmentation variability Mutations in
DOLK gene (dolichol kinase) are associated with dolichol kinase deficiency
(Lefeberetal2011)COX10(cytochromecoxidaseassemblyhomolog10)gene
is associated to mitochondrial complex IV deficiency causing Leigh syndrome
(Antonickaetal2003)
Conclusions)
Exomesequencingthereforeprovidedvaluableinformationoncodingsequence
variants and on their potential effect on protein function As a group variants
classifiedasdeleteriouswererarerthanvariantsclassifiedasneutralSincethis
isanexpectedeffectofpurifyingselectionthissetofvariantwaslikelyenriched
forvariantshavingaphenotypiceffect
Theinclusionintheanalysesofplusandminusvariantanimalsforfertilitytraits
likelypermittedtosequencethemostimportantvariantsinfluencingthesetraits
butthelownumberofanimalssequencedcombinedwiththecomplexityofthe
traitpermittedtofindonlyasuggestiveassociationbetweenSNPsandtraitsItrsquos
likely that amuch larger number of individuals is to be sequenced if genome
wideassociationisthestrategyoneswanttopursue
We explored an alternative strategy to confirm suggestivedeleterious variants
based on indirect evidence of the variant effect by assessing i) evolutionary
constraintsassociatedtotheaminoacidresiduecodedbythevariantii)variant
behaviour in a large population In the absence of direct genotyping of the
candidatevariantsweusedhaplotypeinformationassurrogateiii)functionand
effectsinotherspecies
151
We attempted to quantify these criteria through a score subjective and only
indicativebutpermitted tohighlighted35genes so farneglected in cattlebut
deservingfurtherattentionandpotentiallyhavingdeleteriouseffectsinHolstein
andothercattlebreedsThesemaybepartoftherecessivedeleterioussetthat
constitutethegeneticloadofItalianHolstein
References)AntonickaHLearySCGuercinGHAgarJNHorvathRKennawayNG
HardingCOJakschMShoubridgeEA2003MutationsinCOX10
resultinadefectinmitochondrialhemeAbiosynthesisandaccountfor
multipleearlyonsetclinicalphenotypesassociatedwithisolatedCOX
deficiencyHumMolGenet122693ndash2702doi101093hmgddg284
AshwellMSHeyenDWSonstegardTSVanTassellCPDaYVanRaden
PMRonMWellerJILewinHA2004DetectionofQuantitativeTrait
LociAffectingMilkProductionHealthandReproductiveTraitsin
HolsteinCattleJournalofDairyScience87468ndash475
doi103168jdsS00220302(04)731860
AstonKICarrellDT2009GenomeWideStudyofSingleNucleotide
PolymorphismsAssociatedWithAzoospermiaandSevere
OligozoospermiaJournalofAndrology30711ndash725
doi102164jandrol109007971
AulchenkoYSRipkeSIsaacsADuijnCMvan2007GenABELanRlibrary
forgenomewideassociationanalysisBioinformatics231294ndash1296
doi101093bioinformaticsbtm108
BamshadMJNgSBBighamAWTaborHKEmondMJNickersonDA
ShendureJ2011ExomesequencingasatoolforMendeliandisease
genediscoveryNatRevGenet12745ndash755doi101038nrg3031
BieseckerLGShiannaKVMullikinJC2011Exomesequencingtheexpert
viewGenomeBiology12128doi101186gb2011129128
152
BiffaniSMarusiMBiscariniFCanavesiF2005Developingagenetic
evaluationforfertilityusingangularityandmilkyieldascorrelatedtraits
InterbullBulletin063
BiffaniSSamoreacuteABCanavesiF2002Inbreedingdepressionforproduction
reproductionandfunctionaltraitsinItalianHolsteincattleInstitut
NationaldelaRechercheAgronomique(INRA)pp0ndash4
BolgerAMLohseMUsadelB2014TrimmomaticAflexibletrimmerfor
IlluminaSequenceDataBioinformaticsbtu170
doi101093bioinformaticsbtu170
BrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingand
MissingDataInferenceforWholeGenomeAssociationStudiesByUseof
LocalizedHaplotypeClusteringAmJHumGenet811084ndash1097
CharlesworthDWillisJH2009ThegeneticsofinbreedingdepressionNat
RevGenet10783ndash796doi101038nrg2664
ChunSFayJC2009Identificationofdeleteriousmutationswithinthree
humangenomesGenomeRes191553ndash1561
doi101101gr092619109
ConsortiumTEP2012AnintegratedencyclopediaofDNAelementsinthe
humangenomeNature48957ndash74doi101038nature11247
DePristoMABanksEPoplinRGarimellaKVMaguireJRHartlC
PhilippakisAAdelAngelGRivasMAHannaMMcKennaA
FennellTJKernytskyAMSivachenkoAYCibulskisKGabrielSB
AltshulerDDalyMJ2011Aframeworkforvariationdiscoveryand
genotypingusingnextgenerationDNAsequencingdataNatGenet43
491ndash498doi101038ng806
GoymerP2007SynonymousmutationsbreaktheirsilenceNatRevGenet8
92ndash92doi101038nrg2056
GumiennyTLBrugneraEToselloTrampontACKinchenJMHaneyLB
NishiwakiKWalkSFNemergutMEMacaraIGFrancisRSchedl
TQinYVanAelstLHengartnerMORavichandranKS2001CED
12ELMOaNovelMemberoftheCrkIIDock180RacPathwayIs
RequiredforPhagocytosisandCellMigrationCell10727ndash41
doi101016S00928674(01)005207
153
HanataniSIzumiyaYTakashioSKimuraYArakiSRokutandaTTsujita
KYamamotoETanakaTYamamuroMKojimaSTayamaS
KaikitaKHokimotoSOgawaH2014Circulatingthrombospondin2
reflectsdiseaseseverityandpredictsoutcomeofheartfailurewith
reducedejectionfractionCircJ78903ndash910
HiroseYChibaKKarasugiTNakajimaMKawaguchiYMikamiY
FuruichiTMioFMiyakeAMiyamotoTOzakiKTakahashiA
MizutaHKuboTKimuraTTanakaTToyamaYIkegawaS2008
AfunctionalpolymorphisminTHBS2thataffectsalternativesplicingand
MMPbindingisassociatedwithlumbardischerniationAmJHum
Genet821122ndash1129doi101016jajhg200803013
httpsgenomeucsceduindexhtml[WWWDocument]nd
JamrozikJFatehiJKistemakerGJSchaefferLR2005EstimatesofGenetic
ParametersforCanadianHolsteinFemaleReproductionTraitsJournalof
DairyScience882199ndash2208doi103168jdsS00220302(05)728952
JanisEWiggintonDJC2005AnoteonexacttestsofHardyWeinberg
equilibriumAmericanjournalofhumangenetics76887ndash93
doi101086429864
JiYRebertNAJoslinJMHigginsMJSchultzRANichollsRD2000
StructureoftheHighlyConservedHERC2GeneandofMultiplePartially
DuplicatedParalogsinHumanGenomeRes10319ndash329
KentWJSugnetCWFureyTSRoskinKMPringleTHZahlerAM
HausslerandD2002TheHumanGenomeBrowseratUCSCGenome
Res12996ndash1006doi101101gr229102
KumarPHenikoffSNgPC2009Predictingtheeffectsofcodingnon
synonymousvariantsonproteinfunctionusingtheSIFTalgorithmNat
Protoc41073ndash1081doi101038nprot200986
KyriakidesTRZhuYHSmithLTBainSDYangZLinMTDanielson
KGIozzoRVLaMarcaMMcKinneyCEGinnsEIBornsteinP
1998Micethatlackthrombospondin2displayconnectivetissue
abnormalitiesthatareassociatedwithdisorderedcollagenfibrillogenesis
anincreasedvasculardensityandableedingdiathesisJCellBiol140
419ndash430
154
LaneEACroweMABeltmanMEMoreSJ2013Theinfluenceofcowand
managementfactorsonreproductiveperformanceofIrishseasonal
calvingdairycowsAnimReprodSci14134ndash41
doi101016janireprosci201306019
LefeberDJdeBrouwerAPMMoravaERiemersmaMSchuurs
HoeijmakersJHMAbsmannerBVerrijpKvandenAkkerWMR
HuijbenKSteenbergenGvanReeuwijkJJozwiakAZuckerN
LorberALammensMKnopfCvanBokhovenHGruumlnewaldSLehle
LKapustaLMandelHWeversRA2011AutosomalRecessive
DilatedCardiomyopathyduetoDOLKMutationsResultsfromAbnormal
DystroglycanOMannosylationPLoSGenet7e1002427
doi101371journalpgen1002427
LiHDurbinR2009FastandaccurateshortreadalignmentwithBurrows
WheelertransformBioinformatics251754ndash1760
doi101093bioinformaticsbtp324
LiHHandsakerBWysokerAFennellTRuanJHomerNMarthG
AbecasisGDurbinR1000GenomeProjectDataProcessingSubgroup
2009TheSequenceAlignmentMapformatandSAMtoolsBioinformatics
252078ndash2079doi101093bioinformaticsbtp352
LiMXKwanJSHBaoSYYangWHoSLSongYQShamPC2013
PredictingMendelianDiseaseCausingNonSynonymousSingle
NucleotideVariantsinExomeSequencingStudiesPLoSGenet9
e1003143doi101371journalpgen1003143
MacArthurDGBalasubramanianSFrankishAHuangNMorrisJWalter
KJostinsLHabeggerLPickrellJKMontgomerySBAlbersCA
ZhangZConradDFLunterGZhengHAyubQDePristoMA
BanksEHuMHandsakerRERosenfeldJFromerMJinMMuXJ
KhuranaEYeKKayMSaundersGISunerMMHuntTBarnes
IHAAmidCCarvalhoSilvaDRBignellAHSnowCYngvadottirB
BumpsteadSCooperDNXueYRomeroIGWangJLiYGibbs
RAMcCarrollSADermitzakisETPritchardJKBarrettJCHarrow
JHurlesMEGersteinMBTylerSmithC2012Asystematicsurvey
155
oflossoffunctionvariantsinhumanproteincodinggenesScience335
823ndash828doi101126science1215040
MajewskiJSchwartzentruberJLalondeEMontpetitAJabadoN2011
WhatcanexomesequencingdoforyouJMedGenet48580ndash589
doi101136jmedgenet2011100223
McClureMCBickhartDNullDVanRadenPXuLWiggansGLiuG
SchroederSGlasscockJArmstrongJColeJBVanTassellCP
SonstegardTS2014BovineExomeSequenceAnalysisandTargeted
SNPGenotypingofRecessiveFertilityDefectsBH1HH2andHH3Reveal
aPutativeCausativeMutationinSMC2forHH3PLoSONE9e92769
doi101371journalpone0092769
McClureMKimEBickhartDNullDCooperTColeJWiggansGAjmone
MarsanPColliLSantusELiuGESchroederSMatukumalliLVan
TassellCSonstegardT2013FineMappingforWeaverSyndromein
BrownSwissCattleandtheIdentificationof41ConcordantMutations
acrossNRCAMPNPLA8andCTTNBP2PLoSONE8e59251
doi101371journalpone0059251
McKennaAHannaMBanksESivachenkoACibulskisKKernytskyA
GarimellaKAltshulerDGabrielSDalyMDePristoMA2010The
GenomeAnalysisToolkitAMapReduceframeworkforanalyzingnext
generationDNAsequencingdataGenomeRes201297ndash1303
doi101101gr107524110
MiHDongQMuruganujanAGaudetPLewisSThomasPD2010
PANTHERversion7improvedphylogenetictreesorthologsand
collaborationwiththeGeneOntologyConsortiumNuclAcidsRes38
D204ndashD210doi101093nargkp1019
MrodeRKearneyJFBiffaniSCoffeyMCanavesiF2009Short
communicationGeneticrelationshipsbetweentheHolsteincow
populationsofthreeEuropeandairycountriesJournalofDairyScience
925760ndash5764doi103168jds20081931
PelosoGMAuerPLBisJCVoormanAMorrisonACStitzielNOBrody
JAKhetarpalSACrosbyJRFornageMIsaacsAJakobsdottirJ
FeitosaMFDaviesGHuffmanJEManichaikulADavisBLohman
156
KJoonAYSmithAVGroveMLZanoniPRedonVDemissieS
LawsonKPetersUCarlsonCJacksonRDRyckmanKKMackey
RHRobinsonJGSiscovickDSSchreinerPJMychaleckyjJC
PankowJSHofmanAUitterlindenAGHarrisTBTaylorKD
StaffordJMReynoldsLMMarioniREDehghanAFrancoOHPatel
APLuYHindyGGottesmanOBottingerEPMelanderOOrho
MelanderMLoosRJFDugaSMerliniPAFarrallMGoelA
AsseltaRGirelliDMartinelliNShahSHKrausWELiMRader
DJReillyMPMcPhersonRWatkinsHArdissinoDZhangQWang
JTsaiMYTaylorHACorreaAGriswoldMELangeLAStarrJM
RudanIEiriksdottirGLaunerLJOrdovasJMLevyDChenYDI
ReinerAPHaywardCPolasekODearyIJBoreckiIBLiuY
GudnasonVWilsonJGvanDuijnCMKooperbergCRichSSPsaty
BMRotterJIOrsquoDonnellCJRiceKBoerwinkleEKathiresanS
CupplesLA2014AssociationofLowFrequencyandRareCoding
SequenceVariantswithBloodLipidsandCoronaryHeartDiseasein
56000WhitesandBlacksTheAmericanJournalofHumanGenetics94
223ndash232doi101016jajhg201401009
PryceJEVeerkampRFThompsonRHillWGSimmG1997Genetic
aspectsofcommonhealthdisordersandmeasuresoffertilityinHolstein
FriesiandairycattleAnimalScience65353ndash360
doi101017S1357729800008559
PryceJEWoolastonRBerryDPWallEWintersMButlerRShafferM
2014WorldTrendsinDairyCowFertilityin10ThWorldCongressof
GeneticsAppliedtoLivestockProduction
PuffenbergerEGJinksRNWangHXinBFiorentiniCShermanEA
DegrazioDShawCSougnezCCibulskisKGabrielSKelleyRI
MortonDHStraussKA2012Ahomozygousmissensemutationin
HERC2associatedwithglobaldevelopmentaldelayandautismspectrum
disorderHumMutat331639ndash1646doi101002humu22237
PurcellSNealeBToddBrownKThomasLFerreiraMARBenderD
MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKA
157
ToolSetforWholeGenomeAssociationandPopulationBasedLinkage
AnalysesAmJHumGenet81559ndash575
SahanaGNielsenUSAamandGPLundMSGuldbrandtsenB2013Novel
HarmfulRecessiveHaplotypesIdentifiedforFertilityTraitsinNordic
HolsteinCattlePLoSONE8e82909doi101371journalpone0082909
SchuumltzEScharfensteinMBrenigB2008Implicationofcomplexvertebral
malformationandbovineleukocyteadhesiondeficiencyDNAbased
testingondiseasefrequencyintheHolsteinpopulationJDairySci91
4854ndash4859doi103168jds20081154
SimmonsMJCrowJF1977MutationsAffectingFitnessinDrosophila
PopulationsAnnualReviewofGenetics1149ndash78
doi101146annurevge11120177000405
SoggiuAPirasCHusseinHACanioMDGaviraghiAGalliAUrbaniA
BonizziLRoncadaP2013UnravellingthebullfertilityproteomeMol
BioSyst91188ndash1195doi101039C3MB25494A
StitzielNOKiezunASunyaevS2011Computationalandstatistical
approachestoanalyzingvariantsidentifiedbyexomesequencing
GenomeBiology12227doi101186gb2011129227
SundaramSKHuqAMSunZYuWBennettLWilsonBJBehenME
ChuganiHT2011ExomesequencingofapedigreewithTourette
syndromeorchronicticdisorderAnnNeurol69901ndash904
doi101002ana22398
VanRadenPMOlsonKMNullDJHutchisonJL2011Harmfulrecessive
effectsonfertilitydetectedbyabsenceofhomozygoushaplotypesJournal
ofDairyScience946153ndash6161doi103168jds20114624
VilarintildeoGuumlellCRajputAMilnerwoodAJShahBSzuTuCTrinhJYuI
EncarnacionMMunsieLNTapiaLGustavssonEKChouP
TatarnikovIEvansDMPishottaFTVoltaMBeccanoKellyD
ThompsonCLinMKShermanHEHanHJGuentherBL
WassermanWWBernardVRossCJAppelCresswellSStoesslAJ
RobinsonCADicksonDWRossOAWszolekZKAaslyJOWuR
MHentatiFGibsonRAMcPhersonPSGirardMRajputMRajput
158
AHFarrerMJ2014DNAJC13mutationsinParkinsondiseaseHumMolGenet231794ndash1801doi101093hmgddt570
WangKLiMHakonarsonH2010ANNOVARfunctionalannotationofgeneticvariantsfromhighthroughputsequencingdataNuclAcidsRes38e164ndashe164doi101093nargkq603
WathesDCPollottGEJohnsonKFRichardsonHCookeJS2014HeiferfertilityandcarryoverconsequencesforlifetimeproductionindairyandbeefcattleAnimal8Suppl191ndash104doi101017S1751731114000755
YangYMuznyDMReidJGBainbridgeMNWillisAWardPABraxtonABeutenJXiaFNiuZHardisonMPersonRBekheirniaMRLeducMSKirbyAPhamPScullJWangMDingYPlonSELupskiJRBeaudetALGibbsRAEngCM2013ClinicalWhole
ExomeSequencingfortheDiagnosisofMendelianDisordersNewEnglandJournalofMedicine3691502ndash1511doi101056NEJMoa1306555
ZiminAVDelcherALFloreaLKelleyDRSchatzMCPuiuDHanrahanFPerteaGTassellCPVSonstegardTSMarccedilaisGRobertsMSubramanianPYorkeJASalzbergSL2009Awholegenome
assemblyofthedomesticcowBostaurusGenomeBiology10R42doi101186gb2009104r42
159
Supplementary)files)
YoucanfindChapter6supplementaryfilesat
- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)
- GoogleDrive(httptinyurlcomPhDMMChapter06)orscantheQRcode
)
Supplementary)Figure)S1)Coverage)analyses)
A)Averagecoverageperanimals
B)Boxplotofchromosomescoverageperanimal
C)BoxplotofanimalcoverageperchromosomeBluelinerepresenttheexpected
50Xcoverage
Supplementary)Figure)S2)Distribution)of)variant)sequencing)depth)
)
Supplementary) Figure) S3) Correlation) between) sequencing) depth) and)
number) of) discordant) genotypes) In) red) values) for) two) outlier) animals)
having)concordance)lt846))
)
Supplementary)Figure)S4)Multidimensional)scaling)of)Italian)Holstein)bulls)
sequenced))
)
Supplementary)Figure)S5)Multidimensional)scaling)of)Italian)Holstein)bulls)
analysed)with)the)800K)SNPchip))
160
Supplementary) Figure) S6) MAF) distribution) on) neutral) (blue)) and)
deleterious)(red))variants))
)
Supplementary) Table) S1) EBV) distribution) on) Italian) Holstein) population)
and)SNPchip)dataset)
Sheet1EBVthresholdvaluetodefineplusandminustailforPROT(kgprotein)
SCC(somaticcellcount)FERT(fertility)andLONG(functional longevity) All
ItalianHolsteinFAbullswereincludeinthisanalysis
Sheet2Numberofdatasetanimalsineachtailforeachtrait
Supplementary)Table)S2)Exome)probes)statistics)number)and)bp)covered)
per)chromosomes)
Supplementary) Table) S3) Concordance) between) animal) genotypes) in)
sequencing)and)SNPchip)data)
Error1 HD data homozygotes ndash Exome data heterozygotes Error2 HD data
heterozygotesndashExomedatahomozygotes
Supplementary) Table) S4) List) of) suggestive) genes) deriving) from) the)
analysis)of)exomes)of)animals) in)the)tails)of)male)and)female)fertility)EBV)
distributions))
Header(code
bull Bfreq_totFrequencyofBalleleinentiredataset
bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset
bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset
bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset
bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset
bull CallRate_totSNPcallratendashentiredataset
bull CallRate_MHSNPcallratendashhighmalefertilitydataset
bull CallRate_MLSNPcallratendashlowmalefertilitydataset
bull CallRate_FHSNPcallratendashhighfemalefertilitydataset
bull CallRate_FLSNPcallratendashhighmalefertilitydataset
bull HWeqHardyWeinbergequilibriumpvalue
bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele
bull Geno_HetNumberofanimalswithheterozygotegenotype
161
bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele
bull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)dataset
bull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)dataset
bull LOW_CRSNPcallratefromlowfertility(maleandfemale)dataset
bull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallele
bull LOW_Het Number of animals from low fertility (male and female) dataset with
heterozygotegenotype
bull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallele
bull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and
female)dataset
bull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)dataset
bull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)dataset
bull HIGH_CRSNPcallratefromhighfertility(maleandfemale)dataset
bull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallele
bull HIGH_Het Number of animals from high fertility (male and female) dataset with
heterozygotegenotype
bull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallele
bull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and
female)dataset
bull Male_TREND inheritance models implemented in PLINK trend test on male fertility
dataset
bull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale
fertilitydataset
bull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male
fertilitydataset
bull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility
dataset
bull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on
femalefertilitydataset
bull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale
fertilitydataset
bull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male
andfemalefertilitydataset
bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test
onmaleandfemalefertilitydataset
162
bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston
maleandfemalefertilitydataset
bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset
bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset
bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset
Supplementary)Table) S5) Regions) candidate) to) carry) deleterious) variants)
identified)by)OHW)approach))
Supplementary)Table) S6) Regions) candidate) to) carry) deleterious) variants)
identify)by)LOH)approach))
Ingreyregionsremovedinfurtheranalysis
Supplementary)Table)S7)List)of)candidate)deleterious)variants)mapping)in)
OHW)and)LOH)regions))Header(code
bull Bfreq_totFrequencyofBalleleinentiredataset
bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset
bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset
bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset
bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset
bull CallRate_totSNPcallratendashentiredataset
bull CallRate_MHSNPcallratendashhighmalefertilitydataset
bull CallRate_MLSNPcallratendashlowmalefertilitydataset
bull CallRate_FHSNPcallratendashhighfemalefertilitydataset
bull CallRate_FLSNPcallratendashhighmalefertilitydataset
bull HWeqHardyWeinbergequilibriumpvalue
bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele
bull Geno_HetNumberofanimalswithheterozygotegenotype
bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele
bull HW_insideDeleteriousmutationsinsideldquoOutofHardyWeinbergrdquoregion
bull HW_outsideDeleteriousmutationsoutsideldquoOutofHardyWeinbergrdquoregion
bull DH_insideDeleteriousmutationsinsideldquoLackofHomozygosityrdquoregion
bull DH_outsideDeleteriousmutationsoutsideldquoLackofHomozygosityrdquoregion
163
bull GeneClass Classifications of gene 1 deleterious mutations inside OHW and LOHregion2deleteriousmutationsinsideOHWorLOHregion
bull 2deleteriousmutationsoutsideOHWandorLOHregionbull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)datasetbull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)datasetbull LOW_CRSNPcallratefromlowfertility(maleandfemale)datasetbull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallelebull LOW_Het Number of animals from low fertility (male and female) dataset with
heterozygotegenotypebull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallelebull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and
female)datasetbull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)datasetbull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)datasetbull HIGH_CRSNPcallratefromhighfertility(maleandfemale)datasetbull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallelebull HIGH_Het Number of animals from high fertility (male and female) dataset with
heterozygotegenotypebull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallelebull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and
female)datasetbull Male_TREND inheritance models implemented in PLINK trend test on male fertility
datasetbull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale
fertilitydatasetbull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male
fertilitydatasetbull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility
datasetbull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on
femalefertilitydatasetbull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale
fertilitydatasetbull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male
andfemalefertilitydataset
164
bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test
onmaleandfemalefertilitydataset
bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston
maleandfemalefertilitydataset
bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset
bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset
bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset
Supplementary)Table)S8)Results)of)haplotype)analysis)))Header(code
bull Chromchromosome
bull PosphysicalpositiononUMD31genome
bull Exome_homonumberofhomozygotefordeleteriousmutationsequencedanimals
bull Exome_hetnumberofheterozygotesequencedanimalsfordeleteriousmutation
bull Outcome NotFound It was impossible to associated a haplotype to the mutation
NotOpt Non completely invariant haplotype could be associated to themutationOK
Putativehaplotype associate tomutationwas found In few case alternativehaplotype
wasreported
bull LenlengthofhaplotypeinnumberofSNPs
bull Pop_numnumbersofselectedhaplotypedetectedinHolsteinpopulation
bull Pop_homo numbers of homozygote animals in Holstein population for selected
haplotype
bull Pop_het numbers of heterozygotes animals in Holstein population for selected
haplotype
bull ExpExpectednumberofhomozygoteanimal
bull Alternative if only one heterozygote sequenced animals for deleteriousmutationwas
foundalltwopossiblehaplotypeswheretested
Supplementary)Table)S9)Descriptive)results)for)the)Biological)Process)(BP)))
NumberofgenesbelongingtoBiologicalProcessGOterms)
Supplementary) Table) S10) Descriptive) results) for) the)Molecular) Function)
(MF))
NumberofgenesbelongingtoMolecularFunctionGOterms
165
CHAPTER7
Conclusion
Since the domestication event occurred in the Fertile Crescent around 10000yearsagothebovinespecieswasselectedbyhumankindtofulfilhisneedsThecreationofbreedsoccurredaround200yearsagointensifiedthedifferentiationbetween populations and promoted the development of intensive selectionschemesTraditionalAandnowadaysgenomicAselectionsareproducingpositivegenetictrendsinmanyproductivetraitsdespitethestillpoorknowledgeaboutphenotype biology A deeper understanding of genome organization and genebiology would increase the accuracy of genomic evaluation by incorporatingpriorknowledgeThisthesisexploresthebovinegenomebyhighAthroughputtechnologiesAsuchasNextGenerationSequencingandHighADensityGenotypingAwithestablishedand innovative procedures to investigate the biology of complex traits and toprovidenewdataandmoleculartoolstobovinebreedingThese goals have been achieved by complementary approaches and differentmethodsInChapter2selectionsignatureswereinvestigatedindependentlyintofive Italian breeds using Illumina BovineSNP50 BeadChip Then amultiAbreedapproach was applied clustering breeds into dairy (Italian Holstein ItalianBrownand Italian Simmental) andbeef (Marchigiana Piedmontese and ItalianSimmental) groups Regions under recent positive selection shared by breedswithasameproductiveaptitudewereinvestigatedinfurtherdetailbyretrievinggenes in the region and analysing their function by ontology and pathwayanalysis In the dairy group selection signals appear nearby genes related tomammarygland(metabolismandresistancetomastitis)andfeedingadaptationIn thebeefgroupgenes inselectedregionsare involved inanimalgrowthandmeatquality(textureandjuiciness)Hencecandidategenesidentifiedbelongtonetworks and pathways important in directing a breed towards either beef ordairyproductionIn Chapter 3 a GWAS approach was used in dairy cattle to correlate SNP tocomplex phenotypes having an economic value Three independent ldquoclassicalrdquosingleAmarkerregressionswereruninthreeItaliandairycattle(ItalianHolsteinItalianBrownandItalianSimmental)UsingthetraditionalsingleSNPapproach
168
only few SNPs were above the nominal genomeAwide significance thresholdwhile applying a geneAbased method identified significant candidate genesassociatedwithmilkphysiology andmammary glanddevelopment in all threepopulationsInterestinglygenesfoundindifferentbreedshavesimilarfunctionbut are not the same suggesting that breed history demography selectioncriteriaandstochasticeventsasgeneticdrifthaveanimpactonthedefinitionofthemainmoleculartargetofselectionschemesCurrent intensive selection strategies rapidly spread the genetic gain in thepopulationbutholdtheassociatedriskofdisseminatingdefectivegenescausinggenetic defects (eg mulefoot complex vertebral malformation and others) ordecreasing fertility In Chapter 6 different technologies and approaches werecombined to obtain filter and prioritize genes candidate to be deleteriousExome sequence datawere exploited to identify deleterious variants affectingItalianHolsteincattleinasetof20animalsextremeforfertilitytraitsMorethan5000 candidate deleterious SNP variants were identified To bypass the smallnumberlimitationandthelackofpowerofanystatisticanalysison20animalsastepwise approachwas used to first filter and then prioritize these candidatedeleterious variants Polymorphisms were retained if having asymmetricdistribution in bulls from opposite tails of fertility EBV distributions andormapping in genomic regions outlier for lack of homozygosity in a 1009 bullpopulation genotyped with the Illumina BovineHD BeadChip A total of 191variants in163genespassed theprocessesThesewereassessedby searchingadditional evidences in favour or against their deleterious effect throughdifferentapproachesi)checkingthevariantconservationacross100vertebratespecies by comparative genomics ii) finding haplotypes associated withcandidatemutationsandevaluatingtheirfrequencyinHolsteinsiii)integratinghumanandmouse functionalannotationsA scorewasproposed toweight theresult of these assessments and rank the variants 35 of them had positivescores These occurred in genes involved in basic biological mechanisms asfertilitydevelopmentand immunityandresultedenriched ingenesassociatedtogeneticdefectsreportedinhumanThesemutationsarestrongcandidatestobe recessive deleterious variants worth to be further investigated in a larger
169
populationtobetterunderstandtheirbiologyimpactandpossibleapplicationinbreedingTo increase the reliability of results based on genomic data and to reduce theimpact of subjective choices in data analysis new toolsapproaches weredeveloped(iethemultiAbreedapproachinSelectionSignaturendashChapter2thepostAGWAS geneAbased approach ndash Chapters 3 and 4) and the existent onescriticallyevaluated(imputationmethodsndashChapter5)In Chapter 4 the postAGWAS geneAbased method used in Chapter 3 wasimplemented in the MUGBAS (MUlti species GeneABased Association Suite)software In contrast to existing software MUGBAS is species and annotationfree AmultipleAtesting correctionwas included in the package in addition tocomputing parallelization for faster processing and plot representation forbetterunderstandingThesoftwareusessingleAmarkerGWASdataandagivenannotation to estimate geneAregionAwise association pAvalues As previouslymentioned in Chapter 3 the ldquoclassicalrdquo GWAS approach (single SNP) foundsignals only in ItalianHolsteinwhileMUGBAS geneAbased approach identifiedsignificantsignalsinallbreedsinvestigatedIn Chapter 5 the impact of different bovine reference sequences on genotypeimputation was investigated In this work Illumina BovineSNP50 BeadChipgenotypesof ItalianSimmentalbullswerereAmappedonthethreemostrecentreferencegenomeassembliesWecomparedfourdifferent imputationmethodsfromlow(IlluminaBovineLDv10Beadchip)tohighdensityThetestpopulationwasdividedintofoursubpopulationsonthebasisofrelationshipandgenotypedataavailabilityUpdatingSNPcoordinatesonthe three testedcattlereferencegenome assemblies determined only a slight variation on imputation resultswithinmethodsOntheotherhandlargedifferencesintermsofaccuracywereobservedamongthefourimputationmethodstestedGenetic structure of industrial breeds was evaluated assessing LinkageDisequilibrium (LD) Since LD decay is correlated to intensity of selection weexpected different behaviours in the breeds analysed Indeed as evident inChapter 2 ndash Figure 7 breeds subjected to lower selection intensity (ie
170
MarchigianaPiedmonteseandItalianSimmental)displayalowerpersistenceof
LDatgreaterdistancescomparedtothosesubjectedtohigherselectionpressure
(ieItalianHolsteinandItalianBrown)Chapter3investigatesLDintheDGAT1
region of dairy cattle breeds In Italian Simmental LD is lower compared to
Italian Holstein and Italian Brown In spite of this significant association
betweentheDGAT1regionandmilktraitswasfoundbothinItalianSimmental
andinItalianHolsteinInItalianBrownnoassociationwasfoundbecauseDGAT1
isfixed
Data and results presented in this thesis represent a clear example of the
progress of molecular technologies occurred in last few years spanning from
mediumdensitygenotypingtonextgenerationsequencing
Thebiologylandscapesofsomecomplextraits(iemilkproductionandquality
fertility)were improved finding candidate genes andpathwaysnotpreviously
associated to the phenotype in the bovine species Moreover insights on
deleterious mutations were inferred through a deep and integrated analysis
usingNGSdata
This new knowledge could be used to improve cattle breeding For example
deleteriousrecessivevariantscouldbeincludedinbullevaluationsbybreedersrsquo
associations to reduce the genetic load of deleterious genes in parallel to
improvingproductivity
While technological progress has been impressivewe are still only scratching
thesurfaceofanimalbiologyNewfrontiersarebeingopenedbywholegenome
sequencing of many animals (eg in the 1000 bull genome project) and by
stepping beyond sequencing (a number of epigenomic projects in animals are
following the footprints of the human ENCODE project) The trend is towards
unravelling complexity and data production ability is to be flanked by data
storingandprocessingandaboveallbynewmethodsandtoolsfordataanalysis
171
ACKNOWLEDGMENTS
ldquoNon egrave la destinazione ma il viaggio che contardquo Questi tre anni sono stati un gran bel viaggio di crescita personale e ricchi diesperienze Non egrave stato un viaggio solitarioma condiviso con vecchie e nuoveconoscenzechemisembradoverosomenzionareInnanzitutto vorrei ringraziare il prof Paolo AjmoneMarsan chemi ha dato lapossibilitagrave di fare questa esperienza Grazie per gli insegnamenti e per leopportunitagravechemihaoffertoSperodinonavertraditolesueaspettativeAgli attuali colleghi Lorenzo Stefano Licia Elisa Riccardo ed Elia e a quellipassati Nicola Fatima Raffaele Ezequiel e Barbara va il mio piugrave sincero edaffettuoso ringraziamento per aver condiviso anche solo in parte questomiopercorso Grazie per avermi sopportato (e non egrave poco) guidato (non solo nelcampolavorativo)efattovivereinunmodospecialelrsquoambientedilavoroUnsincerograzievaatuttiimieicolleghidelXXVIIcicloAgrisystemRicorderogravesempretuttiimomentichehopotutocondividereconvoi UngrazieancheachihafattopartedellamiaesperienzasvedeseGraziedicuoreaCarlFlavioChristianEricaatuttiiragazziitalianiatuttiicolleghildquosvedesirdquoea tutte le altre persone che ho conosciuto Mi avete fatto vivere appienolrsquoesperienzaesteraefattotornareacasaconqualcheconsapevolezzainpiugrave UnsentitoringraziamentoatuttiglialtricolleghiconcuihocollaboratoinquestianniRingrazio tutti gli amici e conoscenti che mi sono stati vicino per avermisupportatoesopportatoognunoamodoproprio Ultimomanonperimportanza ilringraziamentochevaatuttalamiafamigliaper avermi aiutato spronato sostenuto sempre e in particolare in questaesperienzaSemplicementegrazie PS Adirlatuttaancheladestinazionenonegravepoicosigravemale
173
- INDEX
- CHAPTER 1 - Introduction
- CHAPTER 2 - Relative extended haplotype homozygosity (rEHH) signals across breeds reveal dairy and beef specific signatures of selection
- CHAPTER 3 - Searching new signals for productive traits in three Italian cattle breeds
- CHAPTER 4 - MUGBAS a species free gene based ased association suite
- CHAPTER 5 - Imputation accuracy is robust to cattle reference genome updates
- CHAPTER 6 - Quest for deleterious recessive variants in Italian Holstein bulls combining exome sequencing and high density SNP analysis
- CHAPTER 7 - Conclusion
- ACKNOWLEDGMENTS
-
Background
The bovine species counts almost 15 billion of heads classified in some 800
breeds worldwide (httpfaostat3faoorg) The species is adapted to a wide
varietyofdifferentenvironmentandhusbandrysystemsandisoftenrearedfor
multiple purposes (Taberlet et al 2008) draft power milk meat leather
productionandotheruses(Elsiketal2009)
European cattle (Bos$ taurus) derive from a single domestication event of the
auroch (Bos$primigenius)occurred in theFertileCrescent around10000years
ago(Taberletetal2011)FollowingdomesticationcattlefollowedtheNeolithic
expansionofagricultureandcolonisedEurope(seeAjmoneMMarsanetal2010
for a review) Many thousand years of natural and artificial selection have
shaped cattle phenotype and genotype to meet environmental conditions and
human needs and together with demographic events and genetic drift have
started thedifferentiation of the bovine species in diverse and locally adapted
populations This differentiation process accelerated in the last two centuries
whenthebreedconceptwasinventedsothatnowthisspeciescountshundreds
of different breeds with high diversity in coat colour body size production
aptitudeadaptationtoclimate tolerancetodiseaseetc(Taberletetal2008
Elsiketal2009TheBovineHapMapConsortium2009)
Thisdiversityisnowbeingprogressivelylostwiththeextinctionofmanylocal
populationsmainlyforeconomicreasonsSelectioneffortshavethereforebeen
focussedonafewoutperformingindustrialbreedsthatarepresentlyselectedfor
specialiseddairyorbeefproductionandsometimesdoublepurpose
In these breeds breeding schemes are well developed and have been
traditionally based on the genetic evaluation of sires and dams using data
collectedfromperformanceandprogenytestingandunbiasedlinearprediction
modelsThesearefoundedontheFisherinfinitesimalmodellingofquantitative
trait variation and need no genomic information to estimate breeding values
(EBVs) The genetic progress recorded using this approach has been relevant
particularly for traits having medium to high heritability In Italian Holstein
genetic trends for themain production traits (milk protein and fat yield) are
positiveHoweveraccurateestimateofbreedingvalueshasahighcostinterms
4
of investment and time so that 5 to 6 years are needed to have a dairy bullprogenytested(wwwanafiit)Theadventofgenomicselection(Meuwissenetal2001)markedamilestoneindairycattlebreedingTheideawasdevelopedbeforethemoleculartoolswereinplaceforimplementitinpracticeandhasbecomearealityonlyrecentlyInthisapproach progeny tested bulls are genotyped at many thousand loci and thevalue of their EBVs used to estimate the value of each allele at each locusgenotyped These values are then used to predict the genetic value of younganimals on the basis of their genotype As a result animals from the newgeneration receive an EBV with the same accuracy of estimates obtained byprogenytestingbutmuchearlierThislowerthegenerationintervalandgreatlyincreasestherateofgeneticprogress indairypopulationsThe initial ideawasso good that nowadays genomic selection is widely applied in industrialcountriestothemajordairycattlebreedsWiththedecreaseofgenotypingcostit is likely that its use will be soon expanded to animals selected for otherpurposes (eg beef cattle) and other species (eg pigs) However genomicselection is an extension of traditional selection It is extremely useful inbreedingforcomplextraitsbutgivesnobiologicalinformationonthegenesthatcontrolthemIfthetraditionalselectionviewsasiregenomeasabigblackboxhaving a value but no clue why genomic selection views the same genomedisassembled inmany small boxes but still black and stillwithno clueon thereasonoftheirvalueIt is reasonable to think that a deeper understanding of the biologicalmechanisms controlling traits may further increase the accuracy of genomicevaluationsIndeedthenewtechnologiesoffernewopportunitiesinthissensebyloweringcostsandincreasingprecisionofdataproductionThis thesis is exploring the use of these technologies for retrieving biologicalinformationfromgenomicdataItusesmethodsnowconsideredldquotraditionalrdquoingenomics (GWAS and analysis of selection sweeps) and less traditional ones(gene centered GWAS and exome analysis) All data used here have beenproducedwithin twonationalprojectson farmanimalgenomics fundedby theItalianMinistryofAgriculture
5
M SELMOL ldquoRicerca e innovazione nelle attivitagrave di miglioramento genetico
animalemediante tecniche di geneticamolecolare per la competitivitagrave del
sistemazootecniconazionaleIrdquo(ResearchandInnovationinanimalbreeding
sheepgoatbuffalohorseanddonkeyI)
M INNOVAGENldquoRicercaeinnovazionenelleattivitagravedimiglioramentogenetico
animalemediante tecniche di geneticamolecolare per la competitivitagrave del
sistema zootecnico nazionale IIrdquo (Research and Innovation in animal
breedingsheepgoatbuffalohorseanddonkeyII)
In these projects the research group I work with M Istituto di Zootecnica at
Universitagrave Cattolica del Sacro Cuore di Piacenza M coordinated all the research
activitiesconcerningtheapplicationofgenomicstechniquesindairycattle
Contentsofthethesis
The thesis is structured into a short introduction (this first chapter) 5 core
chapters(Chapter2to6)andashortConclusion(Chapter7)Withtheexception
of Chapter 6 still in preparation the other core chapters containmanuscripts
that at time of writing are under review for their publication in international
peerMreviewedjournalsormanuscript
InChapter2selectionsignatureindairyandbeefcattleareinvestigatedwiththe
Illumina BovineSNP50 Genotyping BeadChip Five Italian cattle breeds (Italian
Holstein Italian Brown Italian Simmental Marchigiana and Piedmontese) are
analyzedwiththerEHHmethod(Sabetietal2002)tofindsignaturesofrecent
selectionCandidategenesinthevicinityofsignaturessharedbyeitherdairyor
beefbreeds are identified and theirpossible role indairy andbeefproduction
discussed
In Chapter 3 a GWAS analysis is run to investigate production traits in three
Italian dairy cattle breeds (Italian Holstein Italian Brown Italian Simmental)
Data are analysed with a well established method (Amin et al 2007)
6
implemented in theGenABELRpackage (Aulchenkoet al 2007) andwith the
newgeneMbasedmethodAssociationisinvestigatedbetween50KSNPsandmilk
production(milkyield)andqualitytraits(percentageofmilkproteinandfat)
InChapter4thegeneMbasedmethodusedinthepreviouschapter3inspiredby
theVEGASapproach(Liuetal2010)isdescribedThesoftwarewasdeveloped
andreMcodedinourlaboratoryinordertopermittheanalysisofanyspeciesin
additiontohumanMultiMtestingcorrectionandplotrepresentationcompletethe
toolsmadeavailable
InChapter5wetesttheimpactoftheuseofdifferentreferencesequencesinthe
imputation step Italian Simmental data are used to test variation in genomeM
wide inputation accuracy using different cattle reference sequences (BTAU42
UMD31 and BTAU46) To increase the analyses reliability four different
inputationsoftwaresaretested
InChapter6Holsteindataareanalysedcombiningexomesequencesand800K
SNPs data to seek deleterious recessive variantsWhile in natural populations
the destiny of these variants is to disappear in livestock breeds they can be
maintained and disseminated if carried by a highly productive sire The
intersection of deleterious mutations identified by sequencing and lack of
homozygousregionsidentifiedbytheanalysisofthepopulationwiththeHDSNP
chipidentifiedgenescandidatetocarryrecessivedeleteriousvariants
Objective
Themainobjectiveofthethesiswastoinvestigatethebovinegenomewiththe
aid new highMthroughput technology and to explore new strategies for linking
biology (genes and their function) to traits This linkage once validated may
increase the accuracy of genomic prediction and find application in selection
schemesegineradicatingthemostdeleteriousvariantsandorintheplanning
ofmatings
7
ReferencesAjmoneMMarsanPGarciaJFLenstraJA2010OntheoriginofcattleHow
aurochsbecamecattleandcolonizedtheworldEvolAnthropol19148ndash157doi101002evan20267
AminNvanDuijnCMAulchenkoYS2007AGenomicBackgroundBasedMethodforAssociationAnalysisinRelatedIndividualsPLoSONE2e1274doi101371journalpone0001274
AulchenkoYSKoningDMJdeHaleyC2007GenomewideRapidAssociationUsingMixedModelandRegressionAFastandSimpleMethodForGenomewidePedigreeMBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614
ElsikCGTellamRLWorleyKC2009TheGenomeSequenceofTaurineCattleAWindowtoRuminantBiologyandEvolutionScience324522ndash528doi101126science1169588
LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKMHaywardNKMontgomeryGWVisscherPMMartinNGMacgregorS2010AVersatileGeneMBasedTestforGenomeMwideAssociationStudiesTheAmericanJournalofHumanGenetics87139ndash145doi101016jajhg201006009
MeuwissenTHHayesBJGoddardME2001PredictionoftotalgeneticvalueusinggenomeMwidedensemarkermapsGenetics1571819ndash1829
SabetiPCReichDEHigginsJMLevineHZPRichterDJSchaffnerSFGabrielSBPlatkoJVPattersonNJMcDonaldGJAckermanHCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash837doi101038nature01140
TaberletPCoissacEPansuJPompanonF2011ConservationgeneticsofcattlesheepandgoatsCRBiol334247ndash254doi101016jcrvi201012007
8
TaberletPValentiniARezaeiHRNaderiSPompanonFNegriniR
AjmoneMMarsanP2008Arecattlesheepandgoatsendangered
speciesMolEcol17275ndash284doi101111j1365M294X200703475x
TheBovineHapMapConsortium2009GenomeMWideSurveyofSNPVariation
UncoverstheGeneticStructureofCattleBreedsScience324528ndash532
doi101126science1167936
9
CHAPTER2
Relative(extendedamphaplotype(homozygosity)(rEHH)ampsignalsampacrossampbreedsamprevealampdairyampand$beef$specificampsignaturesampofampselection
LBomba1ELNicolazzi2MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly2FondazioneParcoTecnologicoPadanoViaEinsteinLocCascinaCodazza26900LodiItalyTheseauthorscontributedequallytothiswork
____________________SubmittedtoGeneticSelectionEvolution
Abstract
Background
Nowadays many methods have been implemented to scan for signatures of
selection at genomescale level within and across breeds Extended haplotype
homozygosity is a proficient approach to detect genome regions under recent
selective pressure within breeds The objective of this study is to identify
common regions under positive recent selection shared by most represented
dairyandbeefItaliancattlebreeds
Results
A total of 3220 animals from Italian Holstein (2179) Italian Brown (775)
Simmental (493) Marchigiana (485) and Piedmontese (379) breeds were
genotyped with the Illumina BovineSNP50 BeadChip v1 in the frame of the
ItalianlivestockgenomicprojectsldquoSelMolrdquoandldquoProzoordquoAfterstandardcleaning
proceduresgenotypeswerephasedandcorehaplotypesidentifiedThedecayof
Linkage Disequilibrium (LD) for each core haplotype was assessedmeasuring
the extended haplotype homozygosity (EHH) To correct for the lack of good
estimatesof local recombinationrates therelativeEHH(rEHH)wascalculated
for each corehaplotypeGenomic regionsunder recentpositive selectionwere
identifiedasthosehighlyfrequentcorehaplotypeswithsignificantrEHHvalues
Significant independentcorehaplotypeswerealignedacrossbreeds to identify
signalsofrecentselectionsharedbydairyandsharedbybeefbreedsOverall82
and 87 common regions under selectionwere detected among dairy and beef
cattlebreedsrespectivelyBioinformaticsanalysisidentified244and232genes
mapping in these common genomic regions Pathway and network analysis of
significant genes revealed molecular functions biologically related to signal of
selectionsdetectedinmilkormeatproductiontypes
Conclusions
Resultssuggest that themultibreedapproach isamethodto identifyselection
signatures in cattle breeds under selection for the same production goal
12
allowing to better pinpoint the genomic regions of interest in dairy and beef
production
Background
The advent of genomic technologies and the consequent availability of single
nucleotidepolymorphism(SNP)datahaveenabledthestudyofselectionevents
on the cattle genome (Hayes et al 2008 Qanbari et al 2010) Such selection
signals arise from environmental or anthropogenic pressures and help to
understand the causes that led to breed formation These studies are usually
addressedinaldquotopdownrdquoapproach(ShahzadandLoor2012)fromgenotypeto
phenotypeinwhichgenomicdataisstatisticallyevaluatedtodetectdirectional
selectionThephenotypeneededtorunselectionsignaturesisintendedinavery
generalsenseabreedaproductionaptitudeorevenageographicareaoforigin
Thisapproachholdsthepotentialto investigatetraitsasadaptationtoextreme
climatesanddifferentfeedingandhusbandrysystemsresiliencetodiseasesetc
thatareveryexpensivedifficultandsometimesimpossibletostudywithclassic
GWAS (GenomeWide Association Study) approaches Therefore it is expected
thatgenomic informationretrievedinthesestudiesonlypartiallyoverlapsthat
identified by GWAS on specific traits Searching for these ldquosignatures of
selectionrdquo is of interest for understanding molecular mechanisms underlying
important biological processes and providing new insights to functional
biologicalinformation(egdiseaseandorimportanteconomictraits(Nielsenet
al 2007)) To date many methods have been implemented to scan these
selectionsignaturesatthegenomiclevelMethodsdifferwithrespecttoseveral
properties (Lenstra et al 2012) there are methods that perform analyses
withinbreedoracrossbreedsthatcomparealleleorhaplotypefrequenciesand
size that detect recent or ancient selection events either still segregating or
fixatedintopopulations(Lenstraetal2012OrsquoBrienetal2014Utsunomiyaet
al2013)Differentmethodshavedifferentsensitivityandrobustnesseg they
maybeinfluencedatadifferentextentbymarkerascertainmentbiasanduneven
distributionofrecombinationhotspotsalongthegenome
13
Extended haplotype homozygosity (EHH) is a method that has been used in
many animal species including cattle (Pan et al 2013 Qanbari et al 2010
Zhangetal2006)TheEHHmethodwasfirstdevelopedbySabetietal(2002)
for human population analysis The main objective of this methodology is to
identifylongrangehaplotypesinheritedwithoutrecombinationeventsMorein
detailunderaneutralevolutionmodelchangesinallelefrequenciesaredriven
onlybygeneticdriftInthisscenarioanewvariantwouldrequirealongtimeto
reach high frequency in the population and the Linkage Disequilibrium (LD)
surroundingitwoulddecayduetorecombinationevents(Kimura1984)Inthe
caseofpositiveselectionaquickriseinfrequencyofabeneficialmutationina
relatively short time will preserve the original haplotype structure as the
number of recombination events would be limited Therefore a signature of
positiveselectionisdefinedasaregionwithanalleleinlongrangeLDlocatedin
anuncommonlyhighfrequencyhaplotype
One of themost interesting features of thismethod is that it is able to detect
genomic regions that are under recent selection These recent selection
signatures can be therefore used to identify the genomic regions involved in
breedformationInadditionEHHmethoddoesnotrequireanydefinitionofan
ancestralallele(asneededintheintegratedhaplotypescoreiHS(Voightetal
2006) and is suited to be applied to SNP data because it is less sensitive to
ascertainmentbiasthanothermethods(Nielsenetal2007)HowevertheEHH
methodgeneratesa largenumberof falsepositiveandnegative resultsdue to
heterogeneous recombination rates along the genome (Qanbari et al 2010)
Moreover another drawback not specific of EHH but shared by all selection
signaturemethod is the lack of robust inferences able to discern true signals
fromthosegeneratedbychance(Kemperetal2014)
To partially account for these limitations Sabeti et al (2002) developed the
relative extended haplotype homozygosity (rEHH) including an empirical
methodologytoobtainsignificantsignalsTherEHHofaputativecorehaplotype
compares its original EHH valuewith that of other haplotypes at that specific
locus using all other haplotypes as control for local variation in the
recombination rate Therefore it identifies genomic regions carrying variants
14
under selection that are still segregating in the population analysed Although
thesemethodswere conceived for human populations theywere successfully
appliedto livestockspeciessuchaspig(Maetal2012)andcattle(Qanbariet
al2010)
After domestication about 10000 years ago in the fertile crescent taurine
cattle have colonized Europe and Africa and have been selected for different
human needs (Taberlet et al 2011) Particularly in the last centuries the
anthropic pressure led to the formation of hundreds of specialized breeds
adapted to different environmental conditions and linked to local tradition
constitutingagenepooldeservingattentionforconservation(AjmoneMarsanet
al2010)Someofthesebreedshavebeenselectedspecificallyfordairybeefor
bothproductiontypesfollowingstrongartificialselectionforthesetraits(Hayes
etal2009)Thepresentstudyaimsat identifyingsignalsof recentdirectional
selectionusingrEHHmethodindairyandbeefproductiontypesusingdatafrom
five Italian dairy beef and dual purpose cattle breeds We focused on the
significant core haplotypes scanned by rEHH that are shared among breeds of
thesameproductiontypeThenweusedabioinformaticsapproachto identify
positionalcandidategeneslyingwithinthegenomicregionsunderselectionand
investigatetheirbiologicalrole
Methods
SampleAnimalsandGenotyping
A total of 4311 bulls of five Italian dairy beef anddual purpose breedswere
genotyped with the Illumina BovineSNP50 BeadChip v1 (Illumina San Diego
CA)joiningthegenotypingeffortsoftwoItalianprojects(namelyldquoSelMolrdquoand
ldquoProzoordquo)Thedatasetincluded101replicatesand773siresonspairsusedfor
downstreamqualitycheckofdataproducedIndetailgenotypesof2954dairy
(2179 ItalianHolsteinand775 ItalianBrown)864beef (485Marchigianaand
379 Piedmontese) and 493 dual purpose (Italian Simmental) bulls were
availableDataqualitycontrol(QC)wasperformedintwostagesafirststageon
15
animals independently for each breed applying the same methods and
thresholdsandasecondstageappliedonmarkersacrossall theindividualsof
thedatasetThefirststageexcludedindividualswithunexpectedlyhigh(gt02)
Mendelian errors for fatherson couples andwith low call rates (lt95)The
secondstageexcludedi)SNPswithle25missingvaluesinthewholedataset
orcompletelymissing inonebreed ii)SNPswithminorallele frequencyle5
and iii) SNPs in sexual chromosomes and with unassigned chromosome or
physicalposition
EstimationofrEHH
HaplotypeswereobtainedusingfastPHASEsoftwareconsideringdefaultoptions
(ScheetandStephens2006)andwererunbreedwiseandchromosomewisein
eachbreedPedigreeinformationforallbullswereprovidedbytheirrespective
breederassociations andwereused to filteroutdirect relatives (in fatherson
pairsthesonwasmaintainedinthedatasetandthefatherdiscarded)andover
representedfamilies(amaximumof5randomlychosenindividualsperhalfsib
familywasallowed)Thefinaldatasetcontainingtheseldquolessrelatedrdquoanimalswill
behenceforthcalled ldquononredundantrdquodatasetThisnonredundantdatasetwas
used also to calculate within breed pairwise wholegenome Linkage
Disequilibrium(LD)Ther2statisticforallpairsofmarkerswasobtainedusing
PLINK software v107 (Purcell et al 2007)Thedecayof LDwas assessedby
averagingr2valuesupto1Mbdistance
To test if the population structure influenced the detection of rEHH we
replicated all analyses on the whole Italian Holstein dataset without any
population correction (ie ldquoredundantrdquo dataset) focusing on genes or gene
clustersthatarewellknowntobeunderrecentselectionincattle(ieldquocandidate
regionsrdquo) In particularwe focused on the casein polled gene cluster and coat
colourgenes(MC1RandKIT(Kemperetal2014Qanbarietal2010))
The EHH and rEHH calculations were performed using Sweep v11 software
(Sabetietal2002)Somedefaultprogramsettingshadtobemodifiedtoadapt
theseanalyses to thebovinegenomeas thesoftwarewasoriginallydeveloped
forhumangeneticanalysesSpecificallylocalrecombinationratesbetweenSNPs
16
wereapproximatedto1cMperMbEHHandrEHHcalculationswereperformed
breedwise and chromosomewise using an automatic core selection with
defaultoptions(ieconsideringlongestnonoverlappingcoresandlimitingcores
toatleast3andnomorethan20SNPs)asinQanbarietal(2010)AlthoughEHH
and rEHH values were obtained for all core haplotypes only those with
frequency ge 25 were retained for further analyses The (empirical) rEHH
significancethresholdwasobtainedbydividingtherEHHvaluesinto20binsof
5 frequency range each logtransforming withinbin values to achieve
normality Core haplotypes with pvalues lt 005 were considered significant
DetectionofcommonselectionsignatureswasobtainedcomparingrEHHvalues
acrossgenomicregionsacrossbreeds
Breedgroupingaccordingtoproductiontype
Toexamine thesignals common toeachof the twoproduction types (iedairy
andbeef) corehaplotypes thatsharedoneormoreSNP inat least twobreeds
from the same production type were selected The dual purpose Italian
Simmentalwas included inbothdairy (ItalianHolsteinand ItalianBrown)and
beef groups (Piedmontese and Marchigiana) since potentially it possesses
haplotypes selected for both production types All downstream analyses were
performedseparatelyforthedairyandthebeefproductiontypes
Detectionandannotationofcandidategenes
Genome annotation was performed using Biomart
(httpwwwbiomartorgindexhtml) a comprehensive source of gene
annotationformanyspeciesprovidedbyEnsembl(wwwensemblorg)Thelist
ofsharedhaplotypesregions(inbp)fordairyandbeefwasusedasinputfilein
theBiomartwebinterfaceThesetofgenesretrievedbyBiomartwasthenused
as input for theCanonicalpathwayandregulatorynetworkanalyses Ingenuity
PathwayAnalysis toolversion80(IPA IngenuityregSystems IncRedwoodCity
CA httpwwwingenuitycom) and a substantial examination of published
literaturewasusedtoexaminethefunctionalrelationshipsamongtheresulting
genes The IPA operates with a proprietary knowledge database providing
complementarypathwayanalysisforseveralspeciesincludingcattleIntheIPA
17
analyses the significance of each biological function identified was estimated
using Fisherrsquos exact test A Benjamini and Hochberg correction for multiple
testingwas calculated for function and canonical pathways For eachNetwork
IPAcomputedascore(pscore=log10(pvalue))fittingthesetstudiedgenesand
a list of biological functionspresent in the IPAknowledgedatabase The score
considered thenumberofgenes in thenetworkand thesizeof thenetwork to
estimatehowrelevantthisnetworkwasaccordingtothelistofgenesprovided
Thenthescorewasusedtorankthenetworks
Results
DatasetQC
Therepeatabilityobtainedfromthe101replicatespresentinthewholedataset
washigherthan998AfterthetwoQCstages105individualsand9730SNPs
were discarded Moreover after phasing 1292 individuals were removed to
reducethe largenumberofsibfamiliespresent intheredundantdatasetAfter
quality control procedures the final dataset had 44271 SNPs and 1132 514
393 410 and 364 individuals from Italian Holstein Italian Brown Italian
SimmentalMarchigianaandPiedmontesebullsrespectively(Table1)
18
Table1JNumberofanimalsgenotypedbeforeandafterQCTotalnumberofanimalgenotypedandrelativenumberofanimaldiscardedafterQCanalysis
Detectionofselectionsignatures
Sweepv11 softwaredetectedrEHHsignalswitha frequencyhigher than25overlapping the test candidate regions in both datasets corrected or not forpopulation structure In the redundant dataset we found only one significantrEHHsignal(caseincluster)whereasinthenonredundantdatasetweidentified5significantrEHHsignalsoverlappingallthetestcandidateregionsconsidered(caseinclusterpolledclusterMC1R)(Table2)
BreedTotal
Genotyped
ED15
misAN
ED1
REPL
ED1
MENDCleaned
HOL 2179 40 31 5 2093
BRW 775 6 16 4 749
SIM 493 6 6 2 479
MAR 485 37 38 410
PIE 379 5 10 364
19
Table2JComparisonofcandidateregionrEHHinnonJredundantandredundantdatasetrEHH overlapping candidate regions in both corrected (nonredundant) and not corrected
(redundant)datasetsforpopulation
Redundant NonRedundant
Candidate
geneBTA1
Closest
SNP(bp)CH2range CH2Frequency
rEHHJ
log(pval)CH2Frequency
rEHHJ
log(pval)
POLLED
Cluster1 1981154 18974181981154 H1079 093037 H1078 167174
MC1R 18 13657912 1331772014007505 H1069 120096 H1069 077167
KIT_BOVIN 6 72821175 7250492172821175 H1054 030030 H1054 033048
Casein
Cluster6 88427760 8835009888452835 H1047 094139 H1046 148133
Casein
Cluster6 88427760 8835009888452835 H2032 002003 H2033 002003
In total5526567847724388and4049corehaplotypeswitha frequency
higher than 25 were detected on Italian Holstein Italian Brown Italian
Simmental Marchigiana and Piedmontese breeds respectively A total of 838
866 740 692 and 613 core haplotypes were found significant outliers (see
Materials and Methods) in the aforementioned breeds Table 3 shows the
distribution of the total and significant core haplotypes per chromosome and
breed
20
Tableamp3amp(ampCoreamphaplotypesampdistributionampDistributionoftotalandsignificantcorehaplotypesperchromosomeandbreed
ampamp ItalianampampHolsteinamp
ItalianampampBrownamp
ItalianampSimmentalamp Marchigianaamp Piedmonteseamp
BTA1amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
1amp 389 66 389 68 335 49 310 51 288 46
2amp 312 47 314 52 267 44 257 42 238 33
3amp 292 48 313 62 264 46 236 34 227 34
4amp 268 46 279 50 245 41 237 39 229 39
5amp 223 32 231 29 200 30 183 27 171 22
6amp 291 51 307 40 265 45 252 38 233 42
7amp 249 37 253 46 214 39 195 33 204 27
8amp 268 39 270 38 235 35 223 32 209 31
9amp 227 37 228 41 210 37 165 22 187 30
10amp 219 36 234 34 192 25 196 26 168 25
11amp 248 39 246 34 209 30 205 33 187 25
12amp 162 24 176 18 148 21 148 25 135 22
13amp 205 32 200 19 155 18 178 35 135 19
14amp 175 17 189 25 168 29 143 22 129 19
15amp 180 31 185 29 148 23 139 13 130 19
16amp 176 28 192 29 160 25 149 19 127 15
17amp 170 27 183 29 168 31 135 22 123 16
18amp 133 18 153 19 119 14 108 10 82 12
19amp 137 22 130 23 118 16 99 18 92 15
20amp 184 35 172 33 145 23 124 23 124 17
21amp 145 23 147 30 113 18 110 21 94 13
22amp 140 24 145 19 115 21 112 18 104 13
23amp 106 13 100 16 67 10 64 9 57 8
24amp 129 11 140 20 124 17 97 23 101 16
25amp 91 12 98 11 68 10 77 12 70 6
26amp 125 7 114 14 93 12 96 15 90 16
27amp 91 12 91 15 75 11 84 12 60 10
28amp 88 9 96 10 70 10 66 9 55 11
29amp 103 15 103 13 82 10 72 9 67 12
TOTamp 5526 838 5678 866 4772 740 4388 692 4049 613
1BostaurusAutosomes2Detectedcorehaplotypeswithafrequencyinthebreedhigherthan253Significantcorehaplotypesatplt005
Comparisonamptoamppreviouslyampreportedampdataamp
AnumberofstudieshavesearchedselectionsweepsinHolstein(Qanbarietal
2010) Brown (Qanbari et al 2011) and Simmental (Fan et al 2014) breeds
Since different methods are expected to identify signatures having different
characteristicsthecomparisonwithpreviousstudiesislimitedtothosestudies
using thesamemethodand thesamebreed(s)analysed inourstudyOnlyone
study analysed (German) HolsteinRFriesian cattle with rEHH (Qanbari et al
2010)Qanbarietal(2010)reportedcandidategenesandgeneclustersunder
recent positive selection that we compared with our rEHH in dairy breeds
dataset(Table4)
Table4JCorehaplotypesincandidategenesfollowingQanbarietal2011
1BosTaurusChromosome2Corehaplotype
Thenumberofcorehaplotypesfoundinourstudyinthecandidateregionswas
lowerpreviouslyreportedWhenconsideringHolsteinsthetwomostsignificant
candidateregionsinbothstudiesmatch(caseinclusterandthesomatostatinSST
gene)althoughitisimpossibletodetermineifthehaplotypeunderselectionis
the same as this information was not provided in Qanbari et al (2010)
However other genes considered significant in Qanbari et al (with pvalues
ranging from 004 to 010) were not significant in our studyWhen using the
same loose significant threshold (pvalues le 010) ldquosignificantldquo signals were
foundalsoinItalianBrownfortheCaseinCluster(log10pvalue=109)andin
ItalianSimmentalfortheSSTgene(log10pvalue=126)
HOLSTEIN BROWN SIMMENTAL
Candidate
geneBTA1
Closest
SNP
(bp)
CH2rangeCH2
Frequency
rEHH
Hlog(p)CH2range
CH2
Frequency
rEHH
Hlog(p)CH2range
CH2
Frequency
rEHH
Hlog(p)
DGAT1 14 444963236653443936
H1057 011443936763332
H1042 0130007
Casein
Cluster6 88391612
8835009888452835
H1046 1481338832601288452835
H2068 0801098835009888452835
H1044 037022
8835009888452835
H2033 018030
8835009888452835
H3034 022045
GH 19 49652377 GHR 20 33908597
SST 1 813769568128358581376961
H1036 200189 8131845181376961
H1042 126053
8128358581376961
H2029 00630084 8131845181376961
H2042 006027
IGFH1 5 71169823
ABCG2 6 37374911 3731702038256889
H1044 031027
3731702038256889
H1040 029031
Leptin 4 95715500
LPR 3 855692038549710885594551
H1047 0910728549710885794693
H1068 0800638549710885794693
H1063 002005
8549710885594551
H2041 027025
PITH1 1 35756434
22
Signaturessharedbydairyandbybeefbreeds
Significantcorehaplotypeswerealignedacrossbreedsto identifythoseshared
among dairy or beef production types Since breeds can be considered
independentsetsofobservationssharedsignaturesaremorelikelytorepresent
real effects rather than false positives A total of 123 and 142 significant core
haplotypesweresharedbetweenatleasttwodairyorbeefbreedsrespectively
(File S1) Only 82 and 87 of those significant core haplotypes contained genes
considered positional candidates under positive selection In total 22 and
17 of the genomewas under selection for dairy and beef production types
respectivelyTherEHHaveragelengthswere216932and190994bpfordairy
andbeefrespectively
Genesetannotationandnetworkpathwayanalysis
Atotalof244and232annotatedgenesfellwithintheselectedregionsfordairy
andbeefproductiontypesrespectively(Table5)
23
Table5JStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreedsStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreeds
DairyCattle BeefCattle
BTA SelSig
n
Genes1 Sum
SelSignsize
(bp)
Mean
SelSign
size(bp)
SelSi
gn
Genes1 Sum
SelSignsize
(bp)
Mean
SelSign
size(bp)
1 7 11 1521608 138328 7 13 2263213 174093
2 5 9 2216685 246298 8 11 2658549 2416863 2 5 1619336 323867 7 21 3755847 1788504 7 21 5956755 283655 6 11 1761288 1601175 4 17 4506119 265066 5 11 1251127 1137396 2 4 1467738 366934 5 12 5149692 4291417 6 30 4842969 161432 3 28 11889191 4246148 0 0 0 0 4 12 3512215 2926859 4 7 1554863 222123 2 6 1106768 18446110 4 17 8839762 519986 2 3 573138 19104611 6 8 1841802 230225 1 1 100439 10043912 2 8 4244133 530517 1 1 536461 53646113 3 8 1933026 241628 2 8 985376 12317214 0 0 0 0 3 9 692397 7693315 5 9 1415405 157267 2 2 189094 9454716 3 6 1302506 217084 4 6 905719 15095317 3 4 471046 117762 2 9 1551010 17233418 0 0 0 0 3 23 2961718 12877019 3 23 5744143 249745 2 2 116938 5846920 1 2 148886 74443 2 4 627971 15699321 3 7 1930206 275744 1 5 1150760 23015222 3 5 620674 124135 3 10 2411751 24117523 0 0 0 0 1 1 68421 6842124 2 18 9166004 509222 2 4 803770 20094225 2 11 2016602 183327 1 3 329199 10973326 2 4 466134 116534 4 7 776080 11086927 1 1 631792 631792 2 3 1004717 33490628 0 0 0 0 1 1 93464 9346429 2 9 935209 103912 1 5 798300 159660TOT 82 244 65393403 216932 87 232 50024613 190994
1GenesidentifiedGenesintheregionssharedbyallthreedairybreeds(8genes)orallthreebeefbreeds(11genes)foramoredetailedanalysiswereselected(Fig1)
24
Figure1JGenomiclocationoftheselectionsignaturessharedamongstudiedbreedsa)GenesinensembletracksaredisplayedasaredboxcorehaplotypesandSNPsarecolouredinorange(MarchigianaMAR)inviolet(PiedmontesePIE)andpink(SimmentalSIM)b)Genesinensemble tracks are displayed as a red box core haplotypes and SNPs are coloured in blue(HolsteinHOL)ingreen(ItalianBrownBRW)andpink(SimmentalSIM)
All the genes identified were submitted to downstream pathwaynetwork
analyses(Table5Figure1FileS1)Interestinglygenesintheregionssharedby
allthreedairyorbeefbreedsweremanifestedinthemostlikelynetworksforthe
two production aptitudes The IPA analysis detected a total of 12 and 15
networksindairyandbeefbreedsrespectively
Indairybreeds network scores ranged from46 to2 (File S2) Eightnetworks
obtainedscoreshigherthan20andincludedmorethan14moleculeseachThey
wereassociatedtoi)geneexpressionRNAdamageandrepairproteinsynthesis
andposttranslationalmodificationii)cellassemblyorganizationfunctionand
maintenance iii) cell cycle proliferation and cancer iv) carbohydrate
metabolism v) gastrointestinal and neurological disease developmental and
endocrine system disorders A total of 23moleculeswere associatedwith the
highest ranked network (score = 46) In this the immunoglobulins mitogen
25
activatedproteinkinasesandNFkBcomplexesformacentralhublinkinggenes
involved in celltocell signaling and interaction hematological system
developmentandfunctionandimmunecelltraffickingfunctions(Figure2)
Figure2JThemostlikelyIPAnetworkindairycattlebreedsThefunctionsassociatedtothenetworkarecelltocellsignallingandinteractionhematologicalsystem development and function immune cell trafficking Nodes in red correspond to genesidentifiedincorehaplotypesoverlappinginallthethreebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds
Breast cancer antiNestrogen resistance gene 3 (BARC3) andpituitary glutaminyl
cyclase gene (QPCT) genes were common to all three dairy breeds and are
directlyconnectedwithmammaryglandmetabolisms(Ezuraetal2004GSun
et al 2012) The solute carrier family 2 member 5 (SLC2A5) gene facilitates
glucosefructose transport (Zhao et al 1998) and the zetaNchain (TCR)
26
associated protein kinase 70kDa (ZAP70) gene plays a critical role in Tcell
signaling (Bonnefont et al 2011) Calpain is another important complex that
togetherwithcalpainN3(CAPN3)mediatesepithelialcelldeathduringmammary
gland involution (Wilde et al 1997) Furthermore RAS guanyl nucleotide
releasingprotein(RASGRP1)activatestheErkMAPkinasecascaderegulatesT
cellsandBcellsdevelopmenthomeostasisanddifferentiationandisinvolvedin
the regulation of breast cancer cell (Madani et al 2004 Yasuda et al 2007
Wickramasingheetal2009)Genesinvolvedintheothernetworksarelistedin
FileS2
The scoreof thebeef cattlebreedsnetwork rangedbetween46and2 and the
numberofmoleculespresentineachnetworkrangedfrom23to1(FileS3)Asin
dairy breeds eight networks obtained a score higher than 20 and 13 ormore
molecules each These networkswere associated to i) carbohydrate lipid and
nucleic acidsmetabolism energyproduction and smallmoleculebiochemistry
ii) cardiovascular system development and function iii) cellular and tissue
developmentcellassemblyandorganizationfunctionandmaintenanceandcell
and organmorphology iv) cell cycle celltocell signaling and interaction v)
DNA replication recombination and repair and RNA posttranscriptional
modification vi) dermatological diseases and conditions infectious disease
organismal injury and abnormalities In the network with the highest score
(score=46)chondroitinsulfateproteoglycan4(CSPG4)andsnurportinN1(SNUPN)
genes were shared among all beef cattle breeds investigated CSPG4 gene is
relatedtomeattendernesswhileSNUPN isanimprintedgeneimportantinthe
embryodevelopmentandinvolvedinhumanmuscleatrophy(Narayananetal
2002) (Figure 3) All the genes included in this network were involved in
carbohydratemetabolismsmallmoleculesbiochemistryandcellcycleTheother
networkswere related to important signaling andmetabolic process essential
foranimalgrowth(FileS3)
27
Figure3JThemostlikelyIPAnetworkinbeefcattlebreedsThe functions associated to the network are carbohydrate metabolism small molecule
biochemistry and cell cycle Nodes in red correspond to genes identified in core haplotypes
overlapping in all the three breeds of each production type whereas those in green depict
overlappingcorehaplotypesinatleasttwoofthosebreeds
A total of 6 and 9 statistically significant Canonical Pathways (FDR le 005
log10(FDR) ge 13) were identified using IPA for dairy and beef breeds
respectively(Figure4)
28
Figure4JBarplotofstatisticallysignificantCanonicalPathwaysPvalues were corrected for multiple testing using BenjaminiHochberg method and arepresented in thegraphas log(pvalue)Thebar represents thepercentageof genes in a givenpathway that meet cutoff criteria within the total number of molecules that belong to thefunctiona)BarplotofstatisticallysignificantCanonicalPathways indairycattlebreedsb)BarplotofstatisticallysignificantCanonicalPathwaysinbeefcattlebreeds
Themost significant Canonical Pathway in dairywas for purinemetabolism (
log10(FDR) = 26) supporting the highly synthetic processes in the mammary
epithelium(Figure5)
29
Figure5JGenesdetectedunderrecentpositiveselectionindairycattleinvolvedinpurinemetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds
Inbeefproductiontheephrinreceptorsignal(log10(FDR)=27)wasthemost
significant Canonical Pathway (Figure 6) Among other functions ephrin is
knowntopromotemuscleprogenitorcellmigrationbeforemitoticactivation(Li
andJohnson2013)AlltheotherCanonicalPathwaysarereportedinFileS4
30
Figure6JGenesdetectedunderrecentpositiveselectioninbeefcattleinvolvedinEphrinmetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds
Discussion
In this work more than 4000 genotyped bulls of five dairy beef and dualpurposeItalianbreedswerescreenedsearchingforselectionsignaturesindairyand beef production types A strict data quality controlwas applied to reducepossiblesourcesofbiassuchastechnicalgenotypingproblemsandpopulationstructurethatcouldhavelargeimpactonrEHHresultsInfactsincetheimpactofpopulationstructureonrEHHvaluesisstillunexploredwereplicatedpartofouranalyseswithandwithoutcloserelativesinanattempttostudytheeffectofpopulationstructureonthisselectionsignaturemethodInprinciplepopulationstratificationshouldleadtoanoverrepresentationofcertainhaplotypessimplyduetothepresenceoftightpedigreelinks(egsirespassinghalfoftheirgeneticmaterialtotheirsons)ratherthanactualselectionprocessesForthisreasonfor
31
the full analyses all siresons couples were removed after haplotype phasing(retaining only the younger animal) and halfsib familieswere restricted to amaximum of five randomly chosen individuals in an attempt to reduce overrepresentation This assessment was restricted to four regions known to beunderselectioninItalianHolsteinasthecaseinclusterthepolledgeneclusterMC1R and KIT Italian Holstein was selected for two reasons i) it is a highlystructuredbreedaccordingtoourdata(abouthalfofindividualsremovedwereclose relatives) ii) allowed us to compare our results with a previous studyAlthoughtheanalysesconductedonbothredundantandnonredundantdatasetswereable to identify rEHHsignals in these regions thenonredundantdatasetproduced five significant rEHH signals rather than just one in the redundantdataset(Table2)Thisresulthighlightstheconfoundingeffectofthepresenceofcloserelativesinthedatasetandconsequentlytheimprovedabilitytodetectedsignificant selection signature while correcting for population structureInterestingly our results only partially overlap those found by Quanbari et al(2010) These authors detected significant signals at the casein cluster andsomatostatinSSTaswedid but also at ahighernumberof candidate regionsThese inconsistencies may be the effect of different sires included in theanalyses of the different dataset size and to the close relative trimmingprocedure we adopted to decrease the effect of population structure andconsequent bias (Table 4) However poor correlation across investigations isoftenobservedalsointhedeeplyinvestigatedhumanspeciesThisisduetotheuse of different within and between breed statistics that identify selectionsignatureshavingdifferent characteristics (ancientrecent segregatingfixatedunder directionalbalancing selection) to the likely high rate of falsepositivesnegativesresultsandalsotothedifferentconsiderationofpopulationstructureandbackgroundselection(Enardetal2014)The number of total and significant core haplotypes identified by the SweepsoftwarewasthehighestinItalianBrownandthelowestinPiedmontesebreedSinceEHHisheavilybasedonpopulationLDtheaverageLDleveloverdistancewascalculatedineachbreed(Figure7)
32
Figure7JMultiJbreedaveragelinkagedisequilibriumagainstphysicaldistance(inkb)Marchigiana Piedmontese and Italian Simmental breeds present a lower persistence of LD allargerdistancethanItalianHolsteinandItalianBrownbreedsThisanalysishighlightedageneralpositivecorrelationbetweenthe levelofLDover distance and the number of total and significant core haplotypes foundHoweverconsideringthatrEHHisarelativemeasure it ismorelikelythatthehighernumberofsignificantcorehaplotypesidentifiedindairybreedswasduemostlytoahigherselectivepressureindairycomparedtobeefbreedsSignificance tests used in detecting selection signatures should measure theprobability of a statistic to be anoutlier compared to its expecteddistributionunder a neutral model However no reliable neutral model has so far beendevelopedincattlebecauseofthecomplexityofthedemographichistoryofthisspecies (Kemper et al 2014) As a consequence empiric rather than modelbasedsignificancetestaregenerallyusedinthedetectionofselectionsignaturesAccordinglyweconsideredasoutliersvaluesas thosesimply falling in the5plusvarianttailoftherEHHdistributionWekeptthewithinbreedsignificancethresholdloosenoncorrectingitformultipletestingbutconsideredonlysignals
33
eitherexceedingthisthresholdbyatleasttwoordersofmagnitudeorsharedby
twoormoreindependentbreeds
Parallel comparison of results from independent analyses of different breeds
allowed the achievement of a double objective i) to identify putative regions
under(recent)selectioninbreedsbelongingtotwoproductiontypeswhichwas
the main objective of this study and ii) reducing false positives since multi
breed analyses served as internal control Since the rEHH method does not
considerphenotypicinformationasignificantsignalmightarisebecausei)the
core haplotype is actually under a selective pressure ii) the result is a false
positive ie causedby chance population structureor anyotherdriving force
However even considering an unrealistic scenario with no false positives a
proportionofthetotalamountofsignalswouldactuallybeselectionsignatures
due to a selectionpressurenot ondairy or beef traits This becausedairy and
beefbreedsinvestigatedareonlyafewandshareanumberoftraitsnotdirectly
related to dairybeef production Even considering this limitation to our
knowledge this is the first dairy and beef multibreed study applying this
strategytoreducefalsepositivesatthecostofpossiblelossofinformationdue
tohighfalsenegativeratesSignificantsignalsharedbydairyandbybeefbreeds
wereused fordownstreamnetworkanalysesonpositional candidategenes to
search forabiological justification towhatwasobtainedstudying thegenomic
information Only the top network for dairy and the top network for beef
productionarediscussedindetail
Dairybreeds
ThemostsignificantnetworkdetectedindairybreedsisassociatedwithCellTo
CellSignalingandInteractionHematologicalSystemDevelopmentandFunction
andImmuneCellTraffickingfunctions(Figure2)ImmunoglobulinandNFkBact
aspivotalgeneswithinthisnetworkInterestinglyBARC3andQPCTgeneswere
sharedamongallthethreedairybreedsBothgeneshavenotyetbeenstudiedin
cattlealthoughinhumanstudiesthesegenesarewellknowntobelinkedtothe
mammaryglandmetobolismandthecalciumregulationTheformerisinvolved
in integrinmediated cell adhesions and signaling which is required for
mammary gland development and functions (G Sun et al 2012) The latter is
34
associatedwithlowradialbodymineraldensity(BMD)inadultwomen(Ezuraet
al2004)AnotherimportantplayeronthisnetworkisSLC2A5genealsoknown
asGLUT5 SLC2A5 gene acts as fructose transporter in the intestine and has a
significantroleinenergybalanceofdairycows(Burantetal1992)Findingthis
gene in adairy cattle specificnetwork seems surprising since there shouldbe
little need for transporting glucose andor fructose in the ruminant intestinal
tractassimplecarbohydratesaredegradedintovolatilefattyacids(VFA)inthe
rumen (Iqbal et al 2009) However it is often the case that large amount of
starchbypasstherumenincowsfedwithdietsrichincerealgrains(Zhaoetal
1998)Thisbypassedstarchneedstobedigestedinthesmallintestinetoavoid
highlevelsofglucoseconcentrationinthelargeintestine
WedetectedcalpaincomplexandcalpainN3(CAPN3)genesunderpositiverecent
selection in the dairy group as reported also by Utsunomiya et al (2013)
AltoughCalpainisknowntobeinvolvedinpostmortemmeattendernizationitis
also related to dairy metabolism as the muscle breakdown promoted by the
calpain gene provides an energy source for milk production especially at the
beginningoflactation(Kuhlaetal2011)InadditionasreportedbyWildeetal
(1997) calpainNcalpastatin system is related to programmed cell death of
alveolar secretory epithelial cells during lactation Zap70 gene is a cytoplasmic
proteintyrosinekinaseItisrelatedtotheimmunesystemasacomponentofthe
Tcell receptor playing a central role in Tcell responses (Wang et al 2010)
Bonnefontet al (2011) found thatZap70 genewasup regulated (FoldChange
(FC)=82FDR=277E12)onmilksomaticcellsofsheepinfectedbySaureus
andSepidermidisshowinganassociationwithmastitisresistance
Purinemetabolismwasthemostsignificantcanonicalpathwayindairybreeds
In a study conducted in human mammary epthelium Maningat et al (2008)
conductedageneexpressionanalysisonbreastmilkfatglobulesidentifyingthe
PurinemetabolismasthemostsignificantpathwaySynthesisandbreakdownof
purine is essential in the metabolism of the tissues of many organisms
particularlyofthemammaryglandduringlactation
Another interesting canonical pathway was endothelin signaling (Figure 5)
endothelin functions as vasocontrictor and is secreted by endothelial cells
35
(Nussdorferetal1999)Acostaetal(1999)reportedthatincattleendothelinsareinvolvedinthefollicularproductionofprostaglandinsandtheregulationofsteroidogenesis in the mature follicle In a recent study Puglisi et al (2013)confirmed the implication of endothelin in particular EDNRA and endothelinconvertin enzyme 1 in reproductive disorder in bovine They classified theEDNRAaspotentialbiomarkerforfertilityincow
Beefbreeds
Inbeefcattlebreedsthefocaladhesionkinase(FAK)togetherwithintegrinandERK12genesplayacentralroleashubsinthemostsignificantnetwork(Figure3)Theyareinvolvedincellularadhesionandcellmotilityprocessesparticularlyduringskeletalmyogenesis(Nguyenetal2014)Moreovertheoverexpressionof FAK can determine a reduced apoptosis and an increase risk of metastaticcancers(MehlenandPuisieux2006)Thesehubsarenotdirectlyinvolvedinourstudybut sharemany interactors thatare located in region found tobeunderselction such as CSPG4 RB1CC1 MGAT3 CIRPB CSPG4 gene belongs tochondroitin sulfate proteoglycans (CSPGs) family CSPGs are proteoglycansconsistingofaproteincoreandachondroitinsulfatesidechainTheyareknownto be structural components of a variety of tissues includingmuscle andplaykey roles inneuraldevelopmentandglial scar formationTheyare involved incellular processes such as cell adhesion cell growth receptor binding cellmigrationandinteractionswithotherextracellularmatrixconstituents
Manystudiesreporttheroleofproteoglycansinmeattextureofseveralmusclesfromthebovinespecies(Nishimuraetal1996)Dubostetal(2013)highlightadirect involvement of proteoglycans with respect to cooked meat juicinessRB1CC1geneplaysacrucialroleinmusculardifferentiationanditsactivationisa essential for myogenic differentiation (Watanabe et al 2005 p 1)Monoacylglycerol acyltransferase (MGAT3) gene cause the synthesis ofdiacylglycerol (DAG)using2monoacylglyceroland fattyacyl coenzymeAThisenzymaticreactionis fundamental fortheabsorptionofdietaryfat inthesmallintestineSunetal(2012p3)reportedinastudyonfiveChinesecattlebreedsthatMGAT3gene isassociatedwithgrowthtraitsCIRPBgenemaybepartofacompensatorymechanism inmuscles undergoing atrophy It preservesmuscle
36
tissuemassduringcoldshockresponsesaginganddisuse(DupontVersteegden
et al 2008) SNUPN is an imprinted gene and is expressed monoallelically
dependingonitsparentaloriginSNUPNplayimportantrolesinembryosurvival
andpostnatal growth regulation (Smithet al 2012Wanget al 2012)Ephrin
receptorsignalingwasthetopcanonicalpathwayproducebyIPAandhasalsoan
interestingbiologicalinterpretationformeatproduction(Figure6)Indeedthis
pathaway is important for the muscle tissue growth and regeneration
participating in correct positioning and formation of the neural muscular
junction(LiandJohnson2013)
Conclusions
In this study we analysed the selection signatures at genomewide level in 5
Italian cattle breeds Then we used a multibreed approach to identify the
genomicregionssharedamongcattlebreedsselectedfordairyorbeefpurpose
Thisapproachincreasedthepowertopinpointregionsofthegenomethatplay
important roles in economically relevant traits in both dairy and beef cattle
Moreover network and pathways analyses were used to display a detailed
functional characterization of genes mapping in the regions detected under
recentpositiveselection
Particularly in dairy cattle genes under directional selection are related to
feeding adaptation (increasing level of starch in the diet) mammary gland
metabolismandresistencetomastitisThebiologicalfeaturesunderselectionin
beef cattle breeds are involved in animal growth meat texture and juiceness
These results have a logical biological explanation However the frequent
approach of interpreting selection signatures on the basis of the function of
geneslocatednearbythepeaksignalofthestatisticusedispresentlychallenged
Novelinformationinhumanssuggeststhatmostselectedvariationisnotlocated
within genes and coding regions but in regulatory sites identified by the
ENCODEproject(Enardetal2014)Thesemaycontroltheexpressionofentire
genomicregionsorofgeneslocatedatarelavantdistancefromtheselectedsite
makingbiologicalinterpretationmorecomplex
37
InanycasefurtherstudiesusingdenserSNPchipsorwholegenomesequencing
producinginformationnotsubjectedtoascertainmentbias(Qanbarietal2014)
may increase the resolution of our analysis and togetherwith new knowledge
arisingonthecontrolofgeneexpressionvalidateorcorrectourresults
Acknowledgements
ANAFI (Italian Association of Holstein Friesian Breeders) ANAPRI (Italian
Association Simmental Breeders) ANABORAPI (National Association of
PiedmonteseBreeders)ANABIC(AssociationofItalianBeefCattleBreeders)and
ANARB (Italian Brown Cattle Breeders Association) are acknowledged for
providing the biological samples and the necessary information for this work
The Authors are also grateful to SelMol project (MiPAAF Ministero delle
PoliticheAgricoleAlimentarieFortestali)andbytheProZooproject(Regione
LombardiandashDGAgricoltura Fondazione Cariplo Fondazione Banca Popolare di
Lodi)forfunding
Thefundershadnoroleinstudydesigndatacollectionandanalysisdecisionto
publishorpreparationofthemanuscript
References
Acosta TJ Berisha B Ozawa T Sato K Schams D Miyamoto A 1999
Evidence for a Local EndothelinAngiotensinAtrial Natriuretic Peptide
SysteminBovineMature Follicles In Vitro Effects on SteroidHormones
and Prostaglandin Secretion Biol Reprod 61 1419ndash1425
doi101095biolreprod6161419
AjmoneMarsan P Garcia JF Lenstra JA 2010 On the origin of cattle how
aurochs became cattle and colonized theworld Evol Anthropol Issues
NewsRev19148ndash157
38
BonnefontCToufeerMCaubetCFoulonETascaCAurelMRBergonier
D Boullier S RobertGranieacute C Foucras G 2011 Transcriptomic
analysisofmilksomaticcells inmastitisresistantandsusceptiblesheep
upon challenge with Staphylococcus epidermidis and Staphylococcus
aureusBMCGenomics12208
BurantCFTakedaJBrotLarocheEBellGIDavidsonNO1992Fructose
transporter inhumanspermatozoaandsmall intestine isGLUT5 JBiol
Chem26714523ndash14526
DubostAMicolDPicardBLethiasCAnduezaDBauchartDListratA
2013Structuralandbiochemicalcharacteristicsofbovineintramuscular
connective tissue and beef quality Meat Sci 95 555ndash561
doi101016jmeatsci201305040
DupontVersteegden EE Nagarajan R Beggs ML Bearden ED Simpson
PMPetersonCA2008IdentificationofcoldshockproteinRBM3asa
possible regulator of skeletal muscle size through expression profiling
AJP Regul Integr Comp Physiol 295 R1263ndashR1273
doi101152ajpregu904552008
Enard D Messer PW Petrov DA 2014 Genomewide signals of positive
selection in human evolution Genome Res gr164822113
doi101101gr164822113
EzuraYKajitaMIshidaRYoshidaSYoshidaHSuzukiTHosoiTInoue
SShirakiMOrimoHEmiM2004Associationofmultiplenucleotide
variationsinthepituitaryglutaminylcyclasegene(QPCT)withlowradial
BMDinadultwomenJBoneMinRes191296ndash301
Fan H Wu Y Qi X Zhang J Li J Gao X Zhang L Li J Gao H 2014
Genomewide detection of selective signatures in Simmental cattle J
ApplGenet1ndash9doi101007s1335301402006
HayesBJChamberlainAJMaceachernSSavinKMcPartlanHMacLeodI
SethuramanLGoddardME2009Agenomemapofdivergentartificial
selectionbetweenBostaurusdairycattleandBostaurusbeefcattleAnim
Genet40176ndash184doi101111j13652052200801815x
Hayes BJ Lien S Nilsen H Olsen HG Berg P Maceachern S Potter S
Meuwissen THE 2008 The origin of selection signatures on bovine
39
chromosome 6 Anim Genet 39 105ndash111 doi101111j1365
2052200701683x
IqbalSZebeliQMazzolariABertoniGDunnSMYangWZAmetajBN
2009 Feeding barley grain steeped in lactic acid modulates rumen
fermentation patterns and increases milk fat content in dairy cows J
DairySci926023ndash6032doi103168jds20092380
KemperKESaxtonSJBolormaaSHayesBJGoddardME2014Selection
for complex traits leaves littleornoclassic signaturesof selectionBMC
Genomics15246doi1011861471216415246
KimuraM1984Theneutraltheoryofmolecularevolution
KuhlaBNuumlrnbergGAlbrechtDGoumlrsSHammonHMMetgesCC2011InvolvementofSkeletalMuscleProteinGlycogenandFatMetabolismin
the Adaptation on Early Lactation of Dairy Cows J Proteome Res 10
4252ndash4262doi101021pr200425h
Lenstra JAGroeneveld LF EdingHKantanen JWilliams JL Taberlet P
NicolazziELSolknerJSimianerHCianiEGarciaJFBrufordMW
AjmoneMarsan P Weigend S 2012 Molecular tools and analytical
approaches for the characterization of farm animal genetic diversity
AnimGenet43483ndash502
Li J Johnson SE 2013 EphrinA5 promotes bovine muscle progenitor cell
migration before mitotic activation J Anim Sci 91 1086ndash1093
doi102527jas20125728
Ma YL Zhang Q Ding XD 2012 [Detecting selection signatures on X
chromosome in pig through high density SNPs] Yi Chuan Hered
ZhongguoYiChuanXueHuiBianJi341251ndash1260
Madani S Hichami A CharkaouiMalki M Khan NA 2004 Diacylglycerols
Containing Omega 3 and Omega 6 Fatty Acids Bind to RasGRP and
Modulate MAP Kinase Activation J Biol Chem 279 1176ndash1183
doi101074jbcM306252200
Maningat PD Sen P Rijnkels M Sunehag AL Hadsell DL Bray M
Haymond MW 2008 Gene expression in the human mammary
epitheliumduring lactation themilk fat globule transcriptome Physiol
Genomics3712ndash22doi101152physiolgenomics903412008
40
Mehlen P Puisieux A 2006Metastasis a question of life or deathNat Rev
Cancer6449ndash458doi101038nrc1886
NarayananUOspinaJKFreyMRHebertMDMateraAG2002SMNthe
Spinal Muscular Atrophy Protein Forms a PreImport Snrnp Complex
withSnurportin1andImportin HumMolGenet111785ndash1795
Nguyen N Yi JS Park H Lee JS Ko YG 2014 Mitsugumin 53 (MG53)
LigaseUbiquitinatesFocalAdhesionKinaseduringSkeletalMyogenesisJ
BiolChem2893209ndash3216doi101074jbcM113525154
NielsenRHellmannIHubiszMBustamanteCClarkAG2007Recentand
ongoing selection in the human genome Nat Rev Genet 8 857ndash868
doi101038nrg2187
NishimuraTHattoriATakahashiK1996Relationshipbetweendegradation
of proteoglycans andweakening of the intramuscular connective tissue
duringpostmortemageingofbeefMeatSci42251ndash260
Nussdorfer GG Rossi GPMalendowicz LKMazzocchi G 1999Autocrine
ParacrineEndothelinSysteminthePhysiologyandPathologyofSteroid
SecretingTissuesPharmacolRev51403ndash438
OrsquoBrienAMPUtsunomiyaYTMeacuteszaacuterosGBickhartDM LiuGE Tassell
CPV Sonstegard TS Silva MVD Garcia JF Soumllkner J 2014
Assessing signatures of selection through variation in linkage
disequilibrium between taurine and indicine cattle Genet Sel Evol 46
19doi101186129796864619
Pan D Zhang S Jiang J Jiang L Zhang Q Liu J 2013 GenomeWide
DetectionofSelectiveSignatureinChineseHolsteinPLoSONE8e60440
doi101371journalpone0060440
PuglisiRCambuliCCapoferriRGianninoLLukajADuchiRLazzariG
Galli C Feligini M Galli A Bongioni G 2013 Differential gene
expressionincumulusoocytecomplexescollectedbyovumpickupfrom
repeat breeder and normally fertile Holstein Friesian heifers Anim
ReprodSci14126ndash33doi101016janireprosci201307003
Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender D
MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKa
41
tool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795
Qanbari S Gianola D Hayes B Schenkel F Miller S Moore S Thaller GSimianer H 2011 Application of site and haplotypefrequency basedapproachesfordetectingselectionsignaturesincattleBMCGenomics12318doi1011861471216412318
Qanbari S Pausch H Jansen S Somel M Strom TM Fries R Nielsen RSimianer H 2014 Classic Selective Sweeps Revealed by MassiveSequencinginCattlePLoSGenet10doi101371journalpgen1004148
Qanbari S Pimentel ECG Tetens J Thaller G Lichtner P Sharifi ARSimianerH2010AgenomewidescanforsignaturesofrecentselectioninHolsteincattleAnimGenetdoi101111j13652052200902016x
Sabeti PC Reich DE Higgins JM Levine HZ Richter DJ Schaffner SFGabriel SB Platko JV Patterson NJ McDonald GJ Ackerman HCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash7
ScheetPStephensM2006Afastandflexiblestatisticalmodelforlargescalepopulation genotype data applications to inferring missing genotypesandhaplotypicphaseAmJHumGenet78629ndash44
Shahzad K Loor JJ 2012 Application of topdown and bottomup systemsapproaches in ruminantphysiologyandmetabolismCurrGenomics13379
SmithLSuzuki JGoffAFilionFTherrien JMurphyBKohanGhadrHLefebvreRBrisvilleABuczinskiSFecteauGPerecinFMeirellesF2012DevelopmentalandEpigeneticAnomaliesinClonedCattleReprodDomestAnim47107ndash114doi101111j14390531201202063x
Sun G Cheng SYS Chen M Lim CJ Pallen CJ 2012 Protein TyrosinePhosphatase Phosphotyrosyl789 Binds BCAR3 To Position Cas forActivationatIntegrinMediatedFocalAdhesionsMolCellBiol323776ndash3789doi101128MCB0021412
Sun J Zhang C Lan X Lei C ChenH 2012 Exploring polymorphisms andassociationsofthebovineMOGAT3genewithgrowthtraitsGenomeNatl
42
Res Counc Can Geacutenome Cons Natl Rech Can 55 56ndash62doi101139g11077
TaberletPCoissacEPansu J PompanonF 2011Conservationgeneticsofcattle sheep and goats C R Biol On the trail of domesticationsmigrationsandinvasionsinagricultureDominiqueJobGeorgesPelletierJeanClaudePernollet334247ndash254doi101016jcrvi201012007
Utsunomiya YT Perez OrsquoBrien AM Sonstegard TS Van Tassell CP doCarmo AS Meszaros G Solkner J Garcia JF 2013 Detecting lociunder recent positive selection in dairy and beef cattle by combiningdifferentgenomewidescanmethodsPLoSOne8e64280
Voight BF Kudaravalli S Wen X Pritchard JK 2006 A Map of RecentPositive Selection in the Human Genome PLoS Biol 4 e72doi101371journalpbio0040072
WangHKadlecekTAAuYeungBBGoodfellowHESHsuLYFreedmanTSWeissA2010ZAP70AnEssentialKinaseinTcellSignalingColdSpringHarbPerspectBiol2doi101101cshperspecta002279
WangMZhangXKangLJiangCJiangY2012MolecularcharacterizationofporcineNECD SNRPNandUBE3Agenesand imprinting status in theskeletal muscle of neonate pigs Mol Biol Rep 39 9415ndash9422doi101007s1103301218066
WatanabeRChanoTInoueHIsonoTKoiwaiOOkabeH2005Rb1cc1iscritical for myoblast differentiation through Rb1 regulation VirchowsArchIntJPathol447643ndash648doi101007s0042800411831
WickramasingheNSManavalanTTDoughertySMRiggsKALiYKlingeCM 2009 Estradiol downregulates miR21 expression and increasesmiR21targetgeneexpressioninMCF7breastcancercellsNucleicAcidsRes372584ndash2595doi101093nargkp117
WildeCJAddeyCVLiPFernigDG1997Programmedcelldeathinbovinemammary tissue during lactation and involution Exp Physiol 82 943ndash953
YasudaSStevensRLTeradaTTakedaMHashimotoTFukaeJHoritaTKataoka H Atsumi T Koike T 2007 Defective Expression of Ras
43
Guanyl NucleotideReleasing Protein 1 in a Subset of Patients with
SystemicLupusErythematosusJImmunol1794890ndash4900
ZhangCBaileyDKAwadTLiuGXingGCaoMValmeekamVRetiefJ
Matsuzaki H Taub M Seielstad M Kennedy GC 2006 A whole
genome longrange haplotype (WGLRH) test for detecting imprints of
positiveselectioninhumanpopulationsBioinformatics222122ndash8
ZhaoFQOkineEKCheesemanCIShiraziBeecheySPKennellyJJ1998
Glucose transporter gene expression in lactating bovine gastrointestinal
tractJAnimSci762921ndash2929
44
Supplementaryfiles
YoucanfindChapterʹsupplementaryfilesat
- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)
- GoogleDrive(httptinyurlcomPhDMMChapter02)orscantheQRcode
SupplementaryfileS1ndashSignificantcorehaplotypesandgenessharedin
dairyandbeef
Significantcorehaplotypes(pvaluele005haplotypefrequencyge025)shared
indairyandbeefcattlebreedsandrelativegenesintersectingwiththemThefile
isdividedinto4tablesThefirst2sheets(sheet1and2)shownthesignificant
corehaplotypesfordairyandbeefThelast2sheets(sheet3and4)showngenes
intersectingwiththesignificantcorehaplotypesfordairyandbeef
SupplementaryfileS2ndashNetworksrankingindairybreeds
Networksrankingindairybreedswithrelativemoleculessymbolslistscore
valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop
functionforeachnetwork
SupplementaryfileS3ndashNetworksrankinginbeefbreeds
Networksrankinginbeefbreedswithrelativemoleculessymbolslistscore
valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop
functionforeachnetwork
45
SupplementaryfileS4ndashCanonicalpathwaysrankingindairyorbeefcattle
breeds
Canonicalpathwaysrankingindairy(sheet1)orbeef(sheet2)cattlebreedswithrelativemoleculessymbolslistratioandndashlog10ofthepvaluesforeachcanonicalpathway
46
CHAPTER3Searchingnewsignalsforproductivetraitsthroughgenebasedassociationanalysisin
threeItaliancattlebreedsSCapomaccio1MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItalyTheseauthorscontributedequallytothiswork
____________________SubmittedtoAnimalGenetics
Abstract
Genome wide association studies (GWAS) have been widely applied to
disentanglethegeneticbasisofcomplextraits IncattlebreedsclassicalGWAS
approach with medium density markers panels are far to be conclusive
especiallyforcomplextraitsThisisduetotheintrinsiclimitationsofGWASand
to the choices that are made to step from the associationrsquos signals to the
functionalunits
Here we applied a geneHbased association strategy to prioritize genotypeH
phenotype associations found with classical approaches in three Italian dairy
cattlebreedswithdifferentsamplesizes(ItalianHolsteinn=2058ItalianBrown
n=745ItalianSimmentaln=477)formilkproductionandqualitytraits
WhileclassicalsinglemarkerregressionfailedtorevealgenomeHwidesignificant
genotypeHphenotype association except for Italian Holstein the gene based
approach found specific genes in each breed that are associated with milk
physiologyandmammaryglanddevelopment
Since no established method is yet implemented to step from variation to
functional units our strategy may contribute to reveal new genes that play
significant roles in complex traits like thosehere investigated condensing low
signalsofassociationinageneHcentricfashionapproach
Background
Modern cattle breeds have nomore than 200 years of history (Taberlet et al
2008) a time spanwhere selection reproductive isolationand the consequent
geneticdriftshapedallelefrequenciesandlinkagedisequilibriumthroughoutthe
genome This is particularly true for industrial dairy breeds where the joint
application of reproductive technologies and modern genetic evaluation
methodshave imposedahigh selectionpressureona limitednumberof traits
(Taberletetal2008)involvedinmilkquantityandquality
Indeed the milk production industry has been historically ready to acquire
technicalimprovementsfromdifferentfieldsstatisticsbiologyandengineering
50
ForexampleBLUPbasedstrategiesandhighHdensitygenotyping throughSNPHchipsarenowroutinelyincludedinbullevaluationshavingsignificantimpactonselectionefficiencyandpopulationstructureThese techniques have dramatically improved milk yield and compositionwithout knowing the biological basis of the traits under selection To datedespite many efforts and many QTL mapped only a few genes have beeneffectivelyassociatedwithmilkyieldandcomposition(Grisartetal2002Blottetal2003CohenHZinderetal2005)In thiscontextGenomeWideAssociationStudies(GWAS) is themostcommonmethodfordissectingthebiologythatunderliescomplextraits(McCarthyetal2008)TheunderlyingideaofGWASissimplefindmarkerHtraitassociationsexploitingthe linkage disequilibrium (LD) that exists between the causative mutation ndashwhichweignorendashandoneormoreanalyzedmarkerswhichwewellknowInthiswayonecanpinpointgenomicregionscarryingcausalvariantsforanytrait
Although a number of GWAS have been conducted in cattle productive andmorphological traits (some examples (Jiang et al 2010 Cole et al 2011Meredith et al 2012)) and found several genomic regions associated to thesetraits none focused their attention directly on genes involved inmilk biologyMoreoveroneof themajordrawbacksofGWASisrelatedtothescarceresultsportability across populations due to a variety of confounding factorspopulationstructuredifferentialLD levelsbreedspecific selection targetsandeven SNPs ascertainment bias (Clark et al 2005) For instance significantassociationsonfatpercentagewereidentifiedinmanyregionsoftheBostaurusgenome in different breeds Except for genes of major effect there is littleagreement on which regions and H more importantly H genes are actuallyinvolvedinthecontrolofthetraitinvestigatedThisissomehownotsurprisingone favorable allele may segregate in one breed and be fixated in a differentbreed the same allele segregates in both breeds but allelesmay differ or thesegregates in both breeds but the genetic background masks the effect ofsegregation
51
AnotherlimitationinGWASisthestringentsignificancethresholdoftenapplied
tocorrectformultipletestingAsaresulta largeproportionofgeneswithlow
effectaredisregardedwithconsequentoverestimationof thehigheffectgenes
variance(Xu2003)Thisbiasisparticularlyworryingwheninvestigatingtraits
controlledbya largenumberofgeneswithsmalleffectand inhighLDasmilk
yield(GIANOLAETAL2013)
To overcome this kind of limitation different strategies have been invented
usingprior knowledge Interesting ldquocandidatepathwayrdquo approacheshavebeen
conducted(Ravenetal2013)testingforassociationbetweenaphenotypeand
genes inpathwaysthatare likelytobe involvedinthecontrolofthetraitThis
approach reduces thenumberofmarkers tobe tested increases the statistical
poweroftheexperimentrestrictingtheanalysistoexistingknowledge
Other methods based on statistical manipulations or particular experimental
designs have been developed to prioritize or validate genotypeHphenotype
associations For example geneHbased association strategies while restricting
GWA study only to genes and neighboring genomic regionsmay identify new
candidategenesdeservingpostHGWASanalysis(Akulaetal2011Cantoretal
2010 Liu et al 2010) Moreover it is true that classical GWAS downstream
bioinformatics analyses concentrate their efforts in findingmeaningful results
usinggenesasfinaltarget
The aim of ourworkwas to find new candidate genes and novel information
associated to complex traits as milk production (yiled) and quality (fat and
protein percentage) using a geneHcentric genome wide association in three
differentdairybreedsItalianHolsteinItalianBrownandItalianSimmental
Methods
BiologicalSamples
Bulls from the three most represented dairy breeds in Italy (Italian Holstein
ItalianBrownandItalianSimmental)wereselectedfromArtificialInsemination
(AI) bulls progeny tested in Italy until 2010 and forwhich biological samples
52
wereavailableItalianHolsteinbullswerechosenaccordingtofollowingcriteria
i) having the national Selection Index namely production functionality type
(PFT)withreliabilitygt75andii)havingthelowestpossiblerelationshipwith
theothersiresinthesetThiswasachievedbymaximizingthevariabilitywithin
the group On the contrary considering the low number of available Italian
BrownandItalianSimmentalbullstheywereallincludedintheanalysesAfter
thisprocedureatotalof2139ItalianHolstein775ItalianBrownand486Italian
Simmental samples (including quality control replicates)were genotypedwith
theBovineSNP50BeadChipv1(IlluminaSanDiegoCA)
Traitsconsideredintheanalysis
Italian Holstein (ANAFI) Italian Brown (ANARB) and Italian Simmental
(ANAPRI)breedersassociationsprovidedderegressedproofs(DRP)ordaughter
yielddeviations (DYD)hereafter referredasphenotype forall traitsanalyzed
Traitsanalyzedweremilkyieldfatpercentageandproteinpercentage
Genotypingandqualitycontrol
Quality controls (QC) were run independently for each breed using the same
pipelineandthresholds IndetailQCexcluded i)thereplicatesamplewiththe
highernumberofmissinggenotypesii)sampleswithunexpectedlyhigh(ge100)
mendelianerrors(forfatherHsonpairs)iii)sampleswithmorethan5missing
genotypes iv) SNPs with ge 25 missing data v) SNPs with minor allele
frequency(MAF)le5vi)SNPswithmendelianerrorsge25andvii)SNPsout
ofHardyHWeinbergequilibriumtestedbyFisherexacttest(ple0001Bonferroni
corrected)
Analysisofpopulationstructure
MultiDimensionScaling(MDS)wasperformedusingPLINK107(Purcelletal
2007)andresultsplottedwithRscriptsThisanalysisshowninsupplementary
material (FigureS1) exploredpopulationsubstructureandverified thegenetic
homogeneityofthedatasetpriortofurtheranalyses
53
PairwiseLDwascalculatedusingPLINK107(Purcelletal2007)LDdecayand
LD structure on specific regions were calculated with in house R scripts and
snpplotterRpackage(LunaandNicodemus2007)
GWAS
GenomeHwide association analysis was made with the GenABEL package in R
(Team 2014) using GRAMMARHCG (Genome wide Association using Mixed
Model and Regression H Genomic Control) approach (Amin et al 2007
Aulchenkoetal2007)
Basingonpriorknowledge(Minozzietal2013) in ItalianHolsteintheDGAT1
regionwas also treated as fixed effect inGenABEL analysis of the three traits
ResultsusingthismodelarereferredtoasldquoHolsteinDGATrdquo
Manhattan andquantileHquantileplots for each traitwereproduced to allowa
thoroughevaluationoftheGWASresultsAllcalculatedpHvaluesfromtheGWAS
analysiswere retained for the downstream prioritizationwith the geneHbased
association
Workingongenomecoordinates
Coordinatesof themarkerswere liftedover fromBTA40(Elsiketal2009)to
UMD31assembly(Ziminetal2009)usingSnpChimpv1(Nicolazzietal2014)
andgenecoordinateswereretrievedusingBioMartEnsembl
QTLcoordinatesweredownloadedfromtheAnimalQTLdb(Huetal2013) in
bedformat
Operations on genomic intervalswereperformedusing thebedtools suite ver
2162(QuinlanandHall2010)
GeneBasedAssociationanalysis
Toreducetheantagonismbetweensignificancethresholdandfalsepositiverate
of a classical GWA study (Gianola et al 2013) and to facilitate selection
signature comparison across the three breeds investigated we used a geneH
54
centricstatisticthatcollapsesinasinglevalueallthepHvaluesfromSNPsrelated
toagenecorrectingforLDinformation
Our approachderivesdirectly from the softwareVEGAS (VersatileGeneHBased
TestforGenomeHwideAssociationStudies)(Liuetal2010)adaptedtopursue
meaningfulresultsinanyspeciesbeyondhumans
Briefly the geneHwise test statistic is the sum of the n upper tail chiHsquared
valuesofaSNPsubset withonedegreeof freedom(wheren is thenumberof
SNPmappinginthegivengene)WithperfectlinkageequilibriumthegeneHwise
pHvaluewouldbetheoneHsidepHvalueofachiHsquaredistributionwithndegrees
of freedom Otherwise more likely the distribution of the pHvalues has to be
evaluated(withsimulations)andweighted(withLDvalues)(Liuetal2010)
The VEGAS speciesHfree procedure was implemented in a UNIX environment
usingBashandRlanguagesusingEnsemblannotationasinputfilesPLINKdata
typeandpHvaluesforeachSNPfromthepreviouslycitedGRAMMARprocedure
WemappedSNPsonBostaurusgenes(UMD31)andtheirboundaries(100000
bpupHanddownHstream)toaccountforregulatoryregionsOnlygeneswithat
least 2 SNPs mapped were retained Significant entries from the geneHbased
associationweremappedinthecattleQTLdatabasewithbedtools
ResultsandDiscussion
In this study a gene based association strategy was applied to identify new
genomicregionsboostingGWASanalysisresultsformilkproductionandquality
fromthreeItaliandairybreeds
Complextraitslikethosehereevaluatedareaffectedbyafewmajorgeneswith
large effects andmany otherswithmoderate to low effects the latter are not
easily identifiedby genomewide scans inmodern cattle breedsduemainly to
samplesizelimitationwhilesignalfromtheformersarelostingenomescansH
withsomeexceptionsHduerapidfixations
InthisscenariowithmediumdensitySNPpanelsliketheoneusedinthisstudy
associationbasedonsingleSNPregressionorotherldquoclassicalrdquomethodscouldbe
inconclusiveintermsofresults
55
AftertheQCprocess7452058and477samplesand3575938535and38574SNPs were retained for Italian Brown Italian Holstein and Italian Simmentalbulls respectively (Table S1) HolsteinDGAT analysis retained 38534markersoneSNPlessthantheotherHolsteinpanelDescriptive statistics on genotypes for each breed (histogram and barplot forheterozygosis and call rate per individual Figure S7 and multi dimensionalscalingFigureS1)revealedabsenceofconfoundingeffectsWiththesedatasetsatthenominalgenomeHwidepHvaluesignificancethresholdof 005 (ple125 x 10H6 after Bonferroni correction (Burton et al 2007Dudbridge and Gusnanto 2008)) significant associations were found only inItalianHolsteinbetweenmilkyieldandfatpercentageand26SNPsmappingintheproximalregionofBTA14nearbyDGAT1(Table1)
56
Table1PHvaluesoftheSNPsovercomingthegenomewidesignificancethresholdintheHolsteinbreedformilkyieldandfatpercentage(notaccountingforDGAT1effect)SNPsaresortedbypositionsintheUMD31assembly
SNPID BTA Position(Bp) POvalue(Fat) POValue(Milk)
Hapmap30383OBTCO005848 14 1489496 112EH12 361EH12
ARSOBFGLONGSO57820 14 1651311 247EH46 161EH16
ARSOBFGLONGSO34135 14 1675278 857EH17
ARSOBFGLONGSO94706 14 1696470 665EH16
ARSOBFGLONGSO4939 14 1801116 534EH49 106EH20
Hapmap52798Oss46526455 14 1923292 915EH21
UAOIFASAO6878 14 2002873 382EH24
ARSOBFGLONGSO107379 14 2054457 118EH33 664EH15
ARSOBFGLONGSO18365 14 2117455 250EH16
Hapmap30922OBTCO002021 14 2138926 439EH13
Hapmap25384OBTCO001997 14 2217163 390EH10 697EH09
Hapmap24715OBTCO001973 14 2239085 108EH09 369EH09
BTAO35941OnoOrs 14 2276443 402EH13
Hapmap30374OBTCO002159 14 2468020 528EH12
Hapmap30086OBTCO002066 14 2524432 798EH11
ARSOBFGLONGSO74378 14 3640788 161EH08
ARSOBFGLOBACO1511 14 3765019 375EH09
UAOIFASAO9288 14 3956956 811EH09
ARSOBFGLONGSO100480 14 4364952 133EH09
UAOIFASAO5306 14 4468478 458EH08
ThisislikelyduetotheeffectoftheK232ADGAT1mutationthatissegregatingin
thisbreed(Fontanesietal2014)NootherSNPwassignificantlyassociatedto
anyothertraitinthe3breedsinvestigated
SomeSNPswereunderasuggestivesignificancethresholdofple5x10H5Details
on these markers are in supplementary material (Table S3) but they are not
discussedherePlotsdescribingatglancetheassociationresultsrevealingalow
signaltonoiseratioareavailableintheonlinesupplementarymaterial(Figure
S2S3S4andS5respectively for ItalianBrown ItalianHolsteinHolsteinDGAT
andItalianSimmental)
57
Asperthegenecentricanalysiswemappedatotalof192371989919894and
19844 genes on 24616 available on Ensembl (release version 73) in Italian
BrownItalianHolsteinHolsteinDGATandItalianSimmentalrespectivelyGenes
with a geneHwisepHvalue at least 10H5were considered significantly associated
withthestudiedtraitThesignificanceappearedtobeadequatetothehighrate
ofredundancyof theannotationfeaturesandto figureoutthetopassociations
with a reasonable level of confidence Results on these associations per breed
andpertraitaresummarizedinTable2Table3andinsupplementarymaterials
(TableS2andTableS4)ThemajorityofsignificantgenesfoundwiththegeneH
centricanalysismapped inpreviouslycharacterizedQTL thatareassociated to
milkproductionorqualityThesearesummarizedintheTableS5
Table2Numberofsignificantlyassociatedgenesforeachbreedineachtraitinthegenebasedassociationtest
Fat Milkyield Protein
Brown 3 1
Holstein 92 80 1
HolsteinDGAT 1 4 1
Simmental 1 13
58
Tableamp3FeaturessignificantlyassociatedperbreedtraitHolsteinanalysisnotaccountingforDGAT1isshow
ninTableS3
Breedamp
Traitamp
EnsemblampIDamp
GeneWiseampPValueamp
SNPsInGeneamp
BestSNPamp
BestSNP_PValueamp
BTAamp
Startamp(bp)amp
Endamp(bp)amp
GeneampNam
eamp
BROWNamp
FAT
Nosignificantassociationfound
MILK
ENSBTAG00000019237
500EG05
4Hapmap53770Gss46526325
204EG04
23
47333403
47348212
TXNDC5
ENSBTAG00000043918
500EG05
4Hapmap53770Gss46526325
204EG04
23
47337650
47337787
5S_rRNA
ENSBTAG00000046839
500EG05
4Hapmap53770Gss46526325
204EG04
23
47328131
47352621
UncharacterizedProtein
PROTEIN
ENSBTAG00000001260
700EG05
3ARSGBFGLGNGSG116222
441EG03
18
52120387
52124418
PINLYP
amp
HOLSTEINDGATamp
FAT
ENSBTAG00000000369
100EG05
3Hapmap53294Grs29016908
333EG05
594673755
94749093
EPS8
MILK
ENSBTAG00000026896
130EG05
5ARSGBFGLGNGSG104949
385EG04
23
51549721
51553563
FOXF2
ENSBTAG00000005260
400EG05
5Hapmap33288GBTCG033751
120EG03
638120578
38127577
SPP1
ENSBTAG00000006579
500EG05
4ARSGBFGLGBACG2207
632EG05
15
54418559
54459500
P4HA3
ENSBTAG00000007013
700EG05
6UAGIFASAG7665
396EG05
19
5221507
5283308
TOM1L1
PROTEIN
ENSBTAG00000006638
600EG05
3ARSGBFGLGNGSG104774
176EG04
18
56523439
56530499
BCL2L12
amp
SIMMENTHALamp
FAT
ENSBTAG00000037649
900EG05
6Hapmap30001GBTAG142716
529EG05
4120427915
120494453
VIPR2
MILK
ENSBTAG00000017538
100EG05
6ARSGBFGLGBACG13093
587EG06
12
85630428
85630912
Pseudogene
ENSBTAG00000000856
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1766767
1769754
FBXL6
ENSBTAG00000000857
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1763994
1766621
GPR172B
ENSBTAG00000026356
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1795351
1804562
DGAT1
ENSBTAG00000035158
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1770660
1772329
TMEM
249
ENSBTAG00000046208
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1782901
1788087
SCRT1
ENSBTAG00000009811
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1825982
1844587
UncharacterizedProtein
ENSBTAG00000009816
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1838135
1840081
UncharacterizedProtein
ENSBTAG00000014458
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1844664
1894424
MROH1
ENSBTAG00000020751
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1806081
1825793
HSF1
ENSBTAG00000044406
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1883849
1883971
SCARNA15
ENSBTAG00000046889
400EG05
5ARSGBFGLGBACG13093
587EG06
12
85610279
85610762
UncharacterizedProtein
ENSBTAG00000020117
800EG05
3BTAG78494GnoGrs
188EG04
721189879
21196263
ZBTB7A
PROTEIN
Nosignificantassociationfound
amp
59
1 ItalianHolstein
The$ large$ number$ of$ significant$ signals$ was$ observed$ in$ the$ Italian$ Holstein$
especially$ for$ milk$ yield$ and$ fat$ percentage$ As$ shown$ in$ Table$ 2$ the$ strong$
effect$ of$ DGAT1$ in$ Italian$ Holsteins$ observed$ with$ individual$ SNPs$ was$
confirmed$by$the$geneBcentric$approach$In$this$region$almost$the$totality$of$the$
geneBwise$pBvalues$under$the$chosen$threshold$appeared$to$be$influenced$by$this$
major$ gene$ and$ by$ the$ high$ level$ of$ linkage$ disequilibrium$ that$ characterizes$
cosmopolitan$bovine$breeds$(see$Figure$S6)$Within$this$pool$of$genes$it$is$worth$
to$mention$PLEC$ (plectin)$ a$ cell$ surface$ receptor$ gene$ in$ the$RNASE5$ pathway$
(Angiogenin$ encoded$ by$ angiogenin$ ribonuclease$ RNase$ A$ family$ 5)$ Proteins$
encoded$ by$ these$ genes$ that$ are$ present$ in$ milk$ are$ involved$ in$ protein$
synthesis$ and$ act$ as$ growth$ factors$ in$ vitro$Moreover$RNASE5$ seems$ to$ have$
other$ functions$ such$ as$ regulation$ of$ stress$ response$ and$ apoptosis$ and$ even$
angiogenesis$ through$ which$ it$ may$ also$ contribute$ to$ the$ regulate$ milk$
production$(Raven$et$al$2013)$
The$ exclusion$ of$DGAT1$ effect$ (HolsteinDGAT$ results)$ highlighted$ that$ all$ the$
significant$ effects$ on$ fat$ percentage$ found$ for$ genes$ located$ within$
approximately$25$Mb$up$or$downstream$from$the$excluded$SNP$(ARS5BFGL5NGS5
4939)$were$ redundant$ signals$ of$DGAT1$ (Table3$ and$Table$ S2$ and$ S3)$ This$ is$
also$observed$for$milk$yield$due$to$the$genetic$correlations$between$fat$and$milk$$
Also$ PLEC$ was$ not$ significant$ in$ our$ analyses$ when$ taking$ into$ account$ the$
DGAT1$effect$contradicting$previous$findings$in$Holstein$that$report$association$
of$PLEC$gene$with$production$traits$either$accounting$or$not$for$the$effect$of$the$
diglyceride$ acyltransferase$ (Raven$ et$ al$ 2013)$ A$ possible$ explanation$ for$ the$
different$results$between$these$studies$is$a$difference$between$genetic$structures$
of$the$populations$analyzed$To$reduce$false$positive$signals$we$only$evaluated$
the$genes$based$on$HolsteinDGAT$results$for$the$Italian$Holstein$$
11 Milkyield
We$found$genes$associated$with$this$trait$in$Holstein$in$particular$SPP1$P4HA3=
FOXF2$and$TOM1L1$genes$
Among$ these$ the$ osteopontin$ SPP1$ (secreted$ phosphoprotein$ 1)$ is$ directly$
related$to$milk$production$(de$Mello$et$al$2012)$ It$ is$ located$ in$BTA6$nearby$
60
$
ABCG2$ and$ there$ is$ evidence$ that$ polymorphisms$ in$ this$ region$ are$ strongly$
correlated$with$high$production$ levels$of$milk$ in$various$breeds$(CohenBZinder$
et$ al$ 2005)$Osteopontin$SPP1$ is$ a$ secreted$glycophosphoprotein$ that$plays$ a$
crucial$ role$ in$ mammary$ gland$ development$ having$ an$ essential$ role$ also$ in$
mammary$gland$differentiation$and$branching$of$the$mammary$epithelial$ductal$
system$ This$ gene$has$ also$ been$ implicated$ in$many$ types$ of$ cancer$ including$
breast$cancer$ for$ the$potentiality$of$proliferation$and$alveologenesis$ (Hubbard$
et$ al$ 2013)$ It$ is$worth$mentioning$ that$ the$most$ significant$ SNPs$within$ this$
gene$was$Hapmap332885BTC5033751=with=a$pBvalue$of$00012$ largely$over$ the$
genomeBwide$significance$ threshold$This$signal$would$be$ ignored$ in$ the$single$
SNP$approach$
Another$gene$revealed$by$the$geneBbased$association$is$P4HA3$encoding$for$the$
prolyl$4Bhydroxylase$α$polypeptide$III$involved$in$the$correct$folding$of$nascent$
procollagen$ chains$ It$ is$ known$ that$ stromalBepithelial$ interactions$ are$ critical$
for$mammary$gland$development$and$defective$deposition$or$reorganization$of$
extracellular$ matrix$ can$ lead$ to$ incorrect$ mammary$ epithelial$ morphogenesis$
(Gillette$ et$ al$ 2013)$ highlighting$ the$ role$ of$ this$ gene$ in$ the$ healthy$ breast$
tissue=
FOXF2=gene$ (Forkhead$Box$F2)$ is$ a$member$of$ the$ forkhead$box$ transcription$
factors$ known$ to$ be$ involved$ in$ tissue$ development$ as$ part$ of$ the$ ldquoHedgehog$
signaling$pathwayrdquo$This$key$cascade$is$essential$for$the$correct$segmentation$of$
tissues$during$embryonic$development$but$has$been$recently$demonstrated$that$
is$active$in$adult$stem$cells$proliferation$in$mammary$tissue$(Liu$et$al$2006)$In$
particular$ FOXF2$ plays$ a$ key$ role$ in$ extracellular$ matrix$ synthesis$ and$
epithelialBmesenchymal$ interactions$ and$ could$ therefore$ affect$ the$ proper$
development$of$the$mammary$stroma$In$addition$low$levels$of$transcription$of$
this$gene$are$correlated$ to$early$onset$metastasis$ in$breast$cancer$ (Kong$et$al$
2013)$ underlining$ a$ specific$ role$ in$ the$ mammary$ gland$
Lastly$ TOM1L1$ (Target$ Of$ Myb1$ ChickenBLike$ 1)$ is$ another$ gene$ potentially$
involved$ with$ the$ milk$ physiology$ as$ is$ known$ its$ implication$ with$ the$ early$
endosome$ sorting$ coupled$ with$ EGFR= (epidermal$ growth$ factor$ receptor)$
another$ important$player$ in$ lactation$(Fowler$et$al$1995)$TOM1L1$belongs$to$
61
$
the$ VHS$ domain$ containing$ proteins$ that$ are$ involved$ in$ the$ postBGolgi$
trafficking$and$signaling$(Wang$et$al$2010)$
12 Fatpercentage
In$ Italian$ Holstein$ breed$ we$ found$ a$ significant$ association$ with$ or$ without$
DGAT1$ effect$ correction$ to$ the$EPS8$ gene$ located$on$BTA5$at$around$95$Mbp$
This$gene$encodes$for$a$substrate$of$the$already$cited$EGFR$and$is$involved$in$the$
fatty$acid$metabolism$(Fazioli$et$al$1993)$A$recent$study$on$Holstein$Friesian$
(Wang$et$al$2012)$found$strong$association$with$a$variant$located$in$the$second$
intron$of$the$epidermal$growth$factor$receptor$pathway$substrate$enforcing$the$
hypothesis$of$a$true$involvement$of$this$gene$on$fat$percentage$
13 Proteinpercentage
The$BCL2L12$gene$(BclB2Blike$protein$12)$associated$with$protein$percentage$in$
Italian$Holstein$breed$represents$another$interesting$result$The$product$of$this$
gene= is$ an$ important$ oncoprotein$ in$ two$ fundamental$ cell$ cycle$ pathways$
namely$ p53$ and$ cytoplasmic$ caspase$ signaling$ BCL2L12= resides$ in$ both$
cytoplasm$and$nucleus$inhibiting$caspases$3$and$7$and$forming$a$complex$with$
p53$thereby$ inhibiting$p53Bdirected$transcriptomic$changes$upon$DNA$damage$
(Stegh$ and$DePinho$ 2011)$ The$members$ of$ BCL2$ family$ are$ considered$ antiB
apoptotic$genes$and$it$is$obvious$that$dysfunctional$programmed$cell$death$plays$
an$ important$ role$ in$ the$ pathogenesis$ and$ in$ the$ progression$ of$ cancer$Many$
members$of$this$family$were$found$to$be$modulated$in$various$malignancies$and$
BCL2L12= in$ particular$ was$ expressed$ in$ mammary$ gland$ and$ proposed$ as$
prognostic$ cancer$ biomarker$ (Thomadaki$ et$ al$ 2007)$ In$ dairy$ cows$ the$
members$of$BCL2$family$are$involved$in$the$key$events$ in$the$mammary$tissue$
development$ and$ remodeling$ during$ lactation$ and$ involution$ (Stefanon$ et$ al$
2002$p$200)$Members$of$BCL2$family$regulate$the$turnover$of$cell$population$
in$the$mammary$gland$during$lactation$and$its$overexpression$was$found$to$be$
linearly$related$to$milk$production$in$Holstein$cow$(Colitti$et$al$2004)$
$
62
$
2 ItalianSimmental
In$ the$ Italian$ Simmental$ breed$ we$ found$ one$ association$ on$ BTA4$ for$ fat$
percentage$(VIPR2)$and$two$associations$ for$milk$yield$one$ in$BTA7$(ZBTB7A)$
and$ the$ other$ in$ BTA14$ related$ to$ the$ DGAT1$ No$ association$ was$ found$ for$
protein$percentage$
21 Fatpercentage
Among$the$genes$found$significantly$associated$with$production$traits$in$Italian$
Simmental$VIPR2=a$ gene$ encoding$ a$ receptor$ for$ a$ small$ vasoactive$ intestinal$
neuropeptide$ was$ the$ single$ signal$ in$ BTA4$ for$ fat$ percentage$ Vasoactive$
intestinal$ peptide$ (VIP)$ is$ synthesized$ in$ the$pituitary$ gland$ and$ among$other$
functions$ stimulates$ prolactin$ secretion$ and$ is$ involved$ in$ proliferation$
survival$and$differentiation$in$human$breast$cancer$cells$(Valdehita$et$al$2010)$$
An$interesting$novel$finding$is$the$effect$of$immunosuppression$(through$direct$
effect$of$IL56)$on$prolactin$cells$and$its$ability$to$regulate$prolactin$by$acting$on$
pituitary$ VIP$ through$ VIP$ receptors$ (Blanco$ et$ al$ 2013)$ These$ evidences$
suggest$ that$VIPR2$ could$ be$ an$ important$ candidate$ for$ further$ investigations$
through$post$GWAS$approaches$such$as$reBsequencing$or$fine$mapping$
22 Milkyield
Surprisingly$the$DGAT1$gene$was$significantly$associated$with$milk$yield$but$not$
with$fat$percentage$in$Italian$Simmental$breed$Indeed$the$most$significant$SNP$
for$the$milk$yield$trait$in$this$breed$was$the$ARS5BFGL5NGS54939$with$a$pBvalue$
of$ 413EB06$ The$ importance$ of$ this$ gene$ in$ the$ lactation$ is$ largely$ known$
(Grisart$et$al$2002)$$
To$ investigate$DGAT1$ signal$ in$ Simmental$ we$ analyzed$ the$ fine$ LD$ structure$
genomeBwide$ and$ in$ BTA14$ proximal$ region$ in$ all$ the$ three$ breeds$ revealing$
different$patterns$
Observed$LD$decay$on$ the$ three$breeds$genomes$revealed$similar$behavior$ for$
Italian$Holstein$and$Italian$Brown$with$r2$gt$010$$until$ around$ 250$ Kbp$ while$
Italian$Simmental$showed$lower$LD$Distances$higher$that$400$Kbp$revealed$half$
level$of$LD$in$Simmental$(Figure$S6)$
63
$
In$DGAT1$region$LD$behavior$is$very$similar$between$Italian$Holstein$and$Italian$
Brown$ (Figure$ S8)$ while$ association$ values$ are$ really$ different$ because$ ARS5
BFGL5NGS54939=is$fixed$in$Italian$Brown$and$purged$by$QC$process$Interestingly$
association$ patterns$ become$ similar$ when$ DGAT1$ effect$ is$ removed$ from$ the$
Holstein$ panel$ (HolsteinDGAT$ Figure$ S8)$ In$ Simmental$ we$ found$ a$ different$
situation$ very$ low$ LD$ level$ and$ significant$ association$ levels$ with$ ARS5BFGL5
NGS54939$with$ the$milk$ trait$This$ is$probably$due$ to$ the$ low$ frequency$of$ the$
p232K$ allele$ in$ Italian$ Simmental$ (Scotti$ et$ al$ 2010)$ and$ to$ the$ different$
selection$strategies$applied$in$these$breeds$
The$ relatively$ high$ pBvalue$ of$ this$ SNP$ is$ responsible$ for$ the$ statistical$
significance$of$many$of$the$13$genes$found$significant$in$this$breed$for$this$trait$
as$ shown$ in$ Table$ 3$ therefore$ reducing$ the$ gene$ set$ to$ three$ locations$ two$
overlapping$features$indicating$a$pseudogene$in$BTA12$and$a$zinc$finger$protein$
ZBTB7A$in$BTA7$DGAT1$role$and$function$will$be$not$discussed$here$as$already$
done$elsewhere$(Grisart$et$al$2002)$
ZBTB7A=is$part$of$zinc$finger$and$BTB$proteins$(Broad$complex$Tramtrack$and$
Bricabrac)$ transcription$ factors$ with$ emerging$ importance$ for$ their$
involvement$ in$ oncogenesis$ and$ cell$ differentiation$by$ repressing$key$ genes$of$
cell$cycle$regulation$as$retinoblastoma$(Rb)$and=p53=(ldquoJeon$et$al$B$2008$B$ProtoB
oncogene$ FBIB1$ (PokemonZBTB7A)$ Represses$ Trpdfrdquo$ nd)= ZBTB7A$ over$
expression$ is$ shown$ in$primary$ and$ recurrent$ breast$ cancer$ tissues$ leading$ to$
disease$progression$and$poor$survival$(Zu$et$al$2011)$Link$between$this$gene$
polymorphism$and$milk$production$might$be$in$the$proper$cellular$development$
of$ the$ mammary$ gland$ This$ transcription$ factor$ is$ also$ known$ as$ a$ master$
regulator$ of$ B$ and$ TBcell$ lineage$ (Lee$ and$ Maeda$ 2012)$ enforcing$ the$ link$
between$ milk$ production$ and$ immune$ protection$ as$ already$ outlined$ by$ of$
Lemay$ and$ colleagues$ with$ proteomic$ studies$ where$ the$ ldquoactivation$ of$ the$
innate$ immune$systemrdquo$resulted$as$one$of$ the$most$represented$gene$ontology$
categories$(Lemay$et$al$2009)$
$
64
$
3 ItalianBrown
In$ the$ Italian$Brown$breed$we$ found$ one$ association$ on$BTA23$ (TXNDC5)$ for$
milk$yield$and$one$in$BTA18$for$protein$percentage$(PINLYP)$The$three$features$
listed$in$Table$3$for$milk$yield$in$Italian$Brown$were$collapsed$due$to$overlap$$
31 Milkyield
Thioredoxin$ domainBcontaining$ protein$ 5$ (TXNDC5)$ belongs$ to$ thioredoxin$
family$small$ubiquitous$proteins$containing$disulphide$domain$affecting$protein$
folding$with$chaperone$activity$preventing$protein$misBfolding$within$the$lumen$
of$ the$endoplasmic$reticulum$This$multifunctional$organelle$have$a$key$role$ in$
several$ processes$ including$ lipid$ catabolism$ and$ protein$ synthesis$ (RamiacuterezB
Torres$ et$ al$ 2012)$ Because$ of$ its$ disulphide$ bonds$ thioredoxin$ facilitate$ the$
reduction$of$other$proteins$by$cysteine$thiol$disulfide$exchange$This$mechanism$
is$essential$for$detossification$
During$ lactation$ indeed$ an$ enhanced$ detoxification$ level$ is$ functional$ to$
preserve$ milk$ quality$ against$ increased$ metabolic$ activities$ Higher$ metabolic$
levels$ lead$ to$ free$ radicals$ production$ making$ necessary$ the$ maintenance$ of$
redox$ homeostasis$ that$ include$ among$ others$ the$ reduction$ and$ oxidation$ of$
peroxiredoxin$and$thioredoxin$(Lu$and$Holmgren$2014)$
32 Proteinpercentage
For$ protein$ percentage$ in$ Italian$ Brown$ there$ is$ only$ one$ associated$ gene$
PINLYP=is$encoding$a$phospholipase$A2$inhibitor$but$this$protein$product$is$not$
yet$well$studied$ $Many$types$of$phospholipase$A2$were$characterized$and$ it$ is$
known$ that$ they$ are$ mediator$ of$ inflammation$ atherosclerosis$ and$ cancer$ in$
mammals$and$that$plays$an$important$role$in$the$innate$immune$response$(Qiu$
and$Lai$2013)$It$may$be$reasonable$to$imagine$a$role$for$PINLYP$in$the$lactation$
in$light$of$previously$discussed$results$that$link$milk$to$innate$immune$response$
(Lemay$et$al$2009)$
$
Although$ limited$ by$ the$ size$ of$ the$ study$ and$ the$ number$ of$ the$ included$
markers$these$results$represent$both$a$confirmation$and$a$step$forward$towards$
the$comprehension$of$the$biological$bases$of$milk$yield$and$quality$$
65
$
New$ genes$ potentially$ involved$ with$ production$ traits$ in$ dairy$ cows$ were$
tracked$when$GWAS$signals$resulted$too$weak$to$be$considered$in$a$classical$SNP$
based$mapping$with$the$ldquoclosest$gene$to$peakrdquo$or$the$windowBbased$strategy$
It$ is$ worth$ mentioning$ that$ none$ of$ the$ analyzed$ SNPs$ (not$ considering$ the$
Italian$Holstein$with$ARS5BFGL5NGS54939= included$ in$ the$ analysis)$were$ above$
the$ genomeBwide$ significance$ threshold$ in$ single$ SNP$ analysis$ meaning$ that$
important$information$is$often$lost$in$such$studies$$
In$addition$ to$ the$newly$ identified$ loci$we$observed$a$confirmation$of$existing$
associations$ with$ genes$ related$ to$ milk$ production$ not$ only$ dissecting$ gene$
specific$ functions$ but$ from$ mapping$ these$ genes$ in$ previously$ characterized$
regions$(Table$S5)$This$ is$not$trivial$when$using$new$data$analysis$techniques$
especially$ in$ high$ noise$ to$ signal$ ratio$ datasets$ like$ those$ here$ presented$We$
believe$ that$ this$ approach$will$ benefit$ of$data$ from$denser$marker$panels$ (eg$
BovineHD)$ to$ exploit$ local$ linkage$ disequilibrium$ and$ increase$ the$ number$ of$
mapped$genes$with$more$than$two$SNPs$
Results$ obtained$ with$ the$ gene$ centric$ approach$ although$ having$ intrinsic$
limitations$are$not$ influenced$by$a$number$of$subjective$choices$often$made$in$
GWAS$ For$ instance$when$ ldquowindow$mappingrdquo$ is$ used$ to$decrease$ the$noise$ to$
signal$ ratio$ eg$ due$ to$ nearby$ markers$ having$ different$ allele$ frequencies$
windows$size$ is$ to$be$decided$This$can$be$based$on$equal$number$of$markers$
equal$number$of$base$pairs$or$equal$map$distance$in$cM$per$window$The$three$
choices$ do$ not$ coincide$ because$ markers$ are$ not$ equally$ spread$ along$ the$
chromosomes$ and$ recombination$ differs$ in$ different$ genomic$ regions$ In$
addition$ windows$ can$ be$ chosen$ nonBoverlapping$ or$ with$ different$ degree$ of$
overlap$Finally$when$stepping$ from$significant$SNPs$or$windows$ to$candidate$
genes$the$size$of$the$flanking$region$from$which$picking$up$candidate$genes$is$to$
be$decided$as$well$Sometimes$the$gene$closer$to$the$signal$peak$is$selected$ in$
other$cases$all$genes$within$certain$boundaries$are$included$in$further$analyses$
All$ these$ choices$ may$ generate$ misleading$ results$ For$ example$ not$ truly$
associated$genes$may$be$retained$for$further$analyses$disregarding$the$correct$
ones$ In$addition$ a$ large$number$of$ false$positives$can$be$ included$ to$be$ later$
interpreted$with$ the$help$of$network$and$pathway$analysis$As$ a$ consequence$
these$decisions$have$a$direct$impact$in$terms$of$production$and$interpretation$of$
66
$
results$leading$to$conclusions$heavily$based$on$the$enrichment$analysis$and$preB
existing$knowledge$$
With$ the$ gene$ based$ association$ we$ also$ noticed$ some$ advantages$ in$ the$
downstream$data$mining$First$genes$are$functional$units$and$the$golden$targets$
of$ the$ ldquoclassical$ postBGWASrdquo$ bioinformatics$ pipelines$ Hence$ the$ gene$ based$
association$leads$to$genes$with$an$embedded$statistically$robust$framework$This$
is$ a$ ldquoplusrdquo$ in$ studying$ complex$ traits$ because$ one$ can$ concentrate$ on$ specific$
signals$ rather$ than$ cherry$ picking$meaningful$ features$ from$ a$ list$ constructed$
with$the$previously$cited$approaches$$
Moreover$it$is$easier$to$track$back$gene$clusters$to$major$gene$effects$B$like$for$
example$DGAT1$in$a$gene$rich$genome$chunk$many$genes$may$result$significant$
due$to$the$same$SNP$(Table$S2$and$Table$3)$With$the$VEGAS$method$finding$the$
ldquotruerdquo$association$is$facilitated$by$following$the$ldquobest$SNPrdquo$in$the$region$
In$addition$we$found$phenotype$coherent$signals$in$such$situation$where$only$a$
few$SNPs$B$ if$none$ndash$are$above$an$acceptable$statistical$genomeBwide$threshold$
using$a$single$SNP$regression$method$
The$gene$centric$approach$ is$a$ first$and$slight$ improvement$ in$ livestock$GWAS$
analyses$since$we$still$have$a$partial$genome$annotation$ In$addition$ it$ is$now$
known$that$ the$definition$of$gene$ is$ to$be$revised$many$genes$do$not$code$ for$
proteins$ have$ alternative$ starting$ sites$ have$ antisense$ transcripts$ host$
regulatory$ RNA$ are$ subjected$ to$ alternative$ splicing$ and$may$ be$ regulated$ by$
sequences$quite$ far$ away$ from$ their$ coding$ regions$ (Consortium$2012)$Other$
limitations$are$the$need$for$larger$number$of$animals$in$these$type$of$studies$as$
in$this$study$and$the$still$relatively$high$cost$of$sequencing$that$cannot$be$used$
routinely$in$large$experiments$at$least$not$yet$$
$
$
Conclusions
This$study$underlines$the$importance$of$the$implementation$of$new$methods$to$
analyze$complex$traits$with$genomic$data$No$golden$path(s)$is$discovered$yet$to$
walk$from$variation$to$functional$units$The$geneBcentric$analysis$may$contribute$
67
$
to$ reveal$ new$ signals$ throughout$ the$ bovine$ genome$ condensing$ weak$
association$signals$in$chromosome$bits$having$functional$significance$
$
$
Acknowledgements
The$Authors$would$ like$ to$ thank$Dr$ Jimmy$ Liu$ for$ his$ advices$ in$ building$ the$
software$AIA$ANAFI$ANARB$and$ANAPRI$breedersrsquo$associations$ for$ their$help$
and$ support$during$ the$data$analysis$ SelMol$ and$ProZoo$projects$ that$provide$
samples$ The$ Authors$ are$ also$ grateful$ to$MIPAAF$ and$ Cariplo$ Foundation$ for$
funding$
$
$
$
References
Akula$ N$ Baranova$ A$ Seto$ D$ Solka$ J$ Nalls$MA$ Singleton$ A$ Ferrucci$ L$
Tanaka$T$Bandinelli$ S$Cho$YS$Kim$YJ$Lee$ JBY$Han$BBG$Bipolar$
Disorder$ Genome$ Study$ (BiGS)$ Consortium$ The$ Wellcome$ Trust$ CaseB
Control$Consortium$McMahon$FJ$2011$A$NetworkBBased$Approach$to$
Prioritize$ Results$ from$ GenomeBWide$ Association$ Studies$ PLoS$ ONE$ 6$
e24220$doi101371journalpone0024220$
Amin$N$ van$Duijn$ CM$ Aulchenko$ YS$ 2007$ A$Genomic$Background$Based$
Method$ for$ Association$ Analysis$ in$ Related$ Individuals$ PLoS$ ONE$ 2$
e1274$doi101371journalpone0001274$
Aulchenko$YS$de$Koning$DBJ$Haley$C$2007$Genomewide$Rapid$Association$
Using$ Mixed$ Model$ and$ Regression$ A$ Fast$ and$ Simple$ Method$ For$
Genomewide$PedigreeBBased$Quantitative$Trait$Loci$Association$Analysis$
Genetics$177$577ndash585$doi101534genetics107075614$
Blanco$ EJ$ CarreteroBHernaacutendez$ M$ GarciacuteaBBarrado$ J$ IglesiasBOsma$ MC$
Carretero$M$Herrero$JJ$Rubio$M$Riesco$JM$Carretero$J$2013$The$
activity$and$proliferation$of$pituitary$prolactinBpositive$cells$and$pituitary$
68
$
VIPBpositive$ cells$ are$ regulated$by$ interleukin$6$Histol$Histopathol$ 28$
1595ndash1604$
Blott$S$Kim$JBJ$Moisio$S$SchmidtBKuntzel$A$Cornet$A$Berzi$P$Cambisano$
N$Ford$C$Grisart$B$Johnson$D$Karim$L$Simon$P$Snell$R$Spelman$
R$ Wong$ J$ Vilkki$ J$ Georges$ M$ Farnir$ F$ Coppieters$ W$ 2003$
Molecular$ dissection$ of$ a$ quantitative$ trait$ locus$ a$ phenylalanineBtoB
tyrosine$substitution$in$the$transmembrane$domain$of$the$bovine$growth$
hormone$ receptor$ is$ associated$ with$ a$ major$ effect$ on$ milk$ yield$ and$
composition$Genetics$163$253ndash266$
Burton$PR$Clayton$DG$Cardon$LR$Craddock$N$Deloukas$P$Duncanson$A$
Kwiatkowski$ DP$ McCarthy$ MI$ Ouwehand$ WH$ Samani$ NJ$ Todd$
JA$ Donnelly$ P$ Barrett$ JC$ Burton$ PR$ Davison$ D$ Donnelly$ P$
Easton$ D$ Evans$ D$ Leung$ HBT$ Marchini$ JL$ Morris$ AP$ Spencer$
CCA$Tobin$MD$Cardon$LR$Clayton$DG$Attwood$AP$Boorman$JP$
Cant$B$Everson$U$Hussey$JM$Jolley$JD$Knight$AS$Koch$K$Meech$
E$ Nutland$ S$ Prowse$ CV$ Stevens$ HE$ Taylor$ NC$ Walters$ GR$
Walker$NM$Watkins$NA$Winzer$T$Todd$JA$Ouwehand$WH$Jones$
RW$ McArdle$ WL$ Ring$ SM$ Strachan$ DP$ Pembrey$ M$ Breen$ G$
Clair$ DS$ Caesar$ S$ GordonBSmith$ K$ Jones$ L$ Fraser$ C$ Green$ EK$
Grozeva$D$Hamshere$ML$Holmans$PA$Jones$IR$Kirov$G$Moskvina$
V$ Nikolov$ I$ OrsquoDonovan$ MC$ Owen$ MJ$ Craddock$ N$ Collier$ DA$
Elkin$A$Farmer$A$Williamson$R$McGuffin$P$Young$AH$Ferrier$IN$
Ball$SG$Balmforth$AJ$Barrett$JH$Bishop$DT$Iles$MM$Maqbool$A$
Yuldasheva$N$Hall$AS$Braund$PS$Burton$PR$Dixon$RJ$Mangino$
M$ Stevens$ S$ Tobin$ MD$ Thompson$ JR$ Samani$ NJ$ Bredin$ F$
Tremelling$ M$ Parkes$ M$ Drummond$ H$ Lees$ CW$ Nimmo$ ER$
Satsangi$ J$Fisher$SA$Forbes$A$Lewis$CM$Onnie$CM$Prescott$NJ$
Sanderson$J$Mathew$CG$Barbour$J$Mohiuddin$MK$Todhunter$CE$
Mansfield$JC$Ahmad$T$Cummings$FR$Jewell$DP$Webster$J$Brown$
MJ$ Clayton$DG$ Lathrop$GM$ Connell$ J$Dominiczak$A$ Samani$NJ$
Marcano$CAB$Burke$B$Dobson$R$Gungadoo$J$Lee$KL$Munroe$PB$
Newhouse$SJ$Onipinla$A$Wallace$C$Xue$M$Caulfield$M$Farrall$M$
Barton$A$ (braggs)$TB$ in$RG$and$G$Bruce$ IN$Donovan$H$Eyre$S$
69
$
Gilbert$ PD$ Hider$ SL$ Hinks$ AM$ John$ SL$ Potter$ C$ Silman$ AJ$Symmons$ DPM$ Thomson$ W$ Worthington$ J$ Clayton$ DG$ Dunger$DB$ Nutland$ S$ Stevens$ HE$ Walker$ NM$ Widmer$ B$ Todd$ JA$Frayling$TM$Freathy$RM$Lango$H$Perry$JRB$Shields$BM$Weedon$MN$ Hattersley$ AT$ Hitman$ GA$ Walker$ M$ Elliott$ KS$ Groves$ CJ$Lindgren$ CM$ Rayner$ NW$ Timpson$ NJ$ Zeggini$ E$ McCarthy$ MI$Newport$M$Sirugo$G$Lyons$E$Vannberg$F$Hill$AVS$Bradbury$LA$Farrar$ C$ Pointon$ JJ$ Wordsworth$ P$ Brown$ MA$ Franklyn$ JA$Heward$JM$Simmonds$MJ$Gough$SCL$Seal$S$(uk)$BCSC$Stratton$MR$Rahman$N$Ban$M$Goris$A$Sawcer$SJ$Compston$A$Conway$D$Jallow$ M$ Newport$ M$ Sirugo$ G$ Rockett$ KA$ Kwiatkowski$ DP$Bumpstead$ SJ$ Chaney$A$Downes$K$ Ghori$MJR$ Gwilliam$R$Hunt$SE$Inouye$M$Keniry$A$King$E$McGinnis$R$Potter$S$Ravindrarajah$R$ Whittaker$ P$ Widden$ C$ Withers$ D$ Deloukas$ P$ Leung$ HBT$Nutland$S$Stevens$HE$Walker$NM$Todd$JA$Easton$D$Clayton$DG$Burton$PR$Tobin$MD$Barrett$JC$Evans$D$Morris$AP$Cardon$LR$Cardin$NJ$Davison$D$Ferreira$T$PereiraBGale$ J$Hallgrimsdoacutettir$ IB$Howie$ BN$Marchini$ JL$ Spencer$ CCA$ Su$ Z$ Teo$ YY$ Vukcevic$ D$Donnelly$P$Bentley$D$Brown$MA$Cardon$LR$Caulfield$M$Clayton$DG$ Compston$ A$ Craddock$ N$ Deloukas$ P$ Donnelly$ P$ Farrall$ M$Gough$ SCL$ Hall$ AS$ Hattersley$ AT$ Hill$ AVS$ Kwiatkowski$ DP$Mathew$ CG$McCarthy$MI$ Ouwehand$WH$ Parkes$M$ Pembrey$M$Rahman$N$Samani$NJ$Stratton$MR$Todd$ JA$Worthington$ J$2007$GenomeBwide$ association$ study$ of$ 14000$ cases$ of$ seven$ common$diseases$ and$ 3000$ shared$ controls$ Nature$ 447$ 661ndash678$doi101038nature05911$
Cantor$ RM$ Lange$ K$ Sinsheimer$ JS$ 2010$ Prioritizing$ GWAS$ Results$ A$Review$ of$ Statistical$ Methods$ and$ Recommendations$ for$ Their$Application$Am$J$Hum$Genet$86$6ndash22$doi101016jajhg200911017$
Clark$ AG$ Hubisz$ MJ$ Bustamante$ CD$ Williamson$ SH$ Nielsen$ R$ 2005$Ascertainment$ bias$ in$ studies$ of$ human$ genomeBwide$ polymorphism$Genome$Res$15$1496ndash1502$doi101101gr4107905$
70
$
CohenBZinder$M$ Seroussi$ E$ Larkin$ DM$ Loor$ JJ$Wind$ AE$ der$ Lee$ JBH$Drackley$ JK$Band$MR$Hernandez$AG$Shani$M$Lewin$HA$Weller$JI$ Ron$ M$ 2005$ Identification$ of$ a$ missense$ mutation$ in$ the$ bovine$ABCG2$gene$with$a$major$effect$on$ the$QTL$on$ chromosome$6$affecting$milk$yield$and$composition$in$Holstein$cattle$Genome$Res$15$936ndash944$doi101101gr3806705$
Cole$JB$Wiggans$GR$Ma$L$Sonstegard$TS$Lawlor$TJ$Crooker$BA$Tassell$CPV$ Yang$ J$ Wang$ S$ Matukumalli$ LK$ Da$ Y$ 2011$ GenomeBwide$association$ analysis$ of$ thirty$ one$ production$ health$ reproduction$ and$body$ conformation$ traits$ in$ contemporary$ US$ Holstein$ cows$ BMC$Genomics$12$408$doi1011861471B2164B12B408$
Colitti$M$Wilde$CJ$Stefanon$B$2004$Functional$expression$of$bclB2$protein$family$and$AIF$in$bovine$mammary$tissue$in$early$lactation$J$Dairy$Res$71$20ndash27$
Consortium$ TEP$ 2012$ An$ integrated$ encyclopedia$ of$ DNA$ elements$ in$ the$human$genome$Nature$489$57ndash74$doi101038nature11247$
De$Mello$F$Cobuci$JA$Martins$MF$Silva$MVGB$Neto$JB$2012$Association$of$the$polymorphism$g8514CgtT$in$the$osteopontin$gene$(SPP1)$with$milk$yield$ in$ the$ dairy$ cattle$ breed$ Girolando$ Anim$ Genet$ 43$ 647ndash648$doi101111j1365B2052201102312x$
Dudbridge$ F$ Gusnanto$ A$ 2008$ Estimation$ of$ significance$ thresholds$ for$genomewide$ association$ scans$ Genet$ Epidemiol$ 32$ 227ndash234$doi101002gepi20297$
Elsik$ CG$ Tellam$ RL$ Worley$ KC$ 2009$ The$ Genome$ Sequence$ of$ Taurine$Cattle$ A$window$ to$ ruminant$ biology$ and$ evolution$ Science$ 324$ 522ndash528$doi101126science1169588$
Fazioli$ F$ Minichiello$ L$ Matoska$ V$ Castagnino$ P$ Miki$ T$ Wong$ WT$ Di$Fiore$ PP$ 1993$ Eps8$ a$ substrate$ for$ the$ epidermal$ growth$ factor$receptor$kinase$enhances$EGFBdependent$mitogenic$signals$EMBO$J$12$3799ndash3808$
Fontanesi$ L$ Calograve$ DG$ Galimberti$ G$ Negrini$ R$ Marino$ R$ Nardone$ A$AjmoneBMarsan$ P$ Russo$ V$ 2014$ A$ candidate$ gene$ association$ study$
71
$
for$ nine$ economically$ important$ traits$ in$ Italian$ Holstein$ cattle$ Anim$
Genet$nandashna$doi101111age12164$
Fowler$KJ$Walker$F$Alexander$W$Hibbs$ML$Nice$EC$Bohmer$RM$Mann$
GB$ Thumwood$ C$ Maglitto$ R$ Danks$ JA$ 1995$ A$ mutation$ in$ the$
epidermal$growth$factor$receptor$in$wavedB2$mice$has$a$profound$effect$
on$ receptor$ biochemistry$ that$ results$ in$ impaired$ lactation$ Proc$ Natl$
Acad$Sci$92$1465ndash1469$doi101073pnas9251465$
Gianola$ D$ Hospital$ F$ Verrier$ E$ 2013$ Contribution$ of$ an$ additive$ locus$ to$
genetic$variance$when$inheritance$is$multiBfactorial$with$implications$on$
interpretation$ of$ GWAS$ Theor$ Appl$ Genet$ 126$ 1457ndash1472$
doi101007s00122B013B2064B2$
Gillette$ M$ Bray$ K$ Blumenthaler$ A$ VargoBGogola$ T$ 2013$ P190B$ RhoGAP$
Overexpression$ in$ the$ Developing$Mammary$ Epithelium$ Induces$ TGFβB
dependent$ Fibroblast$ Activation$ PLoS$ ONE$ 8$ e65105$
doi101371journalpone0065105$
Grisart$B$Coppieters$W$Farnir$F$Karim$L$Ford$C$Berzi$P$Cambisano$N$
Mni$ M$ Reid$ S$ Simon$ P$ Spelman$ R$ Georges$ M$ Snell$ R$ 2002$
Positional$Candidate$Cloning$of$a$QTL$ in$Dairy$Cattle$ Identification$of$a$
Missense$Mutation$ in$the$Bovine$DGAT1$Gene$with$Major$Effect$on$Milk$
Yield$ and$ Composition$ Genome$ Res$ 12$ 222ndash231$
doi101101gr224202$
Hubbard$NE$Chen$QJ$Sickafoose$LK$Wood$MB$Gregg$ JP$Abrahamsson$
NM$ Engelberg$ JA$ Walls$ JE$ Borowsky$ AD$ 2013$ Transgenic$
Mammary$Epithelial$Osteopontin$(Spp1)$Expression$Induces$Proliferation$
and$ Alveologenesis$ Genes$ Cancer$ 4$ 201ndash212$
doi1011771947601913496813$
Hu$ ZBL$ Park$ CA$ Wu$ XBL$ Reecy$ JM$ 2013$ Animal$ QTLdb$ an$ improved$
database$tool$for$livestock$animal$QTLassociation$data$dissemination$in$
the$ postBgenome$ era$ Nucleic$ Acids$ Res$ 41$ D871ndashD879$
doi101093nargks1150$
Jeon$et$al$ B$2008$ B$ProtoBoncogene$FBIB1$ (PokemonZBTB7A)$Represses$Trpdf$
nd$
72
$
Jiang$ L$ Liu$ J$ Sun$ D$ Ma$ P$ Ding$ X$ Yu$ Y$ Zhang$ Q$ 2010$ Genome$Wide$
Association$ Studies$ for$ Milk$ Production$ Traits$ in$ Chinese$ Holstein$
Population$PLoS$ONE$5$e13661$doi101371journalpone0013661$
Kong$ PBZ$ Yang$ F$ Li$ L$ Li$ XBQ$ Feng$ YBM$ 2013$Decreased$FOXF2$mRNA$
Expression$ Indicates$ EarlyBOnset$ Metastasis$ and$ Poor$ Prognosis$ for$
Breast$ Cancer$ Patients$ with$ Histological$ Grade$ II$ Tumor$ PLoS$ ONE$ 8$
e61591$doi101371journalpone0061591$
Lee$SBU$Maeda$T$2012$POKZBTB$proteins$an$emerging$ family$of$proteins$
that$ regulate$ lymphoid$ development$ and$ function$ Immunol$ Rev$ 247$
107ndash119$
Lemay$ DG$ Lynn$ DJ$ Martin$ WF$ Neville$ MC$ Casey$ TM$ Rincon$ G$
Kriventseva$ EV$ Barris$ WC$ Hinrichs$ AS$ Molenaar$ AJ$ 2009$ The$
bovine$lactation$genome$ insights$ into$the$evolution$of$mammalian$milk$
Genome$Biol$10$R43$
Liu$JZ$Mcrae$AF$Nyholt$DR$Medland$SE$Wray$NR$Brown$KM$2010$A$
versatile$ geneBbased$ test$ for$ genomeBwide$ association$ studies$ Am$ J$
Hum$Genet$87$139$
Liu$S$Dontu$G$Mantle$ID$Patel$S$Ahn$N$Jackson$KW$Suri$P$Wicha$MS$
2006$Hedgehog$Signaling$and$BmiB1$Regulate$SelfBrenewal$of$Normal$and$
Malignant$ Human$ Mammary$ Stem$ Cells$ Cancer$ Res$ 66$ 6063ndash6071$
doi1011580008B5472CANB06B0054$
Lu$ J$ Holmgren$ A$ 2014$ The$ thioredoxin$ superfamily$ in$ oxidative$ protein$
folding$ Antioxid$ Redox$ Signal$ 140202091634000$
doi101089ars20145849$
Luna$ A$ Nicodemus$ KK$ 2007$ snpplotter$ an$ RBbased$ SNPhaplotype$
association$and$linkage$disequilibrium$plotting$package$Bioinforma$Oxf$
Engl$23$774ndash776$doi101093bioinformaticsbtl657$
McCarthy$ MI$ Abecasis$ GR$ Cardon$ LR$ Goldstein$ DB$ Little$ J$ Ioannidis$
JPA$ Hirschhorn$ JN$ 2008$ GenomeBwide$ association$ studies$ for$
complex$traits$consensus$uncertainty$and$challenges$Nat$Rev$Genet$9$
356ndash369$doi101038nrg2344$
Meredith$ BK$ Kearney$ FJ$ Finlay$ EK$ Bradley$ DG$ Fahey$ AG$ Berry$ DP$
Lynn$ DJ$ 2012$ GenomeBwide$ associations$ for$ milk$ production$ and$
73
$
somatic$cell$score$in$HolsteinBFriesian$cattle$in$Ireland$BMC$Genet$13$21$doi1011861471B2156B13B21$
Minozzi$ G$Nicolazzi$ EL$ Stella$ A$ Biffani$ S$Negrini$ R$ Lazzari$ B$ AjmoneBMarsan$ P$Williams$ JL$ 2013$ Genome$Wide$ Analysis$ of$ Fertility$ and$Production$ Traits$ in$ Italian$ Holstein$ Cattle$ PLoS$ ONE$ 8$ e80219$doi101371journalpone0080219$
Nicolazzi$EL$Picciolini$M$Strozzi$F$Schnabel$RD$Lawley$C$Pirani$A$Brew$F$ Stella$ A$ 2014$ SNPchiMp$ a$ database$ to$ disentangle$ the$ SNPchip$jungle$ in$ bovine$ livestock$ BMC$ Genomics$ 15$ 123$ doi1011861471B2164B15B123$
Purcell$ S$ Neale$ B$ ToddBBrown$ K$ Thomas$ L$ Ferreira$ MAR$ Bender$ D$Maller$J$Sklar$P$de$Bakker$PIW$Daly$MJ$Sham$PC$2007$PLINK$a$tool$ set$ for$ wholeBgenome$ association$ and$ populationBbased$ linkage$analyses$Am$J$Hum$Genet$81$559ndash575$doi101086519795$
Qiu$ S$ Lai$ L$ 2013$ Antibacterial$ properties$ of$ recombinant$ human$ nonBpancreatic$secretory$phospholipase$A2$Biochem$Biophys$Res$Commun$441$453ndash456$doi101016jbbrc201310092$
Quinlan$AR$Hall$IM$2010$BEDTools$a$flexible$suite$of$utilities$for$comparing$genomic$ features$ Bioinformatics$ 26$ 841ndash842$doi101093bioinformaticsbtq033$
RamiacuterezBTorres$ A$ BarceloacuteBBatllori$ S$ MartiacutenezBBeamonte$ R$ Navarro$ MA$Surra$ JC$ Arnal$ C$ Guilleacuten$N$ Aciacuten$ S$ Osada$ J$ 2012$ Proteomics$ and$gene$ expression$ analyses$ of$ squaleneBsupplemented$ mice$ identify$microsomal$thioredoxin$domainBcontaining$protein$5$changes$associated$with$ hepatic$ steatosis$ J$ Proteomics$ 77$ 27ndash39$doi101016jjprot201207001$
Raven$ LBA$ Cocks$ BG$ Pryce$ JE$ Cottrell$ JJ$ Hayes$ BJ$ 2013$ Genes$ of$ the$RNASE5$pathway$ contain$ SNP$ associated$with$milk$ production$ traits$ in$dairy$cattle$Genet$Sel$Evol$45$25$doi1011861297B9686B45B25$
Scotti$E$Fontanesi$L$Schiavini$F$La$Mattina$V$Bagnato$A$Russo$V$2010$DGAT1$ pK232A$ polymorphism$ in$ dairy$ and$ dual$ purpose$ Italian$ cattle$breeds$Ital$J$Anim$Sci$9$doi104081ijas2010e16$
74
$
Stefanon$ B$ Colitti$ M$ Gabai$ G$ Knight$ CH$ Wilde$ CJ$ 2002$ Mammary$
apoptosis$and$lactation$persistency$in$dairy$animals$J$Dairy$Res$69$37ndash
52$
Stegh$ AH$ DePinho$ RA$ 2011$ Beyond$ effector$ caspase$ inhibition$ Bcl2L12$
neutralizes$ p53$ signaling$ in$ glioblastoma$ Cell$ Cycle$ 10$ 33ndash38$
doi104161cc10114365$
Taberlet$ P$ Valentini$ A$ Rezaei$ HR$ Naderi$ S$ Pompanon$ F$ Negrini$ R$
AjmoneBMarsan$ P$ 2008$ Are$ cattle$ sheep$ and$ goats$ endangered$
species$Mol$Ecol$17$275ndash284$doi101111j1365B294X200703475x$
Team$RC$ 2014$ R$ A$ Language$ and$Environment$ for$ Statistical$ Computing$ R$
Foundation$for$Statistical$Computing$Vienna$Austria$
Thomadaki$H$ Talieri$M$ Scorilas$ A$ 2007$ Prognostic$ value$ of$ the$ apoptosis$
related$genes$BCL2$and$BCL2L12$in$breast$cancer$Cancer$Lett$247$48ndash
55$doi101016jcanlet200603016$
Valdehita$ A$ Bajo$ AM$ FernaacutendezBMartiacutenez$ AB$ Arenas$ MI$ Vacas$ E$
Valenzuela$ P$ RuiacutezBVillaespesa$ A$ Prieto$ JC$ Carmena$ MJ$ 2010$
Nuclear$ localization$ of$ vasoactive$ intestinal$ peptide$ (VIP)$ receptors$ in$
human$ breast$ cancer$ Peptides$ 31$ 2035ndash2045$
doi101016jpeptides201007024$
Wang$T$Liu$NS$Seet$LBF$Hong$W$2010$The$Emerging$Role$of$VHS$DomainB
Containing$Tom1$Tom1L1$and$Tom1L2$in$Membrane$Trafficking$Traffic$
11$1119ndash1128$doi101111j1600B0854201001098x$
Wang$ X$Wurmser$ C$ Pausch$ H$ Jung$ S$ Reinhardt$ F$ Tetens$ J$ Thaller$ G$
Fries$R$2012$Identification$and$Dissection$of$Four$Major$QTL$Affecting$
Milk$Fat$Content$in$the$German$HolsteinBFriesian$Population$PLoS$ONE$7$
e40711$doi101371journalpone0040711$
Xu$S$2003$Theoretical$Basis$of$the$Beavis$Effect$Genetics$165$2259ndash2268$
Zimin$AV$Delcher$AL$Florea$L$Kelley$DR$Schatz$MC$Puiu$D$Hanrahan$
F$ Pertea$ G$ Tassell$ CPV$ Sonstegard$ TS$ Marccedilais$ G$ Roberts$ M$
Subramanian$ P$ Yorke$ JA$ Salzberg$ SL$ 2009$ A$ wholeBgenome$
assembly$ of$ the$ domestic$ cow$ Bos$ taurus$ Genome$ Biol$ 10$ R42$
doi101186gbB2009B10B4Br42$
75
$
Zu$X$Ma$J$Liu$H$Liu$F$Tan$C$Yu$L$Wang$J$Xie$Z$Cao$D$Jiang$Y$2011$ProBoncogene$ Pokemon$ promotes$ breast$ cancer$ progression$ by$upregulating$ survivin$ expression$ Breast$ Cancer$ Res$ 13$ R26$doi101186bcr2843$ $
76
$
Supplementaryfiles$
You$can$find$Chapter$͵$supplementary$files$at$
- DocTa$(Doctoral$Thesis$Archive)$(httptesionlineunicattit)$
- Google$Drive$(httptinyurlcomPhDMMChapter03)$or$scan$the$QR$code$
$
$
SupplementaryFigureS1Multidimensionalscaling
Spatial$distribution$of$genetic$structure$in$the$analyzed$breeds$$
$
Supplementary$FigureS2GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$Italian$Brown$
$
Supplementary$FigureS3GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$Italian$Holstein$
$
Supplementary$FigureS4GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$HolsteinDGAT$
$
Supplementary$FigureS5GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$Italian$Simmental$
77
$
$Supplementary$FigureS6LDdecayGenomeBwide$LD$decay$for$the$three$breeds$(Holstein$Simmental$Brown)$$Supplementary$FigureS7DescriptiveplotsDescriptive$statistics$for$the$Italian$Brown$Italian$Holstein$and$Italian$Simmental$$Supplementary$FigureS8DGAT1LDFocus$on$the$LD$of$the$DGAT1$region$in$all$the$datasets$(Brown$Holstein$HolsteinDGAT$Simmental)$$Supplementary$TableS1SNPfactNumber$of$subjects$and$SNPs$after$QC$$Supplementary$TableS2GeneBasedAssociationresultsFull$results$of$the$significant$genes$associated$with$the$traits$in$all$the$datasets$$ Sheet$1$Brown$ndash$fat$percentage$$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS3SingleSNPAssociationresultsFull$results$of$the$SNP$that$overcome$the$suggestive$threshold$$$ Sheet$1$Brown$ndash$fat$percentage$
78
$
$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS4GenesandtraitsGeneWise$pBvalues$of$the$discussed$genes$in$all$the$analysis$In$grey$are$evidenced$the$pBvalues$that$overcome$the$significance$threshold$Genes$are$alphabetically$listed$$Supplementary$TableS5GenesinQTLsGenes$from$the$geneBbased$association$mapping$into$known$QTLs$$$$$
79
CHAPTER(4(
MUGBAS(a(species(free(gene(based(association(suite(
S(Capomaccio1(M(Milanesi1(L(Bomba1(et(al(1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122ItalyTheseauthorscontributedequallytothiswork
____________________ApplicationnotesubmittedtoBioinformatics
Abstract(Theassociationstudiesbetweengenomewidemarkersandphenotypes(GWAS)
arenowwidelyusedinmodelandnonmodelspeciesHowevertoolsthatmerge
singlemarkerresultsandtheirprobabilityofbeingassociatedtofunctionalunits
(typicallygenes)areseldomdevelopedforspeciesotherthanhuman
Herewe presentMUGBAS a suite that calculates a generegionbased pvalue
fromagivenannotationsinglemarkerGWAresultandgenotypedatawiththe
objective to ease postGWAS analysis The software is species and annotation
independentfasthighlyparallelizedandreadyforhighdensitymarkerstudies
Availabilityandimplementationhttpsbitbucketorgcapemastermugbas
Introduction(With the availability of highdensity SNP panels Genome Wide Association
Studies (GWAS) have become the gold standard for dissecting the biology of
complextraits(McCarthyetal2008)Thisistrueforhumansandotherspecies
TheunderlyingideaofGWASissimplefindmarkertraitassociationsexploiting
the linkage disequilibrium (LD) that exists between the causative mutation ndash
whichweignorendashandoneormoremarkersofknownchromosomalposition
Whileanumberofwellestablishedapproachescanbeeasilyusedinastandard
GWAS (Aulchenko et al 2007 Kang et al 2010 Purcell et al 2007) to find
marker(s) associated to a particular phenotype several issues have to be
accountedforduringtheupanddownstreamanalyses
Firstly genome wide studies have to take into account multiple testing If
stringent statistical thresholds are applied a proportion ofmoderate or small
effectsactuallyassociatedwiththetraitwillbedisregardedThereforereducing
the number of tests while maintaining all the information provided by high
densitypanelswouldbeaclearadvantageInadditiononceasignificantsignalis
detected different strategiesmaybeused to tag the candidate causative gene
Sometimesthegeneclosertothepeaksignalisselectedinothercasesallgenes
withinaflankingregionofanarbitrarysizeofthesignificantSNP(s)areincluded
82
infurtheranalysesThesechoicesmayendupinbiasresultseitherretaininga
largenumberofnottrulyassociatedgenesordisregardingthecorrectones
Somemethodshaverecentlybeenproposedtoovercometheselimitationsusing
aprioriknowledgeastheldquocandidatepathwayrdquoanalysis(Ravenetal2013)and
the ldquogenebasedrdquo association strategies (Akula et al 2011 Cantor et al 2010
Liuetal2010)TheseapproachesrestrictGWAstudiesonlytoknowngenesor
genesubsetsresizingthemultipletestingproblemandincreasingthepowerof
theanalyses
Usually bioinformatics pipelines are developed for the analysis of human and
modelorganismsandarenotadaptedtononmodelorganismThisisthecaseof
the VEGAS software (Liu et al 2010 Yang et al 2014) developed only for
humansandonlyforgenebasedanalyses
Here we introduce MUGBAS [MUlti species GeneBased Association Suite] a
softwarebasedonVEGASthatusesGWASdataandagivenannotation(typically
genesbutanyregionmaybesuitable)toestimategenebasedandregionbased
associationpvaluesThesuite is speciesandannotation free ready toanalyze
highdensitydatafromanyexperimentaldesigns
Methods(MUGBASisasuiteofscriptsbuiltinBashRPythonandPerlThesuiterequires
several prerequisites that can be easily fulfilled in a LinuxUnixMac
environmentwith a few command lines Further information on how tomeet
theserequirementsaredetailedintheonlinerepository
MUGBAS suite can be invoked launching the Bash wrapper that checks
dependenciesandstoresuserchoices
Required input files are MAPPED files in PLINK format with genotype data
from(atleast200individualsarerecommendedforaproperLDcalculation)an
annotation file inBED format (chromosome startingposition endingposition
nameandusercustomfields)ofthedesiredgenomefeatureandtheGWASresult
file with mandatory headers ldquoSNPNAMErdquo and ldquoPVALUErdquo Multiple association
resultscanbetestedatthesametime ifprovidedinseparatefiles inthesame
directoryAtrialdatasetisdownloadablealongwiththesoftwarereleasePlease
83
notethatforsimplicityweprovideageneannotationbutanyBEDfilewithany
desiredfeaturecanbeused
The program starts the analysis assigning SNPs to any feature of the given
annotationusingbedtools(QuinlanandHall2010)Thisstepiscustomizableby
user defined boundaries (up and downstream) in order to catch regulatory
regionsthatcouldbeinhighLDwiththecurrentgeneregion
Inordertospeeduptheanalysistheprogramsubsamples200individualsfrom
theinputdatasetWethenimplementedtwolevelsofparallelizationoneforthe
LDandonefortheldquogenewiserdquoorldquoregionwiserdquopvaluecalculationThelatteris
particularlyusefulanalyzingmorethanoneGWAresultsfromthesamedataset
BrieflyMUGBASfirstsplitstheLDcalculationinncores(asdefinedbytheuser)
using the foreach doParallel packages (Analytics andWeston 2014a 2014b)
and then distributes traits across cores using GNU Parallel and foreach
doParallel
SinceLDcalculationisaslowprocessVEGASbenefitsofHapMapphase2datato
speed up the processwhile preserving the option for a user defined (human)
population using PLINK In order to expand these analysis to any species LD
calculationinMUGBASisimplementedwithther2fast()functionfromGenABEL
package(Aulchenkoetal2007)whilethegenewisepvalueiscalculatedasin
theVEGASapproachBrieflythegenewiseteststatisticisthesumofthenupper
tailchisquaredvaluesofaSNPsubsetwithonedegreeoffreedom(wheren is
thenumberofSNPmappinginthegivengene)Withperfectlinkageequilibrium
the genewise pvalue would be the onetailed pvalue of a chisquare
distributionwithndegreesof freedomOtherwisemorelikelythedistribution
of the pvalues has to be evaluated (with simulations) andweighted (with LD
values)(Liuetal2010)
Once every generegion has its ldquogenewiserdquo pvalue a FDR qvalue (False
Discovery Rate) statistics is calculated with the base R function padjust()
Results are stored and enrichedwith ldquomanhattanrdquo and ldquoundergroundrdquo plots of
theldquogenewiserdquopvalueandtheqvaluerespectively
84
Results(MUGBAS output is generegionoriented with useful information about the
location of the analyzed feature the ldquoBest SNPrdquo within that feature with its
genomiccoordinatesandtheoriginalpvaluethegeneorregionwisestatistics
(pvalue number of simulations and qvalue) the custom annotation of that
particularentryandthepositionoftheSNPwithrespecttothefeatureandthe
decidedboundariesAll this informationareuseful to track theeffectofhighly
significant markers that could map into multiple region of interest and
effectively pinpoint the most probable location to focus on with data mining
processes
A typical analysis performed with a highdensity panel (gt600K markers)
distributed in 10 midperformance cores (Intel Xeon X5675 307 GHz) on
threetraitssimultaneouslytakesaboutfourhours
Discussion(Generegionbased methods are now widely used and accepted in genomic
research However non modelspecies suffer for the lack of availability of
specifictoolstoenhancediscoveryfromgenomewidedataMUGBASprovidesa
robust and accepted statistics to researchers dealing with species other than
humantoeasilyjumpfromassociatedmarkerstothedesiredfunctionalunits
Acknowledgements(((WewouldliketothankDrJimmyLiuforhisadvicesinbuildingthesuite
Reference(Akula N Baranova A Seto D Solka J NallsMA Singleton A Ferrucci L
TanakaTBandinelli SChoYSKimYJLee JYHanBGBipolar
Disorder Genome Study (BiGS) Consortium The Wellcome Trust Case
85
ControlConsortiumMcMahonFJ2011ANetworkBasedApproachtoPrioritize Results from GenomeWide Association Studies PLoS ONE 6e24220doi101371journalpone0024220
Analytics R Weston S 2014a doParallel Foreach parallel adaptor for theparallelpackage
AnalyticsRWestonS2014bforeachForeachloopingconstructforRAulchenkoYSdeKoningDJHaleyC2007GenomewideRapidAssociation
Using Mixed Model and Regression A Fast and Simple Method ForGenomewidePedigreeBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614
Cantor RM Lange K Sinsheimer JS 2010 Prioritizing GWAS Results AReview of Statistical Methods and Recommendations for TheirApplicationAmJHumGenet866ndash22doi101016jajhg200911017
KangHMSulJHServiceSKZaitlenNAKongSFreimerNBSabattiCEskin E 2010 Variance component model to account for samplestructure in genomewide association studies Nat Genet 42 348ndash354doi101038ng548
LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKM2010Aversatile genebased test for genomewide association studies Am JHumGenet87139
McCarthy MI Abecasis GR Cardon LR Goldstein DB Little J IoannidisJPA Hirschhorn JN 2008 Genomewide association studies forcomplextraitsconsensusuncertaintyandchallengesNatRevGenet9356ndash369doi101038nrg2344
Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender DMallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKatool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795
QuinlanARHallIM2010BEDToolsaflexiblesuiteofutilitiesforcomparinggenomic features Bioinformatics 26 841ndash842doi101093bioinformaticsbtq033
86
Raven LA Cocks BG Pryce JE Cottrell JJ Hayes BJ 2013 Genes of the
RNASE5pathway contain SNP associatedwithmilk production traits in
dairycattleGenetSelEvol4525doi101186129796864525
87
CHAPTER5
Imputationaccuracyisrobusttocattlereferencegenomeupdates
MMilanesi1etal1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122Italy
____________________ShortCommunicationacceptedinAnimalGenetics
SummaryGenotype imputation is routinely applied in a large number of cattle breeds Imputation
hasbecomeaneedduetothelargenumberofSNParrayswithvariabledensity(currently
from2900to777962SNPs)Althoughmanyauthorshavestudiedtheeffectofdifferent
statisticalmethodsonimputationaccuracytheimpactofa(likely)changeinthereference
genomeassemblyonimputationfromlowertohigherdensityhasnotbeendeterminedso
far In this work 1021 Italian Simmental SNP genotypes were reJmapped on the three
mostrecentreferencegenomeassembliesFour imputationmethodswereusedtoassess
theimpactofanupdateinthereferencegenomeAsexpectedthefourmethodsbehaved
differentlywith largedifferences in termsof accuracyUpdatingSNPcoordinateson the
three tested cattle reference genome assemblies determined only a slight variation on
imputationresultswithinmethod
KeywordsImputationreferencegenomeassemblySNPbovine
MaintextFromthepublicationof the firstbovinesinglenucleotidepolymorphism(SNP)array the
wholegenomicscenarioofthisspeciesevolvedrapidlyNowadaysseveralSNPpanelshave
been designed with a range of densities mapping on different reference genome
assemblies and produced by different manufacturers When running analyses with SNP
arrays of different densities or from different commercial companies (eg Illumina and
Affymetrix) an imputation step is often required The impact of different statistical
methods on imputation accuracy has already been assessed (Pei et al 2008) (Marchini
andHowie2010)(Maetal2013)Surprisinglyuptonowtherearenotavailablestudies
investigating the impact of an update on the reference genome assembly on imputation
accuracy This assessment would be of fundamental importance considering that reJ
mappingSNPsdeterminesarearrangementofthehaplotypestructureofthegenotypes(in
somecasesSNPsarereJmappedtoadifferentchromosome)andthatthebovinereference
assembly is likely toberepeatedlyupdatedover timeThisworkwasaimedatassessing
the effect of different reference genome assemblies in cattle As test casewe compared
four different imputation methods from low (Illumina BovineLDv10 Beadchip) to
90
mediumhigh (Illumina BovineSNP50v2 Beadchip) density in the Italian Simmental
populationForeachmethodweassessedthe impacton imputationaccuracyofmapping
SNPs on the BTAU42 BTAU46 (wwwhgscbcmeduotherJmammalsbovineJgenomeJ
project)orUMD31(Ziminetal2009)cattlereferencegenomeassemblies
Atotalof900and121SimmentalbullsweregenotypedwiththeIlluminaBovineSNP50v1
and v2 Beadchips (Illumina Inc San Diego CA) respectively Illumina BovineSNP50v2
Beadchip (hereinafter named 54k) the official reference chip for the Italian Simmental
breedersassociation(ANAPRI)wasconsideredasreferenceSNPpanelTheSNPchiMpv1
tool(Nicolazzietal2014)wasusedtoobtainSNPcoordinatesonBTAU42UMD31and
BTAU46genomeassembliesForeachassemblySNPsnotcontainedinthe54kunmapped
orinthesexualchromosomeswerediscardedThefinaldatasetcontained5118952886
and51290SNPsinBTAU42UMD31andBTAU46respectively
FourimputationmethodswereconsideredPedImpute(Nicolazzietal2013)FindHapv2
(VanRadenetal2011)FImpute(Sargolzaeietal2014)andBeaglev332(Browningand
Browning2007)Thefirstthreeusepedigreeandpopulation(ie linkagedisequilibrium
LD) informationwhereas the fourth uses only LD information Performance of the four
methods testedacrossgenomicassemblieswas comparedusingwrongly imputedalleles
(ldquoerrorsrdquo)andallelicsquaredcorrelation(ldquoallelicJR2rdquo)statisticsAllelicJR2wasdefined
asthesquaredcorrelationbetweenimputedandtrue(eggenotyped)allelesmultipliedby
100 as in Browning amp Browning (2009) The 878 oldest sires were used as training
population whereas the 143 youngest bulls were used as test population SNPs not in
commonwith the lowdensity panelwere set tomissing in the test population The test
population was split in four scenarios individuals with sire and maternal grandsire
genotypedinthereferencepopulation(scenarioAJ84animals)individualswithonlythe
sire genotyped (scenario B J 34 animals) individuals with only the maternal grandsire
genotyped(scenarioCJ13animals)ornoneofcloserelativesgenotyped(scenarioDJ12
animals)Thepopulationusedwashighlystructuredwithnearlyhalfofthebullspresent
in the dataset (490) sired by 115 genotyped sires In addition 35 out of these 115
genotypedsiresweresiredby22bulls(eggrandsires)alsopresentinthedataset
DifferencesacrossassembliesarenothomogeneousFromBTAU42toBTAU4646SNPs
weremapped to different chromosomes and the average difference in physical position
withinchromosomeswasnearly025MbpHowevertherewasnosubstantialdifferencein
the order of markers In fact not considering unmapped SNPs on either of the
aforementionedassembliesadifferenceinthesequentialorderwasobservedforonly134
91
single (ie spread throughout the genome) SNPs On the contrary from UMD31 to
BTAU46 222 SNPsweremapped to different chromosomes and amuch larger average
differenceinSNPsphysicalpositionwasobserved(iemorethan2Mbp)Alsothenumber
ofSNPswithdifferentsequentialorderwasmuchlarger(ie1893SNPs)Howevermore
than50oftherearrangementsinvolvedblocksfrom3tonearly100markersAsaresult
slight differences in overall imputation accuracy from BTA46 to UMD31 rather than
between the two BTAU assemblies were expected across methods On the contrary if
rearrangements are due to issues in the assembly of scaffolds theymight have a direct
(negative) impact on local imputation performances In fact a preliminary analysis on
BTA1consideringlocalerroracrossassembliesandmethodssuggestatleasttwosuch
SNPsclustersat44and138MbponUMD31(FiguresS1S2andS3)
Considering allmethods and scenarios amaximum average variation of 02 inerrors
(Table1)andof09inallelicJR2(Table2)wasobservedacrossreferenceassembliesThe
standarddeviation(SD)oftheimputationaccuracywasstableacrossreferenceassemblies
withinmethodsOnthecontraryalargervariabilitywasobservedacrossmethods
PedImpute and FindHap showed a relatively large average allelicJR2 variability across
scenarios evaluated over the same reference assembly Only PedImpute showed a large
variation in SD across scenarios in errors and allelicJR2 with high variability on
scenarios C and D showing a low performance of thismethodwhen there are no close
relatives(egsiredam)genotypedinthereferencedatasetFImputeinsteadachievedthe
highestaccuracyacrossallreferencegenomesandscenariossurprisinglyobtainingitsbest
resultsJerrorsof14(06SD)andallelicJR2of955(26SD)JinscenarioCwhereonly
onerelativelycloserelativeisgenotyped(maternalgrandJsire)FImputeappeartobeable
todetectshorterhaplotypesfromdistantrelativesmoreaccuratelythananyothermethod
tested in this study thus it is lessdependenton theavailabilityofpedigree information
thantheothertwopedigreeJbasedmethods(Sargolzaeietal2010)
The large variability across methods within assembly was expected although the
differences obtained among PedImpute FindHap and Beagle were larger in this study
comparedtoasimilarstudyinItalianHolstein(Nicolazzietal2013)Thiscoulddependon
a number of factors First the sample size of the test population is small (especially for
groups C and D where less than 15 individuals were present) Second the interaction
between population structure and imputationmethodsmay have a role PedImpute and
FindHap rely much more heavily on pedigree information than FImpute and Beagle
assumesall individualsareunrelatedHowever the lattertwomethods indirectlybenefit
92
greatly of from highly structured populations with large number of relatives ndash close or
distantndashpresent inthereferencepopulation(Sargolzaeietal2014)Anotherinteresting
factor that influences the imputation accuracy may be the persistence of LD at long
distances this is much lower in the Italian Simmental than in the Italian Holstein
population(FigureS4)Informationonrelatives(eitherpedigreeinformationorgenotyped
relativesinthereferencepopulation)isexpectedtoplayabiggerroleifLDisconservedat
shorter distances This hypothesis is supported by the fact that although there is high
variability acrossmethods better performances of the pedigreeJbasedmethods are still
observedinscenariosAandB(andCforFImpute)
Theslightdifferencesinimputationaccuracyobservedacrossreferenceassembliesmaybe
explained by the aforementioned small variation in the SNP blocks across reference
assemblies irrespectively of the rearrangements of single SNPs Considering that SNP
blocksaremostlymaintainedweexpectthatbreedswithlargerLDpersistenceatlonger
distances (ie Holstein Brown) will obtain even better imputation accuracy across
assembliescomparedtowhatreportedinthisstudyinItalianSimmentalHoweverastill
relativelyunexploredaspectof imputationperformance is thedistributionof imputation
errors across the genomeNot considering genotyping errors (which are expected to be
randomly distributed in the genome) a cluster of consecutive markers with high
percentage of imputation error across methods are most probably identifying missJ
assembledregions in thegenome(FiguresS1S2andS3)Althoughthe impactonoverall
imputation accuracy is limited as results in this study show these errorsmight have a
large impact on any analyses relying on specific genomic coordinates (eg genomeJwide
associationstudies)Morecomprehensiveresultsincludingannotationstudiesandwhole
genomesequencingdataarerequiredtoconfirmandinvestigatetheseobservations
Toconcludeonlyasmalleffectofupdatingthereferencegenomeassemblyonimputation
accuracywas observed in Italian Simmental On the other hand as already reported by
other studies the variability of imputation accuracy estimates is much larger across
imputation methods The choice of the most suitable imputation method based on the
breedgeneticbackgrounditscharacteristicsandontheavailabledataisthereforecritical
AcknowledgementsThe research leading to these results has received funding from the European Unions
Seventh Framework Programme (FP72007J2013) under grant agreement ndeg 289592 ndash
93
Gene2FarmPartofthegenotypicdatausedinthisstudywasprovidedbyprojectldquoSelMolrdquo
fundedbytheItalianMinistryofAgriculture(MiPAAF)MMwassupportedbytheDoctoral
SchoolontheAgroJFoodSystem(Agrisystem)of theUniversitagraveCattolicadelSacroCuore
(Italy) FB was supported by the Marie Curie European Reintegration Grant
ldquoNEUTRADAPTrdquo(FP7JPEOPLEJ2009JRG)
ReferencesBrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingandMissingJ
Data Inference for WholeJGenome Association Studies By Use of Localized
HaplotypeClusteringAmJHumGenet811084ndash1097
MaPBroslashndumRFZhangQLundMSSuG2013Comparisonofdifferentmethods
forimputinggenomeJwidemarkergenotypesinSwedishandFinnishRedCattleJ
DairySci964666ndash4677doi103168jds2012J6316
Marchini J Howie B 2010 Genotype imputation for genomeJwide association studies
NatRevGenet11499ndash511doi101038nrg2796
NicolazziELBiffaniSJansenG2013ShortcommunicationImputinggenotypesusing
PedImputefastalgorithmcombiningpedigreeandpopulationinformationJournal
ofDairyScience962649ndash2653doi103168jds2012J6062
NicolazziELPiccioliniMStrozziFSchnabelRDLawleyCPiraniABrewFStella
A 2014 SNPchiMp a database to disentangle the SNPchip jungle in bovine
livestockBMCGenomics15123doi1011861471J2164J15J123
PeiYLiJZhangLPapasianCJDengH2008Analysesandcomparisonofaccuracyof
differentgenotypeimputationmethodsPLoSONE3551
VanRadenPMOrsquoConnellJRWiggansGRWeigelKA2011Genomicevaluationswith
manymoregenotypesGenetSelEvol43
94
Tableamp1ampIm
putationampaccuracyampacrossampassem
bliesampforampPedImputeampFindH
apampFImputeampandampBeagleamppercentageampofampwronglyampim
putedamp
allelesamp
PedImputeamp
FindHapamp
FImputeamp
Beagleamp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
Scenarioamp
Namp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
A1amp
84
20
07
21
07
21
07
32
09
32
09
32
09
15
09
15
09
15
09
24
11
25
11
25
11
B2amp
34
23
08
23
09
22
08
36
11
36
12
36
11
15
07
15
09
14
06
23
08
23
08
23
08
C3amp
13
98
41
98
41
98
41
51
10
52
10
51
10
14
06
14
06
14
06
23
06
23
06
24
06
D4 amp
12
88
49
89
49
88
49
54
11
56
12
54
11
16
11
17
12
16
11
26
15
26
15
26
14
1 Sireandm
aternalgrandsiregenotyped
2 Siregenotypedmate
rnalgrandsireno
tgenotyped
3 Sireno
tgenotypedmate
rnalgrandsiregenotyped
4 Sireandm
aternalgrandsireno
tgenotyped
Tableamp2ampIm
putationampaccuracyampacrossampassem
bliesampforampPedImputeampFindH
apampFImputeampandampBeagleampallelicJR2amp
PedImputeamp
FindHapamp
FImputeamp
Beagleamp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
Scenarioamp
Namp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
A1amp
84
941
20
940
20
941
20
907
24
908
25
907
24
952
26
952
27
952
27
925
33
923
33
925
33
B2amp
34
935
24
934
25
935
24
896
30
896
34
895
32
954
20
954
21
954
20
929
24
928
25
928
24
C3amp
13
759
89
761
88
758
87
848
33
845
30
848
32
955
18
955
19
955
19
931
19
928
19
927
19
D4 amp
12
782
113
777
113
781
112
841
32
832
35
839
33
949
34
947
38
949
35
921
44
919
45
921
44
1 Sireandm
aternalgrandsiregenotyped
2 Siregenotypedmate
rnalgrandsireno
tgenotyped
3 Sireno
tgenotypedmate
rnalgrandsiregenotyped
4 Sireandm
aternalgrandsireno
tgenotyped
95
Supplementaryfiles
YoucanfindChapterͷsupplementaryfilesat
- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)
- GoogleDrive(httptinyurlcomPhDMMChapter05)orscantheQRcode
SupplementaryFigureS1ErrorpercentageacrossmethodsforBTAU42onBTA1
SupplementaryFigureS2ErrorpercentageacrossmethodsforUMD31onBTA1
SupplementaryFigureS3ErrorpercentageacrossmethodsforBTAU46onBTA1
SupplementaryFigureS4Linkagedisequilibrium(LD)decayoverdistanceinItalian
SimmentalandHolstein
ThedottedlinewithblackdotsindicatesLDdecayoverdistanceinItalianHolstein
whereasthedashedlinewithwhitediamondsindicatesthesamevaluesforItalian
SimmentalItalianHolsteingenotypeswereasubsetofthedatausedinNicolazzietal
(2013)
96
CHAPTER6
QuestfordeleteriousrecessivevariantsinItalianHolsteinbullscombiningexomesequencingandhighdensitySNPanalysis
MMilanesi1etal
1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly
____________________Manuscript
Abstract
Outbreed species carry a genetic load of deleterious recessive variants Those
that are lethal survive only in the heterozygous state in the population while
otherssimplyreducethefitnessofindividualshomozygousforthevariantThe
long=termdestinyofthesemutationsistodecreaseinfrequencyduetonegative
selectionandeventuallydisappearbecauseofgeneticdriftHowever theymay
bemaintainedinlivestockpopulationsandevenincreaseinfrequencywhenin
linkage with favourable alleles for traits under selection or occurring in the
genomeofsireshavinghighgeneticvalueforproductivetraitsWesearchedfor
these variants combining exome sequence data from 20 Italian Holstein bulls
sampledfromoppositetailsofthemaleandfemalefertilityEBVdistributionthe
HighDensity(HD)SNPgenotypingof1009progenytestedItalianHolsteinbulls
and information existing in public databases Exome sequencing detected
202956 variants Among these 5167 were classified as ldquodeleteriousrdquo when
annotatedwithSIFTandAnnovarThesewerefilteredbypickingthosemapping
in regions out of equilibrium and lack of homozygotes in the progeny tested
Holstein bulls or having an asymmetric distribution in bulls plus=variant and
minus=variant for fertility EBV The resulting 193 variants in 163 genes were
furtherassessedbycomparativegenomicshaplotypeanalysisandfunctionFor
eachassessmentapositiveornegativescorewasgiventoevidenceinfavouror
against thevariant tobedeleteriousA totalof35variantshadapositive total
score These genes are involved in development cell motility and immune
responseSomeof themareknowntobeexpressed in the testiclesandappear
importantforspermfertilityinmodelspeciesothersareresponsibleforgenetic
defectsinhumanandmousecarryingmutationssimilartotheonesdetectedin
cattleThesegenevariantsarecandidatetobeamongthosethatconstitutethe
ldquogeneticloadrdquoofdeleteriousrecessivegenesintheHolsteinpopulation
100
Introduction
The presence of deleterious recessive variants is well tolerated by outbred
speciesWhenthesevariantsappeartheymayspreadandsurvivealongtimein
the population in the heterozygous state (Simmons and Crow 1977) It is
estimatedthatoutbredindividualshumanincludedcarryonaverageagenetic
loadofmorethan100loss=of=functionvariantsintheirgenome(MacArthuretal
2012)
Theeffectofdeleterious recessivevariantsappearswhendemographicevents
eggeneticbottlenecksreducethegeneticvariabilityofthepopulationorwhen
relatedindividualsareintentionallymatedforbreedingpurposesInthesecases
recessive deleterious reduce the fitness of individuals carrying them in the
homozygous state and in extreme case induce very severe genetic defects
premature death and abortion (McClure et al 2014 Sahana et al 2013
VanRadenetal2011)Theythereforecontributetoexplainthephenomenonof
inbreeding depression and its negative effect on fertility (Charlesworth and
Willis2009)
The search for deleterious recessive variants is particularly interesting in
Holstein cattle Thisbreed is experiencing anegative genetic trend for fertility
traits (Jamrozik et al 2005) it is worth noting that this is occurring while
inbreedingisunderstrictmonitoringandistheoreticallyincreasingataverylow
pace(about005peryear=Mrodeetal2009)Inadditionmatingisplannedto
minimise relationship between animals and the additive genetic variance for
mosttraitsndashincludingmilkproduction=isapparentlynotdecreasinginspiteof
the selection pressure applied by modern schemes (Biffani et al 2002)
indicatingthatoveralldiversityisonlyslightlydecreasing
SelectiontoincreasefertilityischallengingSelectioncriteriagenerallyadopted
to improve this trait are far from the biological mechanisms that govern
reproduction Inaddition thesemechanismsarecomplexandverysensitive to
environmental(management)conditions(Laneetal2013Wathesetal2014)
Asa result fertility traitshave loworvery lowheritability (Pryceetal1997)
andthereforelowresponsetoselectionandslowgeneticprogress
101
Forthesereasons thequest formolecularvariants influencing fertility traits is
on=goingsincetheavailabilityofmoleculartoolsforDNAanalysisAnumberof
investigations tested the effect of functional candidate genes running genome
wide association analyses to find QTLs for fertility traits using hundreds
microsatellite markers in structured families (Ashwell et al 2004) or tens of
thousand SNP panels in whole populations (Aston and Carrell 2009) The
availability of medium density SNP panels allowed the search for deleterious
recessivesevenwithouttheuseofphenotypictraitsbyscanningthegenomein
searchofregionslackingoneofthetwohomozygousclasses(Sahanaetal2013
VanRaden et al 2011) This approach has highlighted a number of genomic
regions likely carrying deleterious recessives This information has immediate
application inmolecular assisted breeding and a relevant scientific interest in
thesearchofcausativevariantsandassessmentoftheirfunctions
Thedevelopmentofnewsequencingtechnologieshasgreatlydecreasedthecost
of sequencing so that nowadays the sequencing of few target individuals is
becoming the gold standard to seek for causal mutations for rare monogenic
defectsinhumansandotherspecies(ChunandFay2009)Oftenonlytheexome
the protein coding part of the genome is sequenced searching for those
mutationsthatseverelyalterproteinstructureandorfunction(Bamshadetal
2011 Li et al 2013) A typical mammalian exome consists in a few tens of
millionnucleotides(around2ofthereferencegenome)comparedtothefew
billionofthewholegenomeThereforewiththesamenumberofreadsexome
sequencinghastheadvantageofamoreaccurategenotypingdueto thehigher
sequencing depth an easier data management lower computation load and
easier interpretation (Bieseckeretal2011Majewskietal2011) Indeed the
numbervariants found inexomesareone=twoordersofmagnitude lowerthan
thatdetectedinwholegenomesandmutationsaredetectedinannotatedgenes
orimmediatelynearbyOntheotherhandtheexomeapproachcanonlyidentify
variantson annotatedgenesVariants inunknownexonsor that occur innon=
coding regulatory features will be missed (Bamshad et al 2011) In humans
these are turning out to be of outmost importance for gene regulation and
downstreamphenotypic effects (ENCODEConsortium2012) but they are still
verypoorlyannotatedinlivestock
102
Herewesequencedtheexomesof20Holsteinbullsprogenytestedselectedfrom
the tails of male and female fertility EBVs (Estimated Breeding Value) Since
livestockfertilityisacomplextraitandcausativemutationscannotbesoughtby
the approach used in monogenic diseases we integrated the sequence
information with high density SNP chip data obtained on a much larger
populationofItalianHolsteinbullstodetectgenescarryingallelescandidateto
be deleterious recessives and confirm their likely deleterious effect by
comparativegenomics
Materialsandmethods
1SamplingandDNAextraction
Semen(straws)andbloodsampleswereprovidedbyItaliancertificatedartificial
insemination centres (UE Directive 88407CEE) thus ethics committee
approvalforthisstudyisnotrequired
A total of 20 ItalianHolstein progeny tested bullswere sampled for complete
exome sequencing Among them 19were chosen from the tails of female and
male fertilityEBV(EstimatedBreedingValue)distribution (Table1) tohave5
animals fromeach tail One animalwas to be in theminus=variant tail of both
traitsBullswereselectedasunrelatedaspossibleSamplingwaspossiblethanks
tothecollaborationwithANAFI(ItalianHolsteinBreederAssociation)
ANAFIdevelopedageneticevaluationfor fertilitycombiningdifferenttraitsas
explained by Biffani and colleagues (Biffani et al 2005) The traits are ldquodays
from calving to first inseminationrdquo ldquocalving intervalrdquo ldquofirst=service non return
rateto56daysrdquoldquoangularityrdquoandldquomatureequivalentmilkyieldat305daysrdquo
MalefertilityisexpressedasanEstimatedRelativeConceptionRate(ERCR)and
isameasureofconceptionrateofaservicesirerelativetoservicesiresofherd
mates(Soggiuetal2013)
103
Table1FemaleandmalefertilityEBVsofthe20animalssequenced
Animal EBVFemaleFertility EBVMaleFertility GroupfriA01 97 =111 NAfriA02 98 =166 MalLfriA03 115 0 FemHfriA04 114 0 FemHfriA05 114 0 FemHfriA06 86 0 FemLfriA07 98 195 MalHfriA08 105 213 MalHfriA09 95 =154 MalLfriA10 95 149 MalHfriA11 102 264 MalHfriA12 91 =475 FemLMalLfriA13 105 =179 MalLfriA14 102 207 MalHfriA15 99 =405 MalLfriA16 85 0 FemLfriA17 87 0 FemLfriA18 121 0 FemHfriA19 117 0 FemHfriA20 88 0 FemL
A total of 1009 Italian Holstein bulls were chosen from the entire population
balancing theirpresence in the tailsof theEBV forkgprotein (PROT) somatic
cell count (SCC) fertility (FERT) and functional longevity (LONG) for high
densitygenotyping(TableS1)
DNAwasisolatedfromwholebloodandsemenstrawssamplesinLGSlaboratory
(CremonaItalywwwlgscrit)usingtheGenomixkitaccordingtoLGSinternal
protocolDNAsforexomecaptureandsequencingwereextractedindependently
tosatisfy therequirement indicatedby IGA technologies (Udine Italy)at least
12mgofDNAataconcentration100ngulwithR260280higherthan18and
R260230near19
2Moleculardataproductionandqualitychecking
21Exomecaptureandsequencing
104
Exomes were captured using the ldquoSureSelected All exon Bovinerdquo (Agilent
Technologies) kit in early version Probes coordinates are based on UMD31
bovinereferencesequence(Ziminetal2009)anddesignedagainstNCBIexons
as well as predicted exons UTRs and miRNAs Number and distribution of
probesperchromosomearedetailedinTableS2
rdquoOvation Ultralow Library Systemsrdquo protocol was followed for library
preparation After quantification with Qubit (Life technologies) and quality
controlwith2100Bioanalyzer(AgilentTechnologies)DNAwassonicatedwith
Bioruptor (Diagenode) to obtain fragments between 150 to 200 bp in length
blunt end=repaired adenylated ligated with paired=end adaptors and purified
followingtheprotocolprovidedbythemanufacturer
LibrarieswerehybridizedwithldquoSureSelectedAllexonBovinerdquoprobesusingthe
Encore Target Capture Module (Nugen) SureSelect Target Enrichment and
Herculase II FusionEnzymewithdNTPCombo (AgilentTechnology) kit Then
capturedDNAwasamplifiedandtaggedwithlibraryspecificindexesQualityof
capturedsampleswastestedwithQubitand2100Bioanalyzer
Paired=end 2X100bp sequencing was performed on Illumina HiSeq2500
(IlluminaSanDiegoCA)atanexpectedcoverageof50X
22Sequencealignment
Adapters and reads shorter than 36bp were trimmed using Trimmomatic
(version032)(Bolgeretal2014)ForeachFastqfilethealignmentandpairing
of the reads to theUMD31 reference sequencewas performedwithBurrows=
WheelerAligner(BWA)(version077=r441)(LiandDurbin2009)applyingalso
trimmingoption (=q5) SAM files resultedwere converted intoBAM file using
SAMtools (version0118) (Lietal2009)Foreach individualBAM fileswere
merged in a single one with Picard (version 1111)
(httpsgithubcombroadinstitutepicard)Duplicatereadswereidentifiedand
markedwithPicardFinallyindividualBAMfileswereindexedwithSAMtools
105
Sequencedepthwasevaluatedsimultaneouslyonall20sequencedanimalsand
onlyonregionstargetedbyprobesusingDepthOfCoverageofGATK(McKennaet
al2010)
23ExomeVariantcallingandqualitycheck
Singlenucleotidepolymorphisms(SNPs)insertionsanddeletions(InDels)were
calledsimultaneouslyonall20sequencedanimalsandonlyonregionstargeted
byprobesusingUnifiedGenotyperofGATK(DePristoetal2011)
All variantswere filteredusingVariantFiltration ofGATK to removeoneswith
lowmappingandorsequencingqualityThechoiceoffiltersandthresholdswas
made in accordance to ldquoGATK best practicerdquo for exome sequencing and
distributionoffiltervalues
bull ldquoSNPclusterrdquoexcludedvariantifmorethan3variantswerefoundin10
bp
bull ldquoBaseQualityRankSumTestrdquoexcludedifitrsquoslowerthen=5ormorethan
5
bull ldquoMappingQualityRankSumTestrdquoexcludedif itrsquoslowerthen=7ormore
than6
bull ldquoRead Position Rank Sum Testrdquo excluded if itrsquos lower then =4 or more
than4
bull ldquoFisherStrandrdquoexcludedifitrsquosmorethan30
bull ldquoDepthrdquoexcluded if itrsquos lower then60X(ieaminimumvalueof3Xper
animals)ormorethan3500X
bull ldquoQualityrdquoexcludedifitrsquoslowerthen40
bull ldquoMappingqualityrdquoexcludedifitrsquoslowerthen30
Filters were applied to all variants but only bi=allelic SNPs were used in the
following analyses Genotypes with ldquoGenotype Depthrdquo lower than 5 and
ldquoGenotype Qualityrdquo lower than 20 were discarded The remaining ones were
converted intoldquoABrdquocode(A=referencealleleandB=alternativeallele)Then
data were transformed in PLINK (Purcell et al 2007) format using a home=
made python script To prepare the working dataset the final quality control
106
removedvariantswithmore than25missingdata (that iswithmore than5
animals out of 20 without genotypes) monomorphic (minor allele frequency
MAFlt1)ornotmappinginautosomesBasicpopulationgeneticsparameters
werecalculatedtoevaluatethepresenceofcrypticstructureinthedatathatmay
influencetheoutcomeofthedownstreamanalyses
In total 24390 SNPs identified in the exomes were also included in the HD
SNPchip panel HD genotype from nineteen out to twenty of the sequenced
animals sequenced was used to compare technologies Genotypes at common
locationswerecomparedacrosstechnologies toevaluatesequencequalityand
theeffectofsequencedepthongenotypecalling
24BovineHDGenotypingBeadChipdataandqualitycheck
AllanimalsweregenotypedusingtheBovineHDGenotypingBeadChip(Illumina
San Diego CA) Genotypes were quality controlled and filtered with standard
threshold Markers with more than 5 missing data and MAF lt 1 were
excluded Animals with more than 5missing data were also excluded Only
autosomal SNPs were retained Thereafter genotypes were phased and the
remainingmissingdata imputedusingBeaglev332(BrowningandBrowning
2007)
Ofthe20animalssequenced12hadalsoBovineHDGenotypingBeadChipand7
BovineSNP50 Genotyping BeadChip version 2 genotypes available For
concordance and haplotype analysis we imputed the 50K data to 800K as
previouslydescribedabove
3Strategyfordetectingcandidatedeleteriousvariants
Wesearchedforcandidatedeleteriousvariantsfollowinganintegratedapproach
that joined the analysis of i) exomes of a few Italian Holstein bulls having
plusvariant and minusvariant EBVs for a target trait ii) HD genotyping of a
population of 1009 progeny tested Italian Holstein bulls iii) the comparative
genomic analysis of 100 vertebrate species The strategy followed for the
107
analysisisshowninFigure1Inthefirststep(Search)welookedforcandidate
deleterious variants in the exomes In step2 (Filter)we filtered candidatesby
assessingwhichof thesevariantseitherhadanon=randomdistributionamong
animals having extreme EBVs or co=mapped with genomic regions carrying
outliersignals inthelargepopulationofprogenytestedbulls Instep3(Asses)
candidateswere further evaluated by i) assessing their conservation across a
panel of 100 vertebrate species ii) searching for associated haplotypes and
evaluating their frequencies in the population of progeny tested bulls iii)
evaluatingtheirassociationwithgeneticdefectsinhumanandmodelspeciesiv)
examining their expression profile and function when known Methods are
belowdescribedindetail
Figure1Strategyfollowedtoidentifydeleteriousvariants
108
31Step1Searchforcandidatedeleteriousvariants
Quest for candidate deleterious variants in exome sequence data by
variantannotation
WeannotatedallvariantsusingANNOVAR(Wangetal2010)andVEP(Variant
Effect Predictor) (McClure et al 2014) software They use the Ensembl gene
annotation database (UMD31 reference sequence release 74) to investigate
locationandpotentialeffectoncodingsequencesofallvariantsVEPintegrates
the SIFT algorithm (Kumar et al 2009) to predict the effect of amino acid
substitutiononproteinfunctionForeachsequencechangesequencehomology
and physic=chemical similarity between the reference amino acid and the
alternative is considered providing a score that evaluates the mutation
tolerabilityVariantswithscorelowerthan005wereclassifiedasldquodeleteriousrdquo
inaccordingtosoftwarerecommendations
SIFTalsocomparedallvariants found in theexomeswith thosepresent in the
dbSNPdatabase(version140)toidentifythosethatwerenovel
32Step2Filteringofcandidatedeleteriousvariants
2a Analysis of deleterious variant distribution among animals having
extremesEBVvaluesformaleandfemalefertilitytraits
Association between variants and the classification in the tails of the EBV
distributionwas estimated byFisherexact testwhich compares frequencies of
alleles (a vs A) in the two tails not assumingH=W equilibriumgenotypic test
that compares the frequency of genotypes (aa vs Aa vs AA) and the recessive
geneactiontestwhichtestsforthedistributionofonehomozygousclassversus
theothers(aavsaAAA)AlltestsareimplementedinPLINKfunctionsThe5
significance threshold was set empirically by permutation (N=100000) either
point=wise that is not corrected for multiple testing (EMP1 = Empirical
Probability1)andthereafterfamily=wisecorrectedformultipletesting(EMP2=
EmpiricalProbability2)
109
GenesexceedingEMP1were considered suggestive and thoseexceedingEMP2
significant candidates to be deleterious recessives and their characteristics
conservationacrossspeciesandfunctionfurtherinvestigated
2b Quest for regions candidate to carry deleterious variants in the
populationanalysedwiththeHDBeadChip
Basic population genetics parameters and Multi Dimensional Scaling were
calculated with GenABEL R package (Aulchenko et al 2007) to evaluate the
presenceofcrypticstructure inthedatathatmayinfluencetheoutcomeofthe
downstreamanalyses
ExpectedandobservedallelicandgenotypicfrequenciesandFisherexacttestof
Hardy=Weinberg proportions (Janis E Wigginton 2005) were calculated with
PLINKon theHDSNPdatasetValueswere averaged in slidingwindowsof10
markersThisvalue(10markers)waschosenaftertestingdifferentwindowsize
as represent a good compromise between noise reduction and resolution (not
shown)
Candidate regions containing deleterious mutations were identified following
twodifferentapproaches
In one approach hereafter refereed to as ldquoOut of Hardy=Weinbergrdquo approach
(OHW)markerssignificantlydeviatingfromofHardy=WeinbergequilibriumatP
le84710=8 (nominalvaluePle005Bonferronicorrected formultiple testing)
werefirstidentifiedWindowswithatleastthreesignificantmarkersbelowthis
thresholdandhavingahomozygotedeficitwereconsideredcandidatestocarry
deleterious mutations In this OHW approach we applied very stringent
constraints to reduce false positive discoveries due to the Wahlund effect
inducedbypopulationsubstructure(seeresultsandFigureS5)
In thesecondapproachhereafterreferred toas ldquoLackOfHomozygotesrdquo (LOH)
approach the differences between observed and expected frequencies of
homozygotesfortheminorallelewerecomputedandaveragedacrossmarkers
included in each sliding window Values were standardized and only regions
carrying three overlapping windows below =3Z score value were considered
110
candidates to carry deleterious mutations Thereafter significant windows
carryingatleastonemarkerincommonwerejoinedtodefineregionboundaries
ExomesequencingknowledgeandHDBeadChipdatawerecrossed inorder to
identifygenescarryingseverevariantslocatedwithinsignificantregionsortheir
proximity (50Kb upstream or downstream) These genes were considered
candidates to be deleterious recessives and their characteristics conservation
acrossspeciesandfunctionwasfurtherinvestigated
33Step3Assessmentofcandidatedeleteriousvariants
Genes carrying variations classified as deleterious by SIFT or ANNOVAR and
eitherassuggestivebyfertilityEBVtailsanalysisorassignificantbyOHWeLOH
analyses of the Holstein bull population were submitted to comparative
genomics population genetics and functional analyses In order to prioritize
genesvariants discoveredwe scored each evaluation using a subjective scale
ranging from =3 to 3 to measuring the evidence in favour (scores 1 to 3) or
against(scores=1to=3)thedeleteriouseffectoftheSNPsThescore0inallcases
indicated the absence of evidence or that a specific assessment failed The
completescorescalethereforerangedfrom=9to9
3aComparativegenomics
For comparative genomics purposes the position of eachmutationwas ldquolifted
overrdquo from UMD31 bovine coordinates to hg19 human coordinates using the
specifictoolatUCSC(Kentetal2002=httpgenomeucscedu)Orthologous
proteins were aligned using the UCSC genome browser and amino acid
conservation assessed in a set of 100 vertebrate species that included human
primates (11 species) euarchontoglires (egmouse14 species) laurasiatheria
(egsheep25species)afrotheira(egelephant6species)othermammals(eg
platypus5species)birds(egchicken14species)sarcopterygii(egturtle8
species) fish (eg zebrafish 16 species) Results of amino acid conservation
analysiswererecordedonlyifhomologousproteinfromatleast50speciescould
beretrievedandalignedThereafteraminoacidconservationwasscored3ifthe
111
aminoacidwasconservedinallvertebrates2ifconservedinallmammalsand1
ifconservedinthelargemajority(atleast90)ofmammalsAscoreof=1was
given when the amino acid was not conserved among species and a score =3
when both variants observed in cattle were carried by other species Finally
wheneithertheliftoverprocessortheproteinalignmentfailedthecomparison
wasconsiderednoninformativeandgivenascore0
3bSearchofhaplotypesassociatedtodeleteriousvariantsandanalysisof
HDpopulationdata
To acquire further evidence we searched for the haplotypes that carry the
deleterious variations and estimated their frequency in the
homozygousheterozygousstatuswithinthebullpopulationgenotypedwiththe
HDBeadChip
Weusedamultistepapproachtofindhaplotypescarryingdeleteriousvariantsi)
from the phased HD BeadChip data we extracted the haplotypes of the 20
sequencedanimalsintheregionofthevariant(100markersoneachside)ii)in
these 20 animals we searched the shorter haplotype able to distinguish
homozygous andor heterozygous animals for the deleterious variant from all
other haplotypes iii) we assessed the frequency of homozygotes and
heterozygotes for thehaplotype in thewholepopulationofbulls characterised
withtheHDBeadChip
Scoresinthiscasewere=3inthecaseofuniquehaplotypesshowinganumberof
homozygotesnonsignificantlydifferentfromthenumberexpectedfromHardy=
Weinbergproportions3inthecaseofuniquehaplotypesshowinganumberof
homozygotes significantly lower than expected 1 in the case of unique
haplotypesneverhomozygousandrare0whenevernouniquehaplotypecould
befound
3cFunctionalanalyses
With PantherDB (Mi et al 2010) resources we classified our gene list into
functionalcategoriesaccordingtotheGeneOntologyclassificationAllsuggestive
112
andcandidateEnsemblGeneIDsubmittedtothebulkanalysiswererecognizedand annotated Beyond the GO term characterization in order to understandpotentialdetrimentaleffectsofmutationsandalsopinpointpotentialcandidatesfor fertility traits we retrieved information for each gene according to thefollowing parameters i) the involvement in known Mendelian or complexdiseases fromOMIMdatabase ii) theavailabilityofknock=outmice iii) and theeventual expression in reproductive tissues from the Gene Expression Atlas(httpwwwebiacukgxa)(Table4)
Alsoforthesedataweusedthesamescaleattributingarangefrom=3pointstogenes having no relation with fertility viability and genetic defects and norelevant phenotypic effects in knock=out mouse models to 3 to gene closelyrelated to fertility and viability associated to genetic defects in human andaffectingfertilityorsurvivalwhenknocked=outInparticularascoreranging=1to+1wasattributedtogenesassociatedtohumansyndromes(=1=noassociationrecordedinOMIMdatabase0=noinformationinOMIM1=associationinOMIM)A score with same range to information collected from knock=out mousephenotypes(=1=knock=outmouseshowsnoimpairedphenotype0=noknockoutmouse has been produced 1= knock=out mouse shows impaired phenotype)Thesamescorewasattributedtomolecularfunctions(=1=functionunrelatedtofertilityandviability0=functionindirectlyrelatedasimmunityandmetabolism1=functiondirectlyrelatedasfertilityanddevelopment)
Results
Exomecaptureandsequencing
A totalof225414probeswereused for capturing thecattleexome (5355Mbabout 2 of UMD31 reference genomes) Mitochondrial (MT) and Ychromosome were not included in probe design The number of probes perchromosomespannedfrom2500onBTA27toalmost12000inBTA3(TableS2)
113
The average exome sequencing depth was 4058 slightly lower than the
expected50XAllanimalshadatleast24Xaveragegenomessequencedbutina
same animal the sequencing depth varied across chromosomes and regions
withinchromosomesThelowestchromosomeaveragevalueobservedwas10X
inBTA25oftheshallowersequencedanimalAveragecoverageperanimaland
chromosomeandoverallmeansareshowninFiguresS1
In total1613461402 raw100bppair=ends sequence readswereproducedA
total of 1425514112 100bp reads were mapped and properly paired to the
reference genome Paired reads per animal ranged between 41973510 and
138840392
Insynthesissequencingdepthwasvariableasexpectedandalmostallanimals
hadsomeregionsofscarceornosequencedepthbutoverallallexomeregions
were sufficiently covered to permit a comprehensive analysis of molecular
variantsoftheanimalsinvestigated
Step1Searchforcandidatedeleteriousvariants
Variantdetectionandannotation
Thesequencingdepthofallvariantscalledonthealignmentofallreadsacross
animalsfollowedaChi=squaredistributionAveragedepthwas79694spanning
from2Xto5000Xwithpeakfrequencyclassat150=250X(FigureS2)
A multistep approach was followed to clean the exome sequence dataset
Cleaning process steps and the number onmarkers retained in each step are
describedinTable2
114
Table2Typeandnumberofmarkersdetectedandretainedafterdatasetcleaning
TypeofvariationNumberofvariations(N)
Rawdata
Aftervariantcleaning
Aftergenotypecleaning(onlyautosomal)
Complex 1213 963 0Deletion 6938 5031 0Insertion 5510 3656 0MultiallelicComplex 5710 3995 0MultiallelicSNP 803 512 0SNP 242988 203079 164277Total 263162 217236 164277whencomparedtothereferencegenome
Atotalof263162variantswereidentifiedAmongthese217236(8255)wereretainedafterapplyingthevariantfilters(seematerialsandmethodssection)
According to GATK classification the vast majority (9348) of variationsidentified were biallelic SNPs followed by insertions and deletions (InDels40) more complex variations and multiallelic SNPs (252) Indels andcomplexvariationsalthoughlessfrequentthanSNPsaffectedarelevantportionoftheexome(49589bp)Thenumberofvariantsdetectedperchromosomewassignificantly correlated to the number of probes used (r=08761P=228x10=10)andofbasescaptured(r=08424P=53x10=19)perchromosomehoweverBTA23BTA27 BTAX had an outlier behaviour the former two containing highervariation(450and445variantsperKb)andthelatterlower(084variantsperKb)thantheaverage(322variantsperKb)
Bialleic SNPswere further filtered for genotype quality (sequence depth gt 5Xandsequencequalitygt20)andBTAXmarkersThefinalautosomalSNPdatasetcounted164277SNPsAmongthese17829biallelicSNPs(109)describedforthefirsttimehavebeensubmittedtodbSNP
Variantsrsquo annotation is shown in Table 3a Variantswere detected in differentgenedomains(exonsintrons3rsquountranslatedregions)aswellasupstreamanddownstreamgenesMorethan31ofthevariationsweredetectedinexonsandamong them more than 41 have the potential to have an effect on genefunctionbyaffectingsplicingoralterproteinstructure(Table3b)
115
Table 3 A) number of autosomal SNPs annotated in each gene region B) number and
classificationofautosomalSNPsalteringgenefunction
A)
Generegion NofSNPs(autosomes)Downstream 1170Exonic 51293Intergenic 49901Intronic 58116ncRNA_exonic 160ncRNA_intronic 3ncRNA_splicing 2Splicing 184Upstream 1445Upstreamdownstream 54UTR3 1345UTR5 604Total 164277
B)
EffectofexonicSNPs NofSNPs(autosomes)Nonsynonymoussinglenucleotidevariation 20839Stopgainsinglenucleotidevariation 213Stoplosssinglenucleotidevariation 10Synonymoussinglenucleotidevariation 30226Unknown 5Total 51293
whencomparedtothereferencegenome
Amongthe51293exonicSNPs4166in2978geneswereclassifiedldquodeleteriousrdquo
by SIFT Among these 3210 in 2356 genes were observed at least twice
ANNOVARidentified223stopgain(213)orloss(10)variantsin217genes
Theaverageconcordanceat24390SNPs in commonbetweenexomesand the
HD Beadchip was 9716 (Table S3) Discordance was due to either alleles
detected by sequencing andnot detected by genotyping (exome=heterozygote
Beadchip=homozygote) and vice=versa (exome=homozygote
Beadchip=heterozygote) Discordances were randomly distributed among
markers and inversely correlated to sequencing depth (Figure S3) indicating
that most errors are likely occurring during sequencing even if occasional
116
Beadchip failure indetectingsomeallelecannotbeexcludedTwoanimalshad
anoutlierbehaviouronehaving846andthesecond719concordanceBoth
animalswereretainedforvariantdiscoverywhiletheywereexcludedfromthe
analysesofEBVtailsWithoutthesetwooutlieranimalsthemeanconcordance
increasesto9938
Multidimensional scaling confirmed the absence of a substructure in animals
sequenced(FigureS4)
Step2Filteringofcandidatedeleteriousvariants
AnalysisofEBVtails
Fisherexact test isgenerallyused to test for theassociationbetweenavariant
and a Mendelian disease Samples are divided in cases and controls and the
association between alleles or genotypes and the disease is tested under
different inheritance models Here we used the same method to test for
association between allelesgenotypes and animals from opposite tails of the
EBVdistributionfori)malefertility(N=4high+4low)ii)femalefertility(N=5
high+4low)andiii)maleandfemalefertility(N=9high+8low)
Genotypes were tested under all inheritance models implemented in PLINK
Giventhelownumberofsamplesnovariantwassignificantaftercorrectionfor
multiple comparisons (EMP2) and only suggestive variants could be found
(significant at 5 at EMP1)We considered suggestive of carrying deleterious
allelesalistof101genescarrying112variants(TableS4)
Detectionofregionscarryingdeleteriousmutationfrom800KSNPdata
After applying quality control parameters 26653 and 146991 markers were
removed from the original dataset for excessive missing data and MAF
respectivelyInaddition17individualsremovedforlowgenotypingrateAswith
exomes only autosomal SNPswere retained for the analyses thus obtaining a
workingdatasetof590680SNPsand992animals
117
MDS(FigureS5)analysis indicates that thedatasethassomesubstructure that
mightaffectdownstreamanalysesWeaccountedfortheriskoffindingmarkers
outofHardy=WeinbergproportionsduetotheWahlundeffectbychoosingvery
stringentsignificancethresholdintheOHWanalyses
To detect regions candidate to carry deleterious recessivemutations we used
two approaches the ldquoOutside Hardy Weinbergrdquo (OHW) and the ldquoLack Of
Homozygotesrdquo (LOH) approaches (see materials and methods) The two
approaches complement each other in the detection of regions carrying less
homozygotes than expected OHW identifies regions in which a few markers
have extreme homozygote deficit while LOH detects regions in which many
markershaveamoderatedeficit
In the SNP dataset 1884 markers were significantly out of Hardy=Weinberg
equilibrium(Ple84710=8)Atotalof485windowswereidentifiedwithatleast
3 markers out of 10 exceeding this threshold Among these 84 contained
markerssignificantbecauseofhomozygotedeficitThese84windowsidentified
11OHWcandidateregionsin9chromosomes(TableS5)
TheLOHapproachidentified2335windowsbelow=3Zscoreofthestandardised
differencebetweenobservedandexpected frequencyofhomozygotes forMAF
Theseidentified326LOHcandidateregionsspreadonallautosomes(TableS6)
Onlyregionsformedbyatleast3singlewindowswereretainedreducingLOH
candidateregionsto240
Deleterious variants (SIFT deleterious and stop codon gainedlost) were
intersectedwithsignificantOHWandLOHregionSevenofthesemappedwithin
OHW sliding windows 36 within LOH sliding windows Among these 5 were
located in regions significant in both OHW and LOH Other 51 deleterious
variants mapped in the immediate vicinity (+=50Kb) of significant windows
These 83 variants affect 66 unique genes candidate to carry deleterious
recessivealleles(TableS7)
Five regionswere common toOHWandLOHoneonBTA4 (LOH=74959610=
75151417 OHW=74970653=75151417) two on BTA7 (LOH=9626435=
10099512 OHW=9628735=10099512 LOH=10575691=11171132
118
OHW=10486424=11087441) one on BTA15 (LOH=80299924=80928311
OHW=80299924=80921274) and one on BTA18 (LOH=28122373=
28185525OHW=28106022=28164693)
Step3Assessingcandidatedeleteriousvariants
After the two filtering step previously described a total of 191 SNPs were
retained out of the 3433 classified as deleterious by SIFT and Annovar and
analysed further These are carried by 163 genes Eighteen genes carried 2
candidateSNPsthreegenes3SNPsandonegene5SNPsInaddition4SNPsin
four geneswere identified as potential candidates in both the analysis of EBV
tailsandofbullpopulation
A totalof12SNPs introducedstopcodonswhile the remaining inducedamino
acid substitutionsor splicing variants In the filtereddataset theproportionof
stop codons on total variation (63) was only slightly higher than the
proportion found in the entire dataset (51 225 stop codon out of 4391
deleteriousSNPs)
Comparativegenomics
Proteinmultiplealignmentwassuccessfulin146outof191comparisonsIn26
casesthemostfrequentaminoacidincattlewasveryhighlyconservedeitherin
allvertebrates(N=9)orinallmammals(N=17)Inother23casestheaminoacid
was conserved in at least 90 of mammalian species In the last class were
generally included amino acids that changed in mammalian species
phylogenetically verydistant from cattle (eg platypus and related species) In
97 instances the amino acid could change quite freely and occasionally both
variantsobservedincattlewerecarriedbyotherspecies
HaplotypeanalysisofHDpopulationdata
The search for haplotypes associated to the191 variants for further testing in
the population identified 101 unique and invariant haplotypes common to all
119
carriersofatargetvariantanddifferentfromallthosefoundinnoncarriersIn
additional36casesanoncompletelyinvarianthaplotypecouldbeassociatedto
themutationConverselyin54instancesareliablehaplotypecouldnotbefound
(table S8) In total 38 haplotypes had no homozygous individuals in the
population Thiswas expected in the case of 33 rather rare haplotypes (fewer
than 100 heterozygotes observed in the population) Conversely 5 haplotypes
were rather frequent (from 170 to almost 300 heterozygotes observed in the
population)andtheexpectationwastofind78to222homozygousindividuals
in thepopulation insteadof thenone foundAnothervarianthadaremarkable
high frequency (35) and a clear outlier behaviour Only 11 homozygous
individualswereobservedoutofthe121expected
Functionalanalysisofcandidategenes
To assess the potential deleterious effects of the variants identified we also
retrieved functional information for each gene according to the following
parametersi)associationtoknownMendelianorcomplexdiseaseslistedinthe
OMIM database ii) availability and effect of gene knock=out in mice iii)
expression profileswith particular attention in reproductive tissues from the
GeneExpressionAtlas(Table4)
Twenty genes out of 163 have a recognized phenotype in humans These are
mostlyseveresyndromesUnfortunatelyonlyafewgeneshaveanull=miceline
producedwithphenotypicdataavailable
According to the expression data available from Ensembl these genes are
generally (with someexceptions)expressed inavarietyof tissueswitha clear
tendencytobehighlyexpressedintestis
120
Tableamp4Informationon163genesanalyzedconcerningeventualphenotypesinmanorknockoutsinmouseHeaderampcodesAbrainBcolonCheartDkidney
EliverFlungGskeletalmuscleHspleenItestisColoursrepresentthegenelevelofexpressionindifferenttissuefromlow(whitegrey)tohigh(blue)NCin
FunctioncolumnmeansldquoNotClassifiedrdquo
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000000079
CCSAP
Cells
9
1
03
1
07
2
06
2
2
NC
ENSBTAG00000000149
USP40
Cells
6
7
2
7
5
7
2
8
9
fertility
ENSBTAG00000000414
FUT5
Nomatch
37
1
cellular
process
ENSBTAG00000000918
ERMARD
Periventricular
nodularheterotopia
6Xlinked
periventricular
heterotopia6q
terminaldeletion
syndrome
Periventricular
nodularheterotopia
Nomatch
5
4
4
9
6
7
4
9
17
cellular
process
ENSBTAG00000000998
SLC39A6
Cells
6
9
2
7
4
10
1
10
9
cellular
process
ENSBTAG00000001133
VWA8
Mice
2
3
6
7
10
3
3
4
4
blood
metabolism
ENSBTAG00000001140
IAH1
httpswwwmousephenotypeor
gdatagenesMGI1914982sect
ionassociations
BOTHSEXadiposetissue
behaviourneurological
growthsizebodyFEMALE
homeostasismetabolismMALE
limbsdigitstailskeleton
9
18
30
60
23
19
12
17
30
cellular
process
ENSBTAG00000001262
IRGQ
Cells
6
1
5
3
06
2
5
04
2
immunity
ENSBTAG00000001286
ELMO3
httpswwwmousephenotypeor
gdatagenesMGI2679007sect
ionassociations
Decreasedstartlereflex
02
27
14
7
7
02
1
fertility
ENSBTAG00000001291
OR8H3
Nomatch
fertility
ENSBTAG00000001500
FIGNL1
Cells
08
2
01
05
07
08
04
2
3
cellular
process
ENSBTAG00000001505
GRK4
Cells
10
7
09
4
6
3
2
6
218
signalling
ENSBTAG00000002220
SPTLC1
Hereditarysensory
andautonomic
neuropathytype1
No
17
27
3
28
12
28
5
19
6
metabolism
ENSBTAG00000002288
NT5DC4
Nomatch
02
cellular
process
ENSBTAG00000002289
NLRP14
No
04
cellular
process
ENSBTAG00000002660
CCDC11
Situsinversustotalis
Nomatch
3
1
06
2
07
3
03
2
54
fertility
121
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000002809
CAPN10
AutosomalRecessive
MentalRetardation
DIABETESMELLITUS
NONINSULIN
DEPEND
ENT1
Cells
2
13
41
33
29
immunity
ENSBTAG00000002966
DNAJC13
Youngadultonset
Parkinsonism
No
5
52
94
62
84
fertility
ENSBTAG00000003231
LIM2
Pulverulentcataract
Cells
01
02
01
02
cellular
process
ENSBTAG00000003508
CFAP69
httpswwwmousephenotypeor
gdatagenesMGI2443778sect
ionassociations
Abnormalcirculatinginsulinlevel
MaleInfertilityDecreasedmean
plateletvolum
eIncreasedleanbody
mass
203
07
1
01
7NC
ENSBTAG00000003523
UGT2B15
Nomatch
34
metabolism
ENSBTAG00000004081
FAT3
No
2
03
01
01
05
signalling
ENSBTAG00000004555
LRP2
DONN
AIBARROW
SYND
ROME
Intellectualdisability
Cells
56
3
01
disease
ENSBTAG00000004772
THEM
4
Cells
8
16
295
27
52
63
metabolism
ENSBTAG00000004860
SLC27A6
Cells
6
02
10
602
207
01
immunity
ENSBTAG00000005021
SEMA5A
Monosom
y5p
Cells
9
04
89
13
08
08
6immunity
ENSBTAG00000005170
GPR114
Mice
101
02
01
09
6
01
fertility
ENSBTAG00000005248
OFCC1
No
01
01
03
01
01
01
03
NC
ENSBTAG00000005340
SERGEF
Cells
7
10
35
15
58
23
cellular
process
ENSBTAG00000005357
VPS11
Mice
14
13
619
717
815
15
cellular
process
ENSBTAG00000005466
C8H8orf58
Nomatch
05
01
08
02
103
02
04
NC
ENSBTAG00000005514
TRIO
Cells
17
66
10
210
88
7immunity
ENSBTAG00000005633
RGNEF
httpswwwmousephenotypeor
gdatagenesMGI1346016sect
ionassociations
Vertebralfusion
36
214
05
801
08
1cellular
process
ENSBTAG00000006021
CEP250
Mice
3
22
43
41
88
metabolism
ENSBTAG00000006027
USP34
Mice
11
63
74
88
912
fertility
ENSBTAG00000006532
CDK5RAP2
Autosomalrecessive
primary
microcephaly
httpswwwmousephenotypeor
gdatagenesMGI2384875sect
ionassociations
skeletonimmunesystem
growthsizebodyhem
atopoietic
system
5
34
92
74
784
developm
ent
ENSBTAG00000007001
SLC26A1
No
03
01
02
32
903
01
07
metabolism
ENSBTAG00000007340
LOC101906349
Nomatch
NC
ENSBTAG00000007568
MEPE
Cells
cellular
process
ENSBTAG00000007910
EXOC7
Cells
13
24
23
22
12
22
15
22
28
cellular
process
122
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000008650
CAMK1D
Cells
24
2
01
06
1
2
03
7
3
signalling
ENSBTAG00000008797
Uncharacterized
Nomatch
09
04
07
04
NC
ENSBTAG00000008871
DDX4
Cells
02
01
01
132
cellular
process
ENSBTAG00000008996
SFI1
Mice
2
1
06
3
04
3
2
5
NC
ENSBTAG00000009014
UPK1B
Mice
2
2
2
5
4
4
2
7
3
fertility
ENSBTAG00000009030
NOX5
Nomatch
04
01
8
01
immunity
ENSBTAG00000009033
TEX37
httpswwwmousephenotypeor
gdatagenesMGI1921471sect
ionassociations
Abnormalstartlereflex
03
83
NC
ENSBTAG00000009064
CSF2RB
Congenitalpulmonary
alveolarproteinosis
Mice
01
2
04
06
2
23
03
17
05
signalling
ENSBTAG00000009144
BPIFA2A
Nomatch
6
06
05
metabolism
ENSBTAG00000009337
PNMAL1
Cells
8
05
01
3
02
03
03
04
3
Immunity
ENSBTAG00000009444
SLC15A5
Cells
01
cellular
process
ENSBTAG00000009568
KANK2
Cells
1
17
30
9
3
31
9
10
3
cellular
process
ENSBTAG00000009569
DOCK6
AdamsOliver
syndrome
Cells
2
5
10
12
5
15
5
6
21
metabolism
ENSBTAG00000009836
CHGA
Mice
140
555
01
2
02
01
07
1
metabolism
ENSBTAG00000010522
Uncharacterized
Nomatch
01
02
02
03
2
metabolism
ENSBTAG00000010601
DOLK
Familialisolated
dilated
cardiomyopathy
No
4
8
6
22
9
6
5
5
12
metabolism
ENSBTAG00000010847
FOXN4
Cells
01
01
metabolism
ENSBTAG00000010954
ART3
Cells
02
1
55
01
08
2
10
06
71
metabolism
ENSBTAG00000011011
SSH2
No
3
09
5
4
2
2
6
7
10
cellular
process
ENSBTAG00000011387
NAT9
Cells
6
7
3
8
6
8
4
6
6
metabolism
ENSBTAG00000011420
CA9
Cells
04
3
02
2
03
1
01
2
56
metabolism
ENSBTAG00000011433
RGP1
httpswwwmousephenotypeor
gdatagenesMGI1915956sect
ionassociations
Completepreweaninglethality
7
19
11
18
8
12
9
8
25
cellular
process
ENSBTAG00000011622
C18orf21
Mice
9
10
6
6
2
10
5
10
34
NC
ENSBTAG00000011683
NUMB
Cells
21
25
8
35
17
47
6
8
19
immunity
ENSBTAG00000011684
RAB11FIP1
Cells
6
01
8
09
11
02
1
4
NC
ENSBTAG00000012053
ACCSL
No
03
01
cellular
process
ENSBTAG00000012314
LDLR
Homozygousfamilial
hypercholesterolemia
Cells
8
33
11
21
17
24
2
15
2
cellular
process
123
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000012326
LOC101906218
Nomatch
11
01
01
09
01
NC
ENSBTAG00000012458
PSAPL1
Cells
01
NC
ENSBTAG00000012565
PCNX
No
3
32
32
33
48
developm
ent
ENSBTAG00000012833
ATG2B
No
4
21
21
21
29
NC
ENSBTAG00000012921
KIF24
Mice
05
07
03
08
02
202
214
metabolism
ENSBTAG00000013568
UEVLD
Cells
5
41
53
42
214
metabolism
ENSBTAG00000013685
KRT31
Mice
cellular
process
ENSBTAG00000013753
DDRGK1
Mice
22
32
18
56
55
25
14
22
37
NC
ENSBTAG00000013789
OR6K6
No
NC
ENSBTAG00000013838
ALLC
Mice
02
01
78
metabolism
ENSBTAG00000013916
HERC2
Mentalretardation
autosomalrecessive
38SKINHAIREYE
PIGM
ENTATION
VARIATIONIN1
Developm
entaldelay
withautismspectrum
disorderandgait
instability
No
8
74
83
74
65
metabolism
ENSBTAG00000014001
MAP1A
Cells
136
23
22
202
43
352
cellular
process
ENSBTAG00000014058
LDB2
No
28
33
12
825
410
3cellular
process
ENSBTAG00000014284
ALPK2
No
01
10
01
304
02
cellular
process
ENSBTAG00000014750
EPB41L4A
httpswwwmousephenotypeor
gdatagenesMGI103007secti
onassociations
Decreasedcirculatingglucoselevel
Abnormalconeelectrophysiology
Immunesystem
s1
22
20
07
12
13
43
NC
ENSBTAG00000014752
AKAP11
httpswwwmousephenotypeor
gdatagenesMGI2684060sect
ionassociations
Nopheno
24
72
63
306
68
cellular
process
ENSBTAG00000014768
ZNF786
Cells
3
23
42
48
33
cellular
process
ENSBTAG00000014794
SCYL3
Cells
6
10
59
613
516
26
cellular
process
ENSBTAG00000015027
Clusterin
httpswwwmousephenotypeor
gdatagenesMGI88423sectio
nassociations
Aggressionvertebralfusion
05
03
03
02
05
3NC
124
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000015210
LOXHD1
DEAFNESS
AUTOSOMAL
RECESSIVE77
Autosomalrecessive
nonsyndromic
sensorineuDominant
lateonsetFuchs
cornealdystrophy
deafnessautosomal
recessivetype77
No
3
03
NC
ENSBTAG00000015294
COX10
Leighsyndrome
Mitochondrial
complexIVdeficiency
(MTC4D)
Cells
4
6
24
5
2
3
14
3
16
cellular
process
ENSBTAG00000015335
BAI3
No
38
03
8
01
01
2
signalling
ENSBTAG00000015490
HS1BP3
Cells
1
10
5
9
4
8
2
1
37
cellular
process
ENSBTAG00000015602
C7H5orf45
Nomatch
10
cellular
process
ENSBTAG00000015839
MAP4
No
27
85
104
35
11
100
82
46
61
cellular
process
ENSBTAG00000015912
DMKN
No
01
04
03
NC
ENSBTAG00000015980
FASN
AutosomalRecessive
MentalRetardation
Cells
19
29
7
12
6
39
6
7
22
metabolism
ENSBTAG00000015985
GPR20
httpswwwmousephenotypeor
gdatagenesMGI2441803sect
ionassociations
Anatomicaldefectsvarioustissues
03
02
08
01
09
signalling
ENSBTAG00000016495
LINS
AutosomalRecessive
MentalRetardation
Cells
6
3
1
5
3
3
1
5
12
NC
ENSBTAG00000016691
LACC1
Mice
2
9
1
9
9
8
08
7
27
NC
ENSBTAG00000017096
ERICH1
Cells
6
4
2
4
2
5
2
5
13
metabolism
ENSBTAG00000017165
MATN2
No
2
7
2
20
08
3
2
2
7
cellular
process
ENSBTAG00000017418
RRP1B
Cells
6
4
4
5
3
7
3
12
159
cellular
process
ENSBTAG00000017436
ALS2CR11
Cells
01
37
disease
ENSBTAG00000017505
PAXIP1
Cells
2
3
2
4
2
4
2
3
12
signalling
ENSBTAG00000017576
GPR144
Nomatch
02
signalling
ENSBTAG00000018528
USP45
No
1
1
04
1
05
06
02
2
04
cellular
process
ENSBTAG00000018810
THBS2
INTERVERTEBRAL
DISCDISEASE
Cells
1
2
04
09
1
09
04
1
2
fertility
ENSBTAG00000019072
PSD4
Cells
7
8
1
1
7
02
6
4
cellular
process
125
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000019265
AGK
Sengerssyndrom
e
Congenitalcataract
hypertrophic
cardiomyopathy
mitochondrial
myopathy
Cells
7
36
43
52
65
metabolism
ENSBTAG00000019752
BPIFA2B
Nomatch
01
1metabolism
ENSBTAG00000019799
FCAM
R
Cells
07
01
02
01
2
7immunity
ENSBTAG00000019961
ATP4B
Cells
12
03
1
05
cellular
process
ENSBTAG00000020414
DHX37
No
3
33
73
73
413
cellular
process
ENSBTAG00000021073
KIAA1549
PILOCYTIC
ASTROCYTOM
ASOMATIC
httpswwwmousephenotypeor
gdatagenesMGI2669829sect
ionassociations
behaviourneurological
homeostasism
etabolism
hematopoieticsystem
3
04
23
01
21
04
3NC
ENSBTAG00000021233
OR5B17
No
fertility
ENSBTAG00000021375
OR10Q1
No
fertility
ENSBTAG00000021643
EFCAB12
Cells
01
1
9cellular
process
ENSBTAG00000021645
MBD4
Cells
8
43
82
51
10
6cellular
process
ENSBTAG00000022509
DNAH
9
No
02
5
05
5metabolism
ENSBTAG00000022858
LOC783561
Nomatch
fertility
ENSBTAG00000026247
PKHD1L1
Cells
02
01
cellular
process
ENSBTAG00000026323
LYSB
Nomatch
68
metabolism
ENSBTAG00000026350
SPATC1
Cells
05
06
01
01
01
39
NC
ENSBTAG00000027390
RTFDC1
Mice
28
19
21
20
922
13
20
49
NC
ENSBTAG00000031115
Uncharacterized
Nomatch
01
cellular
process
ENSBTAG00000031718
OGFR
Mice
6
17
612
517
39
1fertility
ENSBTAG00000032264
Uncharacterized
Nomatch
52
4
22
NC
ENSBTAG00000032481
DAPL1
No
1
6
07
01
13
10
cellular
process
126
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000032527
ERCC6
CEREBROOCULOFACI
OSKELETAL
SYND
ROME1
COCKAYNE
SYND
ROMEBDe
SanctisCacchione
syndromeMACULAR
DEGENERATION
AGERELATED5UV
SENSITIVE
SYND
ROME1COFS
syndromeCockayne
syndrometype1
Cockaynesyndrome
type2Cockayne
syndrometype3UV
sensitivesyndrome
Cockaynesyndrome
typeB(CSB)De
SanctisCacchione
syndrome(DSC)UV
sensitivesyndrome
(UVS)cerebrooculo
facioskeletal
syndrometype1
(COFS1)
Mice
3
205
22
207
42
cellular
process
ENSBTAG00000032821
SCEL
No
01
02
17
04
NC
ENSBTAG00000033954
PRPS1L1
No
35
metabolism
ENSBTAG00000034991
SLX4IP
Cells
6
21
32
207
33
NC
ENSBTAG00000035226
TOR1AIP1
Cells
9
12
513
816
716
83
NC
ENSBTAG00000035309
OR4F15
No
fertility
ENSBTAG00000035708
LOC527795
Nomatch
01
01
23
metabolism
ENSBTAG00000035985
LOC101909743
OR8K1
No
fertility
ENSBTAG00000036019
RIPPLY3
httpswwwmousephenotypeor
gdatagenesMGI2181192sect
ionassociations
hemopoieticsystem
01
04
01
9
01
malformation
ENSBTAG00000037399
Uncharacterized
Nomatch
2
92
4
24
05
5signalling
ENSBTAG00000038606
DBDR
No
02
01
signalling
ENSBTAG00000038831
PHYHD1
No
2
18
21
23
30
18
16
15
1metabolism
ENSBTAG00000039110
OR8U1
No
fertility
ENSBTAG00000039117
FAM179B
Cells
14
206
31
308
33
cellular
process
ENSBTAG00000039520
SIRPB1
Cells
4
08
32
516
05
12
01
signalling
127
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000040568
Uncharacterized
Nomatch
4
307
54
33
25
cellular
process
ENSBTAG00000045558
LOC100299084
Nomatch
1
22
110
10
09
52
fertility
ENSBTAG00000045588
LOC100298356
Nomatch
metabolism
ENSBTAG00000045709
LOC514057
Nomatch
fertility
ENSBTAG00000046175
Uncharacterized
Nomatch
01
01
cellular
process
ENSBTAG00000046228
Olfactory
receptor
Nomatch
fertility
ENSBTAG00000046239
C10orf12
No
03
104
07
108
06
209
cellular
process
ENSBTAG00000046285
OR8I2
No
fertility
ENSBTAG00000046360
Uncharacterized
Nomatch
NC
ENSBTAG00000046423
LOC618007
Nomatch
NC
ENSBTAG00000046527
LOC100139830
Nomatch
fertility
ENSBTAG00000046590
FAM181A
Cells
09
44
NC
ENSBTAG00000046593
Uncharacterized
Nomatch
20
metabolism
ENSBTAG00000046727
Olfactory
receptor
Nomatch
04
fertility
ENSBTAG00000047150
RTEL
Cells
4
23
64
55
616
cellular
process
ENSBTAG00000047166
SH2D7
Mice
03
101
01
1
02
02
2immunity
ENSBTAG00000047174
Uncharacterized
Nomatch
03
7105
02
01
271
05
03
NC
ENSBTAG00000047317
CL43
Nomatch
01
120
01
immunity
ENSBTAG00000047587
Uncharacterized
Nomatch
22
09
09
35
03
4
06
2cellular
process
ENSBTAG00000047661
FABP12
httpswwwmousephenotypeor
gdatagenesMGI1922747sect
ionassociations
Growthsizebody
01
02
69
metabolism
ENSBTAG00000048021
PNMAL2
No
04
03
2
03
03
02
NC
ENSBTAG00000048153
Uncharacterized
Nomatch
2
05
01
02
01
cellular
process
128
Discussion(
Deleterious recessive variants arenaturally present in outbred animal species
Theydonotcreateproblemsaslongastheyremainatlowfrequencyinrandom
mating populations Under these conditions these variants remain in the
heterozygousstateanddonotproducephenotypiceffectsHoweverassoonas
their frequency increases andmatingoccursbetween related individuals their
deleteriouseffectsbecomeapparentandhighlyaffectthewelfareandviabilityof
animalscarryingtheminthehomozygousstate
In industrialdairy cattlebreedscrossbreeding isgenerallyavoidedandbreeds
areinpracticegeneticisolatesinwhichartificialinsemination(AI)iswidelyused
for rapidly spreading thegeneticprogress in thepopulationAsa consequence
elitesireshavemanythousandandsometimeshundredsofthousanddaughters
Under these conditions mating between relatives becomes practically
unavoidable in the longterm In fact inspiteof theclosecontrolof inbreeding
andofthematingsystemgeneticdefectsregularlyappeartothesurfaceSome
examples are the Complex Vertebral Malformation (CVM) and the Bovine
Leucocyte Adhesion Deficiency (BLAD) in Holstein (Schuumltz et al 2008) and
WeaverdiseaseinBrown(McClureetal2013)Inthesecasestheeffectsofthe
recessivedeleteriousallelesareclearandmoleculardiagnostictoolshavebeen
developedtoavoidtheirfurtherspreadingandtoinitiatetheeradicationprocess
However recessive deleterious alleles may produce their effects in a more
deceitful way for example by affecting fertility inducing early lethality or
influence disease susceptibility All these effects are difficult to trackwith the
current methods of phenotype measurements but have an impact on the
economyofthelivestocksectorandonanimalwelfare
InthispaperwesoughtfordeleteriousrecessiveallelesinItalianHolsteincattle
This breed is recently experiencing a decrease in the genetic trend of fertility
traits (Pryce et al 2014) therefore we started our search from the exome
sequences of 20 sires 19 of which having extreme EBV values for male and
femalefertility
130
The choiceof sequencingexomeswasa compromisebetween completenessof
informationaccuracyofvariantandgenotypedetectionandcostExomesclearly
give incomplete information since they miss all variation in regulatory sites
outside gene boundaries (ENCODE Consortium 2012) These may be very
relevantforthecontroloftraitvariationbuttheyarestillverypoorlyannotated
in livestock On the other hand sequencing exomes yields millions instead of
billionsbasepairsadeepersequencingoftargetregionsandaconsequentmore
reliablevariantandgenotypecalling(FigureS1andS2)Theanalysisofexome
data is also less demanding in terms of computer time (thousands instead of
millions variants are to be analysed) and easier in terms of biological
interpretation Exome sequencing revealed to be very effective in the
identificationofanumberofmutationsaffectinghumanhealth(Bamshadetal
2011 Yang et al 2013) All thesewere defects caused bymutations in single
genes producing clear phenotypic effects havingMendelian inheritance These
conditionsdonotholdinourinvestigationthereforewehadtodevisestrategies
to integrate exome data with other information collected from the Holstein
populationandotherspecies
Theoverallprocedureforidentifythecandidatedeleteriousrecessivewasbased
on the rationale that deleterious recessive alleles in coding sequences are
expectedtobei)causedbyseverealterationoftheproteinsencodedbygenes
ii)mostlyifnotexclusivelycarriedattheheterozygousstateinthepopulation
iii)unevenlydistributedinanimalsofhighandlowfertilityiv)likelyconserved
indifferentspeciesbecauseofselectionconstraintv)likelyassociatedtogenetic
defectsinanyspecieswhenseverelydamaged
A stepwise procedure was set up for reducing candidates to a number
manageable for functional analysis (Figure1) In the first stepmutationswere
discoveredinthesecondfilteredandinthirdassessed
Thisprocedureidentifiesonlysomeofthedeleteriousrecessivesexistinginthe
populationanalysedSincewesequencedonlyexomesandnotwholegenomes
ouranalysesislimitedtocodingsequencesandatthemomenttoSNPvariation
Wethereforemissregulatoryvariationsoutsidegenesand theeffectsof indels
andmorecomplexvariations
131
ThenumberofanimalssequencedislimitedDeleteriousrecessivesareexpected
toberareinthepopulationandwehavesampledonly20animalsfromasubset
ofsiresauthorizedtoAI inItalyAlthoughwehavetakenamongthemthose in
theminus tail of fertility trait thatmighthave ahigher genetic load itrsquos likely
thatwehavemissedothervariantsexistinginthepopulationOntheotherhand
variantscarriedbyAIauthorizedbullsaretheonesthatwillhavealargerimpact
onthepopulationifwidelyspreadbyelitebullsandthereforetheyarethemost
interestingfromabreedingpointofview
Most deleterious variants are expected to be very rare and therefore not
detected by the approach here adopted or detected and afterwards discarded
duringthefilteringprocesses
Sequencing is prone to errors Some real variants may have been discarded
duringQCofsequencedataduetopoorsequencequalityorpoorcoverageofthe
region
Someofthevariationsclassifiedasldquonondeleteriousrdquobythesoftwareusedmay
indeedhave an effect In fact synonymousmutation arenon`neutralwhenever
thedifferenttRNAsforasameaminoacidhavedifferentrelativeabundanceand
translationrates(Goymer2007)
1(Searching(candidate(deleterious(variants(in(exomes(
ExomesproducedafairamountofdataAlltogether5355Mbweresequenced
These represent 9998 of the exome sequences targeted by probes Among
theseonly35hadasequencingdepthacrossanimalslt60XSequencingdepth
is a key factor for variant detection and even more so when the genotype of
sequencedanimalsistobeinferredInvestigatingtheconcordancebetweenthe
genotypes detected at more than 20K SNP by both Beadchip and exome
sequencingwe observed a clear inverse relationship between sequence depth
andnumberofdiscordances(TableS3andFigureS3)Theaveragesequencing
depth expected (50X) and realised in practice (4058) in this investigation
yieldedratheraccurategenotypessincetheaverageconcordanceobservedwas
over97and increased toover99whentwooutlieranimalswereremoved
132
from the dataset The reason for the outlier behaviour of these samples waslikely a suboptimal quality of DNA submitted to sequencing Overall almost215000 variants passed quality controls and among them 192440 autosomalSNPs have been investigated in detail As found in other investigations mostvariants were rare (Peloso et al 2014) For example the MAF distributionobservedin46904SNPsclassifiedneutralhasameanof0197andamedianof0167values(FigureS6)
ThefirststepinourresearchwastoidentifySNPscausingseverealterationsinthe proteins coded by genes These were relevant changes in conformationalterationsofsplicingsitesortruncationsofproteinsToaccomplishthistaskwereliedontwosoftwareSIFTandANNOVARthatidentified4902suchvariationsin3423genes Interestingly the subsetofdeleteriousvariantshas significantlylowermean (0156)andmedian (010) compared toneutral variations (FigureS6) This subset is therefore at least enriched in variants under purifyingselectionhavingadeleteriouseffectatthepopulationlevel
Thenumberofdeleteriousallelesdetectedishighanditisunlikelytheyallhavea relevant phenotypic effect even if most variants are rare In fact variantsdeleterious for a protein are not automatically deleterious for an organism asthe protein change induced by the variant although relevant may not affectproteinfunctionvariationmayoccurinonememberofagenefamilysothatitsfunction may be compensated by other family members the protein functionmay be compensated by other proteins or the affected metabolic pathwaycompensated by alternative pathways The challenge was therefore to filter asubsetofvariantslikelyhavingaphenotypiceffecttobeproposedforvalidationatalargerscaleToaccomplishthistaskwesoughtforadditionalinformationinexomesequencesandinalargepopulationofHolsteinsiresgenotypedwiththebovineHDBeadchip
( (
133
2(Filtering(deleterious(variants(
The second step consisted in filtering the several thousand deleterious SNPsidentified in exomes to search for SNPs and genes that worth deeperinvestigationWeusedtwostrategiestoaccomplishthistask
21$ Distribution$ of$ deleterious$ variants$ in$ fertility$ EBV$ plus9variant$ and$
minus9variant$animals$
Fertility traits are controlled by a high number of genes and are highlygenetically heterogeneous Even using a selected subset of variants identifiedfrom exome sequencing classic association analyses would need sample sizesmuch larger that the one available in this investigation to have a reasonablestatistical power (Stitziel et al 2011) The first approachwe appliedwas theanalysisof SNPallelic andgenotypicdistributionbetween theminusandplus`variant tails of male and female fertility EBVs Given the very low number ofanimals sequencedwedecided to use a simplemodel to exclude from furtheranalyses all markers showing no evidence of association with the EBV tailsratherthantofindsignificantassociations
We divided samples in cases (minus`variant animals for fertility EBV) andcontrols (plus`variant animals) and tested all possible models of inheritanceimplemented in thePlinkpackage for tackling simple genetic defectsWe thenused an empirical point`wise significant threshold based on permutations tohighlight SNPs having alleles or genotypes showing an uneven distributionbetweencasesandcontrolsandconsideredtheseasworthfurtherinvestigationAtotalof112SNPsin101genespassedthisfilteringprocess
$
22$Detection$of$regions$carrying$deleterious$mutation$from$800K$SNP$data
The analysis of exomes alone was therefore insufficient to tag recessivedeleteriousgenesinItalianHolsteinandwesoughtforadditionalinformationin1009ItalianHolsteinbullsgenotypedathighdensitythatarepartofthegenomicselection program run by ANAFI The rationale of the approach is that
134
deleterious recessives are expected to be absent or rare in the homozygous
status in the population Hence they may be tagged searching for genome
regions having markers or haplotypes lacking or having significantly less
homozygotesthanexpectedunderaneutralmodel
We used two complementary approaches to identify these candidate regions
One(OHW)searchedforverystrongsignalsproducedbysinglemarkersoutof
Hardy`Weinberg equilibrium at Ple847x10`8 A minimum of three significant
markersinaslidingwindowsof10wassettoreachsignificancetocapturemore
reliablesignalsavoidexcessivesinglemarkernoiseandaccount forapossible
Wahlund effect due to presence of the genetic structure in the population
difficulttoavoidinthesetofprogenytestedbullsanalysedThesecondapproach
(LOH) captured more moderate signals but from multiple markers in larger
regions Both approaches were designed to detect signals in regions in which
deleteriousvariantsarestillinsegregationandarenotexcessivelyrare
Asexpectedthetwoapproachesidentifiedregionsthatonlypartiallyoverlapped
(TablesS5S6eS7)Inparticular5regionsoutofthe11identifiedbyOHWwere
in common to the 240 identified by LOH Two of these regions overlap with
previous investigations that used the BovineSNP50 Beadchip Sahana et al
(2013) found a region candidate to carrypotential lethal haplotypes inNordic
Holstein on BTA7 in the region 6708007`10907840 This interval overlaps
withbothBTA7regionsdetectedinthisstudyVanRadenetal(2011)detected
inUSHolsteinalargeregiononBTA15spanning76Mb`82M(onBTA30)that
includedLRP4 gene responsible for themulefootdefectOuranalyseswith the
HDBeadchipindicatesontheonesidethatthelargeregiondetectedbySahana
etal(2013)mayconsistintwonearbyregionsOntheothersidethesignalwe
detectedonBTA15doesnot includetheLRP4 locusthat iscompletely fixed in
theItalianbullspopulationanalysed
In total83mutations in66genes fell in eitherorbothOHWandLOHregions
These were joined to the SNPs output of the EBV tail analyses for further
assessment
(
135
3(Assessing(filtered(deleterious(variants(
Different characteristics of the filtered deleterious variants were assessed to
identify among them strong candidates to submit to a large`scale validation
studyTheassessmentphasepermittedtogainadditionalinformationinfavour
or against SNP and gene candidature to be deleterious We quantified this
evidenceusingasubjectivescore
31$Comparative$genomics$analysis$
In the case of comparative genomics the score quantified the level of
conservationof thealternativeaminoacidresidues inducedbySNPvariants in
ItalianHolsteincattleComparisonwasrunagainstaset100vertebratespecies
that included 62 mammals The level of conservation is a good proxy for the
expected severity of a mutation In the case of the 46 SNPs that induced the
change of an amino acid highly conserved in the species investigatedwemay
hypothesize a strong selection pressure in favour of the maintenance or the
invariant amino acid at that position Therefore the amino acid change or the
stopcodonweobserved incattlemayaffectnotonlytheproteinstructureand
function but also the phenotype of homozygous animals Interestingly in all
thesecasestheSNPallelecodingforthenonconservedvariantswastheminor
allele
Converselyinallcasesofhighvariabilityofthetargetaminoacidacrossspecies
(asmanyassevendifferentaminoacidshavesometimesbeenobserved)poorly
supports the presence of phenotypic effects due to its change Sometimes the
informationcollectedisagainstthepresenceofadeleteriousmutationasinthe
case of the presence in different species of both alleles identified by exome
sequencingThis suggests thatneither of these is greatly affecting the viability
andfitnessofanimalscarryingthem
Wealsoobservedanumberoffailures(N=45)intheliftoverofSNPcoordinates
fromcattletohumanorinproteinalignmentThishappenedforexampleinthe
caseofuncharacterized cattleproteins thathavenoorthologous in thehuman
genome and of someolfactory receptors that belong to a highly variable gene
136
family Even if the failure of alignment is an indication of poor proteinconservationacrossspeciestheconservationofthetargetaminoacidcouldnotbeassessesandinthesecaseswepreferredtoconsidertheinformationmissingandtoscoreitaccordingly
32$Haplotype$analysis$
Asecondassessmentofthefilteredcandidateswasontheirbehaviourinthesirepopulation The ideal test would be the direct genotyping of the SNPs in thepopulationand theevaluationof theirallelicandgenotypicproportions In theabsenceofthisinformationwefirstinferredtheHDhaplotypeassociatedtoSNPallelesandassessedthebehaviourofthehaplotypeinthepopulationassumingthisasasurrogateoftheSNPbehaviourSincedeleteriousallelesareexpectedtoberareandthis isalsowhatweobserved fromcomparativegenomicanalysesweinvestigatethefrequencyonlyofthehaplotypecarryingtheminorSNPalleleagainsttheexpectationto findthecorrespondinghaplotypeneverhomozygousor a number of times much lower than expected from its frequency in thepopulation The strategy yielded some interesting confirmation but was onlypartiallysuccessful
Firstofalltheidentificationofauniquehaplotypecarryingtherarevariantwasnot always possible or straightforward In some cases carriers of a same rarevariantdidnotshareexactlythesamehaplotypeWeacceptedsomevariationifthe haplotype remained clearly distinguishable from those of animals notcarryingtherarevariantbutatthesametimereducedthestringencyofthetestInquiteanumberofcasesasamehaplotypecarriedbothcandidatedeleteriousSNP variants This may occur for example in the case of recent mutations IntheselastcasesthetestcouldnotbecarriedoutFinallymostrarevariantswereassociatedtorarehaplotypesthatarenotlikelytohavehomozygousindividualsinthepopulationindependentlyoftheSNPeffect
Thisassessmentwas thereforeonly significant fora fewhaplotypes thathadalarge number of heterozygous individuals and no or only a very fewhomozygotesandforthosethathadalargenumberofhomozygotes
137
33$Functional$analysis$
The final assessment was the analysis of the function of genes carrying the
candidate deleterious SNP This has some limitation since knowledge of gene
function is only partial and mainly gained from investigations on species
different from cattle In any case we considered the association to genetic
defectsinhumanandtheinductionofseverephenotypesinmouseknockoutas
a positive indication of the likely phenotypic effect of the severe protein
alterationweobservedincattleInadditiongenefunctionandexpressionprofile
was evaluated in search for evidence of gene involvement in fertility
developmentorprocessesfundamentalforcellandorganismviability
Amonggenesidentified22areassociatedtohumansyndromes(Table4)Eleven
syndromesareneurologicaldiseasesaffectingbraindevelopmentand inducing
mental retardation (eg Donnay`Barrow Syndrome and Autosomal recessive
primary microcephaly) the others influence development (eg
Cerebrooculofacioskeletal Syndrome 1 and Adams`Oliver syndrome) or organ
function(egCongenitalpulmonaryalveolarproteinosis)
Most of the same defects translated into Holstein would induce physical and
behavioural anomalies likely causingyoungbulls tobe excluded fromprogeny
testing Itshould in factbeconsideredthattheanimalsweanalysedareonthe
onesidethosethatwillmostinfluencethegeneticmakeoffuturegenerationbut
are not a random sample of Holstein animals These bulls have been progeny
testedandauthorisedtobeusedinartificialinseminationinItalyCandidatesto
progenytestingderivefromthematingofthebest1siresandbest2damsof
the Italian Holstein population At the age of 4 months they are taken to the
Italian Holstein breeder association (ANAFI) test station where they are
subjected to performance test sanitary controls and behavioural tests
Thereafterthequalityoftheirsemenischeckedbeforeauthorizationisgranted
toenterinprogenytestInthissampleofanimalsitisthereforeexpectedtofind
never at the homozygous state mutations causing lethality physical
malformation abnormal behaviour and semen infertility Also those having a
138
majoreffectonsemenqualityandhighsusceptibilitytodiseasesareexpectedto
berarelycarriedatthehomozygousstatus
Association with human syndromes is a strong evidence of the potential
phenotypiceffectofmutationsalteringthegenestructureHoweveritshouldbe
considered that variants found in humans are different fromvariants found in
cattle and that the severity of the phenotype induced by a deleterious gene is
often variable even within species Therefore we evaluated this information
togetherwiththeotherindirectevidencesofvariantstobedeleterious
WealsoenquiredthewebsiteoftheInternationalMousePhenotypeConsortium
(httpswwwmousephenotypeorg) to search for the effects of null alleles in
mice The few genes forwhichnull`mice line is available are involved in basic
functionsfortheorganismthatspanfromvision(IAH1andEPB41L4A)toaging
(RGP1)andtissuesdevelopment(FABP12RIPPLY3RGNEF)orfertility(CFAP96)
Aswithgenesassociatedtohumansyndromesmostoftheabnormalphenotypes
observedinknockmousewouldhamperyoungbullstoenterprogenytesting
Gene Ontology (GO) was evaluated on the entire gene set No enrichment on
particularcategoriesGOtermswasobservedThemostrepresentedcategories
in Biological Processes were as expected ldquometabolic processrdquo and ldquocellular
processrdquo however lower level terms ldquoreproductionrdquo and ldquodevelopmental
processrdquo are represented with meaningful hits The analysis of Molecular
Function indicates thatmany of the hits are enzymes (ldquocatalytic activityrdquo) and
haveregulatoryroles(ldquobindingrdquoandldquoenzymeregulatoractivityrdquo)Thisresultcan
be interpreted indifferentwaysOn theone sidedeleterious allelesmay exert
their deleterious effect in a number of different cellular and developmental
processes the correct process is one while there is many options to make it
wrongAlsothenumberofgenesinvestigatedisrelativelylowandformanyof
them there is little or no information available (eg 11 genes code for a
completelyuncharacterizedprotein)Inadditionsomeofthemmayturnoutnot
tobereallydeleteriousandaddnoisetotheGOanalysis
As a final step all the previous information (GO expression patterns known
function)havebeenused to estimate a scoremeasuring the importanceof the
gene function forcattle survivalandreproductionThescoringof functionwas
139
particularlydifficultGOcategoriesweregeneralandallfunctionstaggedhavearelevant importance for animal survival andproductionTherefore in this casewedidnrsquotusethewholescorerangeplanned(from`1to1)andgavenonegativescoresApositivescoreof1wasattributedonly togenesplayingan importantroleinfertilityanddevelopment
$
34$Strong$candidate$deleterious$variants$
Thescoringprocessyielded21variantswithatotalscore=035variantswithpositive and 135 with negative values (Figure 2 and Table 5) Although theprocesswasnoterrorfreeasitreliedonincompleteinformationandhadsomedegreeofsubjectivityparticularlyinthechoiceofrelativeweightsitidentifiedaset of variants that may be considered strong candidates to have deleteriouseffects in the Italian Holstein breed Variants were evaluated individually andthose in a same genes had sometimes different scores For example the fivevariantscarriedbyALPK2hadscoresrangingfrom`7to`2
140
Figure2Graphicalrepresentationoftotalscoreforeachvariants
minus505
Variation
Score
1_645923001_645985561_1381698111_1464490851_150972144
2_7478962_270362092_270455392_375933832_555609862_904645543_75418123_110403993_110404003_188945763_1137877703_1203356984_53767714_265405974_265406204_749514994_1033779064_1057246734_1130233264_1178417155_445557055_445557425_757447955_940308955_940309556_207762566_382803486_863518016_927092906_1075769256_1078851146_1090916426_1166055226_1186534167_13339757_56553577_97149147_167941797_168358027_168621947_197404097_197409057_263293538_602606098_603324848_658345648_704101558_760297748_770028908_872798628_1117618168_1128416959_83888819_83889249_237928389_510133929_1047396559_10515702910_171251510_1584300310_2802235210_2802311010_8290312110_8517371111_4630576711_4675390311_4740957711_5979141011_7832201011_8794387511_9548610911_9945389411_9946100112_1190618112_1247843312_1248084512_1413689712_5304908112_9076120913_381566413_1178793413_1178793813_5247168413_5393493013_5393494713_5393985313_5454042513_5504774013_5998820513_6304575013_6316961013_6539670814_197749414_364069514_3556894514_4688575014_5718402214_6863536315_18267915_18331715_18340415_248692
15_3018920915_3504878815_4630194115_7503314015_7503764515_8024448915_8025571015_8028875915_8038974515_8048559315_8055593115_8094968915_8275455615_8292533216_456201716_3831963616_6257435517_5309539517_5311265517_6609153117_7237753218_2562355018_3495651318_4637204118_5215480618_5408127718_5413095518_5781976819_2144410319_3107215219_3111397219_3279084319_4225178519_5139519419_5619117519_5726625120_763980120_2341043120_5888423620_6417173921_623911921_2034587821_2034628921_2683773321_3100359921_5527419521_5582646321_5817338421_5912071921_6048248421_6048251021_6284314222_5241595922_5242173422_5242204622_5242610422_5242666822_5695165722_5697484722_5779767423_4588505224_2132460424_2145336424_4676490924_5032719524_5815817824_5815862024_5815878424_5820400324_5820447925_3743318026_1811648726_2571599626_2576457027_503107827_3283255728_28422828_169040328_3571880728_4408176529_198588129_753052329_753066329_26397931
141
Table5SingleanalysesscoreandtotalforeachcandidatevariantsChrchrom
osom
ePosphysicalposition
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
164592300
ENSBTAG00000009014
UPK1B
02
I1
01
2
164598556
ENSBTAG00000009014
UPK1B
0I1
I1
00
I2
1138169811
ENSBTAG00000002966
DNAJC13
03
10
15
1146449085
ENSBTAG00000017418
RRP1B
I3
I1
I1
00
I5
1150972144
ENSBTAG00000036019
RIPPLY3
I3
I1
I1
I1
1I5
2747896
ENSBTAG00000013916
HERC2
30
10
04
227036209
ENSBTAG00000004555
LRP2
I3
I1
10
0I3
227045539
ENSBTAG00000004555
LRP2
I3
31
00
1
237593383
ENSBTAG00000032481
DAPL1
00
I1
00
I1
255560986
ENSBTAG00000032264
Uncharacterized
3I3
I1
00
I1
290464554
ENSBTAG00000017436
ALS2CR11
0I1
I1
00
I2
37541812
ENSBTAG00000046175
Uncharacterized
00
00
00
311040399
ENSBTAG00000013789
OR6K6
I3
I3
I1
00
I7
311040400
ENSBTAG00000013789
OR6K6
I3
I3
I1
00
I7
318894576
ENSBTAG00000004772
THEM
40
I3
I1
00
I4
3113787770
ENSBTAG00000000149
USP40
11
I1
01
2
3120335698
ENSBTAG00000002809
CAPN10
I3
21
00
0
45376771
ENSBTAG00000001500
FIGNL1
01
I1
00
0
426540597
ENSBTAG00000033954
PRPS1L1
1I3
I1
00
I3
426540620
ENSBTAG00000033954
PRPS1L1
1I1
I1
00
I1
474951499
ENSBTAG00000003508
CFAP69
I3
I3
I1
10
I6
4103377906
ENSBTAG00000021073
KIAA1549
0I1
11
01
4105724673
ENSBTAG00000019265
AGK
01
10
02
142
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
4113023326
ENSBTAG00000014768
ZNF786
I3
I3
I1
00
I7
4117841715
ENSBTAG00000017505
PAXIP1
0I1
I1
00
I2
544555705
ENSBTAG00000026323
LYSB
I3
3I1
00
I1
544555742
ENSBTAG00000026323
LYSB
I3
0I1
00
I4
575744795
ENSBTAG00000009064
CSF2RB
I3
I3
10
0I5
594030895
ENSBTAG00000009444
SLC15A5
I3
I1
I1
00
I5
594030955
ENSBTAG00000009444
SLC15A5
I3
I3
I1
00
I7
620776256
ENSBTAG00000008797
Uncharacterized
I3
I3
00
0I6
638280348
ENSBTAG00000007568
MEPE
0I3
I1
00
I4
686351801
ENSBTAG00000003523
UGT2B15
1I1
I1
00
I1
692709290
ENSBTAG00000010954
ART3
I3
2I1
00
I2
6107576925
ENSBTAG00000038606
DBDR
0I3
I1
00
I4
6107885114
ENSBTAG00000001505
GRK4
I3
1I1
00
I3
6109091642
ENSBTAG00000007001
SLC26A1
1I3
I1
00
I3
6116605522
ENSBTAG00000014058
LDB2
01
I1
00
0
6118653416
ENSBTAG00000012458
PSAPL1
I3
I1
I1
00
I5
71333975
ENSBTAG00000015602
C7H5orf45
I3
I3
I1
00
I7
75655357
ENSBTAG00000045588
LOC100298356
1I3
I1
00
I3
79714914
ENSBTAG00000046423
LOC618007
00
I1
00
I1
716794179
ENSBTAG00000012314
LDLR
02
10
03
716835802
ENSBTAG00000009568
KANK2
01
I1
00
0
716862194
ENSBTAG00000009569
DOCK6
01
10
02
719740409
ENSBTAG00000000414
FUT5
I3
I3
I1
00
I7
719740905
ENSBTAG00000000414
FUT5
I3
I3
I1
00
I7
143
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
726329353
ENSBTAG00000004860
SLC27A6
0I1
I1
00
I2
860260609
ENSBTAG00000011420
CA9
I3
I1
I1
00
I5
860332484
ENSBTAG00000011433
RGP1
I3
0I1
10
I3
865834564
ENSBTAG00000035708
LOC527795
00
I1
00
I1
870410155
ENSBTAG00000005466
C8H8orf58
I3
I3
I1
00
I7
876029774
ENSBTAG00000015027
Clusterin
00
I1
10
0
877002890
ENSBTAG00000012921
KIF24
12
I1
00
2
887279862
ENSBTAG00000002220
SPTLC1
00
10
01
8111761816
ENSBTAG00000006532
CDK5RAP2
I3
I3
11
1I3
8112841695
ENSBTAG00000013838
ALLC
01
I1
00
0
98388881
ENSBTAG00000015335
BAI3
00
I1
00
I1
98388924
ENSBTAG00000015335
BAI3
00
I1
00
I1
923792838
ENSBTAG00000046593
Uncharacterized
03
00
03
951013392
ENSBTAG00000018528
USP45
02
I1
00
1
9104739655
ENSBTAG00000018810
THBS2
03
10
15
9105157029
ENSBTAG00000000918
ERMARD
I3
21
00
0
10
1712515
ENSBTAG00000014750
EPB41L4A
0I1
I1
I1
0I3
10
15843003
ENSBTAG00000009030
NOX5
I3
I3
I1
00
I7
10
28022352
ENSBTAG00000035309
OR4F15
I3
1I1
00
I3
10
28023110
ENSBTAG00000035309
OR4F15
I3
I1
I1
00
I5
10
82903121
ENSBTAG00000012565
PCNX
I3
2I1
01
I1
10
85173711
ENSBTAG00000011683
NUM
BI3
0I1
00
I4
11
46305767
ENSBTAG00000002288
NT5DC4
0I3
I1
00
I4
11
46753903
ENSBTAG00000019072
PSD4
0I1
I1
00
I2
144
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
11
47409577
ENSBTAG00000009033
TEX37
I3
I2
I1
10
I5
11
59791410
ENSBTAG00000006027
USP34
02
I1
01
211
78322010
ENSBTAG00000015490
HS1BP3
00
I1
00
I1
11
87943875
ENSBTAG00000001140
IAH1
I3
3I1
10
011
95486109
ENSBTAG00000017576
GPR144
I3
I3
I1
00
I7
11
99453894
ENSBTAG00000038831
PHYHD1
11
I1
00
111
99461001
ENSBTAG00000010601
DOLK
12
10
04
12
11906181
ENSBTAG00000001133
VWA8
I3
1I1
00
I3
12
12478433
ENSBTAG00000014752
AKAP11
1I3
I1
I1
0I4
12
12480845
ENSBTAG00000014752
AKAP11
1I1
I1
I1
0I2
12
14136897
ENSBTAG00000016691
LACC1
I3
I3
I1
00
I7
12
53049081
ENSBTAG00000032821
SCEL
I3
I3
I1
00
I7
12
90761209
ENSBTAG00000019961
ATP4B
I3
I3
I1
00
I7
13
3815664
ENSBTAG00000034991
SLX4IP
00
I1
00
I1
13
11787934
ENSBTAG00000008650
CAMK
1D
I3
0I1
00
I4
13
11787938
ENSBTAG00000008650
CAMK
1D
I3
0I1
00
I4
13
52471684
ENSBTAG00000013753
DDRGK1
02
I1
00
113
53934930
ENSBTAG00000039520
SIRPB1
0I3
I1
00
I4
13
53934947
ENSBTAG00000039520
SIRPB1
00
I1
00
I1
13
53939853
ENSBTAG00000039520
SIRPB1
0I3
I1
00
I4
13
54540425
ENSBTAG00000047150
RTEL
I3
0I1
00
I4
13
55047740
ENSBTAG00000031718
OGFR
01
I1
00
013
59988205
ENSBTAG00000027390
RTFDC1
01
I1
00
013
63045750
ENSBTAG00000009144
BPIFA2A
0I3
I1
00
I4
145
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
13
63169610
ENSBTAG00000019752
BPIFA2B
0I1
I1
00
I2
13
65396708
ENSBTAG00000006021
CEP250
0I3
I1
00
I4
14
1977494
ENSBTAG00000026350
SPATC1
I3
I3
I1
00
I7
14
3640695
ENSBTAG00000015985
GPR20
00
I1
00
I1
14
35568945
ENSBTAG00000037399
Uncharacterized
30
I1
00
2
14
46885750
ENSBTAG00000047661
FABP12
0I1
I1
00
I2
14
57184022
ENSBTAG00000026247
PKHD1L1
I3
3I1
00
I1
14
68635363
ENSBTAG00000017165
MATN2
01
I1
00
0
15
182679
ENSBTAG00000046228
Olfactoryreceptor
00
I1
0o
I1
15
183317
ENSBTAG00000046228
Olfactoryreceptor
00
I1
00
I1
15
183404
ENSBTAG00000046228
Olfactoryreceptor
00
I1
00
I1
15
248692
ENSBTAG00000045558
LOC100299084
00
I1
0o
I1
15
30189209
ENSBTAG00000005357
VPS11
02
I1
00
1
15
35048788
ENSBTAG00000005340
SERGEF
0I3
I1
00
I4
15
46301941
ENSBTAG00000002289
NLRP14
I3
I3
I1
00
I7
15
75033140
ENSBTAG00000012053
ACCSL
1I3
I1
00
I3
15
75037645
ENSBTAG00000012053
ACCSL
I3
I3
I1
00
I7
15
80244489
ENSBTAG00000046527
LOC100139830
I3
0I1
00
I4
15
80255710
ENSBTAG00000046285
OR8I2
10
I1
00
0
15
80288759
ENSBTAG00000001291
OR8H3
1I1
I1
00
I1
15
80389745
ENSBTAG00000045709
LOC514057
13
I1
00
3
15
80485593
ENSBTAG00000035985
LOC101909743OR8K1
1I1
I1
00
I1
15
80555931
ENSBTAG00000022858
LOC783561
I3
0I1
00
I4
15
80949689
ENSBTAG00000039110
OR8U1
1I3
I1
00
I3
146
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
15
82754556
ENSBTAG00000021375
OR10Q1
01
I1
00
015
82925332
ENSBTAG00000021233
OR5B17
I3
0I1
00
I4
16
4562017
ENSBTAG00000019799
FCAM
RI3
I3
I1
00
I7
16
38319636
ENSBTAG00000014794
SCYL3
I3
0I1
00
I4
16
62574355
ENSBTAG00000035226
TOR1AIP1
00
I1
00
I1
17
53095395
ENSBTAG00000020414
DHX37
1I1
I1
00
I1
17
53112655
ENSBTAG00000020414
DHX37
0I3
I1
00
I4
17
66091531
ENSBTAG00000010847
FOXN4
0I1
I1
00
I2
17
72377532
ENSBTAG00000008996
SFI1
0I3
I1
00
I4
18
25623550
ENSBTAG00000005170
GPR114
0I3
I1
01
I3
18
34956513
ENSBTAG00000001286
ELMO
33
3I1
11
718
46372041
ENSBTAG00000015912
DMKN
0
1I1
00
018
52154806
ENSBTAG00000001262
IRGQ
I3
1I1
00
I3
18
54081277
ENSBTAG00000009337
PNMA
L1
0I3
I1
00
I4
18
54130955
ENSBTAG00000048021
PNMA
L2
I3
1I1
00
I3
18
57819768
ENSBTAG00000003231
LIM2
1
I3
10
0I1
19
21444103
ENSBTAG00000011011
SSH2
I3
I3
I1
00
I7
19
31072152
ENSBTAG00000022509
DNAH
9I3
0I1
00
I4
19
31113972
ENSBTAG00000022509
DNAH
90
0I1
00
I1
19
32790843
ENSBTAG00000015294
COX10
12
10
04
19
42251785
ENSBTAG00000013685
KRT31
I3
0I1
00
I4
19
51395194
ENSBTAG00000015980
FASN
1I1
10
01
19
56191175
ENSBTAG00000007910
EXOC7
3I3
I1
00
I1
19
57266251
ENSBTAG00000011387
NAT9
12
I1
00
2
147
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
20
7639801
ENSBTAG00000005633
RGNEF
1I1
I1
10
0
20
23410431
ENSBTAG00000008871
DDX4
1I3
I1
00
I3
20
58884236
ENSBTAG00000005514
TRIO
I3
0I1
00
I4
20
64171739
ENSBTAG00000005021
SEMA5A
01
10
02
21
6239119
ENSBTAG00000016495
LINS
02
10
03
21
20345878
ENSBTAG00000012326
LOC101906218
00
I1
00
I1
21
20346289
ENSBTAG00000012326
LOC101906218
I3
0I1
00
I4
21
26837733
ENSBTAG00000047587
Uncharacterized
10
00
01
21
31003599
ENSBTAG00000047166
SH2D7
0I3
I1
00
I4
21
55274195
ENSBTAG00000039117
FAM179B
0I3
I1
00
I4
21
55826463
ENSBTAG00000014001
MAP1A
0I3
I1
00
I4
21
58173384
ENSBTAG00000009836
CHGA
I3
I1
I1
00
I5
21
59120719
ENSBTAG00000046590
FAM181A
01
I1
00
0
21
60482484
ENSBTAG00000017096
ERICH1
0I3
I1
00
I4
21
60482510
ENSBTAG00000017096
ERICH1
01
I1
00
0
21
62843142
ENSBTAG00000012833
ATG2B
0I3
I1
00
I4
22
52415959
ENSBTAG00000015839
MAP4
00
I1
00
I1
22
52421734
ENSBTAG00000015839
MAP4
1I1
I1
00
I1
22
52422046
ENSBTAG00000015839
MAP4
I3
I3
I1
00
I7
22
52426104
ENSBTAG00000047174
Uncharacterized
I3
00
00
I3
22
52426668
ENSBTAG00000047174
Uncharacterized
10
00
01
22
56951657
ENSBTAG00000021645
MBD4
1I1
I1
00
I1
22
56974847
ENSBTAG00000021643
EFCAB12
0I1
I1
00
I2
22
57797674
ENSBTAG00000031115
Uncharacterized
01
00
01
148
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
23
45885052
ENSBTAG00000005248
OFCC1
00
I1
00
I1
24
21324604
ENSBTAG00000000998
SLC39A6
02
I1
00
1
24
21453364
ENSBTAG00000011622
C18orf21
I3
I3
I1
00
I7
24
46764909
ENSBTAG00000015210
LOXHD1
0I3
10
0I2
24
50327195
ENSBTAG00000002660
CCDC11
I3
11
01
0
24
58158178
ENSBTAG00000014284
ALPK2
I3
I1
I1
00
I5
24
58158620
ENSBTAG00000014284
ALPK2
0I1
I1
00
I2
24
58158784
ENSBTAG00000014284
ALPK2
I3
I3
I1
00
I7
24
58204003
ENSBTAG00000014284
ALPK2
I3
I3
I1
00
I7
24
58204479
ENSBTAG00000014284
ALPK2
I3
2I1
00
I2
25
37433180
ENSBTAG00000040568
Uncharacterized
00
I1
00
I1
26
18116487
ENSBTAG00000046239
C10orf12
0I1
I1
00
I2
26
25715996
ENSBTAG00000010522
Uncharacterized
10
00
01
26
25764570
ENSBTAG00000046727
Olfactoryreceptor
00
I1
00
I1
27
5031078
ENSBTAG00000007340
LOC101906349
0I1
I1
00
I2
27
32832557
ENSBTAG00000011684
RAB11FIP1
10
I1
00
0
28
284228
ENSBTAG00000000079
CCSAP
0I1
I1
00
I2
28
1690403
ENSBTAG00000048153
Uncharacterized
10
00
01
28
35718807
ENSBTAG00000047317
CL43
1I1
I1
00
I1
28
44081765
ENSBTAG00000032527
ERCC6
I3
01
00
I2
29
1985881
ENSBTAG00000004081
FAT3
30
I1
00
2
29
7530523
ENSBTAG00000046360
Uncharacterized
01
00
01
29
7530663
ENSBTAG00000046360
Uncharacterized
1I1
00
00
29
26397931
ENSBTAG00000013568
UEVLD
I3
I3
I1
00
I7
149
Interestingly13ofthe22genesassociatedtohumansyndromescarryvariants
appearing in the short list of the 35 positives and only 6 in that of 135 with
negative values This in spite of the low weight (from 1 to 1) given to this
parameterEightofthe35positivevariantsareinproteinsstilluncharacterized
Sixvariantshadascoreof4orhigherThehighestscore(totalscore=7)wasfor
ELMO3 (Engulfment and cell motility gene 3) This gene is involved in cell
motility and migration ELMO3 is also a homolog of C( elegans Ced12 that is
required for many developmental processes included gonadal morphogenesis
Mutant for this gene in C( elegans suffer from abnormal development usually
with lethal consequences or sublethal developmental defects including small
embryosdelayeddevelopmentandreducedfertility(Gumiennyetal2001)In
human it is expressed in testis prostate and ovary
(httpwwwgenecardsorgcgibincarddispplgene=ELMO3)
ThegeneDNAJC13(DnaJ(Hsp40)homologsubfamilyCmember13)hadatotal
score of 5 This gene is involved in the onset of Parkinson disease DNAJC13
regulates the dynamics of clathrin coats on early endosomes Cellular analysis
shows that the mutation Arg855Ser identified by VilarintildeoGuumlell et al (2014)
confers a gainoffunction that impairs endosomal transportApossible roleof
DNAJC13geneinTourettesyndromechronicticdisorderisunderinvestigation
(Sundarametal2011)
A second genehad a scoreof 5THBS2 (Thrombospondin2) It belongs to the
thrombospondin family and is a powerful inhibitor of tumour growth and
angiogenesis It is highly expressed in testis in fetal Leydig cell Hirose et al
(2008) found a significant association between an intronic SNP in the THBS2
gene and lumbar disc herniation Thrombospondin2 is also a matricellular
proteinfoundinhumanserumDeletionofTSP2causesagedependentdilated
cardiomyopathy (Hanatani et al 2014) Mutant mice for this gene display a
variety of mutant phenotype Kyriakides et al 1998 suggest that THBS2
modulates the cell surface properties of mesenchymal cells affecting cell
functionssuchasadhesionandmigration
150
Three genes had a score of 4 all three are associated to human syndromes
Mutation in HERC2 (HECT and RLD domain containing E3 ubiquitin protein
ligase2)isinvolvedinautosomalrecessivementalretardation38(Puffenberger
etal2012)Jietal(2000)identifiedinmutantmouseneuromuscularsecretory
vesicledefectsandspermacrosomedefectsMoreovergeneticvariationsinthis
gene are associated with skinhaireye pigmentation variability Mutations in
DOLK gene (dolichol kinase) are associated with dolichol kinase deficiency
(Lefeberetal2011)COX10(cytochromecoxidaseassemblyhomolog10)gene
is associated to mitochondrial complex IV deficiency causing Leigh syndrome
(Antonickaetal2003)
Conclusions)
Exomesequencingthereforeprovidedvaluableinformationoncodingsequence
variants and on their potential effect on protein function As a group variants
classifiedasdeleteriouswererarerthanvariantsclassifiedasneutralSincethis
isanexpectedeffectofpurifyingselectionthissetofvariantwaslikelyenriched
forvariantshavingaphenotypiceffect
Theinclusionintheanalysesofplusandminusvariantanimalsforfertilitytraits
likelypermittedtosequencethemostimportantvariantsinfluencingthesetraits
butthelownumberofanimalssequencedcombinedwiththecomplexityofthe
traitpermittedtofindonlyasuggestiveassociationbetweenSNPsandtraitsItrsquos
likely that amuch larger number of individuals is to be sequenced if genome
wideassociationisthestrategyoneswanttopursue
We explored an alternative strategy to confirm suggestivedeleterious variants
based on indirect evidence of the variant effect by assessing i) evolutionary
constraintsassociatedtotheaminoacidresiduecodedbythevariantii)variant
behaviour in a large population In the absence of direct genotyping of the
candidatevariantsweusedhaplotypeinformationassurrogateiii)functionand
effectsinotherspecies
151
We attempted to quantify these criteria through a score subjective and only
indicativebutpermitted tohighlighted35genes so farneglected in cattlebut
deservingfurtherattentionandpotentiallyhavingdeleteriouseffectsinHolstein
andothercattlebreedsThesemaybepartoftherecessivedeleterioussetthat
constitutethegeneticloadofItalianHolstein
References)AntonickaHLearySCGuercinGHAgarJNHorvathRKennawayNG
HardingCOJakschMShoubridgeEA2003MutationsinCOX10
resultinadefectinmitochondrialhemeAbiosynthesisandaccountfor
multipleearlyonsetclinicalphenotypesassociatedwithisolatedCOX
deficiencyHumMolGenet122693ndash2702doi101093hmgddg284
AshwellMSHeyenDWSonstegardTSVanTassellCPDaYVanRaden
PMRonMWellerJILewinHA2004DetectionofQuantitativeTrait
LociAffectingMilkProductionHealthandReproductiveTraitsin
HolsteinCattleJournalofDairyScience87468ndash475
doi103168jdsS00220302(04)731860
AstonKICarrellDT2009GenomeWideStudyofSingleNucleotide
PolymorphismsAssociatedWithAzoospermiaandSevere
OligozoospermiaJournalofAndrology30711ndash725
doi102164jandrol109007971
AulchenkoYSRipkeSIsaacsADuijnCMvan2007GenABELanRlibrary
forgenomewideassociationanalysisBioinformatics231294ndash1296
doi101093bioinformaticsbtm108
BamshadMJNgSBBighamAWTaborHKEmondMJNickersonDA
ShendureJ2011ExomesequencingasatoolforMendeliandisease
genediscoveryNatRevGenet12745ndash755doi101038nrg3031
BieseckerLGShiannaKVMullikinJC2011Exomesequencingtheexpert
viewGenomeBiology12128doi101186gb2011129128
152
BiffaniSMarusiMBiscariniFCanavesiF2005Developingagenetic
evaluationforfertilityusingangularityandmilkyieldascorrelatedtraits
InterbullBulletin063
BiffaniSSamoreacuteABCanavesiF2002Inbreedingdepressionforproduction
reproductionandfunctionaltraitsinItalianHolsteincattleInstitut
NationaldelaRechercheAgronomique(INRA)pp0ndash4
BolgerAMLohseMUsadelB2014TrimmomaticAflexibletrimmerfor
IlluminaSequenceDataBioinformaticsbtu170
doi101093bioinformaticsbtu170
BrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingand
MissingDataInferenceforWholeGenomeAssociationStudiesByUseof
LocalizedHaplotypeClusteringAmJHumGenet811084ndash1097
CharlesworthDWillisJH2009ThegeneticsofinbreedingdepressionNat
RevGenet10783ndash796doi101038nrg2664
ChunSFayJC2009Identificationofdeleteriousmutationswithinthree
humangenomesGenomeRes191553ndash1561
doi101101gr092619109
ConsortiumTEP2012AnintegratedencyclopediaofDNAelementsinthe
humangenomeNature48957ndash74doi101038nature11247
DePristoMABanksEPoplinRGarimellaKVMaguireJRHartlC
PhilippakisAAdelAngelGRivasMAHannaMMcKennaA
FennellTJKernytskyAMSivachenkoAYCibulskisKGabrielSB
AltshulerDDalyMJ2011Aframeworkforvariationdiscoveryand
genotypingusingnextgenerationDNAsequencingdataNatGenet43
491ndash498doi101038ng806
GoymerP2007SynonymousmutationsbreaktheirsilenceNatRevGenet8
92ndash92doi101038nrg2056
GumiennyTLBrugneraEToselloTrampontACKinchenJMHaneyLB
NishiwakiKWalkSFNemergutMEMacaraIGFrancisRSchedl
TQinYVanAelstLHengartnerMORavichandranKS2001CED
12ELMOaNovelMemberoftheCrkIIDock180RacPathwayIs
RequiredforPhagocytosisandCellMigrationCell10727ndash41
doi101016S00928674(01)005207
153
HanataniSIzumiyaYTakashioSKimuraYArakiSRokutandaTTsujita
KYamamotoETanakaTYamamuroMKojimaSTayamaS
KaikitaKHokimotoSOgawaH2014Circulatingthrombospondin2
reflectsdiseaseseverityandpredictsoutcomeofheartfailurewith
reducedejectionfractionCircJ78903ndash910
HiroseYChibaKKarasugiTNakajimaMKawaguchiYMikamiY
FuruichiTMioFMiyakeAMiyamotoTOzakiKTakahashiA
MizutaHKuboTKimuraTTanakaTToyamaYIkegawaS2008
AfunctionalpolymorphisminTHBS2thataffectsalternativesplicingand
MMPbindingisassociatedwithlumbardischerniationAmJHum
Genet821122ndash1129doi101016jajhg200803013
httpsgenomeucsceduindexhtml[WWWDocument]nd
JamrozikJFatehiJKistemakerGJSchaefferLR2005EstimatesofGenetic
ParametersforCanadianHolsteinFemaleReproductionTraitsJournalof
DairyScience882199ndash2208doi103168jdsS00220302(05)728952
JanisEWiggintonDJC2005AnoteonexacttestsofHardyWeinberg
equilibriumAmericanjournalofhumangenetics76887ndash93
doi101086429864
JiYRebertNAJoslinJMHigginsMJSchultzRANichollsRD2000
StructureoftheHighlyConservedHERC2GeneandofMultiplePartially
DuplicatedParalogsinHumanGenomeRes10319ndash329
KentWJSugnetCWFureyTSRoskinKMPringleTHZahlerAM
HausslerandD2002TheHumanGenomeBrowseratUCSCGenome
Res12996ndash1006doi101101gr229102
KumarPHenikoffSNgPC2009Predictingtheeffectsofcodingnon
synonymousvariantsonproteinfunctionusingtheSIFTalgorithmNat
Protoc41073ndash1081doi101038nprot200986
KyriakidesTRZhuYHSmithLTBainSDYangZLinMTDanielson
KGIozzoRVLaMarcaMMcKinneyCEGinnsEIBornsteinP
1998Micethatlackthrombospondin2displayconnectivetissue
abnormalitiesthatareassociatedwithdisorderedcollagenfibrillogenesis
anincreasedvasculardensityandableedingdiathesisJCellBiol140
419ndash430
154
LaneEACroweMABeltmanMEMoreSJ2013Theinfluenceofcowand
managementfactorsonreproductiveperformanceofIrishseasonal
calvingdairycowsAnimReprodSci14134ndash41
doi101016janireprosci201306019
LefeberDJdeBrouwerAPMMoravaERiemersmaMSchuurs
HoeijmakersJHMAbsmannerBVerrijpKvandenAkkerWMR
HuijbenKSteenbergenGvanReeuwijkJJozwiakAZuckerN
LorberALammensMKnopfCvanBokhovenHGruumlnewaldSLehle
LKapustaLMandelHWeversRA2011AutosomalRecessive
DilatedCardiomyopathyduetoDOLKMutationsResultsfromAbnormal
DystroglycanOMannosylationPLoSGenet7e1002427
doi101371journalpgen1002427
LiHDurbinR2009FastandaccurateshortreadalignmentwithBurrows
WheelertransformBioinformatics251754ndash1760
doi101093bioinformaticsbtp324
LiHHandsakerBWysokerAFennellTRuanJHomerNMarthG
AbecasisGDurbinR1000GenomeProjectDataProcessingSubgroup
2009TheSequenceAlignmentMapformatandSAMtoolsBioinformatics
252078ndash2079doi101093bioinformaticsbtp352
LiMXKwanJSHBaoSYYangWHoSLSongYQShamPC2013
PredictingMendelianDiseaseCausingNonSynonymousSingle
NucleotideVariantsinExomeSequencingStudiesPLoSGenet9
e1003143doi101371journalpgen1003143
MacArthurDGBalasubramanianSFrankishAHuangNMorrisJWalter
KJostinsLHabeggerLPickrellJKMontgomerySBAlbersCA
ZhangZConradDFLunterGZhengHAyubQDePristoMA
BanksEHuMHandsakerRERosenfeldJFromerMJinMMuXJ
KhuranaEYeKKayMSaundersGISunerMMHuntTBarnes
IHAAmidCCarvalhoSilvaDRBignellAHSnowCYngvadottirB
BumpsteadSCooperDNXueYRomeroIGWangJLiYGibbs
RAMcCarrollSADermitzakisETPritchardJKBarrettJCHarrow
JHurlesMEGersteinMBTylerSmithC2012Asystematicsurvey
155
oflossoffunctionvariantsinhumanproteincodinggenesScience335
823ndash828doi101126science1215040
MajewskiJSchwartzentruberJLalondeEMontpetitAJabadoN2011
WhatcanexomesequencingdoforyouJMedGenet48580ndash589
doi101136jmedgenet2011100223
McClureMCBickhartDNullDVanRadenPXuLWiggansGLiuG
SchroederSGlasscockJArmstrongJColeJBVanTassellCP
SonstegardTS2014BovineExomeSequenceAnalysisandTargeted
SNPGenotypingofRecessiveFertilityDefectsBH1HH2andHH3Reveal
aPutativeCausativeMutationinSMC2forHH3PLoSONE9e92769
doi101371journalpone0092769
McClureMKimEBickhartDNullDCooperTColeJWiggansGAjmone
MarsanPColliLSantusELiuGESchroederSMatukumalliLVan
TassellCSonstegardT2013FineMappingforWeaverSyndromein
BrownSwissCattleandtheIdentificationof41ConcordantMutations
acrossNRCAMPNPLA8andCTTNBP2PLoSONE8e59251
doi101371journalpone0059251
McKennaAHannaMBanksESivachenkoACibulskisKKernytskyA
GarimellaKAltshulerDGabrielSDalyMDePristoMA2010The
GenomeAnalysisToolkitAMapReduceframeworkforanalyzingnext
generationDNAsequencingdataGenomeRes201297ndash1303
doi101101gr107524110
MiHDongQMuruganujanAGaudetPLewisSThomasPD2010
PANTHERversion7improvedphylogenetictreesorthologsand
collaborationwiththeGeneOntologyConsortiumNuclAcidsRes38
D204ndashD210doi101093nargkp1019
MrodeRKearneyJFBiffaniSCoffeyMCanavesiF2009Short
communicationGeneticrelationshipsbetweentheHolsteincow
populationsofthreeEuropeandairycountriesJournalofDairyScience
925760ndash5764doi103168jds20081931
PelosoGMAuerPLBisJCVoormanAMorrisonACStitzielNOBrody
JAKhetarpalSACrosbyJRFornageMIsaacsAJakobsdottirJ
FeitosaMFDaviesGHuffmanJEManichaikulADavisBLohman
156
KJoonAYSmithAVGroveMLZanoniPRedonVDemissieS
LawsonKPetersUCarlsonCJacksonRDRyckmanKKMackey
RHRobinsonJGSiscovickDSSchreinerPJMychaleckyjJC
PankowJSHofmanAUitterlindenAGHarrisTBTaylorKD
StaffordJMReynoldsLMMarioniREDehghanAFrancoOHPatel
APLuYHindyGGottesmanOBottingerEPMelanderOOrho
MelanderMLoosRJFDugaSMerliniPAFarrallMGoelA
AsseltaRGirelliDMartinelliNShahSHKrausWELiMRader
DJReillyMPMcPhersonRWatkinsHArdissinoDZhangQWang
JTsaiMYTaylorHACorreaAGriswoldMELangeLAStarrJM
RudanIEiriksdottirGLaunerLJOrdovasJMLevyDChenYDI
ReinerAPHaywardCPolasekODearyIJBoreckiIBLiuY
GudnasonVWilsonJGvanDuijnCMKooperbergCRichSSPsaty
BMRotterJIOrsquoDonnellCJRiceKBoerwinkleEKathiresanS
CupplesLA2014AssociationofLowFrequencyandRareCoding
SequenceVariantswithBloodLipidsandCoronaryHeartDiseasein
56000WhitesandBlacksTheAmericanJournalofHumanGenetics94
223ndash232doi101016jajhg201401009
PryceJEVeerkampRFThompsonRHillWGSimmG1997Genetic
aspectsofcommonhealthdisordersandmeasuresoffertilityinHolstein
FriesiandairycattleAnimalScience65353ndash360
doi101017S1357729800008559
PryceJEWoolastonRBerryDPWallEWintersMButlerRShafferM
2014WorldTrendsinDairyCowFertilityin10ThWorldCongressof
GeneticsAppliedtoLivestockProduction
PuffenbergerEGJinksRNWangHXinBFiorentiniCShermanEA
DegrazioDShawCSougnezCCibulskisKGabrielSKelleyRI
MortonDHStraussKA2012Ahomozygousmissensemutationin
HERC2associatedwithglobaldevelopmentaldelayandautismspectrum
disorderHumMutat331639ndash1646doi101002humu22237
PurcellSNealeBToddBrownKThomasLFerreiraMARBenderD
MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKA
157
ToolSetforWholeGenomeAssociationandPopulationBasedLinkage
AnalysesAmJHumGenet81559ndash575
SahanaGNielsenUSAamandGPLundMSGuldbrandtsenB2013Novel
HarmfulRecessiveHaplotypesIdentifiedforFertilityTraitsinNordic
HolsteinCattlePLoSONE8e82909doi101371journalpone0082909
SchuumltzEScharfensteinMBrenigB2008Implicationofcomplexvertebral
malformationandbovineleukocyteadhesiondeficiencyDNAbased
testingondiseasefrequencyintheHolsteinpopulationJDairySci91
4854ndash4859doi103168jds20081154
SimmonsMJCrowJF1977MutationsAffectingFitnessinDrosophila
PopulationsAnnualReviewofGenetics1149ndash78
doi101146annurevge11120177000405
SoggiuAPirasCHusseinHACanioMDGaviraghiAGalliAUrbaniA
BonizziLRoncadaP2013UnravellingthebullfertilityproteomeMol
BioSyst91188ndash1195doi101039C3MB25494A
StitzielNOKiezunASunyaevS2011Computationalandstatistical
approachestoanalyzingvariantsidentifiedbyexomesequencing
GenomeBiology12227doi101186gb2011129227
SundaramSKHuqAMSunZYuWBennettLWilsonBJBehenME
ChuganiHT2011ExomesequencingofapedigreewithTourette
syndromeorchronicticdisorderAnnNeurol69901ndash904
doi101002ana22398
VanRadenPMOlsonKMNullDJHutchisonJL2011Harmfulrecessive
effectsonfertilitydetectedbyabsenceofhomozygoushaplotypesJournal
ofDairyScience946153ndash6161doi103168jds20114624
VilarintildeoGuumlellCRajputAMilnerwoodAJShahBSzuTuCTrinhJYuI
EncarnacionMMunsieLNTapiaLGustavssonEKChouP
TatarnikovIEvansDMPishottaFTVoltaMBeccanoKellyD
ThompsonCLinMKShermanHEHanHJGuentherBL
WassermanWWBernardVRossCJAppelCresswellSStoesslAJ
RobinsonCADicksonDWRossOAWszolekZKAaslyJOWuR
MHentatiFGibsonRAMcPhersonPSGirardMRajputMRajput
158
AHFarrerMJ2014DNAJC13mutationsinParkinsondiseaseHumMolGenet231794ndash1801doi101093hmgddt570
WangKLiMHakonarsonH2010ANNOVARfunctionalannotationofgeneticvariantsfromhighthroughputsequencingdataNuclAcidsRes38e164ndashe164doi101093nargkq603
WathesDCPollottGEJohnsonKFRichardsonHCookeJS2014HeiferfertilityandcarryoverconsequencesforlifetimeproductionindairyandbeefcattleAnimal8Suppl191ndash104doi101017S1751731114000755
YangYMuznyDMReidJGBainbridgeMNWillisAWardPABraxtonABeutenJXiaFNiuZHardisonMPersonRBekheirniaMRLeducMSKirbyAPhamPScullJWangMDingYPlonSELupskiJRBeaudetALGibbsRAEngCM2013ClinicalWhole
ExomeSequencingfortheDiagnosisofMendelianDisordersNewEnglandJournalofMedicine3691502ndash1511doi101056NEJMoa1306555
ZiminAVDelcherALFloreaLKelleyDRSchatzMCPuiuDHanrahanFPerteaGTassellCPVSonstegardTSMarccedilaisGRobertsMSubramanianPYorkeJASalzbergSL2009Awholegenome
assemblyofthedomesticcowBostaurusGenomeBiology10R42doi101186gb2009104r42
159
Supplementary)files)
YoucanfindChapter6supplementaryfilesat
- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)
- GoogleDrive(httptinyurlcomPhDMMChapter06)orscantheQRcode
)
Supplementary)Figure)S1)Coverage)analyses)
A)Averagecoverageperanimals
B)Boxplotofchromosomescoverageperanimal
C)BoxplotofanimalcoverageperchromosomeBluelinerepresenttheexpected
50Xcoverage
Supplementary)Figure)S2)Distribution)of)variant)sequencing)depth)
)
Supplementary) Figure) S3) Correlation) between) sequencing) depth) and)
number) of) discordant) genotypes) In) red) values) for) two) outlier) animals)
having)concordance)lt846))
)
Supplementary)Figure)S4)Multidimensional)scaling)of)Italian)Holstein)bulls)
sequenced))
)
Supplementary)Figure)S5)Multidimensional)scaling)of)Italian)Holstein)bulls)
analysed)with)the)800K)SNPchip))
160
Supplementary) Figure) S6) MAF) distribution) on) neutral) (blue)) and)
deleterious)(red))variants))
)
Supplementary) Table) S1) EBV) distribution) on) Italian) Holstein) population)
and)SNPchip)dataset)
Sheet1EBVthresholdvaluetodefineplusandminustailforPROT(kgprotein)
SCC(somaticcellcount)FERT(fertility)andLONG(functional longevity) All
ItalianHolsteinFAbullswereincludeinthisanalysis
Sheet2Numberofdatasetanimalsineachtailforeachtrait
Supplementary)Table)S2)Exome)probes)statistics)number)and)bp)covered)
per)chromosomes)
Supplementary) Table) S3) Concordance) between) animal) genotypes) in)
sequencing)and)SNPchip)data)
Error1 HD data homozygotes ndash Exome data heterozygotes Error2 HD data
heterozygotesndashExomedatahomozygotes
Supplementary) Table) S4) List) of) suggestive) genes) deriving) from) the)
analysis)of)exomes)of)animals) in)the)tails)of)male)and)female)fertility)EBV)
distributions))
Header(code
bull Bfreq_totFrequencyofBalleleinentiredataset
bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset
bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset
bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset
bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset
bull CallRate_totSNPcallratendashentiredataset
bull CallRate_MHSNPcallratendashhighmalefertilitydataset
bull CallRate_MLSNPcallratendashlowmalefertilitydataset
bull CallRate_FHSNPcallratendashhighfemalefertilitydataset
bull CallRate_FLSNPcallratendashhighmalefertilitydataset
bull HWeqHardyWeinbergequilibriumpvalue
bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele
bull Geno_HetNumberofanimalswithheterozygotegenotype
161
bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele
bull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)dataset
bull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)dataset
bull LOW_CRSNPcallratefromlowfertility(maleandfemale)dataset
bull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallele
bull LOW_Het Number of animals from low fertility (male and female) dataset with
heterozygotegenotype
bull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallele
bull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and
female)dataset
bull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)dataset
bull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)dataset
bull HIGH_CRSNPcallratefromhighfertility(maleandfemale)dataset
bull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallele
bull HIGH_Het Number of animals from high fertility (male and female) dataset with
heterozygotegenotype
bull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallele
bull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and
female)dataset
bull Male_TREND inheritance models implemented in PLINK trend test on male fertility
dataset
bull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale
fertilitydataset
bull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male
fertilitydataset
bull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility
dataset
bull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on
femalefertilitydataset
bull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale
fertilitydataset
bull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male
andfemalefertilitydataset
bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test
onmaleandfemalefertilitydataset
162
bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston
maleandfemalefertilitydataset
bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset
bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset
bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset
Supplementary)Table) S5) Regions) candidate) to) carry) deleterious) variants)
identified)by)OHW)approach))
Supplementary)Table) S6) Regions) candidate) to) carry) deleterious) variants)
identify)by)LOH)approach))
Ingreyregionsremovedinfurtheranalysis
Supplementary)Table)S7)List)of)candidate)deleterious)variants)mapping)in)
OHW)and)LOH)regions))Header(code
bull Bfreq_totFrequencyofBalleleinentiredataset
bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset
bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset
bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset
bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset
bull CallRate_totSNPcallratendashentiredataset
bull CallRate_MHSNPcallratendashhighmalefertilitydataset
bull CallRate_MLSNPcallratendashlowmalefertilitydataset
bull CallRate_FHSNPcallratendashhighfemalefertilitydataset
bull CallRate_FLSNPcallratendashhighmalefertilitydataset
bull HWeqHardyWeinbergequilibriumpvalue
bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele
bull Geno_HetNumberofanimalswithheterozygotegenotype
bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele
bull HW_insideDeleteriousmutationsinsideldquoOutofHardyWeinbergrdquoregion
bull HW_outsideDeleteriousmutationsoutsideldquoOutofHardyWeinbergrdquoregion
bull DH_insideDeleteriousmutationsinsideldquoLackofHomozygosityrdquoregion
bull DH_outsideDeleteriousmutationsoutsideldquoLackofHomozygosityrdquoregion
163
bull GeneClass Classifications of gene 1 deleterious mutations inside OHW and LOHregion2deleteriousmutationsinsideOHWorLOHregion
bull 2deleteriousmutationsoutsideOHWandorLOHregionbull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)datasetbull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)datasetbull LOW_CRSNPcallratefromlowfertility(maleandfemale)datasetbull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallelebull LOW_Het Number of animals from low fertility (male and female) dataset with
heterozygotegenotypebull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallelebull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and
female)datasetbull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)datasetbull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)datasetbull HIGH_CRSNPcallratefromhighfertility(maleandfemale)datasetbull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallelebull HIGH_Het Number of animals from high fertility (male and female) dataset with
heterozygotegenotypebull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallelebull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and
female)datasetbull Male_TREND inheritance models implemented in PLINK trend test on male fertility
datasetbull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale
fertilitydatasetbull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male
fertilitydatasetbull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility
datasetbull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on
femalefertilitydatasetbull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale
fertilitydatasetbull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male
andfemalefertilitydataset
164
bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test
onmaleandfemalefertilitydataset
bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston
maleandfemalefertilitydataset
bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset
bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset
bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset
Supplementary)Table)S8)Results)of)haplotype)analysis)))Header(code
bull Chromchromosome
bull PosphysicalpositiononUMD31genome
bull Exome_homonumberofhomozygotefordeleteriousmutationsequencedanimals
bull Exome_hetnumberofheterozygotesequencedanimalsfordeleteriousmutation
bull Outcome NotFound It was impossible to associated a haplotype to the mutation
NotOpt Non completely invariant haplotype could be associated to themutationOK
Putativehaplotype associate tomutationwas found In few case alternativehaplotype
wasreported
bull LenlengthofhaplotypeinnumberofSNPs
bull Pop_numnumbersofselectedhaplotypedetectedinHolsteinpopulation
bull Pop_homo numbers of homozygote animals in Holstein population for selected
haplotype
bull Pop_het numbers of heterozygotes animals in Holstein population for selected
haplotype
bull ExpExpectednumberofhomozygoteanimal
bull Alternative if only one heterozygote sequenced animals for deleteriousmutationwas
foundalltwopossiblehaplotypeswheretested
Supplementary)Table)S9)Descriptive)results)for)the)Biological)Process)(BP)))
NumberofgenesbelongingtoBiologicalProcessGOterms)
Supplementary) Table) S10) Descriptive) results) for) the)Molecular) Function)
(MF))
NumberofgenesbelongingtoMolecularFunctionGOterms
165
CHAPTER7
Conclusion
Since the domestication event occurred in the Fertile Crescent around 10000yearsagothebovinespecieswasselectedbyhumankindtofulfilhisneedsThecreationofbreedsoccurredaround200yearsagointensifiedthedifferentiationbetween populations and promoted the development of intensive selectionschemesTraditionalAandnowadaysgenomicAselectionsareproducingpositivegenetictrendsinmanyproductivetraitsdespitethestillpoorknowledgeaboutphenotype biology A deeper understanding of genome organization and genebiology would increase the accuracy of genomic evaluation by incorporatingpriorknowledgeThisthesisexploresthebovinegenomebyhighAthroughputtechnologiesAsuchasNextGenerationSequencingandHighADensityGenotypingAwithestablishedand innovative procedures to investigate the biology of complex traits and toprovidenewdataandmoleculartoolstobovinebreedingThese goals have been achieved by complementary approaches and differentmethodsInChapter2selectionsignatureswereinvestigatedindependentlyintofive Italian breeds using Illumina BovineSNP50 BeadChip Then amultiAbreedapproach was applied clustering breeds into dairy (Italian Holstein ItalianBrownand Italian Simmental) andbeef (Marchigiana Piedmontese and ItalianSimmental) groups Regions under recent positive selection shared by breedswithasameproductiveaptitudewereinvestigatedinfurtherdetailbyretrievinggenes in the region and analysing their function by ontology and pathwayanalysis In the dairy group selection signals appear nearby genes related tomammarygland(metabolismandresistancetomastitis)andfeedingadaptationIn thebeefgroupgenes inselectedregionsare involved inanimalgrowthandmeatquality(textureandjuiciness)Hencecandidategenesidentifiedbelongtonetworks and pathways important in directing a breed towards either beef ordairyproductionIn Chapter 3 a GWAS approach was used in dairy cattle to correlate SNP tocomplex phenotypes having an economic value Three independent ldquoclassicalrdquosingleAmarkerregressionswereruninthreeItaliandairycattle(ItalianHolsteinItalianBrownandItalianSimmental)UsingthetraditionalsingleSNPapproach
168
only few SNPs were above the nominal genomeAwide significance thresholdwhile applying a geneAbased method identified significant candidate genesassociatedwithmilkphysiology andmammary glanddevelopment in all threepopulationsInterestinglygenesfoundindifferentbreedshavesimilarfunctionbut are not the same suggesting that breed history demography selectioncriteriaandstochasticeventsasgeneticdrifthaveanimpactonthedefinitionofthemainmoleculartargetofselectionschemesCurrent intensive selection strategies rapidly spread the genetic gain in thepopulationbutholdtheassociatedriskofdisseminatingdefectivegenescausinggenetic defects (eg mulefoot complex vertebral malformation and others) ordecreasing fertility In Chapter 6 different technologies and approaches werecombined to obtain filter and prioritize genes candidate to be deleteriousExome sequence datawere exploited to identify deleterious variants affectingItalianHolsteincattleinasetof20animalsextremeforfertilitytraitsMorethan5000 candidate deleterious SNP variants were identified To bypass the smallnumberlimitationandthelackofpowerofanystatisticanalysison20animalsastepwise approachwas used to first filter and then prioritize these candidatedeleterious variants Polymorphisms were retained if having asymmetricdistribution in bulls from opposite tails of fertility EBV distributions andormapping in genomic regions outlier for lack of homozygosity in a 1009 bullpopulation genotyped with the Illumina BovineHD BeadChip A total of 191variants in163genespassed theprocessesThesewereassessedby searchingadditional evidences in favour or against their deleterious effect throughdifferentapproachesi)checkingthevariantconservationacross100vertebratespecies by comparative genomics ii) finding haplotypes associated withcandidatemutationsandevaluatingtheirfrequencyinHolsteinsiii)integratinghumanandmouse functionalannotationsA scorewasproposed toweight theresult of these assessments and rank the variants 35 of them had positivescores These occurred in genes involved in basic biological mechanisms asfertilitydevelopmentand immunityandresultedenriched ingenesassociatedtogeneticdefectsreportedinhumanThesemutationsarestrongcandidatestobe recessive deleterious variants worth to be further investigated in a larger
169
populationtobetterunderstandtheirbiologyimpactandpossibleapplicationinbreedingTo increase the reliability of results based on genomic data and to reduce theimpact of subjective choices in data analysis new toolsapproaches weredeveloped(iethemultiAbreedapproachinSelectionSignaturendashChapter2thepostAGWAS geneAbased approach ndash Chapters 3 and 4) and the existent onescriticallyevaluated(imputationmethodsndashChapter5)In Chapter 4 the postAGWAS geneAbased method used in Chapter 3 wasimplemented in the MUGBAS (MUlti species GeneABased Association Suite)software In contrast to existing software MUGBAS is species and annotationfree AmultipleAtesting correctionwas included in the package in addition tocomputing parallelization for faster processing and plot representation forbetterunderstandingThesoftwareusessingleAmarkerGWASdataandagivenannotation to estimate geneAregionAwise association pAvalues As previouslymentioned in Chapter 3 the ldquoclassicalrdquo GWAS approach (single SNP) foundsignals only in ItalianHolsteinwhileMUGBAS geneAbased approach identifiedsignificantsignalsinallbreedsinvestigatedIn Chapter 5 the impact of different bovine reference sequences on genotypeimputation was investigated In this work Illumina BovineSNP50 BeadChipgenotypesof ItalianSimmentalbullswerereAmappedonthethreemostrecentreferencegenomeassembliesWecomparedfourdifferent imputationmethodsfromlow(IlluminaBovineLDv10Beadchip)tohighdensityThetestpopulationwasdividedintofoursubpopulationsonthebasisofrelationshipandgenotypedataavailabilityUpdatingSNPcoordinatesonthe three testedcattlereferencegenome assemblies determined only a slight variation on imputation resultswithinmethodsOntheotherhandlargedifferencesintermsofaccuracywereobservedamongthefourimputationmethodstestedGenetic structure of industrial breeds was evaluated assessing LinkageDisequilibrium (LD) Since LD decay is correlated to intensity of selection weexpected different behaviours in the breeds analysed Indeed as evident inChapter 2 ndash Figure 7 breeds subjected to lower selection intensity (ie
170
MarchigianaPiedmonteseandItalianSimmental)displayalowerpersistenceof
LDatgreaterdistancescomparedtothosesubjectedtohigherselectionpressure
(ieItalianHolsteinandItalianBrown)Chapter3investigatesLDintheDGAT1
region of dairy cattle breeds In Italian Simmental LD is lower compared to
Italian Holstein and Italian Brown In spite of this significant association
betweentheDGAT1regionandmilktraitswasfoundbothinItalianSimmental
andinItalianHolsteinInItalianBrownnoassociationwasfoundbecauseDGAT1
isfixed
Data and results presented in this thesis represent a clear example of the
progress of molecular technologies occurred in last few years spanning from
mediumdensitygenotypingtonextgenerationsequencing
Thebiologylandscapesofsomecomplextraits(iemilkproductionandquality
fertility)were improved finding candidate genes andpathwaysnotpreviously
associated to the phenotype in the bovine species Moreover insights on
deleterious mutations were inferred through a deep and integrated analysis
usingNGSdata
This new knowledge could be used to improve cattle breeding For example
deleteriousrecessivevariantscouldbeincludedinbullevaluationsbybreedersrsquo
associations to reduce the genetic load of deleterious genes in parallel to
improvingproductivity
While technological progress has been impressivewe are still only scratching
thesurfaceofanimalbiologyNewfrontiersarebeingopenedbywholegenome
sequencing of many animals (eg in the 1000 bull genome project) and by
stepping beyond sequencing (a number of epigenomic projects in animals are
following the footprints of the human ENCODE project) The trend is towards
unravelling complexity and data production ability is to be flanked by data
storingandprocessingandaboveallbynewmethodsandtoolsfordataanalysis
171
ACKNOWLEDGMENTS
ldquoNon egrave la destinazione ma il viaggio che contardquo Questi tre anni sono stati un gran bel viaggio di crescita personale e ricchi diesperienze Non egrave stato un viaggio solitarioma condiviso con vecchie e nuoveconoscenzechemisembradoverosomenzionareInnanzitutto vorrei ringraziare il prof Paolo AjmoneMarsan chemi ha dato lapossibilitagrave di fare questa esperienza Grazie per gli insegnamenti e per leopportunitagravechemihaoffertoSperodinonavertraditolesueaspettativeAgli attuali colleghi Lorenzo Stefano Licia Elisa Riccardo ed Elia e a quellipassati Nicola Fatima Raffaele Ezequiel e Barbara va il mio piugrave sincero edaffettuoso ringraziamento per aver condiviso anche solo in parte questomiopercorso Grazie per avermi sopportato (e non egrave poco) guidato (non solo nelcampolavorativo)efattovivereinunmodospecialelrsquoambientedilavoroUnsincerograzievaatuttiimieicolleghidelXXVIIcicloAgrisystemRicorderogravesempretuttiimomentichehopotutocondividereconvoi UngrazieancheachihafattopartedellamiaesperienzasvedeseGraziedicuoreaCarlFlavioChristianEricaatuttiiragazziitalianiatuttiicolleghildquosvedesirdquoea tutte le altre persone che ho conosciuto Mi avete fatto vivere appienolrsquoesperienzaesteraefattotornareacasaconqualcheconsapevolezzainpiugrave UnsentitoringraziamentoatuttiglialtricolleghiconcuihocollaboratoinquestianniRingrazio tutti gli amici e conoscenti che mi sono stati vicino per avermisupportatoesopportatoognunoamodoproprio Ultimomanonperimportanza ilringraziamentochevaatuttalamiafamigliaper avermi aiutato spronato sostenuto sempre e in particolare in questaesperienzaSemplicementegrazie PS Adirlatuttaancheladestinazionenonegravepoicosigravemale
173
- INDEX
- CHAPTER 1 - Introduction
- CHAPTER 2 - Relative extended haplotype homozygosity (rEHH) signals across breeds reveal dairy and beef specific signatures of selection
- CHAPTER 3 - Searching new signals for productive traits in three Italian cattle breeds
- CHAPTER 4 - MUGBAS a species free gene based ased association suite
- CHAPTER 5 - Imputation accuracy is robust to cattle reference genome updates
- CHAPTER 6 - Quest for deleterious recessive variants in Italian Holstein bulls combining exome sequencing and high density SNP analysis
- CHAPTER 7 - Conclusion
- ACKNOWLEDGMENTS
-
of investment and time so that 5 to 6 years are needed to have a dairy bullprogenytested(wwwanafiit)Theadventofgenomicselection(Meuwissenetal2001)markedamilestoneindairycattlebreedingTheideawasdevelopedbeforethemoleculartoolswereinplaceforimplementitinpracticeandhasbecomearealityonlyrecentlyInthisapproach progeny tested bulls are genotyped at many thousand loci and thevalue of their EBVs used to estimate the value of each allele at each locusgenotyped These values are then used to predict the genetic value of younganimals on the basis of their genotype As a result animals from the newgeneration receive an EBV with the same accuracy of estimates obtained byprogenytestingbutmuchearlierThislowerthegenerationintervalandgreatlyincreasestherateofgeneticprogress indairypopulationsThe initial ideawasso good that nowadays genomic selection is widely applied in industrialcountriestothemajordairycattlebreedsWiththedecreaseofgenotypingcostit is likely that its use will be soon expanded to animals selected for otherpurposes (eg beef cattle) and other species (eg pigs) However genomicselection is an extension of traditional selection It is extremely useful inbreedingforcomplextraitsbutgivesnobiologicalinformationonthegenesthatcontrolthemIfthetraditionalselectionviewsasiregenomeasabigblackboxhaving a value but no clue why genomic selection views the same genomedisassembled inmany small boxes but still black and stillwithno clueon thereasonoftheirvalueIt is reasonable to think that a deeper understanding of the biologicalmechanisms controlling traits may further increase the accuracy of genomicevaluationsIndeedthenewtechnologiesoffernewopportunitiesinthissensebyloweringcostsandincreasingprecisionofdataproductionThis thesis is exploring the use of these technologies for retrieving biologicalinformationfromgenomicdataItusesmethodsnowconsideredldquotraditionalrdquoingenomics (GWAS and analysis of selection sweeps) and less traditional ones(gene centered GWAS and exome analysis) All data used here have beenproducedwithin twonationalprojectson farmanimalgenomics fundedby theItalianMinistryofAgriculture
5
M SELMOL ldquoRicerca e innovazione nelle attivitagrave di miglioramento genetico
animalemediante tecniche di geneticamolecolare per la competitivitagrave del
sistemazootecniconazionaleIrdquo(ResearchandInnovationinanimalbreeding
sheepgoatbuffalohorseanddonkeyI)
M INNOVAGENldquoRicercaeinnovazionenelleattivitagravedimiglioramentogenetico
animalemediante tecniche di geneticamolecolare per la competitivitagrave del
sistema zootecnico nazionale IIrdquo (Research and Innovation in animal
breedingsheepgoatbuffalohorseanddonkeyII)
In these projects the research group I work with M Istituto di Zootecnica at
Universitagrave Cattolica del Sacro Cuore di Piacenza M coordinated all the research
activitiesconcerningtheapplicationofgenomicstechniquesindairycattle
Contentsofthethesis
The thesis is structured into a short introduction (this first chapter) 5 core
chapters(Chapter2to6)andashortConclusion(Chapter7)Withtheexception
of Chapter 6 still in preparation the other core chapters containmanuscripts
that at time of writing are under review for their publication in international
peerMreviewedjournalsormanuscript
InChapter2selectionsignatureindairyandbeefcattleareinvestigatedwiththe
Illumina BovineSNP50 Genotyping BeadChip Five Italian cattle breeds (Italian
Holstein Italian Brown Italian Simmental Marchigiana and Piedmontese) are
analyzedwiththerEHHmethod(Sabetietal2002)tofindsignaturesofrecent
selectionCandidategenesinthevicinityofsignaturessharedbyeitherdairyor
beefbreeds are identified and theirpossible role indairy andbeefproduction
discussed
In Chapter 3 a GWAS analysis is run to investigate production traits in three
Italian dairy cattle breeds (Italian Holstein Italian Brown Italian Simmental)
Data are analysed with a well established method (Amin et al 2007)
6
implemented in theGenABELRpackage (Aulchenkoet al 2007) andwith the
newgeneMbasedmethodAssociationisinvestigatedbetween50KSNPsandmilk
production(milkyield)andqualitytraits(percentageofmilkproteinandfat)
InChapter4thegeneMbasedmethodusedinthepreviouschapter3inspiredby
theVEGASapproach(Liuetal2010)isdescribedThesoftwarewasdeveloped
andreMcodedinourlaboratoryinordertopermittheanalysisofanyspeciesin
additiontohumanMultiMtestingcorrectionandplotrepresentationcompletethe
toolsmadeavailable
InChapter5wetesttheimpactoftheuseofdifferentreferencesequencesinthe
imputation step Italian Simmental data are used to test variation in genomeM
wide inputation accuracy using different cattle reference sequences (BTAU42
UMD31 and BTAU46) To increase the analyses reliability four different
inputationsoftwaresaretested
InChapter6Holsteindataareanalysedcombiningexomesequencesand800K
SNPs data to seek deleterious recessive variantsWhile in natural populations
the destiny of these variants is to disappear in livestock breeds they can be
maintained and disseminated if carried by a highly productive sire The
intersection of deleterious mutations identified by sequencing and lack of
homozygousregionsidentifiedbytheanalysisofthepopulationwiththeHDSNP
chipidentifiedgenescandidatetocarryrecessivedeleteriousvariants
Objective
Themainobjectiveofthethesiswastoinvestigatethebovinegenomewiththe
aid new highMthroughput technology and to explore new strategies for linking
biology (genes and their function) to traits This linkage once validated may
increase the accuracy of genomic prediction and find application in selection
schemesegineradicatingthemostdeleteriousvariantsandorintheplanning
ofmatings
7
ReferencesAjmoneMMarsanPGarciaJFLenstraJA2010OntheoriginofcattleHow
aurochsbecamecattleandcolonizedtheworldEvolAnthropol19148ndash157doi101002evan20267
AminNvanDuijnCMAulchenkoYS2007AGenomicBackgroundBasedMethodforAssociationAnalysisinRelatedIndividualsPLoSONE2e1274doi101371journalpone0001274
AulchenkoYSKoningDMJdeHaleyC2007GenomewideRapidAssociationUsingMixedModelandRegressionAFastandSimpleMethodForGenomewidePedigreeMBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614
ElsikCGTellamRLWorleyKC2009TheGenomeSequenceofTaurineCattleAWindowtoRuminantBiologyandEvolutionScience324522ndash528doi101126science1169588
LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKMHaywardNKMontgomeryGWVisscherPMMartinNGMacgregorS2010AVersatileGeneMBasedTestforGenomeMwideAssociationStudiesTheAmericanJournalofHumanGenetics87139ndash145doi101016jajhg201006009
MeuwissenTHHayesBJGoddardME2001PredictionoftotalgeneticvalueusinggenomeMwidedensemarkermapsGenetics1571819ndash1829
SabetiPCReichDEHigginsJMLevineHZPRichterDJSchaffnerSFGabrielSBPlatkoJVPattersonNJMcDonaldGJAckermanHCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash837doi101038nature01140
TaberletPCoissacEPansuJPompanonF2011ConservationgeneticsofcattlesheepandgoatsCRBiol334247ndash254doi101016jcrvi201012007
8
TaberletPValentiniARezaeiHRNaderiSPompanonFNegriniR
AjmoneMMarsanP2008Arecattlesheepandgoatsendangered
speciesMolEcol17275ndash284doi101111j1365M294X200703475x
TheBovineHapMapConsortium2009GenomeMWideSurveyofSNPVariation
UncoverstheGeneticStructureofCattleBreedsScience324528ndash532
doi101126science1167936
9
CHAPTER2
Relative(extendedamphaplotype(homozygosity)(rEHH)ampsignalsampacrossampbreedsamprevealampdairyampand$beef$specificampsignaturesampofampselection
LBomba1ELNicolazzi2MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly2FondazioneParcoTecnologicoPadanoViaEinsteinLocCascinaCodazza26900LodiItalyTheseauthorscontributedequallytothiswork
____________________SubmittedtoGeneticSelectionEvolution
Abstract
Background
Nowadays many methods have been implemented to scan for signatures of
selection at genomescale level within and across breeds Extended haplotype
homozygosity is a proficient approach to detect genome regions under recent
selective pressure within breeds The objective of this study is to identify
common regions under positive recent selection shared by most represented
dairyandbeefItaliancattlebreeds
Results
A total of 3220 animals from Italian Holstein (2179) Italian Brown (775)
Simmental (493) Marchigiana (485) and Piedmontese (379) breeds were
genotyped with the Illumina BovineSNP50 BeadChip v1 in the frame of the
ItalianlivestockgenomicprojectsldquoSelMolrdquoandldquoProzoordquoAfterstandardcleaning
proceduresgenotypeswerephasedandcorehaplotypesidentifiedThedecayof
Linkage Disequilibrium (LD) for each core haplotype was assessedmeasuring
the extended haplotype homozygosity (EHH) To correct for the lack of good
estimatesof local recombinationrates therelativeEHH(rEHH)wascalculated
for each corehaplotypeGenomic regionsunder recentpositive selectionwere
identifiedasthosehighlyfrequentcorehaplotypeswithsignificantrEHHvalues
Significant independentcorehaplotypeswerealignedacrossbreeds to identify
signalsofrecentselectionsharedbydairyandsharedbybeefbreedsOverall82
and 87 common regions under selectionwere detected among dairy and beef
cattlebreedsrespectivelyBioinformaticsanalysisidentified244and232genes
mapping in these common genomic regions Pathway and network analysis of
significant genes revealed molecular functions biologically related to signal of
selectionsdetectedinmilkormeatproductiontypes
Conclusions
Resultssuggest that themultibreedapproach isamethodto identifyselection
signatures in cattle breeds under selection for the same production goal
12
allowing to better pinpoint the genomic regions of interest in dairy and beef
production
Background
The advent of genomic technologies and the consequent availability of single
nucleotidepolymorphism(SNP)datahaveenabledthestudyofselectionevents
on the cattle genome (Hayes et al 2008 Qanbari et al 2010) Such selection
signals arise from environmental or anthropogenic pressures and help to
understand the causes that led to breed formation These studies are usually
addressedinaldquotopdownrdquoapproach(ShahzadandLoor2012)fromgenotypeto
phenotypeinwhichgenomicdataisstatisticallyevaluatedtodetectdirectional
selectionThephenotypeneededtorunselectionsignaturesisintendedinavery
generalsenseabreedaproductionaptitudeorevenageographicareaoforigin
Thisapproachholdsthepotentialto investigatetraitsasadaptationtoextreme
climatesanddifferentfeedingandhusbandrysystemsresiliencetodiseasesetc
thatareveryexpensivedifficultandsometimesimpossibletostudywithclassic
GWAS (GenomeWide Association Study) approaches Therefore it is expected
thatgenomic informationretrievedinthesestudiesonlypartiallyoverlapsthat
identified by GWAS on specific traits Searching for these ldquosignatures of
selectionrdquo is of interest for understanding molecular mechanisms underlying
important biological processes and providing new insights to functional
biologicalinformation(egdiseaseandorimportanteconomictraits(Nielsenet
al 2007)) To date many methods have been implemented to scan these
selectionsignaturesatthegenomiclevelMethodsdifferwithrespecttoseveral
properties (Lenstra et al 2012) there are methods that perform analyses
withinbreedoracrossbreedsthatcomparealleleorhaplotypefrequenciesand
size that detect recent or ancient selection events either still segregating or
fixatedintopopulations(Lenstraetal2012OrsquoBrienetal2014Utsunomiyaet
al2013)Differentmethodshavedifferentsensitivityandrobustnesseg they
maybeinfluencedatadifferentextentbymarkerascertainmentbiasanduneven
distributionofrecombinationhotspotsalongthegenome
13
Extended haplotype homozygosity (EHH) is a method that has been used in
many animal species including cattle (Pan et al 2013 Qanbari et al 2010
Zhangetal2006)TheEHHmethodwasfirstdevelopedbySabetietal(2002)
for human population analysis The main objective of this methodology is to
identifylongrangehaplotypesinheritedwithoutrecombinationeventsMorein
detailunderaneutralevolutionmodelchangesinallelefrequenciesaredriven
onlybygeneticdriftInthisscenarioanewvariantwouldrequirealongtimeto
reach high frequency in the population and the Linkage Disequilibrium (LD)
surroundingitwoulddecayduetorecombinationevents(Kimura1984)Inthe
caseofpositiveselectionaquickriseinfrequencyofabeneficialmutationina
relatively short time will preserve the original haplotype structure as the
number of recombination events would be limited Therefore a signature of
positiveselectionisdefinedasaregionwithanalleleinlongrangeLDlocatedin
anuncommonlyhighfrequencyhaplotype
One of themost interesting features of thismethod is that it is able to detect
genomic regions that are under recent selection These recent selection
signatures can be therefore used to identify the genomic regions involved in
breedformationInadditionEHHmethoddoesnotrequireanydefinitionofan
ancestralallele(asneededintheintegratedhaplotypescoreiHS(Voightetal
2006) and is suited to be applied to SNP data because it is less sensitive to
ascertainmentbiasthanothermethods(Nielsenetal2007)HowevertheEHH
methodgeneratesa largenumberof falsepositiveandnegative resultsdue to
heterogeneous recombination rates along the genome (Qanbari et al 2010)
Moreover another drawback not specific of EHH but shared by all selection
signaturemethod is the lack of robust inferences able to discern true signals
fromthosegeneratedbychance(Kemperetal2014)
To partially account for these limitations Sabeti et al (2002) developed the
relative extended haplotype homozygosity (rEHH) including an empirical
methodologytoobtainsignificantsignalsTherEHHofaputativecorehaplotype
compares its original EHH valuewith that of other haplotypes at that specific
locus using all other haplotypes as control for local variation in the
recombination rate Therefore it identifies genomic regions carrying variants
14
under selection that are still segregating in the population analysed Although
thesemethodswere conceived for human populations theywere successfully
appliedto livestockspeciessuchaspig(Maetal2012)andcattle(Qanbariet
al2010)
After domestication about 10000 years ago in the fertile crescent taurine
cattle have colonized Europe and Africa and have been selected for different
human needs (Taberlet et al 2011) Particularly in the last centuries the
anthropic pressure led to the formation of hundreds of specialized breeds
adapted to different environmental conditions and linked to local tradition
constitutingagenepooldeservingattentionforconservation(AjmoneMarsanet
al2010)Someofthesebreedshavebeenselectedspecificallyfordairybeefor
bothproductiontypesfollowingstrongartificialselectionforthesetraits(Hayes
etal2009)Thepresentstudyaimsat identifyingsignalsof recentdirectional
selectionusingrEHHmethodindairyandbeefproductiontypesusingdatafrom
five Italian dairy beef and dual purpose cattle breeds We focused on the
significant core haplotypes scanned by rEHH that are shared among breeds of
thesameproductiontypeThenweusedabioinformaticsapproachto identify
positionalcandidategeneslyingwithinthegenomicregionsunderselectionand
investigatetheirbiologicalrole
Methods
SampleAnimalsandGenotyping
A total of 4311 bulls of five Italian dairy beef anddual purpose breedswere
genotyped with the Illumina BovineSNP50 BeadChip v1 (Illumina San Diego
CA)joiningthegenotypingeffortsoftwoItalianprojects(namelyldquoSelMolrdquoand
ldquoProzoordquo)Thedatasetincluded101replicatesand773siresonspairsusedfor
downstreamqualitycheckofdataproducedIndetailgenotypesof2954dairy
(2179 ItalianHolsteinand775 ItalianBrown)864beef (485Marchigianaand
379 Piedmontese) and 493 dual purpose (Italian Simmental) bulls were
availableDataqualitycontrol(QC)wasperformedintwostagesafirststageon
15
animals independently for each breed applying the same methods and
thresholdsandasecondstageappliedonmarkersacrossall theindividualsof
thedatasetThefirststageexcludedindividualswithunexpectedlyhigh(gt02)
Mendelian errors for fatherson couples andwith low call rates (lt95)The
secondstageexcludedi)SNPswithle25missingvaluesinthewholedataset
orcompletelymissing inonebreed ii)SNPswithminorallele frequencyle5
and iii) SNPs in sexual chromosomes and with unassigned chromosome or
physicalposition
EstimationofrEHH
HaplotypeswereobtainedusingfastPHASEsoftwareconsideringdefaultoptions
(ScheetandStephens2006)andwererunbreedwiseandchromosomewisein
eachbreedPedigreeinformationforallbullswereprovidedbytheirrespective
breederassociations andwereused to filteroutdirect relatives (in fatherson
pairsthesonwasmaintainedinthedatasetandthefatherdiscarded)andover
representedfamilies(amaximumof5randomlychosenindividualsperhalfsib
familywasallowed)Thefinaldatasetcontainingtheseldquolessrelatedrdquoanimalswill
behenceforthcalled ldquononredundantrdquodatasetThisnonredundantdatasetwas
used also to calculate within breed pairwise wholegenome Linkage
Disequilibrium(LD)Ther2statisticforallpairsofmarkerswasobtainedusing
PLINK software v107 (Purcell et al 2007)Thedecayof LDwas assessedby
averagingr2valuesupto1Mbdistance
To test if the population structure influenced the detection of rEHH we
replicated all analyses on the whole Italian Holstein dataset without any
population correction (ie ldquoredundantrdquo dataset) focusing on genes or gene
clustersthatarewellknowntobeunderrecentselectionincattle(ieldquocandidate
regionsrdquo) In particularwe focused on the casein polled gene cluster and coat
colourgenes(MC1RandKIT(Kemperetal2014Qanbarietal2010))
The EHH and rEHH calculations were performed using Sweep v11 software
(Sabetietal2002)Somedefaultprogramsettingshadtobemodifiedtoadapt
theseanalyses to thebovinegenomeas thesoftwarewasoriginallydeveloped
forhumangeneticanalysesSpecificallylocalrecombinationratesbetweenSNPs
16
wereapproximatedto1cMperMbEHHandrEHHcalculationswereperformed
breedwise and chromosomewise using an automatic core selection with
defaultoptions(ieconsideringlongestnonoverlappingcoresandlimitingcores
toatleast3andnomorethan20SNPs)asinQanbarietal(2010)AlthoughEHH
and rEHH values were obtained for all core haplotypes only those with
frequency ge 25 were retained for further analyses The (empirical) rEHH
significancethresholdwasobtainedbydividingtherEHHvaluesinto20binsof
5 frequency range each logtransforming withinbin values to achieve
normality Core haplotypes with pvalues lt 005 were considered significant
DetectionofcommonselectionsignatureswasobtainedcomparingrEHHvalues
acrossgenomicregionsacrossbreeds
Breedgroupingaccordingtoproductiontype
Toexamine thesignals common toeachof the twoproduction types (iedairy
andbeef) corehaplotypes thatsharedoneormoreSNP inat least twobreeds
from the same production type were selected The dual purpose Italian
Simmentalwas included inbothdairy (ItalianHolsteinand ItalianBrown)and
beef groups (Piedmontese and Marchigiana) since potentially it possesses
haplotypes selected for both production types All downstream analyses were
performedseparatelyforthedairyandthebeefproductiontypes
Detectionandannotationofcandidategenes
Genome annotation was performed using Biomart
(httpwwwbiomartorgindexhtml) a comprehensive source of gene
annotationformanyspeciesprovidedbyEnsembl(wwwensemblorg)Thelist
ofsharedhaplotypesregions(inbp)fordairyandbeefwasusedasinputfilein
theBiomartwebinterfaceThesetofgenesretrievedbyBiomartwasthenused
as input for theCanonicalpathwayandregulatorynetworkanalyses Ingenuity
PathwayAnalysis toolversion80(IPA IngenuityregSystems IncRedwoodCity
CA httpwwwingenuitycom) and a substantial examination of published
literaturewasusedtoexaminethefunctionalrelationshipsamongtheresulting
genes The IPA operates with a proprietary knowledge database providing
complementarypathwayanalysisforseveralspeciesincludingcattleIntheIPA
17
analyses the significance of each biological function identified was estimated
using Fisherrsquos exact test A Benjamini and Hochberg correction for multiple
testingwas calculated for function and canonical pathways For eachNetwork
IPAcomputedascore(pscore=log10(pvalue))fittingthesetstudiedgenesand
a list of biological functionspresent in the IPAknowledgedatabase The score
considered thenumberofgenes in thenetworkand thesizeof thenetwork to
estimatehowrelevantthisnetworkwasaccordingtothelistofgenesprovided
Thenthescorewasusedtorankthenetworks
Results
DatasetQC
Therepeatabilityobtainedfromthe101replicatespresentinthewholedataset
washigherthan998AfterthetwoQCstages105individualsand9730SNPs
were discarded Moreover after phasing 1292 individuals were removed to
reducethe largenumberofsibfamiliespresent intheredundantdatasetAfter
quality control procedures the final dataset had 44271 SNPs and 1132 514
393 410 and 364 individuals from Italian Holstein Italian Brown Italian
SimmentalMarchigianaandPiedmontesebullsrespectively(Table1)
18
Table1JNumberofanimalsgenotypedbeforeandafterQCTotalnumberofanimalgenotypedandrelativenumberofanimaldiscardedafterQCanalysis
Detectionofselectionsignatures
Sweepv11 softwaredetectedrEHHsignalswitha frequencyhigher than25overlapping the test candidate regions in both datasets corrected or not forpopulation structure In the redundant dataset we found only one significantrEHHsignal(caseincluster)whereasinthenonredundantdatasetweidentified5significantrEHHsignalsoverlappingallthetestcandidateregionsconsidered(caseinclusterpolledclusterMC1R)(Table2)
BreedTotal
Genotyped
ED15
misAN
ED1
REPL
ED1
MENDCleaned
HOL 2179 40 31 5 2093
BRW 775 6 16 4 749
SIM 493 6 6 2 479
MAR 485 37 38 410
PIE 379 5 10 364
19
Table2JComparisonofcandidateregionrEHHinnonJredundantandredundantdatasetrEHH overlapping candidate regions in both corrected (nonredundant) and not corrected
(redundant)datasetsforpopulation
Redundant NonRedundant
Candidate
geneBTA1
Closest
SNP(bp)CH2range CH2Frequency
rEHHJ
log(pval)CH2Frequency
rEHHJ
log(pval)
POLLED
Cluster1 1981154 18974181981154 H1079 093037 H1078 167174
MC1R 18 13657912 1331772014007505 H1069 120096 H1069 077167
KIT_BOVIN 6 72821175 7250492172821175 H1054 030030 H1054 033048
Casein
Cluster6 88427760 8835009888452835 H1047 094139 H1046 148133
Casein
Cluster6 88427760 8835009888452835 H2032 002003 H2033 002003
In total5526567847724388and4049corehaplotypeswitha frequency
higher than 25 were detected on Italian Holstein Italian Brown Italian
Simmental Marchigiana and Piedmontese breeds respectively A total of 838
866 740 692 and 613 core haplotypes were found significant outliers (see
Materials and Methods) in the aforementioned breeds Table 3 shows the
distribution of the total and significant core haplotypes per chromosome and
breed
20
Tableamp3amp(ampCoreamphaplotypesampdistributionampDistributionoftotalandsignificantcorehaplotypesperchromosomeandbreed
ampamp ItalianampampHolsteinamp
ItalianampampBrownamp
ItalianampSimmentalamp Marchigianaamp Piedmonteseamp
BTA1amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
Hampgtamp252amp
SignampH3amp
1amp 389 66 389 68 335 49 310 51 288 46
2amp 312 47 314 52 267 44 257 42 238 33
3amp 292 48 313 62 264 46 236 34 227 34
4amp 268 46 279 50 245 41 237 39 229 39
5amp 223 32 231 29 200 30 183 27 171 22
6amp 291 51 307 40 265 45 252 38 233 42
7amp 249 37 253 46 214 39 195 33 204 27
8amp 268 39 270 38 235 35 223 32 209 31
9amp 227 37 228 41 210 37 165 22 187 30
10amp 219 36 234 34 192 25 196 26 168 25
11amp 248 39 246 34 209 30 205 33 187 25
12amp 162 24 176 18 148 21 148 25 135 22
13amp 205 32 200 19 155 18 178 35 135 19
14amp 175 17 189 25 168 29 143 22 129 19
15amp 180 31 185 29 148 23 139 13 130 19
16amp 176 28 192 29 160 25 149 19 127 15
17amp 170 27 183 29 168 31 135 22 123 16
18amp 133 18 153 19 119 14 108 10 82 12
19amp 137 22 130 23 118 16 99 18 92 15
20amp 184 35 172 33 145 23 124 23 124 17
21amp 145 23 147 30 113 18 110 21 94 13
22amp 140 24 145 19 115 21 112 18 104 13
23amp 106 13 100 16 67 10 64 9 57 8
24amp 129 11 140 20 124 17 97 23 101 16
25amp 91 12 98 11 68 10 77 12 70 6
26amp 125 7 114 14 93 12 96 15 90 16
27amp 91 12 91 15 75 11 84 12 60 10
28amp 88 9 96 10 70 10 66 9 55 11
29amp 103 15 103 13 82 10 72 9 67 12
TOTamp 5526 838 5678 866 4772 740 4388 692 4049 613
1BostaurusAutosomes2Detectedcorehaplotypeswithafrequencyinthebreedhigherthan253Significantcorehaplotypesatplt005
Comparisonamptoamppreviouslyampreportedampdataamp
AnumberofstudieshavesearchedselectionsweepsinHolstein(Qanbarietal
2010) Brown (Qanbari et al 2011) and Simmental (Fan et al 2014) breeds
Since different methods are expected to identify signatures having different
characteristicsthecomparisonwithpreviousstudiesislimitedtothosestudies
using thesamemethodand thesamebreed(s)analysed inourstudyOnlyone
study analysed (German) HolsteinRFriesian cattle with rEHH (Qanbari et al
2010)Qanbarietal(2010)reportedcandidategenesandgeneclustersunder
recent positive selection that we compared with our rEHH in dairy breeds
dataset(Table4)
Table4JCorehaplotypesincandidategenesfollowingQanbarietal2011
1BosTaurusChromosome2Corehaplotype
Thenumberofcorehaplotypesfoundinourstudyinthecandidateregionswas
lowerpreviouslyreportedWhenconsideringHolsteinsthetwomostsignificant
candidateregionsinbothstudiesmatch(caseinclusterandthesomatostatinSST
gene)althoughitisimpossibletodetermineifthehaplotypeunderselectionis
the same as this information was not provided in Qanbari et al (2010)
However other genes considered significant in Qanbari et al (with pvalues
ranging from 004 to 010) were not significant in our studyWhen using the
same loose significant threshold (pvalues le 010) ldquosignificantldquo signals were
foundalsoinItalianBrownfortheCaseinCluster(log10pvalue=109)andin
ItalianSimmentalfortheSSTgene(log10pvalue=126)
HOLSTEIN BROWN SIMMENTAL
Candidate
geneBTA1
Closest
SNP
(bp)
CH2rangeCH2
Frequency
rEHH
Hlog(p)CH2range
CH2
Frequency
rEHH
Hlog(p)CH2range
CH2
Frequency
rEHH
Hlog(p)
DGAT1 14 444963236653443936
H1057 011443936763332
H1042 0130007
Casein
Cluster6 88391612
8835009888452835
H1046 1481338832601288452835
H2068 0801098835009888452835
H1044 037022
8835009888452835
H2033 018030
8835009888452835
H3034 022045
GH 19 49652377 GHR 20 33908597
SST 1 813769568128358581376961
H1036 200189 8131845181376961
H1042 126053
8128358581376961
H2029 00630084 8131845181376961
H2042 006027
IGFH1 5 71169823
ABCG2 6 37374911 3731702038256889
H1044 031027
3731702038256889
H1040 029031
Leptin 4 95715500
LPR 3 855692038549710885594551
H1047 0910728549710885794693
H1068 0800638549710885794693
H1063 002005
8549710885594551
H2041 027025
PITH1 1 35756434
22
Signaturessharedbydairyandbybeefbreeds
Significantcorehaplotypeswerealignedacrossbreedsto identifythoseshared
among dairy or beef production types Since breeds can be considered
independentsetsofobservationssharedsignaturesaremorelikelytorepresent
real effects rather than false positives A total of 123 and 142 significant core
haplotypesweresharedbetweenatleasttwodairyorbeefbreedsrespectively
(File S1) Only 82 and 87 of those significant core haplotypes contained genes
considered positional candidates under positive selection In total 22 and
17 of the genomewas under selection for dairy and beef production types
respectivelyTherEHHaveragelengthswere216932and190994bpfordairy
andbeefrespectively
Genesetannotationandnetworkpathwayanalysis
Atotalof244and232annotatedgenesfellwithintheselectedregionsfordairy
andbeefproductiontypesrespectively(Table5)
23
Table5JStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreedsStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreeds
DairyCattle BeefCattle
BTA SelSig
n
Genes1 Sum
SelSignsize
(bp)
Mean
SelSign
size(bp)
SelSi
gn
Genes1 Sum
SelSignsize
(bp)
Mean
SelSign
size(bp)
1 7 11 1521608 138328 7 13 2263213 174093
2 5 9 2216685 246298 8 11 2658549 2416863 2 5 1619336 323867 7 21 3755847 1788504 7 21 5956755 283655 6 11 1761288 1601175 4 17 4506119 265066 5 11 1251127 1137396 2 4 1467738 366934 5 12 5149692 4291417 6 30 4842969 161432 3 28 11889191 4246148 0 0 0 0 4 12 3512215 2926859 4 7 1554863 222123 2 6 1106768 18446110 4 17 8839762 519986 2 3 573138 19104611 6 8 1841802 230225 1 1 100439 10043912 2 8 4244133 530517 1 1 536461 53646113 3 8 1933026 241628 2 8 985376 12317214 0 0 0 0 3 9 692397 7693315 5 9 1415405 157267 2 2 189094 9454716 3 6 1302506 217084 4 6 905719 15095317 3 4 471046 117762 2 9 1551010 17233418 0 0 0 0 3 23 2961718 12877019 3 23 5744143 249745 2 2 116938 5846920 1 2 148886 74443 2 4 627971 15699321 3 7 1930206 275744 1 5 1150760 23015222 3 5 620674 124135 3 10 2411751 24117523 0 0 0 0 1 1 68421 6842124 2 18 9166004 509222 2 4 803770 20094225 2 11 2016602 183327 1 3 329199 10973326 2 4 466134 116534 4 7 776080 11086927 1 1 631792 631792 2 3 1004717 33490628 0 0 0 0 1 1 93464 9346429 2 9 935209 103912 1 5 798300 159660TOT 82 244 65393403 216932 87 232 50024613 190994
1GenesidentifiedGenesintheregionssharedbyallthreedairybreeds(8genes)orallthreebeefbreeds(11genes)foramoredetailedanalysiswereselected(Fig1)
24
Figure1JGenomiclocationoftheselectionsignaturessharedamongstudiedbreedsa)GenesinensembletracksaredisplayedasaredboxcorehaplotypesandSNPsarecolouredinorange(MarchigianaMAR)inviolet(PiedmontesePIE)andpink(SimmentalSIM)b)Genesinensemble tracks are displayed as a red box core haplotypes and SNPs are coloured in blue(HolsteinHOL)ingreen(ItalianBrownBRW)andpink(SimmentalSIM)
All the genes identified were submitted to downstream pathwaynetwork
analyses(Table5Figure1FileS1)Interestinglygenesintheregionssharedby
allthreedairyorbeefbreedsweremanifestedinthemostlikelynetworksforthe
two production aptitudes The IPA analysis detected a total of 12 and 15
networksindairyandbeefbreedsrespectively
Indairybreeds network scores ranged from46 to2 (File S2) Eightnetworks
obtainedscoreshigherthan20andincludedmorethan14moleculeseachThey
wereassociatedtoi)geneexpressionRNAdamageandrepairproteinsynthesis
andposttranslationalmodificationii)cellassemblyorganizationfunctionand
maintenance iii) cell cycle proliferation and cancer iv) carbohydrate
metabolism v) gastrointestinal and neurological disease developmental and
endocrine system disorders A total of 23moleculeswere associatedwith the
highest ranked network (score = 46) In this the immunoglobulins mitogen
25
activatedproteinkinasesandNFkBcomplexesformacentralhublinkinggenes
involved in celltocell signaling and interaction hematological system
developmentandfunctionandimmunecelltraffickingfunctions(Figure2)
Figure2JThemostlikelyIPAnetworkindairycattlebreedsThefunctionsassociatedtothenetworkarecelltocellsignallingandinteractionhematologicalsystem development and function immune cell trafficking Nodes in red correspond to genesidentifiedincorehaplotypesoverlappinginallthethreebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds
Breast cancer antiNestrogen resistance gene 3 (BARC3) andpituitary glutaminyl
cyclase gene (QPCT) genes were common to all three dairy breeds and are
directlyconnectedwithmammaryglandmetabolisms(Ezuraetal2004GSun
et al 2012) The solute carrier family 2 member 5 (SLC2A5) gene facilitates
glucosefructose transport (Zhao et al 1998) and the zetaNchain (TCR)
26
associated protein kinase 70kDa (ZAP70) gene plays a critical role in Tcell
signaling (Bonnefont et al 2011) Calpain is another important complex that
togetherwithcalpainN3(CAPN3)mediatesepithelialcelldeathduringmammary
gland involution (Wilde et al 1997) Furthermore RAS guanyl nucleotide
releasingprotein(RASGRP1)activatestheErkMAPkinasecascaderegulatesT
cellsandBcellsdevelopmenthomeostasisanddifferentiationandisinvolvedin
the regulation of breast cancer cell (Madani et al 2004 Yasuda et al 2007
Wickramasingheetal2009)Genesinvolvedintheothernetworksarelistedin
FileS2
The scoreof thebeef cattlebreedsnetwork rangedbetween46and2 and the
numberofmoleculespresentineachnetworkrangedfrom23to1(FileS3)Asin
dairy breeds eight networks obtained a score higher than 20 and 13 ormore
molecules each These networkswere associated to i) carbohydrate lipid and
nucleic acidsmetabolism energyproduction and smallmoleculebiochemistry
ii) cardiovascular system development and function iii) cellular and tissue
developmentcellassemblyandorganizationfunctionandmaintenanceandcell
and organmorphology iv) cell cycle celltocell signaling and interaction v)
DNA replication recombination and repair and RNA posttranscriptional
modification vi) dermatological diseases and conditions infectious disease
organismal injury and abnormalities In the network with the highest score
(score=46)chondroitinsulfateproteoglycan4(CSPG4)andsnurportinN1(SNUPN)
genes were shared among all beef cattle breeds investigated CSPG4 gene is
relatedtomeattendernesswhileSNUPN isanimprintedgeneimportantinthe
embryodevelopmentandinvolvedinhumanmuscleatrophy(Narayananetal
2002) (Figure 3) All the genes included in this network were involved in
carbohydratemetabolismsmallmoleculesbiochemistryandcellcycleTheother
networkswere related to important signaling andmetabolic process essential
foranimalgrowth(FileS3)
27
Figure3JThemostlikelyIPAnetworkinbeefcattlebreedsThe functions associated to the network are carbohydrate metabolism small molecule
biochemistry and cell cycle Nodes in red correspond to genes identified in core haplotypes
overlapping in all the three breeds of each production type whereas those in green depict
overlappingcorehaplotypesinatleasttwoofthosebreeds
A total of 6 and 9 statistically significant Canonical Pathways (FDR le 005
log10(FDR) ge 13) were identified using IPA for dairy and beef breeds
respectively(Figure4)
28
Figure4JBarplotofstatisticallysignificantCanonicalPathwaysPvalues were corrected for multiple testing using BenjaminiHochberg method and arepresented in thegraphas log(pvalue)Thebar represents thepercentageof genes in a givenpathway that meet cutoff criteria within the total number of molecules that belong to thefunctiona)BarplotofstatisticallysignificantCanonicalPathways indairycattlebreedsb)BarplotofstatisticallysignificantCanonicalPathwaysinbeefcattlebreeds
Themost significant Canonical Pathway in dairywas for purinemetabolism (
log10(FDR) = 26) supporting the highly synthetic processes in the mammary
epithelium(Figure5)
29
Figure5JGenesdetectedunderrecentpositiveselectionindairycattleinvolvedinpurinemetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds
Inbeefproductiontheephrinreceptorsignal(log10(FDR)=27)wasthemost
significant Canonical Pathway (Figure 6) Among other functions ephrin is
knowntopromotemuscleprogenitorcellmigrationbeforemitoticactivation(Li
andJohnson2013)AlltheotherCanonicalPathwaysarereportedinFileS4
30
Figure6JGenesdetectedunderrecentpositiveselectioninbeefcattleinvolvedinEphrinmetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds
Discussion
In this work more than 4000 genotyped bulls of five dairy beef and dualpurposeItalianbreedswerescreenedsearchingforselectionsignaturesindairyand beef production types A strict data quality controlwas applied to reducepossiblesourcesofbiassuchastechnicalgenotypingproblemsandpopulationstructurethatcouldhavelargeimpactonrEHHresultsInfactsincetheimpactofpopulationstructureonrEHHvaluesisstillunexploredwereplicatedpartofouranalyseswithandwithoutcloserelativesinanattempttostudytheeffectofpopulationstructureonthisselectionsignaturemethodInprinciplepopulationstratificationshouldleadtoanoverrepresentationofcertainhaplotypessimplyduetothepresenceoftightpedigreelinks(egsirespassinghalfoftheirgeneticmaterialtotheirsons)ratherthanactualselectionprocessesForthisreasonfor
31
the full analyses all siresons couples were removed after haplotype phasing(retaining only the younger animal) and halfsib familieswere restricted to amaximum of five randomly chosen individuals in an attempt to reduce overrepresentation This assessment was restricted to four regions known to beunderselectioninItalianHolsteinasthecaseinclusterthepolledgeneclusterMC1R and KIT Italian Holstein was selected for two reasons i) it is a highlystructuredbreedaccordingtoourdata(abouthalfofindividualsremovedwereclose relatives) ii) allowed us to compare our results with a previous studyAlthoughtheanalysesconductedonbothredundantandnonredundantdatasetswereable to identify rEHHsignals in these regions thenonredundantdatasetproduced five significant rEHH signals rather than just one in the redundantdataset(Table2)Thisresulthighlightstheconfoundingeffectofthepresenceofcloserelativesinthedatasetandconsequentlytheimprovedabilitytodetectedsignificant selection signature while correcting for population structureInterestingly our results only partially overlap those found by Quanbari et al(2010) These authors detected significant signals at the casein cluster andsomatostatinSSTaswedid but also at ahighernumberof candidate regionsThese inconsistencies may be the effect of different sires included in theanalyses of the different dataset size and to the close relative trimmingprocedure we adopted to decrease the effect of population structure andconsequent bias (Table 4) However poor correlation across investigations isoftenobservedalsointhedeeplyinvestigatedhumanspeciesThisisduetotheuse of different within and between breed statistics that identify selectionsignatureshavingdifferent characteristics (ancientrecent segregatingfixatedunder directionalbalancing selection) to the likely high rate of falsepositivesnegativesresultsandalsotothedifferentconsiderationofpopulationstructureandbackgroundselection(Enardetal2014)The number of total and significant core haplotypes identified by the SweepsoftwarewasthehighestinItalianBrownandthelowestinPiedmontesebreedSinceEHHisheavilybasedonpopulationLDtheaverageLDleveloverdistancewascalculatedineachbreed(Figure7)
32
Figure7JMultiJbreedaveragelinkagedisequilibriumagainstphysicaldistance(inkb)Marchigiana Piedmontese and Italian Simmental breeds present a lower persistence of LD allargerdistancethanItalianHolsteinandItalianBrownbreedsThisanalysishighlightedageneralpositivecorrelationbetweenthe levelofLDover distance and the number of total and significant core haplotypes foundHoweverconsideringthatrEHHisarelativemeasure it ismorelikelythatthehighernumberofsignificantcorehaplotypesidentifiedindairybreedswasduemostlytoahigherselectivepressureindairycomparedtobeefbreedsSignificance tests used in detecting selection signatures should measure theprobability of a statistic to be anoutlier compared to its expecteddistributionunder a neutral model However no reliable neutral model has so far beendevelopedincattlebecauseofthecomplexityofthedemographichistoryofthisspecies (Kemper et al 2014) As a consequence empiric rather than modelbasedsignificancetestaregenerallyusedinthedetectionofselectionsignaturesAccordinglyweconsideredasoutliersvaluesas thosesimply falling in the5plusvarianttailoftherEHHdistributionWekeptthewithinbreedsignificancethresholdloosenoncorrectingitformultipletestingbutconsideredonlysignals
33
eitherexceedingthisthresholdbyatleasttwoordersofmagnitudeorsharedby
twoormoreindependentbreeds
Parallel comparison of results from independent analyses of different breeds
allowed the achievement of a double objective i) to identify putative regions
under(recent)selectioninbreedsbelongingtotwoproductiontypeswhichwas
the main objective of this study and ii) reducing false positives since multi
breed analyses served as internal control Since the rEHH method does not
considerphenotypicinformationasignificantsignalmightarisebecausei)the
core haplotype is actually under a selective pressure ii) the result is a false
positive ie causedby chance population structureor anyotherdriving force
However even considering an unrealistic scenario with no false positives a
proportionofthetotalamountofsignalswouldactuallybeselectionsignatures
due to a selectionpressurenot ondairy or beef traits This becausedairy and
beefbreedsinvestigatedareonlyafewandshareanumberoftraitsnotdirectly
related to dairybeef production Even considering this limitation to our
knowledge this is the first dairy and beef multibreed study applying this
strategytoreducefalsepositivesatthecostofpossiblelossofinformationdue
tohighfalsenegativeratesSignificantsignalsharedbydairyandbybeefbreeds
wereused fordownstreamnetworkanalysesonpositional candidategenes to
search forabiological justification towhatwasobtainedstudying thegenomic
information Only the top network for dairy and the top network for beef
productionarediscussedindetail
Dairybreeds
ThemostsignificantnetworkdetectedindairybreedsisassociatedwithCellTo
CellSignalingandInteractionHematologicalSystemDevelopmentandFunction
andImmuneCellTraffickingfunctions(Figure2)ImmunoglobulinandNFkBact
aspivotalgeneswithinthisnetworkInterestinglyBARC3andQPCTgeneswere
sharedamongallthethreedairybreedsBothgeneshavenotyetbeenstudiedin
cattlealthoughinhumanstudiesthesegenesarewellknowntobelinkedtothe
mammaryglandmetobolismandthecalciumregulationTheformerisinvolved
in integrinmediated cell adhesions and signaling which is required for
mammary gland development and functions (G Sun et al 2012) The latter is
34
associatedwithlowradialbodymineraldensity(BMD)inadultwomen(Ezuraet
al2004)AnotherimportantplayeronthisnetworkisSLC2A5genealsoknown
asGLUT5 SLC2A5 gene acts as fructose transporter in the intestine and has a
significantroleinenergybalanceofdairycows(Burantetal1992)Findingthis
gene in adairy cattle specificnetwork seems surprising since there shouldbe
little need for transporting glucose andor fructose in the ruminant intestinal
tractassimplecarbohydratesaredegradedintovolatilefattyacids(VFA)inthe
rumen (Iqbal et al 2009) However it is often the case that large amount of
starchbypasstherumenincowsfedwithdietsrichincerealgrains(Zhaoetal
1998)Thisbypassedstarchneedstobedigestedinthesmallintestinetoavoid
highlevelsofglucoseconcentrationinthelargeintestine
WedetectedcalpaincomplexandcalpainN3(CAPN3)genesunderpositiverecent
selection in the dairy group as reported also by Utsunomiya et al (2013)
AltoughCalpainisknowntobeinvolvedinpostmortemmeattendernizationitis
also related to dairy metabolism as the muscle breakdown promoted by the
calpain gene provides an energy source for milk production especially at the
beginningoflactation(Kuhlaetal2011)InadditionasreportedbyWildeetal
(1997) calpainNcalpastatin system is related to programmed cell death of
alveolar secretory epithelial cells during lactation Zap70 gene is a cytoplasmic
proteintyrosinekinaseItisrelatedtotheimmunesystemasacomponentofthe
Tcell receptor playing a central role in Tcell responses (Wang et al 2010)
Bonnefontet al (2011) found thatZap70 genewasup regulated (FoldChange
(FC)=82FDR=277E12)onmilksomaticcellsofsheepinfectedbySaureus
andSepidermidisshowinganassociationwithmastitisresistance
Purinemetabolismwasthemostsignificantcanonicalpathwayindairybreeds
In a study conducted in human mammary epthelium Maningat et al (2008)
conductedageneexpressionanalysisonbreastmilkfatglobulesidentifyingthe
PurinemetabolismasthemostsignificantpathwaySynthesisandbreakdownof
purine is essential in the metabolism of the tissues of many organisms
particularlyofthemammaryglandduringlactation
Another interesting canonical pathway was endothelin signaling (Figure 5)
endothelin functions as vasocontrictor and is secreted by endothelial cells
35
(Nussdorferetal1999)Acostaetal(1999)reportedthatincattleendothelinsareinvolvedinthefollicularproductionofprostaglandinsandtheregulationofsteroidogenesis in the mature follicle In a recent study Puglisi et al (2013)confirmed the implication of endothelin in particular EDNRA and endothelinconvertin enzyme 1 in reproductive disorder in bovine They classified theEDNRAaspotentialbiomarkerforfertilityincow
Beefbreeds
Inbeefcattlebreedsthefocaladhesionkinase(FAK)togetherwithintegrinandERK12genesplayacentralroleashubsinthemostsignificantnetwork(Figure3)Theyareinvolvedincellularadhesionandcellmotilityprocessesparticularlyduringskeletalmyogenesis(Nguyenetal2014)Moreovertheoverexpressionof FAK can determine a reduced apoptosis and an increase risk of metastaticcancers(MehlenandPuisieux2006)Thesehubsarenotdirectlyinvolvedinourstudybut sharemany interactors thatare located in region found tobeunderselction such as CSPG4 RB1CC1 MGAT3 CIRPB CSPG4 gene belongs tochondroitin sulfate proteoglycans (CSPGs) family CSPGs are proteoglycansconsistingofaproteincoreandachondroitinsulfatesidechainTheyareknownto be structural components of a variety of tissues includingmuscle andplaykey roles inneuraldevelopmentandglial scar formationTheyare involved incellular processes such as cell adhesion cell growth receptor binding cellmigrationandinteractionswithotherextracellularmatrixconstituents
Manystudiesreporttheroleofproteoglycansinmeattextureofseveralmusclesfromthebovinespecies(Nishimuraetal1996)Dubostetal(2013)highlightadirect involvement of proteoglycans with respect to cooked meat juicinessRB1CC1geneplaysacrucialroleinmusculardifferentiationanditsactivationisa essential for myogenic differentiation (Watanabe et al 2005 p 1)Monoacylglycerol acyltransferase (MGAT3) gene cause the synthesis ofdiacylglycerol (DAG)using2monoacylglyceroland fattyacyl coenzymeAThisenzymaticreactionis fundamental fortheabsorptionofdietaryfat inthesmallintestineSunetal(2012p3)reportedinastudyonfiveChinesecattlebreedsthatMGAT3gene isassociatedwithgrowthtraitsCIRPBgenemaybepartofacompensatorymechanism inmuscles undergoing atrophy It preservesmuscle
36
tissuemassduringcoldshockresponsesaginganddisuse(DupontVersteegden
et al 2008) SNUPN is an imprinted gene and is expressed monoallelically
dependingonitsparentaloriginSNUPNplayimportantrolesinembryosurvival
andpostnatal growth regulation (Smithet al 2012Wanget al 2012)Ephrin
receptorsignalingwasthetopcanonicalpathwayproducebyIPAandhasalsoan
interestingbiologicalinterpretationformeatproduction(Figure6)Indeedthis
pathaway is important for the muscle tissue growth and regeneration
participating in correct positioning and formation of the neural muscular
junction(LiandJohnson2013)
Conclusions
In this study we analysed the selection signatures at genomewide level in 5
Italian cattle breeds Then we used a multibreed approach to identify the
genomicregionssharedamongcattlebreedsselectedfordairyorbeefpurpose
Thisapproachincreasedthepowertopinpointregionsofthegenomethatplay
important roles in economically relevant traits in both dairy and beef cattle
Moreover network and pathways analyses were used to display a detailed
functional characterization of genes mapping in the regions detected under
recentpositiveselection
Particularly in dairy cattle genes under directional selection are related to
feeding adaptation (increasing level of starch in the diet) mammary gland
metabolismandresistencetomastitisThebiologicalfeaturesunderselectionin
beef cattle breeds are involved in animal growth meat texture and juiceness
These results have a logical biological explanation However the frequent
approach of interpreting selection signatures on the basis of the function of
geneslocatednearbythepeaksignalofthestatisticusedispresentlychallenged
Novelinformationinhumanssuggeststhatmostselectedvariationisnotlocated
within genes and coding regions but in regulatory sites identified by the
ENCODEproject(Enardetal2014)Thesemaycontroltheexpressionofentire
genomicregionsorofgeneslocatedatarelavantdistancefromtheselectedsite
makingbiologicalinterpretationmorecomplex
37
InanycasefurtherstudiesusingdenserSNPchipsorwholegenomesequencing
producinginformationnotsubjectedtoascertainmentbias(Qanbarietal2014)
may increase the resolution of our analysis and togetherwith new knowledge
arisingonthecontrolofgeneexpressionvalidateorcorrectourresults
Acknowledgements
ANAFI (Italian Association of Holstein Friesian Breeders) ANAPRI (Italian
Association Simmental Breeders) ANABORAPI (National Association of
PiedmonteseBreeders)ANABIC(AssociationofItalianBeefCattleBreeders)and
ANARB (Italian Brown Cattle Breeders Association) are acknowledged for
providing the biological samples and the necessary information for this work
The Authors are also grateful to SelMol project (MiPAAF Ministero delle
PoliticheAgricoleAlimentarieFortestali)andbytheProZooproject(Regione
LombardiandashDGAgricoltura Fondazione Cariplo Fondazione Banca Popolare di
Lodi)forfunding
Thefundershadnoroleinstudydesigndatacollectionandanalysisdecisionto
publishorpreparationofthemanuscript
References
Acosta TJ Berisha B Ozawa T Sato K Schams D Miyamoto A 1999
Evidence for a Local EndothelinAngiotensinAtrial Natriuretic Peptide
SysteminBovineMature Follicles In Vitro Effects on SteroidHormones
and Prostaglandin Secretion Biol Reprod 61 1419ndash1425
doi101095biolreprod6161419
AjmoneMarsan P Garcia JF Lenstra JA 2010 On the origin of cattle how
aurochs became cattle and colonized theworld Evol Anthropol Issues
NewsRev19148ndash157
38
BonnefontCToufeerMCaubetCFoulonETascaCAurelMRBergonier
D Boullier S RobertGranieacute C Foucras G 2011 Transcriptomic
analysisofmilksomaticcells inmastitisresistantandsusceptiblesheep
upon challenge with Staphylococcus epidermidis and Staphylococcus
aureusBMCGenomics12208
BurantCFTakedaJBrotLarocheEBellGIDavidsonNO1992Fructose
transporter inhumanspermatozoaandsmall intestine isGLUT5 JBiol
Chem26714523ndash14526
DubostAMicolDPicardBLethiasCAnduezaDBauchartDListratA
2013Structuralandbiochemicalcharacteristicsofbovineintramuscular
connective tissue and beef quality Meat Sci 95 555ndash561
doi101016jmeatsci201305040
DupontVersteegden EE Nagarajan R Beggs ML Bearden ED Simpson
PMPetersonCA2008IdentificationofcoldshockproteinRBM3asa
possible regulator of skeletal muscle size through expression profiling
AJP Regul Integr Comp Physiol 295 R1263ndashR1273
doi101152ajpregu904552008
Enard D Messer PW Petrov DA 2014 Genomewide signals of positive
selection in human evolution Genome Res gr164822113
doi101101gr164822113
EzuraYKajitaMIshidaRYoshidaSYoshidaHSuzukiTHosoiTInoue
SShirakiMOrimoHEmiM2004Associationofmultiplenucleotide
variationsinthepituitaryglutaminylcyclasegene(QPCT)withlowradial
BMDinadultwomenJBoneMinRes191296ndash301
Fan H Wu Y Qi X Zhang J Li J Gao X Zhang L Li J Gao H 2014
Genomewide detection of selective signatures in Simmental cattle J
ApplGenet1ndash9doi101007s1335301402006
HayesBJChamberlainAJMaceachernSSavinKMcPartlanHMacLeodI
SethuramanLGoddardME2009Agenomemapofdivergentartificial
selectionbetweenBostaurusdairycattleandBostaurusbeefcattleAnim
Genet40176ndash184doi101111j13652052200801815x
Hayes BJ Lien S Nilsen H Olsen HG Berg P Maceachern S Potter S
Meuwissen THE 2008 The origin of selection signatures on bovine
39
chromosome 6 Anim Genet 39 105ndash111 doi101111j1365
2052200701683x
IqbalSZebeliQMazzolariABertoniGDunnSMYangWZAmetajBN
2009 Feeding barley grain steeped in lactic acid modulates rumen
fermentation patterns and increases milk fat content in dairy cows J
DairySci926023ndash6032doi103168jds20092380
KemperKESaxtonSJBolormaaSHayesBJGoddardME2014Selection
for complex traits leaves littleornoclassic signaturesof selectionBMC
Genomics15246doi1011861471216415246
KimuraM1984Theneutraltheoryofmolecularevolution
KuhlaBNuumlrnbergGAlbrechtDGoumlrsSHammonHMMetgesCC2011InvolvementofSkeletalMuscleProteinGlycogenandFatMetabolismin
the Adaptation on Early Lactation of Dairy Cows J Proteome Res 10
4252ndash4262doi101021pr200425h
Lenstra JAGroeneveld LF EdingHKantanen JWilliams JL Taberlet P
NicolazziELSolknerJSimianerHCianiEGarciaJFBrufordMW
AjmoneMarsan P Weigend S 2012 Molecular tools and analytical
approaches for the characterization of farm animal genetic diversity
AnimGenet43483ndash502
Li J Johnson SE 2013 EphrinA5 promotes bovine muscle progenitor cell
migration before mitotic activation J Anim Sci 91 1086ndash1093
doi102527jas20125728
Ma YL Zhang Q Ding XD 2012 [Detecting selection signatures on X
chromosome in pig through high density SNPs] Yi Chuan Hered
ZhongguoYiChuanXueHuiBianJi341251ndash1260
Madani S Hichami A CharkaouiMalki M Khan NA 2004 Diacylglycerols
Containing Omega 3 and Omega 6 Fatty Acids Bind to RasGRP and
Modulate MAP Kinase Activation J Biol Chem 279 1176ndash1183
doi101074jbcM306252200
Maningat PD Sen P Rijnkels M Sunehag AL Hadsell DL Bray M
Haymond MW 2008 Gene expression in the human mammary
epitheliumduring lactation themilk fat globule transcriptome Physiol
Genomics3712ndash22doi101152physiolgenomics903412008
40
Mehlen P Puisieux A 2006Metastasis a question of life or deathNat Rev
Cancer6449ndash458doi101038nrc1886
NarayananUOspinaJKFreyMRHebertMDMateraAG2002SMNthe
Spinal Muscular Atrophy Protein Forms a PreImport Snrnp Complex
withSnurportin1andImportin HumMolGenet111785ndash1795
Nguyen N Yi JS Park H Lee JS Ko YG 2014 Mitsugumin 53 (MG53)
LigaseUbiquitinatesFocalAdhesionKinaseduringSkeletalMyogenesisJ
BiolChem2893209ndash3216doi101074jbcM113525154
NielsenRHellmannIHubiszMBustamanteCClarkAG2007Recentand
ongoing selection in the human genome Nat Rev Genet 8 857ndash868
doi101038nrg2187
NishimuraTHattoriATakahashiK1996Relationshipbetweendegradation
of proteoglycans andweakening of the intramuscular connective tissue
duringpostmortemageingofbeefMeatSci42251ndash260
Nussdorfer GG Rossi GPMalendowicz LKMazzocchi G 1999Autocrine
ParacrineEndothelinSysteminthePhysiologyandPathologyofSteroid
SecretingTissuesPharmacolRev51403ndash438
OrsquoBrienAMPUtsunomiyaYTMeacuteszaacuterosGBickhartDM LiuGE Tassell
CPV Sonstegard TS Silva MVD Garcia JF Soumllkner J 2014
Assessing signatures of selection through variation in linkage
disequilibrium between taurine and indicine cattle Genet Sel Evol 46
19doi101186129796864619
Pan D Zhang S Jiang J Jiang L Zhang Q Liu J 2013 GenomeWide
DetectionofSelectiveSignatureinChineseHolsteinPLoSONE8e60440
doi101371journalpone0060440
PuglisiRCambuliCCapoferriRGianninoLLukajADuchiRLazzariG
Galli C Feligini M Galli A Bongioni G 2013 Differential gene
expressionincumulusoocytecomplexescollectedbyovumpickupfrom
repeat breeder and normally fertile Holstein Friesian heifers Anim
ReprodSci14126ndash33doi101016janireprosci201307003
Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender D
MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKa
41
tool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795
Qanbari S Gianola D Hayes B Schenkel F Miller S Moore S Thaller GSimianer H 2011 Application of site and haplotypefrequency basedapproachesfordetectingselectionsignaturesincattleBMCGenomics12318doi1011861471216412318
Qanbari S Pausch H Jansen S Somel M Strom TM Fries R Nielsen RSimianer H 2014 Classic Selective Sweeps Revealed by MassiveSequencinginCattlePLoSGenet10doi101371journalpgen1004148
Qanbari S Pimentel ECG Tetens J Thaller G Lichtner P Sharifi ARSimianerH2010AgenomewidescanforsignaturesofrecentselectioninHolsteincattleAnimGenetdoi101111j13652052200902016x
Sabeti PC Reich DE Higgins JM Levine HZ Richter DJ Schaffner SFGabriel SB Platko JV Patterson NJ McDonald GJ Ackerman HCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash7
ScheetPStephensM2006Afastandflexiblestatisticalmodelforlargescalepopulation genotype data applications to inferring missing genotypesandhaplotypicphaseAmJHumGenet78629ndash44
Shahzad K Loor JJ 2012 Application of topdown and bottomup systemsapproaches in ruminantphysiologyandmetabolismCurrGenomics13379
SmithLSuzuki JGoffAFilionFTherrien JMurphyBKohanGhadrHLefebvreRBrisvilleABuczinskiSFecteauGPerecinFMeirellesF2012DevelopmentalandEpigeneticAnomaliesinClonedCattleReprodDomestAnim47107ndash114doi101111j14390531201202063x
Sun G Cheng SYS Chen M Lim CJ Pallen CJ 2012 Protein TyrosinePhosphatase Phosphotyrosyl789 Binds BCAR3 To Position Cas forActivationatIntegrinMediatedFocalAdhesionsMolCellBiol323776ndash3789doi101128MCB0021412
Sun J Zhang C Lan X Lei C ChenH 2012 Exploring polymorphisms andassociationsofthebovineMOGAT3genewithgrowthtraitsGenomeNatl
42
Res Counc Can Geacutenome Cons Natl Rech Can 55 56ndash62doi101139g11077
TaberletPCoissacEPansu J PompanonF 2011Conservationgeneticsofcattle sheep and goats C R Biol On the trail of domesticationsmigrationsandinvasionsinagricultureDominiqueJobGeorgesPelletierJeanClaudePernollet334247ndash254doi101016jcrvi201012007
Utsunomiya YT Perez OrsquoBrien AM Sonstegard TS Van Tassell CP doCarmo AS Meszaros G Solkner J Garcia JF 2013 Detecting lociunder recent positive selection in dairy and beef cattle by combiningdifferentgenomewidescanmethodsPLoSOne8e64280
Voight BF Kudaravalli S Wen X Pritchard JK 2006 A Map of RecentPositive Selection in the Human Genome PLoS Biol 4 e72doi101371journalpbio0040072
WangHKadlecekTAAuYeungBBGoodfellowHESHsuLYFreedmanTSWeissA2010ZAP70AnEssentialKinaseinTcellSignalingColdSpringHarbPerspectBiol2doi101101cshperspecta002279
WangMZhangXKangLJiangCJiangY2012MolecularcharacterizationofporcineNECD SNRPNandUBE3Agenesand imprinting status in theskeletal muscle of neonate pigs Mol Biol Rep 39 9415ndash9422doi101007s1103301218066
WatanabeRChanoTInoueHIsonoTKoiwaiOOkabeH2005Rb1cc1iscritical for myoblast differentiation through Rb1 regulation VirchowsArchIntJPathol447643ndash648doi101007s0042800411831
WickramasingheNSManavalanTTDoughertySMRiggsKALiYKlingeCM 2009 Estradiol downregulates miR21 expression and increasesmiR21targetgeneexpressioninMCF7breastcancercellsNucleicAcidsRes372584ndash2595doi101093nargkp117
WildeCJAddeyCVLiPFernigDG1997Programmedcelldeathinbovinemammary tissue during lactation and involution Exp Physiol 82 943ndash953
YasudaSStevensRLTeradaTTakedaMHashimotoTFukaeJHoritaTKataoka H Atsumi T Koike T 2007 Defective Expression of Ras
43
Guanyl NucleotideReleasing Protein 1 in a Subset of Patients with
SystemicLupusErythematosusJImmunol1794890ndash4900
ZhangCBaileyDKAwadTLiuGXingGCaoMValmeekamVRetiefJ
Matsuzaki H Taub M Seielstad M Kennedy GC 2006 A whole
genome longrange haplotype (WGLRH) test for detecting imprints of
positiveselectioninhumanpopulationsBioinformatics222122ndash8
ZhaoFQOkineEKCheesemanCIShiraziBeecheySPKennellyJJ1998
Glucose transporter gene expression in lactating bovine gastrointestinal
tractJAnimSci762921ndash2929
44
Supplementaryfiles
YoucanfindChapterʹsupplementaryfilesat
- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)
- GoogleDrive(httptinyurlcomPhDMMChapter02)orscantheQRcode
SupplementaryfileS1ndashSignificantcorehaplotypesandgenessharedin
dairyandbeef
Significantcorehaplotypes(pvaluele005haplotypefrequencyge025)shared
indairyandbeefcattlebreedsandrelativegenesintersectingwiththemThefile
isdividedinto4tablesThefirst2sheets(sheet1and2)shownthesignificant
corehaplotypesfordairyandbeefThelast2sheets(sheet3and4)showngenes
intersectingwiththesignificantcorehaplotypesfordairyandbeef
SupplementaryfileS2ndashNetworksrankingindairybreeds
Networksrankingindairybreedswithrelativemoleculessymbolslistscore
valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop
functionforeachnetwork
SupplementaryfileS3ndashNetworksrankinginbeefbreeds
Networksrankinginbeefbreedswithrelativemoleculessymbolslistscore
valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop
functionforeachnetwork
45
SupplementaryfileS4ndashCanonicalpathwaysrankingindairyorbeefcattle
breeds
Canonicalpathwaysrankingindairy(sheet1)orbeef(sheet2)cattlebreedswithrelativemoleculessymbolslistratioandndashlog10ofthepvaluesforeachcanonicalpathway
46
CHAPTER3Searchingnewsignalsforproductivetraitsthroughgenebasedassociationanalysisin
threeItaliancattlebreedsSCapomaccio1MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItalyTheseauthorscontributedequallytothiswork
____________________SubmittedtoAnimalGenetics
Abstract
Genome wide association studies (GWAS) have been widely applied to
disentanglethegeneticbasisofcomplextraits IncattlebreedsclassicalGWAS
approach with medium density markers panels are far to be conclusive
especiallyforcomplextraitsThisisduetotheintrinsiclimitationsofGWASand
to the choices that are made to step from the associationrsquos signals to the
functionalunits
Here we applied a geneHbased association strategy to prioritize genotypeH
phenotype associations found with classical approaches in three Italian dairy
cattlebreedswithdifferentsamplesizes(ItalianHolsteinn=2058ItalianBrown
n=745ItalianSimmentaln=477)formilkproductionandqualitytraits
WhileclassicalsinglemarkerregressionfailedtorevealgenomeHwidesignificant
genotypeHphenotype association except for Italian Holstein the gene based
approach found specific genes in each breed that are associated with milk
physiologyandmammaryglanddevelopment
Since no established method is yet implemented to step from variation to
functional units our strategy may contribute to reveal new genes that play
significant roles in complex traits like thosehere investigated condensing low
signalsofassociationinageneHcentricfashionapproach
Background
Modern cattle breeds have nomore than 200 years of history (Taberlet et al
2008) a time spanwhere selection reproductive isolationand the consequent
geneticdriftshapedallelefrequenciesandlinkagedisequilibriumthroughoutthe
genome This is particularly true for industrial dairy breeds where the joint
application of reproductive technologies and modern genetic evaluation
methodshave imposedahigh selectionpressureona limitednumberof traits
(Taberletetal2008)involvedinmilkquantityandquality
Indeed the milk production industry has been historically ready to acquire
technicalimprovementsfromdifferentfieldsstatisticsbiologyandengineering
50
ForexampleBLUPbasedstrategiesandhighHdensitygenotyping throughSNPHchipsarenowroutinelyincludedinbullevaluationshavingsignificantimpactonselectionefficiencyandpopulationstructureThese techniques have dramatically improved milk yield and compositionwithout knowing the biological basis of the traits under selection To datedespite many efforts and many QTL mapped only a few genes have beeneffectivelyassociatedwithmilkyieldandcomposition(Grisartetal2002Blottetal2003CohenHZinderetal2005)In thiscontextGenomeWideAssociationStudies(GWAS) is themostcommonmethodfordissectingthebiologythatunderliescomplextraits(McCarthyetal2008)TheunderlyingideaofGWASissimplefindmarkerHtraitassociationsexploitingthe linkage disequilibrium (LD) that exists between the causative mutation ndashwhichweignorendashandoneormoreanalyzedmarkerswhichwewellknowInthiswayonecanpinpointgenomicregionscarryingcausalvariantsforanytrait
Although a number of GWAS have been conducted in cattle productive andmorphological traits (some examples (Jiang et al 2010 Cole et al 2011Meredith et al 2012)) and found several genomic regions associated to thesetraits none focused their attention directly on genes involved inmilk biologyMoreoveroneof themajordrawbacksofGWASisrelatedtothescarceresultsportability across populations due to a variety of confounding factorspopulationstructuredifferentialLD levelsbreedspecific selection targetsandeven SNPs ascertainment bias (Clark et al 2005) For instance significantassociationsonfatpercentagewereidentifiedinmanyregionsoftheBostaurusgenome in different breeds Except for genes of major effect there is littleagreement on which regions and H more importantly H genes are actuallyinvolvedinthecontrolofthetraitinvestigatedThisissomehownotsurprisingone favorable allele may segregate in one breed and be fixated in a differentbreed the same allele segregates in both breeds but allelesmay differ or thesegregates in both breeds but the genetic background masks the effect ofsegregation
51
AnotherlimitationinGWASisthestringentsignificancethresholdoftenapplied
tocorrectformultipletestingAsaresulta largeproportionofgeneswithlow
effectaredisregardedwithconsequentoverestimationof thehigheffectgenes
variance(Xu2003)Thisbiasisparticularlyworryingwheninvestigatingtraits
controlledbya largenumberofgeneswithsmalleffectand inhighLDasmilk
yield(GIANOLAETAL2013)
To overcome this kind of limitation different strategies have been invented
usingprior knowledge Interesting ldquocandidatepathwayrdquo approacheshavebeen
conducted(Ravenetal2013)testingforassociationbetweenaphenotypeand
genes inpathwaysthatare likelytobe involvedinthecontrolofthetraitThis
approach reduces thenumberofmarkers tobe tested increases the statistical
poweroftheexperimentrestrictingtheanalysistoexistingknowledge
Other methods based on statistical manipulations or particular experimental
designs have been developed to prioritize or validate genotypeHphenotype
associations For example geneHbased association strategies while restricting
GWA study only to genes and neighboring genomic regionsmay identify new
candidategenesdeservingpostHGWASanalysis(Akulaetal2011Cantoretal
2010 Liu et al 2010) Moreover it is true that classical GWAS downstream
bioinformatics analyses concentrate their efforts in findingmeaningful results
usinggenesasfinaltarget
The aim of ourworkwas to find new candidate genes and novel information
associated to complex traits as milk production (yiled) and quality (fat and
protein percentage) using a geneHcentric genome wide association in three
differentdairybreedsItalianHolsteinItalianBrownandItalianSimmental
Methods
BiologicalSamples
Bulls from the three most represented dairy breeds in Italy (Italian Holstein
ItalianBrownandItalianSimmental)wereselectedfromArtificialInsemination
(AI) bulls progeny tested in Italy until 2010 and forwhich biological samples
52
wereavailableItalianHolsteinbullswerechosenaccordingtofollowingcriteria
i) having the national Selection Index namely production functionality type
(PFT)withreliabilitygt75andii)havingthelowestpossiblerelationshipwith
theothersiresinthesetThiswasachievedbymaximizingthevariabilitywithin
the group On the contrary considering the low number of available Italian
BrownandItalianSimmentalbullstheywereallincludedintheanalysesAfter
thisprocedureatotalof2139ItalianHolstein775ItalianBrownand486Italian
Simmental samples (including quality control replicates)were genotypedwith
theBovineSNP50BeadChipv1(IlluminaSanDiegoCA)
Traitsconsideredintheanalysis
Italian Holstein (ANAFI) Italian Brown (ANARB) and Italian Simmental
(ANAPRI)breedersassociationsprovidedderegressedproofs(DRP)ordaughter
yielddeviations (DYD)hereafter referredasphenotype forall traitsanalyzed
Traitsanalyzedweremilkyieldfatpercentageandproteinpercentage
Genotypingandqualitycontrol
Quality controls (QC) were run independently for each breed using the same
pipelineandthresholds IndetailQCexcluded i)thereplicatesamplewiththe
highernumberofmissinggenotypesii)sampleswithunexpectedlyhigh(ge100)
mendelianerrors(forfatherHsonpairs)iii)sampleswithmorethan5missing
genotypes iv) SNPs with ge 25 missing data v) SNPs with minor allele
frequency(MAF)le5vi)SNPswithmendelianerrorsge25andvii)SNPsout
ofHardyHWeinbergequilibriumtestedbyFisherexacttest(ple0001Bonferroni
corrected)
Analysisofpopulationstructure
MultiDimensionScaling(MDS)wasperformedusingPLINK107(Purcelletal
2007)andresultsplottedwithRscriptsThisanalysisshowninsupplementary
material (FigureS1) exploredpopulationsubstructureandverified thegenetic
homogeneityofthedatasetpriortofurtheranalyses
53
PairwiseLDwascalculatedusingPLINK107(Purcelletal2007)LDdecayand
LD structure on specific regions were calculated with in house R scripts and
snpplotterRpackage(LunaandNicodemus2007)
GWAS
GenomeHwide association analysis was made with the GenABEL package in R
(Team 2014) using GRAMMARHCG (Genome wide Association using Mixed
Model and Regression H Genomic Control) approach (Amin et al 2007
Aulchenkoetal2007)
Basingonpriorknowledge(Minozzietal2013) in ItalianHolsteintheDGAT1
regionwas also treated as fixed effect inGenABEL analysis of the three traits
ResultsusingthismodelarereferredtoasldquoHolsteinDGATrdquo
Manhattan andquantileHquantileplots for each traitwereproduced to allowa
thoroughevaluationoftheGWASresultsAllcalculatedpHvaluesfromtheGWAS
analysiswere retained for the downstream prioritizationwith the geneHbased
association
Workingongenomecoordinates
Coordinatesof themarkerswere liftedover fromBTA40(Elsiketal2009)to
UMD31assembly(Ziminetal2009)usingSnpChimpv1(Nicolazzietal2014)
andgenecoordinateswereretrievedusingBioMartEnsembl
QTLcoordinatesweredownloadedfromtheAnimalQTLdb(Huetal2013) in
bedformat
Operations on genomic intervalswereperformedusing thebedtools suite ver
2162(QuinlanandHall2010)
GeneBasedAssociationanalysis
Toreducetheantagonismbetweensignificancethresholdandfalsepositiverate
of a classical GWA study (Gianola et al 2013) and to facilitate selection
signature comparison across the three breeds investigated we used a geneH
54
centricstatisticthatcollapsesinasinglevalueallthepHvaluesfromSNPsrelated
toagenecorrectingforLDinformation
Our approachderivesdirectly from the softwareVEGAS (VersatileGeneHBased
TestforGenomeHwideAssociationStudies)(Liuetal2010)adaptedtopursue
meaningfulresultsinanyspeciesbeyondhumans
Briefly the geneHwise test statistic is the sum of the n upper tail chiHsquared
valuesofaSNPsubset withonedegreeof freedom(wheren is thenumberof
SNPmappinginthegivengene)WithperfectlinkageequilibriumthegeneHwise
pHvaluewouldbetheoneHsidepHvalueofachiHsquaredistributionwithndegrees
of freedom Otherwise more likely the distribution of the pHvalues has to be
evaluated(withsimulations)andweighted(withLDvalues)(Liuetal2010)
The VEGAS speciesHfree procedure was implemented in a UNIX environment
usingBashandRlanguagesusingEnsemblannotationasinputfilesPLINKdata
typeandpHvaluesforeachSNPfromthepreviouslycitedGRAMMARprocedure
WemappedSNPsonBostaurusgenes(UMD31)andtheirboundaries(100000
bpupHanddownHstream)toaccountforregulatoryregionsOnlygeneswithat
least 2 SNPs mapped were retained Significant entries from the geneHbased
associationweremappedinthecattleQTLdatabasewithbedtools
ResultsandDiscussion
In this study a gene based association strategy was applied to identify new
genomicregionsboostingGWASanalysisresultsformilkproductionandquality
fromthreeItaliandairybreeds
Complextraitslikethosehereevaluatedareaffectedbyafewmajorgeneswith
large effects andmany otherswithmoderate to low effects the latter are not
easily identifiedby genomewide scans inmodern cattle breedsduemainly to
samplesizelimitationwhilesignalfromtheformersarelostingenomescansH
withsomeexceptionsHduerapidfixations
InthisscenariowithmediumdensitySNPpanelsliketheoneusedinthisstudy
associationbasedonsingleSNPregressionorotherldquoclassicalrdquomethodscouldbe
inconclusiveintermsofresults
55
AftertheQCprocess7452058and477samplesand3575938535and38574SNPs were retained for Italian Brown Italian Holstein and Italian Simmentalbulls respectively (Table S1) HolsteinDGAT analysis retained 38534markersoneSNPlessthantheotherHolsteinpanelDescriptive statistics on genotypes for each breed (histogram and barplot forheterozygosis and call rate per individual Figure S7 and multi dimensionalscalingFigureS1)revealedabsenceofconfoundingeffectsWiththesedatasetsatthenominalgenomeHwidepHvaluesignificancethresholdof 005 (ple125 x 10H6 after Bonferroni correction (Burton et al 2007Dudbridge and Gusnanto 2008)) significant associations were found only inItalianHolsteinbetweenmilkyieldandfatpercentageand26SNPsmappingintheproximalregionofBTA14nearbyDGAT1(Table1)
56
Table1PHvaluesoftheSNPsovercomingthegenomewidesignificancethresholdintheHolsteinbreedformilkyieldandfatpercentage(notaccountingforDGAT1effect)SNPsaresortedbypositionsintheUMD31assembly
SNPID BTA Position(Bp) POvalue(Fat) POValue(Milk)
Hapmap30383OBTCO005848 14 1489496 112EH12 361EH12
ARSOBFGLONGSO57820 14 1651311 247EH46 161EH16
ARSOBFGLONGSO34135 14 1675278 857EH17
ARSOBFGLONGSO94706 14 1696470 665EH16
ARSOBFGLONGSO4939 14 1801116 534EH49 106EH20
Hapmap52798Oss46526455 14 1923292 915EH21
UAOIFASAO6878 14 2002873 382EH24
ARSOBFGLONGSO107379 14 2054457 118EH33 664EH15
ARSOBFGLONGSO18365 14 2117455 250EH16
Hapmap30922OBTCO002021 14 2138926 439EH13
Hapmap25384OBTCO001997 14 2217163 390EH10 697EH09
Hapmap24715OBTCO001973 14 2239085 108EH09 369EH09
BTAO35941OnoOrs 14 2276443 402EH13
Hapmap30374OBTCO002159 14 2468020 528EH12
Hapmap30086OBTCO002066 14 2524432 798EH11
ARSOBFGLONGSO74378 14 3640788 161EH08
ARSOBFGLOBACO1511 14 3765019 375EH09
UAOIFASAO9288 14 3956956 811EH09
ARSOBFGLONGSO100480 14 4364952 133EH09
UAOIFASAO5306 14 4468478 458EH08
ThisislikelyduetotheeffectoftheK232ADGAT1mutationthatissegregatingin
thisbreed(Fontanesietal2014)NootherSNPwassignificantlyassociatedto
anyothertraitinthe3breedsinvestigated
SomeSNPswereunderasuggestivesignificancethresholdofple5x10H5Details
on these markers are in supplementary material (Table S3) but they are not
discussedherePlotsdescribingatglancetheassociationresultsrevealingalow
signaltonoiseratioareavailableintheonlinesupplementarymaterial(Figure
S2S3S4andS5respectively for ItalianBrown ItalianHolsteinHolsteinDGAT
andItalianSimmental)
57
Asperthegenecentricanalysiswemappedatotalof192371989919894and
19844 genes on 24616 available on Ensembl (release version 73) in Italian
BrownItalianHolsteinHolsteinDGATandItalianSimmentalrespectivelyGenes
with a geneHwisepHvalue at least 10H5were considered significantly associated
withthestudiedtraitThesignificanceappearedtobeadequatetothehighrate
ofredundancyof theannotationfeaturesandto figureoutthetopassociations
with a reasonable level of confidence Results on these associations per breed
andpertraitaresummarizedinTable2Table3andinsupplementarymaterials
(TableS2andTableS4)ThemajorityofsignificantgenesfoundwiththegeneH
centricanalysismapped inpreviouslycharacterizedQTL thatareassociated to
milkproductionorqualityThesearesummarizedintheTableS5
Table2Numberofsignificantlyassociatedgenesforeachbreedineachtraitinthegenebasedassociationtest
Fat Milkyield Protein
Brown 3 1
Holstein 92 80 1
HolsteinDGAT 1 4 1
Simmental 1 13
58
Tableamp3FeaturessignificantlyassociatedperbreedtraitHolsteinanalysisnotaccountingforDGAT1isshow
ninTableS3
Breedamp
Traitamp
EnsemblampIDamp
GeneWiseampPValueamp
SNPsInGeneamp
BestSNPamp
BestSNP_PValueamp
BTAamp
Startamp(bp)amp
Endamp(bp)amp
GeneampNam
eamp
BROWNamp
FAT
Nosignificantassociationfound
MILK
ENSBTAG00000019237
500EG05
4Hapmap53770Gss46526325
204EG04
23
47333403
47348212
TXNDC5
ENSBTAG00000043918
500EG05
4Hapmap53770Gss46526325
204EG04
23
47337650
47337787
5S_rRNA
ENSBTAG00000046839
500EG05
4Hapmap53770Gss46526325
204EG04
23
47328131
47352621
UncharacterizedProtein
PROTEIN
ENSBTAG00000001260
700EG05
3ARSGBFGLGNGSG116222
441EG03
18
52120387
52124418
PINLYP
amp
HOLSTEINDGATamp
FAT
ENSBTAG00000000369
100EG05
3Hapmap53294Grs29016908
333EG05
594673755
94749093
EPS8
MILK
ENSBTAG00000026896
130EG05
5ARSGBFGLGNGSG104949
385EG04
23
51549721
51553563
FOXF2
ENSBTAG00000005260
400EG05
5Hapmap33288GBTCG033751
120EG03
638120578
38127577
SPP1
ENSBTAG00000006579
500EG05
4ARSGBFGLGBACG2207
632EG05
15
54418559
54459500
P4HA3
ENSBTAG00000007013
700EG05
6UAGIFASAG7665
396EG05
19
5221507
5283308
TOM1L1
PROTEIN
ENSBTAG00000006638
600EG05
3ARSGBFGLGNGSG104774
176EG04
18
56523439
56530499
BCL2L12
amp
SIMMENTHALamp
FAT
ENSBTAG00000037649
900EG05
6Hapmap30001GBTAG142716
529EG05
4120427915
120494453
VIPR2
MILK
ENSBTAG00000017538
100EG05
6ARSGBFGLGBACG13093
587EG06
12
85630428
85630912
Pseudogene
ENSBTAG00000000856
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1766767
1769754
FBXL6
ENSBTAG00000000857
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1763994
1766621
GPR172B
ENSBTAG00000026356
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1795351
1804562
DGAT1
ENSBTAG00000035158
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1770660
1772329
TMEM
249
ENSBTAG00000046208
200EG05
2ARSGBFGLGNGSG4939
413EG06
14
1782901
1788087
SCRT1
ENSBTAG00000009811
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1825982
1844587
UncharacterizedProtein
ENSBTAG00000009816
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1838135
1840081
UncharacterizedProtein
ENSBTAG00000014458
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1844664
1894424
MROH1
ENSBTAG00000020751
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1806081
1825793
HSF1
ENSBTAG00000044406
400EG05
2ARSGBFGLGNGSG4939
413EG06
14
1883849
1883971
SCARNA15
ENSBTAG00000046889
400EG05
5ARSGBFGLGBACG13093
587EG06
12
85610279
85610762
UncharacterizedProtein
ENSBTAG00000020117
800EG05
3BTAG78494GnoGrs
188EG04
721189879
21196263
ZBTB7A
PROTEIN
Nosignificantassociationfound
amp
59
1 ItalianHolstein
The$ large$ number$ of$ significant$ signals$ was$ observed$ in$ the$ Italian$ Holstein$
especially$ for$ milk$ yield$ and$ fat$ percentage$ As$ shown$ in$ Table$ 2$ the$ strong$
effect$ of$ DGAT1$ in$ Italian$ Holsteins$ observed$ with$ individual$ SNPs$ was$
confirmed$by$the$geneBcentric$approach$In$this$region$almost$the$totality$of$the$
geneBwise$pBvalues$under$the$chosen$threshold$appeared$to$be$influenced$by$this$
major$ gene$ and$ by$ the$ high$ level$ of$ linkage$ disequilibrium$ that$ characterizes$
cosmopolitan$bovine$breeds$(see$Figure$S6)$Within$this$pool$of$genes$it$is$worth$
to$mention$PLEC$ (plectin)$ a$ cell$ surface$ receptor$ gene$ in$ the$RNASE5$ pathway$
(Angiogenin$ encoded$ by$ angiogenin$ ribonuclease$ RNase$ A$ family$ 5)$ Proteins$
encoded$ by$ these$ genes$ that$ are$ present$ in$ milk$ are$ involved$ in$ protein$
synthesis$ and$ act$ as$ growth$ factors$ in$ vitro$Moreover$RNASE5$ seems$ to$ have$
other$ functions$ such$ as$ regulation$ of$ stress$ response$ and$ apoptosis$ and$ even$
angiogenesis$ through$ which$ it$ may$ also$ contribute$ to$ the$ regulate$ milk$
production$(Raven$et$al$2013)$
The$ exclusion$ of$DGAT1$ effect$ (HolsteinDGAT$ results)$ highlighted$ that$ all$ the$
significant$ effects$ on$ fat$ percentage$ found$ for$ genes$ located$ within$
approximately$25$Mb$up$or$downstream$from$the$excluded$SNP$(ARS5BFGL5NGS5
4939)$were$ redundant$ signals$ of$DGAT1$ (Table3$ and$Table$ S2$ and$ S3)$ This$ is$
also$observed$for$milk$yield$due$to$the$genetic$correlations$between$fat$and$milk$$
Also$ PLEC$ was$ not$ significant$ in$ our$ analyses$ when$ taking$ into$ account$ the$
DGAT1$effect$contradicting$previous$findings$in$Holstein$that$report$association$
of$PLEC$gene$with$production$traits$either$accounting$or$not$for$the$effect$of$the$
diglyceride$ acyltransferase$ (Raven$ et$ al$ 2013)$ A$ possible$ explanation$ for$ the$
different$results$between$these$studies$is$a$difference$between$genetic$structures$
of$the$populations$analyzed$To$reduce$false$positive$signals$we$only$evaluated$
the$genes$based$on$HolsteinDGAT$results$for$the$Italian$Holstein$$
11 Milkyield
We$found$genes$associated$with$this$trait$in$Holstein$in$particular$SPP1$P4HA3=
FOXF2$and$TOM1L1$genes$
Among$ these$ the$ osteopontin$ SPP1$ (secreted$ phosphoprotein$ 1)$ is$ directly$
related$to$milk$production$(de$Mello$et$al$2012)$ It$ is$ located$ in$BTA6$nearby$
60
$
ABCG2$ and$ there$ is$ evidence$ that$ polymorphisms$ in$ this$ region$ are$ strongly$
correlated$with$high$production$ levels$of$milk$ in$various$breeds$(CohenBZinder$
et$ al$ 2005)$Osteopontin$SPP1$ is$ a$ secreted$glycophosphoprotein$ that$plays$ a$
crucial$ role$ in$ mammary$ gland$ development$ having$ an$ essential$ role$ also$ in$
mammary$gland$differentiation$and$branching$of$the$mammary$epithelial$ductal$
system$ This$ gene$has$ also$ been$ implicated$ in$many$ types$ of$ cancer$ including$
breast$cancer$ for$ the$potentiality$of$proliferation$and$alveologenesis$ (Hubbard$
et$ al$ 2013)$ It$ is$worth$mentioning$ that$ the$most$ significant$ SNPs$within$ this$
gene$was$Hapmap332885BTC5033751=with=a$pBvalue$of$00012$ largely$over$ the$
genomeBwide$significance$ threshold$This$signal$would$be$ ignored$ in$ the$single$
SNP$approach$
Another$gene$revealed$by$the$geneBbased$association$is$P4HA3$encoding$for$the$
prolyl$4Bhydroxylase$α$polypeptide$III$involved$in$the$correct$folding$of$nascent$
procollagen$ chains$ It$ is$ known$ that$ stromalBepithelial$ interactions$ are$ critical$
for$mammary$gland$development$and$defective$deposition$or$reorganization$of$
extracellular$ matrix$ can$ lead$ to$ incorrect$ mammary$ epithelial$ morphogenesis$
(Gillette$ et$ al$ 2013)$ highlighting$ the$ role$ of$ this$ gene$ in$ the$ healthy$ breast$
tissue=
FOXF2=gene$ (Forkhead$Box$F2)$ is$ a$member$of$ the$ forkhead$box$ transcription$
factors$ known$ to$ be$ involved$ in$ tissue$ development$ as$ part$ of$ the$ ldquoHedgehog$
signaling$pathwayrdquo$This$key$cascade$is$essential$for$the$correct$segmentation$of$
tissues$during$embryonic$development$but$has$been$recently$demonstrated$that$
is$active$in$adult$stem$cells$proliferation$in$mammary$tissue$(Liu$et$al$2006)$In$
particular$ FOXF2$ plays$ a$ key$ role$ in$ extracellular$ matrix$ synthesis$ and$
epithelialBmesenchymal$ interactions$ and$ could$ therefore$ affect$ the$ proper$
development$of$the$mammary$stroma$In$addition$low$levels$of$transcription$of$
this$gene$are$correlated$ to$early$onset$metastasis$ in$breast$cancer$ (Kong$et$al$
2013)$ underlining$ a$ specific$ role$ in$ the$ mammary$ gland$
Lastly$ TOM1L1$ (Target$ Of$ Myb1$ ChickenBLike$ 1)$ is$ another$ gene$ potentially$
involved$ with$ the$ milk$ physiology$ as$ is$ known$ its$ implication$ with$ the$ early$
endosome$ sorting$ coupled$ with$ EGFR= (epidermal$ growth$ factor$ receptor)$
another$ important$player$ in$ lactation$(Fowler$et$al$1995)$TOM1L1$belongs$to$
61
$
the$ VHS$ domain$ containing$ proteins$ that$ are$ involved$ in$ the$ postBGolgi$
trafficking$and$signaling$(Wang$et$al$2010)$
12 Fatpercentage
In$ Italian$ Holstein$ breed$ we$ found$ a$ significant$ association$ with$ or$ without$
DGAT1$ effect$ correction$ to$ the$EPS8$ gene$ located$on$BTA5$at$around$95$Mbp$
This$gene$encodes$for$a$substrate$of$the$already$cited$EGFR$and$is$involved$in$the$
fatty$acid$metabolism$(Fazioli$et$al$1993)$A$recent$study$on$Holstein$Friesian$
(Wang$et$al$2012)$found$strong$association$with$a$variant$located$in$the$second$
intron$of$the$epidermal$growth$factor$receptor$pathway$substrate$enforcing$the$
hypothesis$of$a$true$involvement$of$this$gene$on$fat$percentage$
13 Proteinpercentage
The$BCL2L12$gene$(BclB2Blike$protein$12)$associated$with$protein$percentage$in$
Italian$Holstein$breed$represents$another$interesting$result$The$product$of$this$
gene= is$ an$ important$ oncoprotein$ in$ two$ fundamental$ cell$ cycle$ pathways$
namely$ p53$ and$ cytoplasmic$ caspase$ signaling$ BCL2L12= resides$ in$ both$
cytoplasm$and$nucleus$inhibiting$caspases$3$and$7$and$forming$a$complex$with$
p53$thereby$ inhibiting$p53Bdirected$transcriptomic$changes$upon$DNA$damage$
(Stegh$ and$DePinho$ 2011)$ The$members$ of$ BCL2$ family$ are$ considered$ antiB
apoptotic$genes$and$it$is$obvious$that$dysfunctional$programmed$cell$death$plays$
an$ important$ role$ in$ the$ pathogenesis$ and$ in$ the$ progression$ of$ cancer$Many$
members$of$this$family$were$found$to$be$modulated$in$various$malignancies$and$
BCL2L12= in$ particular$ was$ expressed$ in$ mammary$ gland$ and$ proposed$ as$
prognostic$ cancer$ biomarker$ (Thomadaki$ et$ al$ 2007)$ In$ dairy$ cows$ the$
members$of$BCL2$family$are$involved$in$the$key$events$ in$the$mammary$tissue$
development$ and$ remodeling$ during$ lactation$ and$ involution$ (Stefanon$ et$ al$
2002$p$200)$Members$of$BCL2$family$regulate$the$turnover$of$cell$population$
in$the$mammary$gland$during$lactation$and$its$overexpression$was$found$to$be$
linearly$related$to$milk$production$in$Holstein$cow$(Colitti$et$al$2004)$
$
62
$
2 ItalianSimmental
In$ the$ Italian$ Simmental$ breed$ we$ found$ one$ association$ on$ BTA4$ for$ fat$
percentage$(VIPR2)$and$two$associations$ for$milk$yield$one$ in$BTA7$(ZBTB7A)$
and$ the$ other$ in$ BTA14$ related$ to$ the$ DGAT1$ No$ association$ was$ found$ for$
protein$percentage$
21 Fatpercentage
Among$the$genes$found$significantly$associated$with$production$traits$in$Italian$
Simmental$VIPR2=a$ gene$ encoding$ a$ receptor$ for$ a$ small$ vasoactive$ intestinal$
neuropeptide$ was$ the$ single$ signal$ in$ BTA4$ for$ fat$ percentage$ Vasoactive$
intestinal$ peptide$ (VIP)$ is$ synthesized$ in$ the$pituitary$ gland$ and$ among$other$
functions$ stimulates$ prolactin$ secretion$ and$ is$ involved$ in$ proliferation$
survival$and$differentiation$in$human$breast$cancer$cells$(Valdehita$et$al$2010)$$
An$interesting$novel$finding$is$the$effect$of$immunosuppression$(through$direct$
effect$of$IL56)$on$prolactin$cells$and$its$ability$to$regulate$prolactin$by$acting$on$
pituitary$ VIP$ through$ VIP$ receptors$ (Blanco$ et$ al$ 2013)$ These$ evidences$
suggest$ that$VIPR2$ could$ be$ an$ important$ candidate$ for$ further$ investigations$
through$post$GWAS$approaches$such$as$reBsequencing$or$fine$mapping$
22 Milkyield
Surprisingly$the$DGAT1$gene$was$significantly$associated$with$milk$yield$but$not$
with$fat$percentage$in$Italian$Simmental$breed$Indeed$the$most$significant$SNP$
for$the$milk$yield$trait$in$this$breed$was$the$ARS5BFGL5NGS54939$with$a$pBvalue$
of$ 413EB06$ The$ importance$ of$ this$ gene$ in$ the$ lactation$ is$ largely$ known$
(Grisart$et$al$2002)$$
To$ investigate$DGAT1$ signal$ in$ Simmental$ we$ analyzed$ the$ fine$ LD$ structure$
genomeBwide$ and$ in$ BTA14$ proximal$ region$ in$ all$ the$ three$ breeds$ revealing$
different$patterns$
Observed$LD$decay$on$ the$ three$breeds$genomes$revealed$similar$behavior$ for$
Italian$Holstein$and$Italian$Brown$with$r2$gt$010$$until$ around$ 250$ Kbp$ while$
Italian$Simmental$showed$lower$LD$Distances$higher$that$400$Kbp$revealed$half$
level$of$LD$in$Simmental$(Figure$S6)$
63
$
In$DGAT1$region$LD$behavior$is$very$similar$between$Italian$Holstein$and$Italian$
Brown$ (Figure$ S8)$ while$ association$ values$ are$ really$ different$ because$ ARS5
BFGL5NGS54939=is$fixed$in$Italian$Brown$and$purged$by$QC$process$Interestingly$
association$ patterns$ become$ similar$ when$ DGAT1$ effect$ is$ removed$ from$ the$
Holstein$ panel$ (HolsteinDGAT$ Figure$ S8)$ In$ Simmental$ we$ found$ a$ different$
situation$ very$ low$ LD$ level$ and$ significant$ association$ levels$ with$ ARS5BFGL5
NGS54939$with$ the$milk$ trait$This$ is$probably$due$ to$ the$ low$ frequency$of$ the$
p232K$ allele$ in$ Italian$ Simmental$ (Scotti$ et$ al$ 2010)$ and$ to$ the$ different$
selection$strategies$applied$in$these$breeds$
The$ relatively$ high$ pBvalue$ of$ this$ SNP$ is$ responsible$ for$ the$ statistical$
significance$of$many$of$the$13$genes$found$significant$in$this$breed$for$this$trait$
as$ shown$ in$ Table$ 3$ therefore$ reducing$ the$ gene$ set$ to$ three$ locations$ two$
overlapping$features$indicating$a$pseudogene$in$BTA12$and$a$zinc$finger$protein$
ZBTB7A$in$BTA7$DGAT1$role$and$function$will$be$not$discussed$here$as$already$
done$elsewhere$(Grisart$et$al$2002)$
ZBTB7A=is$part$of$zinc$finger$and$BTB$proteins$(Broad$complex$Tramtrack$and$
Bricabrac)$ transcription$ factors$ with$ emerging$ importance$ for$ their$
involvement$ in$ oncogenesis$ and$ cell$ differentiation$by$ repressing$key$ genes$of$
cell$cycle$regulation$as$retinoblastoma$(Rb)$and=p53=(ldquoJeon$et$al$B$2008$B$ProtoB
oncogene$ FBIB1$ (PokemonZBTB7A)$ Represses$ Trpdfrdquo$ nd)= ZBTB7A$ over$
expression$ is$ shown$ in$primary$ and$ recurrent$ breast$ cancer$ tissues$ leading$ to$
disease$progression$and$poor$survival$(Zu$et$al$2011)$Link$between$this$gene$
polymorphism$and$milk$production$might$be$in$the$proper$cellular$development$
of$ the$ mammary$ gland$ This$ transcription$ factor$ is$ also$ known$ as$ a$ master$
regulator$ of$ B$ and$ TBcell$ lineage$ (Lee$ and$ Maeda$ 2012)$ enforcing$ the$ link$
between$ milk$ production$ and$ immune$ protection$ as$ already$ outlined$ by$ of$
Lemay$ and$ colleagues$ with$ proteomic$ studies$ where$ the$ ldquoactivation$ of$ the$
innate$ immune$systemrdquo$resulted$as$one$of$ the$most$represented$gene$ontology$
categories$(Lemay$et$al$2009)$
$
64
$
3 ItalianBrown
In$ the$ Italian$Brown$breed$we$ found$ one$ association$ on$BTA23$ (TXNDC5)$ for$
milk$yield$and$one$in$BTA18$for$protein$percentage$(PINLYP)$The$three$features$
listed$in$Table$3$for$milk$yield$in$Italian$Brown$were$collapsed$due$to$overlap$$
31 Milkyield
Thioredoxin$ domainBcontaining$ protein$ 5$ (TXNDC5)$ belongs$ to$ thioredoxin$
family$small$ubiquitous$proteins$containing$disulphide$domain$affecting$protein$
folding$with$chaperone$activity$preventing$protein$misBfolding$within$the$lumen$
of$ the$endoplasmic$reticulum$This$multifunctional$organelle$have$a$key$role$ in$
several$ processes$ including$ lipid$ catabolism$ and$ protein$ synthesis$ (RamiacuterezB
Torres$ et$ al$ 2012)$ Because$ of$ its$ disulphide$ bonds$ thioredoxin$ facilitate$ the$
reduction$of$other$proteins$by$cysteine$thiol$disulfide$exchange$This$mechanism$
is$essential$for$detossification$
During$ lactation$ indeed$ an$ enhanced$ detoxification$ level$ is$ functional$ to$
preserve$ milk$ quality$ against$ increased$ metabolic$ activities$ Higher$ metabolic$
levels$ lead$ to$ free$ radicals$ production$ making$ necessary$ the$ maintenance$ of$
redox$ homeostasis$ that$ include$ among$ others$ the$ reduction$ and$ oxidation$ of$
peroxiredoxin$and$thioredoxin$(Lu$and$Holmgren$2014)$
32 Proteinpercentage
For$ protein$ percentage$ in$ Italian$ Brown$ there$ is$ only$ one$ associated$ gene$
PINLYP=is$encoding$a$phospholipase$A2$inhibitor$but$this$protein$product$is$not$
yet$well$studied$ $Many$types$of$phospholipase$A2$were$characterized$and$ it$ is$
known$ that$ they$ are$ mediator$ of$ inflammation$ atherosclerosis$ and$ cancer$ in$
mammals$and$that$plays$an$important$role$in$the$innate$immune$response$(Qiu$
and$Lai$2013)$It$may$be$reasonable$to$imagine$a$role$for$PINLYP$in$the$lactation$
in$light$of$previously$discussed$results$that$link$milk$to$innate$immune$response$
(Lemay$et$al$2009)$
$
Although$ limited$ by$ the$ size$ of$ the$ study$ and$ the$ number$ of$ the$ included$
markers$these$results$represent$both$a$confirmation$and$a$step$forward$towards$
the$comprehension$of$the$biological$bases$of$milk$yield$and$quality$$
65
$
New$ genes$ potentially$ involved$ with$ production$ traits$ in$ dairy$ cows$ were$
tracked$when$GWAS$signals$resulted$too$weak$to$be$considered$in$a$classical$SNP$
based$mapping$with$the$ldquoclosest$gene$to$peakrdquo$or$the$windowBbased$strategy$
It$ is$ worth$ mentioning$ that$ none$ of$ the$ analyzed$ SNPs$ (not$ considering$ the$
Italian$Holstein$with$ARS5BFGL5NGS54939= included$ in$ the$ analysis)$were$ above$
the$ genomeBwide$ significance$ threshold$ in$ single$ SNP$ analysis$ meaning$ that$
important$information$is$often$lost$in$such$studies$$
In$addition$ to$ the$newly$ identified$ loci$we$observed$a$confirmation$of$existing$
associations$ with$ genes$ related$ to$ milk$ production$ not$ only$ dissecting$ gene$
specific$ functions$ but$ from$ mapping$ these$ genes$ in$ previously$ characterized$
regions$(Table$S5)$This$ is$not$trivial$when$using$new$data$analysis$techniques$
especially$ in$ high$ noise$ to$ signal$ ratio$ datasets$ like$ those$ here$ presented$We$
believe$ that$ this$ approach$will$ benefit$ of$data$ from$denser$marker$panels$ (eg$
BovineHD)$ to$ exploit$ local$ linkage$ disequilibrium$ and$ increase$ the$ number$ of$
mapped$genes$with$more$than$two$SNPs$
Results$ obtained$ with$ the$ gene$ centric$ approach$ although$ having$ intrinsic$
limitations$are$not$ influenced$by$a$number$of$subjective$choices$often$made$in$
GWAS$ For$ instance$when$ ldquowindow$mappingrdquo$ is$ used$ to$decrease$ the$noise$ to$
signal$ ratio$ eg$ due$ to$ nearby$ markers$ having$ different$ allele$ frequencies$
windows$size$ is$ to$be$decided$This$can$be$based$on$equal$number$of$markers$
equal$number$of$base$pairs$or$equal$map$distance$in$cM$per$window$The$three$
choices$ do$ not$ coincide$ because$ markers$ are$ not$ equally$ spread$ along$ the$
chromosomes$ and$ recombination$ differs$ in$ different$ genomic$ regions$ In$
addition$ windows$ can$ be$ chosen$ nonBoverlapping$ or$ with$ different$ degree$ of$
overlap$Finally$when$stepping$ from$significant$SNPs$or$windows$ to$candidate$
genes$the$size$of$the$flanking$region$from$which$picking$up$candidate$genes$is$to$
be$decided$as$well$Sometimes$the$gene$closer$to$the$signal$peak$is$selected$ in$
other$cases$all$genes$within$certain$boundaries$are$included$in$further$analyses$
All$ these$ choices$ may$ generate$ misleading$ results$ For$ example$ not$ truly$
associated$genes$may$be$retained$for$further$analyses$disregarding$the$correct$
ones$ In$addition$ a$ large$number$of$ false$positives$can$be$ included$ to$be$ later$
interpreted$with$ the$help$of$network$and$pathway$analysis$As$ a$ consequence$
these$decisions$have$a$direct$impact$in$terms$of$production$and$interpretation$of$
66
$
results$leading$to$conclusions$heavily$based$on$the$enrichment$analysis$and$preB
existing$knowledge$$
With$ the$ gene$ based$ association$ we$ also$ noticed$ some$ advantages$ in$ the$
downstream$data$mining$First$genes$are$functional$units$and$the$golden$targets$
of$ the$ ldquoclassical$ postBGWASrdquo$ bioinformatics$ pipelines$ Hence$ the$ gene$ based$
association$leads$to$genes$with$an$embedded$statistically$robust$framework$This$
is$ a$ ldquoplusrdquo$ in$ studying$ complex$ traits$ because$ one$ can$ concentrate$ on$ specific$
signals$ rather$ than$ cherry$ picking$meaningful$ features$ from$ a$ list$ constructed$
with$the$previously$cited$approaches$$
Moreover$it$is$easier$to$track$back$gene$clusters$to$major$gene$effects$B$like$for$
example$DGAT1$in$a$gene$rich$genome$chunk$many$genes$may$result$significant$
due$to$the$same$SNP$(Table$S2$and$Table$3)$With$the$VEGAS$method$finding$the$
ldquotruerdquo$association$is$facilitated$by$following$the$ldquobest$SNPrdquo$in$the$region$
In$addition$we$found$phenotype$coherent$signals$in$such$situation$where$only$a$
few$SNPs$B$ if$none$ndash$are$above$an$acceptable$statistical$genomeBwide$threshold$
using$a$single$SNP$regression$method$
The$gene$centric$approach$ is$a$ first$and$slight$ improvement$ in$ livestock$GWAS$
analyses$since$we$still$have$a$partial$genome$annotation$ In$addition$ it$ is$now$
known$that$ the$definition$of$gene$ is$ to$be$revised$many$genes$do$not$code$ for$
proteins$ have$ alternative$ starting$ sites$ have$ antisense$ transcripts$ host$
regulatory$ RNA$ are$ subjected$ to$ alternative$ splicing$ and$may$ be$ regulated$ by$
sequences$quite$ far$ away$ from$ their$ coding$ regions$ (Consortium$2012)$Other$
limitations$are$the$need$for$larger$number$of$animals$in$these$type$of$studies$as$
in$this$study$and$the$still$relatively$high$cost$of$sequencing$that$cannot$be$used$
routinely$in$large$experiments$at$least$not$yet$$
$
$
Conclusions
This$study$underlines$the$importance$of$the$implementation$of$new$methods$to$
analyze$complex$traits$with$genomic$data$No$golden$path(s)$is$discovered$yet$to$
walk$from$variation$to$functional$units$The$geneBcentric$analysis$may$contribute$
67
$
to$ reveal$ new$ signals$ throughout$ the$ bovine$ genome$ condensing$ weak$
association$signals$in$chromosome$bits$having$functional$significance$
$
$
Acknowledgements
The$Authors$would$ like$ to$ thank$Dr$ Jimmy$ Liu$ for$ his$ advices$ in$ building$ the$
software$AIA$ANAFI$ANARB$and$ANAPRI$breedersrsquo$associations$ for$ their$help$
and$ support$during$ the$data$analysis$ SelMol$ and$ProZoo$projects$ that$provide$
samples$ The$ Authors$ are$ also$ grateful$ to$MIPAAF$ and$ Cariplo$ Foundation$ for$
funding$
$
$
$
References
Akula$ N$ Baranova$ A$ Seto$ D$ Solka$ J$ Nalls$MA$ Singleton$ A$ Ferrucci$ L$
Tanaka$T$Bandinelli$ S$Cho$YS$Kim$YJ$Lee$ JBY$Han$BBG$Bipolar$
Disorder$ Genome$ Study$ (BiGS)$ Consortium$ The$ Wellcome$ Trust$ CaseB
Control$Consortium$McMahon$FJ$2011$A$NetworkBBased$Approach$to$
Prioritize$ Results$ from$ GenomeBWide$ Association$ Studies$ PLoS$ ONE$ 6$
e24220$doi101371journalpone0024220$
Amin$N$ van$Duijn$ CM$ Aulchenko$ YS$ 2007$ A$Genomic$Background$Based$
Method$ for$ Association$ Analysis$ in$ Related$ Individuals$ PLoS$ ONE$ 2$
e1274$doi101371journalpone0001274$
Aulchenko$YS$de$Koning$DBJ$Haley$C$2007$Genomewide$Rapid$Association$
Using$ Mixed$ Model$ and$ Regression$ A$ Fast$ and$ Simple$ Method$ For$
Genomewide$PedigreeBBased$Quantitative$Trait$Loci$Association$Analysis$
Genetics$177$577ndash585$doi101534genetics107075614$
Blanco$ EJ$ CarreteroBHernaacutendez$ M$ GarciacuteaBBarrado$ J$ IglesiasBOsma$ MC$
Carretero$M$Herrero$JJ$Rubio$M$Riesco$JM$Carretero$J$2013$The$
activity$and$proliferation$of$pituitary$prolactinBpositive$cells$and$pituitary$
68
$
VIPBpositive$ cells$ are$ regulated$by$ interleukin$6$Histol$Histopathol$ 28$
1595ndash1604$
Blott$S$Kim$JBJ$Moisio$S$SchmidtBKuntzel$A$Cornet$A$Berzi$P$Cambisano$
N$Ford$C$Grisart$B$Johnson$D$Karim$L$Simon$P$Snell$R$Spelman$
R$ Wong$ J$ Vilkki$ J$ Georges$ M$ Farnir$ F$ Coppieters$ W$ 2003$
Molecular$ dissection$ of$ a$ quantitative$ trait$ locus$ a$ phenylalanineBtoB
tyrosine$substitution$in$the$transmembrane$domain$of$the$bovine$growth$
hormone$ receptor$ is$ associated$ with$ a$ major$ effect$ on$ milk$ yield$ and$
composition$Genetics$163$253ndash266$
Burton$PR$Clayton$DG$Cardon$LR$Craddock$N$Deloukas$P$Duncanson$A$
Kwiatkowski$ DP$ McCarthy$ MI$ Ouwehand$ WH$ Samani$ NJ$ Todd$
JA$ Donnelly$ P$ Barrett$ JC$ Burton$ PR$ Davison$ D$ Donnelly$ P$
Easton$ D$ Evans$ D$ Leung$ HBT$ Marchini$ JL$ Morris$ AP$ Spencer$
CCA$Tobin$MD$Cardon$LR$Clayton$DG$Attwood$AP$Boorman$JP$
Cant$B$Everson$U$Hussey$JM$Jolley$JD$Knight$AS$Koch$K$Meech$
E$ Nutland$ S$ Prowse$ CV$ Stevens$ HE$ Taylor$ NC$ Walters$ GR$
Walker$NM$Watkins$NA$Winzer$T$Todd$JA$Ouwehand$WH$Jones$
RW$ McArdle$ WL$ Ring$ SM$ Strachan$ DP$ Pembrey$ M$ Breen$ G$
Clair$ DS$ Caesar$ S$ GordonBSmith$ K$ Jones$ L$ Fraser$ C$ Green$ EK$
Grozeva$D$Hamshere$ML$Holmans$PA$Jones$IR$Kirov$G$Moskvina$
V$ Nikolov$ I$ OrsquoDonovan$ MC$ Owen$ MJ$ Craddock$ N$ Collier$ DA$
Elkin$A$Farmer$A$Williamson$R$McGuffin$P$Young$AH$Ferrier$IN$
Ball$SG$Balmforth$AJ$Barrett$JH$Bishop$DT$Iles$MM$Maqbool$A$
Yuldasheva$N$Hall$AS$Braund$PS$Burton$PR$Dixon$RJ$Mangino$
M$ Stevens$ S$ Tobin$ MD$ Thompson$ JR$ Samani$ NJ$ Bredin$ F$
Tremelling$ M$ Parkes$ M$ Drummond$ H$ Lees$ CW$ Nimmo$ ER$
Satsangi$ J$Fisher$SA$Forbes$A$Lewis$CM$Onnie$CM$Prescott$NJ$
Sanderson$J$Mathew$CG$Barbour$J$Mohiuddin$MK$Todhunter$CE$
Mansfield$JC$Ahmad$T$Cummings$FR$Jewell$DP$Webster$J$Brown$
MJ$ Clayton$DG$ Lathrop$GM$ Connell$ J$Dominiczak$A$ Samani$NJ$
Marcano$CAB$Burke$B$Dobson$R$Gungadoo$J$Lee$KL$Munroe$PB$
Newhouse$SJ$Onipinla$A$Wallace$C$Xue$M$Caulfield$M$Farrall$M$
Barton$A$ (braggs)$TB$ in$RG$and$G$Bruce$ IN$Donovan$H$Eyre$S$
69
$
Gilbert$ PD$ Hider$ SL$ Hinks$ AM$ John$ SL$ Potter$ C$ Silman$ AJ$Symmons$ DPM$ Thomson$ W$ Worthington$ J$ Clayton$ DG$ Dunger$DB$ Nutland$ S$ Stevens$ HE$ Walker$ NM$ Widmer$ B$ Todd$ JA$Frayling$TM$Freathy$RM$Lango$H$Perry$JRB$Shields$BM$Weedon$MN$ Hattersley$ AT$ Hitman$ GA$ Walker$ M$ Elliott$ KS$ Groves$ CJ$Lindgren$ CM$ Rayner$ NW$ Timpson$ NJ$ Zeggini$ E$ McCarthy$ MI$Newport$M$Sirugo$G$Lyons$E$Vannberg$F$Hill$AVS$Bradbury$LA$Farrar$ C$ Pointon$ JJ$ Wordsworth$ P$ Brown$ MA$ Franklyn$ JA$Heward$JM$Simmonds$MJ$Gough$SCL$Seal$S$(uk)$BCSC$Stratton$MR$Rahman$N$Ban$M$Goris$A$Sawcer$SJ$Compston$A$Conway$D$Jallow$ M$ Newport$ M$ Sirugo$ G$ Rockett$ KA$ Kwiatkowski$ DP$Bumpstead$ SJ$ Chaney$A$Downes$K$ Ghori$MJR$ Gwilliam$R$Hunt$SE$Inouye$M$Keniry$A$King$E$McGinnis$R$Potter$S$Ravindrarajah$R$ Whittaker$ P$ Widden$ C$ Withers$ D$ Deloukas$ P$ Leung$ HBT$Nutland$S$Stevens$HE$Walker$NM$Todd$JA$Easton$D$Clayton$DG$Burton$PR$Tobin$MD$Barrett$JC$Evans$D$Morris$AP$Cardon$LR$Cardin$NJ$Davison$D$Ferreira$T$PereiraBGale$ J$Hallgrimsdoacutettir$ IB$Howie$ BN$Marchini$ JL$ Spencer$ CCA$ Su$ Z$ Teo$ YY$ Vukcevic$ D$Donnelly$P$Bentley$D$Brown$MA$Cardon$LR$Caulfield$M$Clayton$DG$ Compston$ A$ Craddock$ N$ Deloukas$ P$ Donnelly$ P$ Farrall$ M$Gough$ SCL$ Hall$ AS$ Hattersley$ AT$ Hill$ AVS$ Kwiatkowski$ DP$Mathew$ CG$McCarthy$MI$ Ouwehand$WH$ Parkes$M$ Pembrey$M$Rahman$N$Samani$NJ$Stratton$MR$Todd$ JA$Worthington$ J$2007$GenomeBwide$ association$ study$ of$ 14000$ cases$ of$ seven$ common$diseases$ and$ 3000$ shared$ controls$ Nature$ 447$ 661ndash678$doi101038nature05911$
Cantor$ RM$ Lange$ K$ Sinsheimer$ JS$ 2010$ Prioritizing$ GWAS$ Results$ A$Review$ of$ Statistical$ Methods$ and$ Recommendations$ for$ Their$Application$Am$J$Hum$Genet$86$6ndash22$doi101016jajhg200911017$
Clark$ AG$ Hubisz$ MJ$ Bustamante$ CD$ Williamson$ SH$ Nielsen$ R$ 2005$Ascertainment$ bias$ in$ studies$ of$ human$ genomeBwide$ polymorphism$Genome$Res$15$1496ndash1502$doi101101gr4107905$
70
$
CohenBZinder$M$ Seroussi$ E$ Larkin$ DM$ Loor$ JJ$Wind$ AE$ der$ Lee$ JBH$Drackley$ JK$Band$MR$Hernandez$AG$Shani$M$Lewin$HA$Weller$JI$ Ron$ M$ 2005$ Identification$ of$ a$ missense$ mutation$ in$ the$ bovine$ABCG2$gene$with$a$major$effect$on$ the$QTL$on$ chromosome$6$affecting$milk$yield$and$composition$in$Holstein$cattle$Genome$Res$15$936ndash944$doi101101gr3806705$
Cole$JB$Wiggans$GR$Ma$L$Sonstegard$TS$Lawlor$TJ$Crooker$BA$Tassell$CPV$ Yang$ J$ Wang$ S$ Matukumalli$ LK$ Da$ Y$ 2011$ GenomeBwide$association$ analysis$ of$ thirty$ one$ production$ health$ reproduction$ and$body$ conformation$ traits$ in$ contemporary$ US$ Holstein$ cows$ BMC$Genomics$12$408$doi1011861471B2164B12B408$
Colitti$M$Wilde$CJ$Stefanon$B$2004$Functional$expression$of$bclB2$protein$family$and$AIF$in$bovine$mammary$tissue$in$early$lactation$J$Dairy$Res$71$20ndash27$
Consortium$ TEP$ 2012$ An$ integrated$ encyclopedia$ of$ DNA$ elements$ in$ the$human$genome$Nature$489$57ndash74$doi101038nature11247$
De$Mello$F$Cobuci$JA$Martins$MF$Silva$MVGB$Neto$JB$2012$Association$of$the$polymorphism$g8514CgtT$in$the$osteopontin$gene$(SPP1)$with$milk$yield$ in$ the$ dairy$ cattle$ breed$ Girolando$ Anim$ Genet$ 43$ 647ndash648$doi101111j1365B2052201102312x$
Dudbridge$ F$ Gusnanto$ A$ 2008$ Estimation$ of$ significance$ thresholds$ for$genomewide$ association$ scans$ Genet$ Epidemiol$ 32$ 227ndash234$doi101002gepi20297$
Elsik$ CG$ Tellam$ RL$ Worley$ KC$ 2009$ The$ Genome$ Sequence$ of$ Taurine$Cattle$ A$window$ to$ ruminant$ biology$ and$ evolution$ Science$ 324$ 522ndash528$doi101126science1169588$
Fazioli$ F$ Minichiello$ L$ Matoska$ V$ Castagnino$ P$ Miki$ T$ Wong$ WT$ Di$Fiore$ PP$ 1993$ Eps8$ a$ substrate$ for$ the$ epidermal$ growth$ factor$receptor$kinase$enhances$EGFBdependent$mitogenic$signals$EMBO$J$12$3799ndash3808$
Fontanesi$ L$ Calograve$ DG$ Galimberti$ G$ Negrini$ R$ Marino$ R$ Nardone$ A$AjmoneBMarsan$ P$ Russo$ V$ 2014$ A$ candidate$ gene$ association$ study$
71
$
for$ nine$ economically$ important$ traits$ in$ Italian$ Holstein$ cattle$ Anim$
Genet$nandashna$doi101111age12164$
Fowler$KJ$Walker$F$Alexander$W$Hibbs$ML$Nice$EC$Bohmer$RM$Mann$
GB$ Thumwood$ C$ Maglitto$ R$ Danks$ JA$ 1995$ A$ mutation$ in$ the$
epidermal$growth$factor$receptor$in$wavedB2$mice$has$a$profound$effect$
on$ receptor$ biochemistry$ that$ results$ in$ impaired$ lactation$ Proc$ Natl$
Acad$Sci$92$1465ndash1469$doi101073pnas9251465$
Gianola$ D$ Hospital$ F$ Verrier$ E$ 2013$ Contribution$ of$ an$ additive$ locus$ to$
genetic$variance$when$inheritance$is$multiBfactorial$with$implications$on$
interpretation$ of$ GWAS$ Theor$ Appl$ Genet$ 126$ 1457ndash1472$
doi101007s00122B013B2064B2$
Gillette$ M$ Bray$ K$ Blumenthaler$ A$ VargoBGogola$ T$ 2013$ P190B$ RhoGAP$
Overexpression$ in$ the$ Developing$Mammary$ Epithelium$ Induces$ TGFβB
dependent$ Fibroblast$ Activation$ PLoS$ ONE$ 8$ e65105$
doi101371journalpone0065105$
Grisart$B$Coppieters$W$Farnir$F$Karim$L$Ford$C$Berzi$P$Cambisano$N$
Mni$ M$ Reid$ S$ Simon$ P$ Spelman$ R$ Georges$ M$ Snell$ R$ 2002$
Positional$Candidate$Cloning$of$a$QTL$ in$Dairy$Cattle$ Identification$of$a$
Missense$Mutation$ in$the$Bovine$DGAT1$Gene$with$Major$Effect$on$Milk$
Yield$ and$ Composition$ Genome$ Res$ 12$ 222ndash231$
doi101101gr224202$
Hubbard$NE$Chen$QJ$Sickafoose$LK$Wood$MB$Gregg$ JP$Abrahamsson$
NM$ Engelberg$ JA$ Walls$ JE$ Borowsky$ AD$ 2013$ Transgenic$
Mammary$Epithelial$Osteopontin$(Spp1)$Expression$Induces$Proliferation$
and$ Alveologenesis$ Genes$ Cancer$ 4$ 201ndash212$
doi1011771947601913496813$
Hu$ ZBL$ Park$ CA$ Wu$ XBL$ Reecy$ JM$ 2013$ Animal$ QTLdb$ an$ improved$
database$tool$for$livestock$animal$QTLassociation$data$dissemination$in$
the$ postBgenome$ era$ Nucleic$ Acids$ Res$ 41$ D871ndashD879$
doi101093nargks1150$
Jeon$et$al$ B$2008$ B$ProtoBoncogene$FBIB1$ (PokemonZBTB7A)$Represses$Trpdf$
nd$
72
$
Jiang$ L$ Liu$ J$ Sun$ D$ Ma$ P$ Ding$ X$ Yu$ Y$ Zhang$ Q$ 2010$ Genome$Wide$
Association$ Studies$ for$ Milk$ Production$ Traits$ in$ Chinese$ Holstein$
Population$PLoS$ONE$5$e13661$doi101371journalpone0013661$
Kong$ PBZ$ Yang$ F$ Li$ L$ Li$ XBQ$ Feng$ YBM$ 2013$Decreased$FOXF2$mRNA$
Expression$ Indicates$ EarlyBOnset$ Metastasis$ and$ Poor$ Prognosis$ for$
Breast$ Cancer$ Patients$ with$ Histological$ Grade$ II$ Tumor$ PLoS$ ONE$ 8$
e61591$doi101371journalpone0061591$
Lee$SBU$Maeda$T$2012$POKZBTB$proteins$an$emerging$ family$of$proteins$
that$ regulate$ lymphoid$ development$ and$ function$ Immunol$ Rev$ 247$
107ndash119$
Lemay$ DG$ Lynn$ DJ$ Martin$ WF$ Neville$ MC$ Casey$ TM$ Rincon$ G$
Kriventseva$ EV$ Barris$ WC$ Hinrichs$ AS$ Molenaar$ AJ$ 2009$ The$
bovine$lactation$genome$ insights$ into$the$evolution$of$mammalian$milk$
Genome$Biol$10$R43$
Liu$JZ$Mcrae$AF$Nyholt$DR$Medland$SE$Wray$NR$Brown$KM$2010$A$
versatile$ geneBbased$ test$ for$ genomeBwide$ association$ studies$ Am$ J$
Hum$Genet$87$139$
Liu$S$Dontu$G$Mantle$ID$Patel$S$Ahn$N$Jackson$KW$Suri$P$Wicha$MS$
2006$Hedgehog$Signaling$and$BmiB1$Regulate$SelfBrenewal$of$Normal$and$
Malignant$ Human$ Mammary$ Stem$ Cells$ Cancer$ Res$ 66$ 6063ndash6071$
doi1011580008B5472CANB06B0054$
Lu$ J$ Holmgren$ A$ 2014$ The$ thioredoxin$ superfamily$ in$ oxidative$ protein$
folding$ Antioxid$ Redox$ Signal$ 140202091634000$
doi101089ars20145849$
Luna$ A$ Nicodemus$ KK$ 2007$ snpplotter$ an$ RBbased$ SNPhaplotype$
association$and$linkage$disequilibrium$plotting$package$Bioinforma$Oxf$
Engl$23$774ndash776$doi101093bioinformaticsbtl657$
McCarthy$ MI$ Abecasis$ GR$ Cardon$ LR$ Goldstein$ DB$ Little$ J$ Ioannidis$
JPA$ Hirschhorn$ JN$ 2008$ GenomeBwide$ association$ studies$ for$
complex$traits$consensus$uncertainty$and$challenges$Nat$Rev$Genet$9$
356ndash369$doi101038nrg2344$
Meredith$ BK$ Kearney$ FJ$ Finlay$ EK$ Bradley$ DG$ Fahey$ AG$ Berry$ DP$
Lynn$ DJ$ 2012$ GenomeBwide$ associations$ for$ milk$ production$ and$
73
$
somatic$cell$score$in$HolsteinBFriesian$cattle$in$Ireland$BMC$Genet$13$21$doi1011861471B2156B13B21$
Minozzi$ G$Nicolazzi$ EL$ Stella$ A$ Biffani$ S$Negrini$ R$ Lazzari$ B$ AjmoneBMarsan$ P$Williams$ JL$ 2013$ Genome$Wide$ Analysis$ of$ Fertility$ and$Production$ Traits$ in$ Italian$ Holstein$ Cattle$ PLoS$ ONE$ 8$ e80219$doi101371journalpone0080219$
Nicolazzi$EL$Picciolini$M$Strozzi$F$Schnabel$RD$Lawley$C$Pirani$A$Brew$F$ Stella$ A$ 2014$ SNPchiMp$ a$ database$ to$ disentangle$ the$ SNPchip$jungle$ in$ bovine$ livestock$ BMC$ Genomics$ 15$ 123$ doi1011861471B2164B15B123$
Purcell$ S$ Neale$ B$ ToddBBrown$ K$ Thomas$ L$ Ferreira$ MAR$ Bender$ D$Maller$J$Sklar$P$de$Bakker$PIW$Daly$MJ$Sham$PC$2007$PLINK$a$tool$ set$ for$ wholeBgenome$ association$ and$ populationBbased$ linkage$analyses$Am$J$Hum$Genet$81$559ndash575$doi101086519795$
Qiu$ S$ Lai$ L$ 2013$ Antibacterial$ properties$ of$ recombinant$ human$ nonBpancreatic$secretory$phospholipase$A2$Biochem$Biophys$Res$Commun$441$453ndash456$doi101016jbbrc201310092$
Quinlan$AR$Hall$IM$2010$BEDTools$a$flexible$suite$of$utilities$for$comparing$genomic$ features$ Bioinformatics$ 26$ 841ndash842$doi101093bioinformaticsbtq033$
RamiacuterezBTorres$ A$ BarceloacuteBBatllori$ S$ MartiacutenezBBeamonte$ R$ Navarro$ MA$Surra$ JC$ Arnal$ C$ Guilleacuten$N$ Aciacuten$ S$ Osada$ J$ 2012$ Proteomics$ and$gene$ expression$ analyses$ of$ squaleneBsupplemented$ mice$ identify$microsomal$thioredoxin$domainBcontaining$protein$5$changes$associated$with$ hepatic$ steatosis$ J$ Proteomics$ 77$ 27ndash39$doi101016jjprot201207001$
Raven$ LBA$ Cocks$ BG$ Pryce$ JE$ Cottrell$ JJ$ Hayes$ BJ$ 2013$ Genes$ of$ the$RNASE5$pathway$ contain$ SNP$ associated$with$milk$ production$ traits$ in$dairy$cattle$Genet$Sel$Evol$45$25$doi1011861297B9686B45B25$
Scotti$E$Fontanesi$L$Schiavini$F$La$Mattina$V$Bagnato$A$Russo$V$2010$DGAT1$ pK232A$ polymorphism$ in$ dairy$ and$ dual$ purpose$ Italian$ cattle$breeds$Ital$J$Anim$Sci$9$doi104081ijas2010e16$
74
$
Stefanon$ B$ Colitti$ M$ Gabai$ G$ Knight$ CH$ Wilde$ CJ$ 2002$ Mammary$
apoptosis$and$lactation$persistency$in$dairy$animals$J$Dairy$Res$69$37ndash
52$
Stegh$ AH$ DePinho$ RA$ 2011$ Beyond$ effector$ caspase$ inhibition$ Bcl2L12$
neutralizes$ p53$ signaling$ in$ glioblastoma$ Cell$ Cycle$ 10$ 33ndash38$
doi104161cc10114365$
Taberlet$ P$ Valentini$ A$ Rezaei$ HR$ Naderi$ S$ Pompanon$ F$ Negrini$ R$
AjmoneBMarsan$ P$ 2008$ Are$ cattle$ sheep$ and$ goats$ endangered$
species$Mol$Ecol$17$275ndash284$doi101111j1365B294X200703475x$
Team$RC$ 2014$ R$ A$ Language$ and$Environment$ for$ Statistical$ Computing$ R$
Foundation$for$Statistical$Computing$Vienna$Austria$
Thomadaki$H$ Talieri$M$ Scorilas$ A$ 2007$ Prognostic$ value$ of$ the$ apoptosis$
related$genes$BCL2$and$BCL2L12$in$breast$cancer$Cancer$Lett$247$48ndash
55$doi101016jcanlet200603016$
Valdehita$ A$ Bajo$ AM$ FernaacutendezBMartiacutenez$ AB$ Arenas$ MI$ Vacas$ E$
Valenzuela$ P$ RuiacutezBVillaespesa$ A$ Prieto$ JC$ Carmena$ MJ$ 2010$
Nuclear$ localization$ of$ vasoactive$ intestinal$ peptide$ (VIP)$ receptors$ in$
human$ breast$ cancer$ Peptides$ 31$ 2035ndash2045$
doi101016jpeptides201007024$
Wang$T$Liu$NS$Seet$LBF$Hong$W$2010$The$Emerging$Role$of$VHS$DomainB
Containing$Tom1$Tom1L1$and$Tom1L2$in$Membrane$Trafficking$Traffic$
11$1119ndash1128$doi101111j1600B0854201001098x$
Wang$ X$Wurmser$ C$ Pausch$ H$ Jung$ S$ Reinhardt$ F$ Tetens$ J$ Thaller$ G$
Fries$R$2012$Identification$and$Dissection$of$Four$Major$QTL$Affecting$
Milk$Fat$Content$in$the$German$HolsteinBFriesian$Population$PLoS$ONE$7$
e40711$doi101371journalpone0040711$
Xu$S$2003$Theoretical$Basis$of$the$Beavis$Effect$Genetics$165$2259ndash2268$
Zimin$AV$Delcher$AL$Florea$L$Kelley$DR$Schatz$MC$Puiu$D$Hanrahan$
F$ Pertea$ G$ Tassell$ CPV$ Sonstegard$ TS$ Marccedilais$ G$ Roberts$ M$
Subramanian$ P$ Yorke$ JA$ Salzberg$ SL$ 2009$ A$ wholeBgenome$
assembly$ of$ the$ domestic$ cow$ Bos$ taurus$ Genome$ Biol$ 10$ R42$
doi101186gbB2009B10B4Br42$
75
$
Zu$X$Ma$J$Liu$H$Liu$F$Tan$C$Yu$L$Wang$J$Xie$Z$Cao$D$Jiang$Y$2011$ProBoncogene$ Pokemon$ promotes$ breast$ cancer$ progression$ by$upregulating$ survivin$ expression$ Breast$ Cancer$ Res$ 13$ R26$doi101186bcr2843$ $
76
$
Supplementaryfiles$
You$can$find$Chapter$͵$supplementary$files$at$
- DocTa$(Doctoral$Thesis$Archive)$(httptesionlineunicattit)$
- Google$Drive$(httptinyurlcomPhDMMChapter03)$or$scan$the$QR$code$
$
$
SupplementaryFigureS1Multidimensionalscaling
Spatial$distribution$of$genetic$structure$in$the$analyzed$breeds$$
$
Supplementary$FigureS2GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$Italian$Brown$
$
Supplementary$FigureS3GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$Italian$Holstein$
$
Supplementary$FigureS4GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$HolsteinDGAT$
$
Supplementary$FigureS5GWASplot
Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$
Protein$percentage$in$the$Italian$Simmental$
77
$
$Supplementary$FigureS6LDdecayGenomeBwide$LD$decay$for$the$three$breeds$(Holstein$Simmental$Brown)$$Supplementary$FigureS7DescriptiveplotsDescriptive$statistics$for$the$Italian$Brown$Italian$Holstein$and$Italian$Simmental$$Supplementary$FigureS8DGAT1LDFocus$on$the$LD$of$the$DGAT1$region$in$all$the$datasets$(Brown$Holstein$HolsteinDGAT$Simmental)$$Supplementary$TableS1SNPfactNumber$of$subjects$and$SNPs$after$QC$$Supplementary$TableS2GeneBasedAssociationresultsFull$results$of$the$significant$genes$associated$with$the$traits$in$all$the$datasets$$ Sheet$1$Brown$ndash$fat$percentage$$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS3SingleSNPAssociationresultsFull$results$of$the$SNP$that$overcome$the$suggestive$threshold$$$ Sheet$1$Brown$ndash$fat$percentage$
78
$
$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS4GenesandtraitsGeneWise$pBvalues$of$the$discussed$genes$in$all$the$analysis$In$grey$are$evidenced$the$pBvalues$that$overcome$the$significance$threshold$Genes$are$alphabetically$listed$$Supplementary$TableS5GenesinQTLsGenes$from$the$geneBbased$association$mapping$into$known$QTLs$$$$$
79
CHAPTER(4(
MUGBAS(a(species(free(gene(based(association(suite(
S(Capomaccio1(M(Milanesi1(L(Bomba1(et(al(1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122ItalyTheseauthorscontributedequallytothiswork
____________________ApplicationnotesubmittedtoBioinformatics
Abstract(Theassociationstudiesbetweengenomewidemarkersandphenotypes(GWAS)
arenowwidelyusedinmodelandnonmodelspeciesHowevertoolsthatmerge
singlemarkerresultsandtheirprobabilityofbeingassociatedtofunctionalunits
(typicallygenes)areseldomdevelopedforspeciesotherthanhuman
Herewe presentMUGBAS a suite that calculates a generegionbased pvalue
fromagivenannotationsinglemarkerGWAresultandgenotypedatawiththe
objective to ease postGWAS analysis The software is species and annotation
independentfasthighlyparallelizedandreadyforhighdensitymarkerstudies
Availabilityandimplementationhttpsbitbucketorgcapemastermugbas
Introduction(With the availability of highdensity SNP panels Genome Wide Association
Studies (GWAS) have become the gold standard for dissecting the biology of
complextraits(McCarthyetal2008)Thisistrueforhumansandotherspecies
TheunderlyingideaofGWASissimplefindmarkertraitassociationsexploiting
the linkage disequilibrium (LD) that exists between the causative mutation ndash
whichweignorendashandoneormoremarkersofknownchromosomalposition
Whileanumberofwellestablishedapproachescanbeeasilyusedinastandard
GWAS (Aulchenko et al 2007 Kang et al 2010 Purcell et al 2007) to find
marker(s) associated to a particular phenotype several issues have to be
accountedforduringtheupanddownstreamanalyses
Firstly genome wide studies have to take into account multiple testing If
stringent statistical thresholds are applied a proportion ofmoderate or small
effectsactuallyassociatedwiththetraitwillbedisregardedThereforereducing
the number of tests while maintaining all the information provided by high
densitypanelswouldbeaclearadvantageInadditiononceasignificantsignalis
detected different strategiesmaybeused to tag the candidate causative gene
Sometimesthegeneclosertothepeaksignalisselectedinothercasesallgenes
withinaflankingregionofanarbitrarysizeofthesignificantSNP(s)areincluded
82
infurtheranalysesThesechoicesmayendupinbiasresultseitherretaininga
largenumberofnottrulyassociatedgenesordisregardingthecorrectones
Somemethodshaverecentlybeenproposedtoovercometheselimitationsusing
aprioriknowledgeastheldquocandidatepathwayrdquoanalysis(Ravenetal2013)and
the ldquogenebasedrdquo association strategies (Akula et al 2011 Cantor et al 2010
Liuetal2010)TheseapproachesrestrictGWAstudiesonlytoknowngenesor
genesubsetsresizingthemultipletestingproblemandincreasingthepowerof
theanalyses
Usually bioinformatics pipelines are developed for the analysis of human and
modelorganismsandarenotadaptedtononmodelorganismThisisthecaseof
the VEGAS software (Liu et al 2010 Yang et al 2014) developed only for
humansandonlyforgenebasedanalyses
Here we introduce MUGBAS [MUlti species GeneBased Association Suite] a
softwarebasedonVEGASthatusesGWASdataandagivenannotation(typically
genesbutanyregionmaybesuitable)toestimategenebasedandregionbased
associationpvaluesThesuite is speciesandannotation free ready toanalyze
highdensitydatafromanyexperimentaldesigns
Methods(MUGBASisasuiteofscriptsbuiltinBashRPythonandPerlThesuiterequires
several prerequisites that can be easily fulfilled in a LinuxUnixMac
environmentwith a few command lines Further information on how tomeet
theserequirementsaredetailedintheonlinerepository
MUGBAS suite can be invoked launching the Bash wrapper that checks
dependenciesandstoresuserchoices
Required input files are MAPPED files in PLINK format with genotype data
from(atleast200individualsarerecommendedforaproperLDcalculation)an
annotation file inBED format (chromosome startingposition endingposition
nameandusercustomfields)ofthedesiredgenomefeatureandtheGWASresult
file with mandatory headers ldquoSNPNAMErdquo and ldquoPVALUErdquo Multiple association
resultscanbetestedatthesametime ifprovidedinseparatefiles inthesame
directoryAtrialdatasetisdownloadablealongwiththesoftwarereleasePlease
83
notethatforsimplicityweprovideageneannotationbutanyBEDfilewithany
desiredfeaturecanbeused
The program starts the analysis assigning SNPs to any feature of the given
annotationusingbedtools(QuinlanandHall2010)Thisstepiscustomizableby
user defined boundaries (up and downstream) in order to catch regulatory
regionsthatcouldbeinhighLDwiththecurrentgeneregion
Inordertospeeduptheanalysistheprogramsubsamples200individualsfrom
theinputdatasetWethenimplementedtwolevelsofparallelizationoneforthe
LDandonefortheldquogenewiserdquoorldquoregionwiserdquopvaluecalculationThelatteris
particularlyusefulanalyzingmorethanoneGWAresultsfromthesamedataset
BrieflyMUGBASfirstsplitstheLDcalculationinncores(asdefinedbytheuser)
using the foreach doParallel packages (Analytics andWeston 2014a 2014b)
and then distributes traits across cores using GNU Parallel and foreach
doParallel
SinceLDcalculationisaslowprocessVEGASbenefitsofHapMapphase2datato
speed up the processwhile preserving the option for a user defined (human)
population using PLINK In order to expand these analysis to any species LD
calculationinMUGBASisimplementedwithther2fast()functionfromGenABEL
package(Aulchenkoetal2007)whilethegenewisepvalueiscalculatedasin
theVEGASapproachBrieflythegenewiseteststatisticisthesumofthenupper
tailchisquaredvaluesofaSNPsubsetwithonedegreeoffreedom(wheren is
thenumberofSNPmappinginthegivengene)Withperfectlinkageequilibrium
the genewise pvalue would be the onetailed pvalue of a chisquare
distributionwithndegreesof freedomOtherwisemorelikelythedistribution
of the pvalues has to be evaluated (with simulations) andweighted (with LD
values)(Liuetal2010)
Once every generegion has its ldquogenewiserdquo pvalue a FDR qvalue (False
Discovery Rate) statistics is calculated with the base R function padjust()
Results are stored and enrichedwith ldquomanhattanrdquo and ldquoundergroundrdquo plots of
theldquogenewiserdquopvalueandtheqvaluerespectively
84
Results(MUGBAS output is generegionoriented with useful information about the
location of the analyzed feature the ldquoBest SNPrdquo within that feature with its
genomiccoordinatesandtheoriginalpvaluethegeneorregionwisestatistics
(pvalue number of simulations and qvalue) the custom annotation of that
particularentryandthepositionoftheSNPwithrespecttothefeatureandthe
decidedboundariesAll this informationareuseful to track theeffectofhighly
significant markers that could map into multiple region of interest and
effectively pinpoint the most probable location to focus on with data mining
processes
A typical analysis performed with a highdensity panel (gt600K markers)
distributed in 10 midperformance cores (Intel Xeon X5675 307 GHz) on
threetraitssimultaneouslytakesaboutfourhours
Discussion(Generegionbased methods are now widely used and accepted in genomic
research However non modelspecies suffer for the lack of availability of
specifictoolstoenhancediscoveryfromgenomewidedataMUGBASprovidesa
robust and accepted statistics to researchers dealing with species other than
humantoeasilyjumpfromassociatedmarkerstothedesiredfunctionalunits
Acknowledgements(((WewouldliketothankDrJimmyLiuforhisadvicesinbuildingthesuite
Reference(Akula N Baranova A Seto D Solka J NallsMA Singleton A Ferrucci L
TanakaTBandinelli SChoYSKimYJLee JYHanBGBipolar
Disorder Genome Study (BiGS) Consortium The Wellcome Trust Case
85
ControlConsortiumMcMahonFJ2011ANetworkBasedApproachtoPrioritize Results from GenomeWide Association Studies PLoS ONE 6e24220doi101371journalpone0024220
Analytics R Weston S 2014a doParallel Foreach parallel adaptor for theparallelpackage
AnalyticsRWestonS2014bforeachForeachloopingconstructforRAulchenkoYSdeKoningDJHaleyC2007GenomewideRapidAssociation
Using Mixed Model and Regression A Fast and Simple Method ForGenomewidePedigreeBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614
Cantor RM Lange K Sinsheimer JS 2010 Prioritizing GWAS Results AReview of Statistical Methods and Recommendations for TheirApplicationAmJHumGenet866ndash22doi101016jajhg200911017
KangHMSulJHServiceSKZaitlenNAKongSFreimerNBSabattiCEskin E 2010 Variance component model to account for samplestructure in genomewide association studies Nat Genet 42 348ndash354doi101038ng548
LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKM2010Aversatile genebased test for genomewide association studies Am JHumGenet87139
McCarthy MI Abecasis GR Cardon LR Goldstein DB Little J IoannidisJPA Hirschhorn JN 2008 Genomewide association studies forcomplextraitsconsensusuncertaintyandchallengesNatRevGenet9356ndash369doi101038nrg2344
Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender DMallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKatool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795
QuinlanARHallIM2010BEDToolsaflexiblesuiteofutilitiesforcomparinggenomic features Bioinformatics 26 841ndash842doi101093bioinformaticsbtq033
86
Raven LA Cocks BG Pryce JE Cottrell JJ Hayes BJ 2013 Genes of the
RNASE5pathway contain SNP associatedwithmilk production traits in
dairycattleGenetSelEvol4525doi101186129796864525
87
CHAPTER5
Imputationaccuracyisrobusttocattlereferencegenomeupdates
MMilanesi1etal1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122Italy
____________________ShortCommunicationacceptedinAnimalGenetics
SummaryGenotype imputation is routinely applied in a large number of cattle breeds Imputation
hasbecomeaneedduetothelargenumberofSNParrayswithvariabledensity(currently
from2900to777962SNPs)Althoughmanyauthorshavestudiedtheeffectofdifferent
statisticalmethodsonimputationaccuracytheimpactofa(likely)changeinthereference
genomeassemblyonimputationfromlowertohigherdensityhasnotbeendeterminedso
far In this work 1021 Italian Simmental SNP genotypes were reJmapped on the three
mostrecentreferencegenomeassembliesFour imputationmethodswereusedtoassess
theimpactofanupdateinthereferencegenomeAsexpectedthefourmethodsbehaved
differentlywith largedifferences in termsof accuracyUpdatingSNPcoordinateson the
three tested cattle reference genome assemblies determined only a slight variation on
imputationresultswithinmethod
KeywordsImputationreferencegenomeassemblySNPbovine
MaintextFromthepublicationof the firstbovinesinglenucleotidepolymorphism(SNP)array the
wholegenomicscenarioofthisspeciesevolvedrapidlyNowadaysseveralSNPpanelshave
been designed with a range of densities mapping on different reference genome
assemblies and produced by different manufacturers When running analyses with SNP
arrays of different densities or from different commercial companies (eg Illumina and
Affymetrix) an imputation step is often required The impact of different statistical
methods on imputation accuracy has already been assessed (Pei et al 2008) (Marchini
andHowie2010)(Maetal2013)Surprisinglyuptonowtherearenotavailablestudies
investigating the impact of an update on the reference genome assembly on imputation
accuracy This assessment would be of fundamental importance considering that reJ
mappingSNPsdeterminesarearrangementofthehaplotypestructureofthegenotypes(in
somecasesSNPsarereJmappedtoadifferentchromosome)andthatthebovinereference
assembly is likely toberepeatedlyupdatedover timeThisworkwasaimedatassessing
the effect of different reference genome assemblies in cattle As test casewe compared
four different imputation methods from low (Illumina BovineLDv10 Beadchip) to
90
mediumhigh (Illumina BovineSNP50v2 Beadchip) density in the Italian Simmental
populationForeachmethodweassessedthe impacton imputationaccuracyofmapping
SNPs on the BTAU42 BTAU46 (wwwhgscbcmeduotherJmammalsbovineJgenomeJ
project)orUMD31(Ziminetal2009)cattlereferencegenomeassemblies
Atotalof900and121SimmentalbullsweregenotypedwiththeIlluminaBovineSNP50v1
and v2 Beadchips (Illumina Inc San Diego CA) respectively Illumina BovineSNP50v2
Beadchip (hereinafter named 54k) the official reference chip for the Italian Simmental
breedersassociation(ANAPRI)wasconsideredasreferenceSNPpanelTheSNPchiMpv1
tool(Nicolazzietal2014)wasusedtoobtainSNPcoordinatesonBTAU42UMD31and
BTAU46genomeassembliesForeachassemblySNPsnotcontainedinthe54kunmapped
orinthesexualchromosomeswerediscardedThefinaldatasetcontained5118952886
and51290SNPsinBTAU42UMD31andBTAU46respectively
FourimputationmethodswereconsideredPedImpute(Nicolazzietal2013)FindHapv2
(VanRadenetal2011)FImpute(Sargolzaeietal2014)andBeaglev332(Browningand
Browning2007)Thefirstthreeusepedigreeandpopulation(ie linkagedisequilibrium
LD) informationwhereas the fourth uses only LD information Performance of the four
methods testedacrossgenomicassemblieswas comparedusingwrongly imputedalleles
(ldquoerrorsrdquo)andallelicsquaredcorrelation(ldquoallelicJR2rdquo)statisticsAllelicJR2wasdefined
asthesquaredcorrelationbetweenimputedandtrue(eggenotyped)allelesmultipliedby
100 as in Browning amp Browning (2009) The 878 oldest sires were used as training
population whereas the 143 youngest bulls were used as test population SNPs not in
commonwith the lowdensity panelwere set tomissing in the test population The test
population was split in four scenarios individuals with sire and maternal grandsire
genotypedinthereferencepopulation(scenarioAJ84animals)individualswithonlythe
sire genotyped (scenario B J 34 animals) individuals with only the maternal grandsire
genotyped(scenarioCJ13animals)ornoneofcloserelativesgenotyped(scenarioDJ12
animals)Thepopulationusedwashighlystructuredwithnearlyhalfofthebullspresent
in the dataset (490) sired by 115 genotyped sires In addition 35 out of these 115
genotypedsiresweresiredby22bulls(eggrandsires)alsopresentinthedataset
DifferencesacrossassembliesarenothomogeneousFromBTAU42toBTAU4646SNPs
weremapped to different chromosomes and the average difference in physical position
withinchromosomeswasnearly025MbpHowevertherewasnosubstantialdifferencein
the order of markers In fact not considering unmapped SNPs on either of the
aforementionedassembliesadifferenceinthesequentialorderwasobservedforonly134
91
single (ie spread throughout the genome) SNPs On the contrary from UMD31 to
BTAU46 222 SNPsweremapped to different chromosomes and amuch larger average
differenceinSNPsphysicalpositionwasobserved(iemorethan2Mbp)Alsothenumber
ofSNPswithdifferentsequentialorderwasmuchlarger(ie1893SNPs)Howevermore
than50oftherearrangementsinvolvedblocksfrom3tonearly100markersAsaresult
slight differences in overall imputation accuracy from BTA46 to UMD31 rather than
between the two BTAU assemblies were expected across methods On the contrary if
rearrangements are due to issues in the assembly of scaffolds theymight have a direct
(negative) impact on local imputation performances In fact a preliminary analysis on
BTA1consideringlocalerroracrossassembliesandmethodssuggestatleasttwosuch
SNPsclustersat44and138MbponUMD31(FiguresS1S2andS3)
Considering allmethods and scenarios amaximum average variation of 02 inerrors
(Table1)andof09inallelicJR2(Table2)wasobservedacrossreferenceassembliesThe
standarddeviation(SD)oftheimputationaccuracywasstableacrossreferenceassemblies
withinmethodsOnthecontraryalargervariabilitywasobservedacrossmethods
PedImpute and FindHap showed a relatively large average allelicJR2 variability across
scenarios evaluated over the same reference assembly Only PedImpute showed a large
variation in SD across scenarios in errors and allelicJR2 with high variability on
scenarios C and D showing a low performance of thismethodwhen there are no close
relatives(egsiredam)genotypedinthereferencedatasetFImputeinsteadachievedthe
highestaccuracyacrossallreferencegenomesandscenariossurprisinglyobtainingitsbest
resultsJerrorsof14(06SD)andallelicJR2of955(26SD)JinscenarioCwhereonly
onerelativelycloserelativeisgenotyped(maternalgrandJsire)FImputeappeartobeable
todetectshorterhaplotypesfromdistantrelativesmoreaccuratelythananyothermethod
tested in this study thus it is lessdependenton theavailabilityofpedigree information
thantheothertwopedigreeJbasedmethods(Sargolzaeietal2010)
The large variability across methods within assembly was expected although the
differences obtained among PedImpute FindHap and Beagle were larger in this study
comparedtoasimilarstudyinItalianHolstein(Nicolazzietal2013)Thiscoulddependon
a number of factors First the sample size of the test population is small (especially for
groups C and D where less than 15 individuals were present) Second the interaction
between population structure and imputationmethodsmay have a role PedImpute and
FindHap rely much more heavily on pedigree information than FImpute and Beagle
assumesall individualsareunrelatedHowever the lattertwomethods indirectlybenefit
92
greatly of from highly structured populations with large number of relatives ndash close or
distantndashpresent inthereferencepopulation(Sargolzaeietal2014)Anotherinteresting
factor that influences the imputation accuracy may be the persistence of LD at long
distances this is much lower in the Italian Simmental than in the Italian Holstein
population(FigureS4)Informationonrelatives(eitherpedigreeinformationorgenotyped
relativesinthereferencepopulation)isexpectedtoplayabiggerroleifLDisconservedat
shorter distances This hypothesis is supported by the fact that although there is high
variability acrossmethods better performances of the pedigreeJbasedmethods are still
observedinscenariosAandB(andCforFImpute)
Theslightdifferencesinimputationaccuracyobservedacrossreferenceassembliesmaybe
explained by the aforementioned small variation in the SNP blocks across reference
assemblies irrespectively of the rearrangements of single SNPs Considering that SNP
blocksaremostlymaintainedweexpectthatbreedswithlargerLDpersistenceatlonger
distances (ie Holstein Brown) will obtain even better imputation accuracy across
assembliescomparedtowhatreportedinthisstudyinItalianSimmentalHoweverastill
relativelyunexploredaspectof imputationperformance is thedistributionof imputation
errors across the genomeNot considering genotyping errors (which are expected to be
randomly distributed in the genome) a cluster of consecutive markers with high
percentage of imputation error across methods are most probably identifying missJ
assembledregions in thegenome(FiguresS1S2andS3)Althoughthe impactonoverall
imputation accuracy is limited as results in this study show these errorsmight have a
large impact on any analyses relying on specific genomic coordinates (eg genomeJwide
associationstudies)Morecomprehensiveresultsincludingannotationstudiesandwhole
genomesequencingdataarerequiredtoconfirmandinvestigatetheseobservations
Toconcludeonlyasmalleffectofupdatingthereferencegenomeassemblyonimputation
accuracywas observed in Italian Simmental On the other hand as already reported by
other studies the variability of imputation accuracy estimates is much larger across
imputation methods The choice of the most suitable imputation method based on the
breedgeneticbackgrounditscharacteristicsandontheavailabledataisthereforecritical
AcknowledgementsThe research leading to these results has received funding from the European Unions
Seventh Framework Programme (FP72007J2013) under grant agreement ndeg 289592 ndash
93
Gene2FarmPartofthegenotypicdatausedinthisstudywasprovidedbyprojectldquoSelMolrdquo
fundedbytheItalianMinistryofAgriculture(MiPAAF)MMwassupportedbytheDoctoral
SchoolontheAgroJFoodSystem(Agrisystem)of theUniversitagraveCattolicadelSacroCuore
(Italy) FB was supported by the Marie Curie European Reintegration Grant
ldquoNEUTRADAPTrdquo(FP7JPEOPLEJ2009JRG)
ReferencesBrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingandMissingJ
Data Inference for WholeJGenome Association Studies By Use of Localized
HaplotypeClusteringAmJHumGenet811084ndash1097
MaPBroslashndumRFZhangQLundMSSuG2013Comparisonofdifferentmethods
forimputinggenomeJwidemarkergenotypesinSwedishandFinnishRedCattleJ
DairySci964666ndash4677doi103168jds2012J6316
Marchini J Howie B 2010 Genotype imputation for genomeJwide association studies
NatRevGenet11499ndash511doi101038nrg2796
NicolazziELBiffaniSJansenG2013ShortcommunicationImputinggenotypesusing
PedImputefastalgorithmcombiningpedigreeandpopulationinformationJournal
ofDairyScience962649ndash2653doi103168jds2012J6062
NicolazziELPiccioliniMStrozziFSchnabelRDLawleyCPiraniABrewFStella
A 2014 SNPchiMp a database to disentangle the SNPchip jungle in bovine
livestockBMCGenomics15123doi1011861471J2164J15J123
PeiYLiJZhangLPapasianCJDengH2008Analysesandcomparisonofaccuracyof
differentgenotypeimputationmethodsPLoSONE3551
VanRadenPMOrsquoConnellJRWiggansGRWeigelKA2011Genomicevaluationswith
manymoregenotypesGenetSelEvol43
94
Tableamp1ampIm
putationampaccuracyampacrossampassem
bliesampforampPedImputeampFindH
apampFImputeampandampBeagleamppercentageampofampwronglyampim
putedamp
allelesamp
PedImputeamp
FindHapamp
FImputeamp
Beagleamp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
Scenarioamp
Namp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
A1amp
84
20
07
21
07
21
07
32
09
32
09
32
09
15
09
15
09
15
09
24
11
25
11
25
11
B2amp
34
23
08
23
09
22
08
36
11
36
12
36
11
15
07
15
09
14
06
23
08
23
08
23
08
C3amp
13
98
41
98
41
98
41
51
10
52
10
51
10
14
06
14
06
14
06
23
06
23
06
24
06
D4 amp
12
88
49
89
49
88
49
54
11
56
12
54
11
16
11
17
12
16
11
26
15
26
15
26
14
1 Sireandm
aternalgrandsiregenotyped
2 Siregenotypedmate
rnalgrandsireno
tgenotyped
3 Sireno
tgenotypedmate
rnalgrandsiregenotyped
4 Sireandm
aternalgrandsireno
tgenotyped
Tableamp2ampIm
putationampaccuracyampacrossampassem
bliesampforampPedImputeampFindH
apampFImputeampandampBeagleampallelicJR2amp
PedImputeamp
FindHapamp
FImputeamp
Beagleamp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
BTAU
amp42amp
UMDamp31amp
BTAU
amp46amp
Scenarioamp
Namp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
Meanamp
SDamp
A1amp
84
941
20
940
20
941
20
907
24
908
25
907
24
952
26
952
27
952
27
925
33
923
33
925
33
B2amp
34
935
24
934
25
935
24
896
30
896
34
895
32
954
20
954
21
954
20
929
24
928
25
928
24
C3amp
13
759
89
761
88
758
87
848
33
845
30
848
32
955
18
955
19
955
19
931
19
928
19
927
19
D4 amp
12
782
113
777
113
781
112
841
32
832
35
839
33
949
34
947
38
949
35
921
44
919
45
921
44
1 Sireandm
aternalgrandsiregenotyped
2 Siregenotypedmate
rnalgrandsireno
tgenotyped
3 Sireno
tgenotypedmate
rnalgrandsiregenotyped
4 Sireandm
aternalgrandsireno
tgenotyped
95
Supplementaryfiles
YoucanfindChapterͷsupplementaryfilesat
- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)
- GoogleDrive(httptinyurlcomPhDMMChapter05)orscantheQRcode
SupplementaryFigureS1ErrorpercentageacrossmethodsforBTAU42onBTA1
SupplementaryFigureS2ErrorpercentageacrossmethodsforUMD31onBTA1
SupplementaryFigureS3ErrorpercentageacrossmethodsforBTAU46onBTA1
SupplementaryFigureS4Linkagedisequilibrium(LD)decayoverdistanceinItalian
SimmentalandHolstein
ThedottedlinewithblackdotsindicatesLDdecayoverdistanceinItalianHolstein
whereasthedashedlinewithwhitediamondsindicatesthesamevaluesforItalian
SimmentalItalianHolsteingenotypeswereasubsetofthedatausedinNicolazzietal
(2013)
96
CHAPTER6
QuestfordeleteriousrecessivevariantsinItalianHolsteinbullscombiningexomesequencingandhighdensitySNPanalysis
MMilanesi1etal
1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly
____________________Manuscript
Abstract
Outbreed species carry a genetic load of deleterious recessive variants Those
that are lethal survive only in the heterozygous state in the population while
otherssimplyreducethefitnessofindividualshomozygousforthevariantThe
long=termdestinyofthesemutationsistodecreaseinfrequencyduetonegative
selectionandeventuallydisappearbecauseofgeneticdriftHowever theymay
bemaintainedinlivestockpopulationsandevenincreaseinfrequencywhenin
linkage with favourable alleles for traits under selection or occurring in the
genomeofsireshavinghighgeneticvalueforproductivetraitsWesearchedfor
these variants combining exome sequence data from 20 Italian Holstein bulls
sampledfromoppositetailsofthemaleandfemalefertilityEBVdistributionthe
HighDensity(HD)SNPgenotypingof1009progenytestedItalianHolsteinbulls
and information existing in public databases Exome sequencing detected
202956 variants Among these 5167 were classified as ldquodeleteriousrdquo when
annotatedwithSIFTandAnnovarThesewerefilteredbypickingthosemapping
in regions out of equilibrium and lack of homozygotes in the progeny tested
Holstein bulls or having an asymmetric distribution in bulls plus=variant and
minus=variant for fertility EBV The resulting 193 variants in 163 genes were
furtherassessedbycomparativegenomicshaplotypeanalysisandfunctionFor
eachassessmentapositiveornegativescorewasgiventoevidenceinfavouror
against thevariant tobedeleteriousA totalof35variantshadapositive total
score These genes are involved in development cell motility and immune
responseSomeof themareknowntobeexpressed in the testiclesandappear
importantforspermfertilityinmodelspeciesothersareresponsibleforgenetic
defectsinhumanandmousecarryingmutationssimilartotheonesdetectedin
cattleThesegenevariantsarecandidatetobeamongthosethatconstitutethe
ldquogeneticloadrdquoofdeleteriousrecessivegenesintheHolsteinpopulation
100
Introduction
The presence of deleterious recessive variants is well tolerated by outbred
speciesWhenthesevariantsappeartheymayspreadandsurvivealongtimein
the population in the heterozygous state (Simmons and Crow 1977) It is
estimatedthatoutbredindividualshumanincludedcarryonaverageagenetic
loadofmorethan100loss=of=functionvariantsintheirgenome(MacArthuretal
2012)
Theeffectofdeleterious recessivevariantsappearswhendemographicevents
eggeneticbottlenecksreducethegeneticvariabilityofthepopulationorwhen
relatedindividualsareintentionallymatedforbreedingpurposesInthesecases
recessive deleterious reduce the fitness of individuals carrying them in the
homozygous state and in extreme case induce very severe genetic defects
premature death and abortion (McClure et al 2014 Sahana et al 2013
VanRadenetal2011)Theythereforecontributetoexplainthephenomenonof
inbreeding depression and its negative effect on fertility (Charlesworth and
Willis2009)
The search for deleterious recessive variants is particularly interesting in
Holstein cattle Thisbreed is experiencing anegative genetic trend for fertility
traits (Jamrozik et al 2005) it is worth noting that this is occurring while
inbreedingisunderstrictmonitoringandistheoreticallyincreasingataverylow
pace(about005peryear=Mrodeetal2009)Inadditionmatingisplannedto
minimise relationship between animals and the additive genetic variance for
mosttraitsndashincludingmilkproduction=isapparentlynotdecreasinginspiteof
the selection pressure applied by modern schemes (Biffani et al 2002)
indicatingthatoveralldiversityisonlyslightlydecreasing
SelectiontoincreasefertilityischallengingSelectioncriteriagenerallyadopted
to improve this trait are far from the biological mechanisms that govern
reproduction Inaddition thesemechanismsarecomplexandverysensitive to
environmental(management)conditions(Laneetal2013Wathesetal2014)
Asa result fertility traitshave loworvery lowheritability (Pryceetal1997)
andthereforelowresponsetoselectionandslowgeneticprogress
101
Forthesereasons thequest formolecularvariants influencing fertility traits is
on=goingsincetheavailabilityofmoleculartoolsforDNAanalysisAnumberof
investigations tested the effect of functional candidate genes running genome
wide association analyses to find QTLs for fertility traits using hundreds
microsatellite markers in structured families (Ashwell et al 2004) or tens of
thousand SNP panels in whole populations (Aston and Carrell 2009) The
availability of medium density SNP panels allowed the search for deleterious
recessivesevenwithouttheuseofphenotypictraitsbyscanningthegenomein
searchofregionslackingoneofthetwohomozygousclasses(Sahanaetal2013
VanRaden et al 2011) This approach has highlighted a number of genomic
regions likely carrying deleterious recessives This information has immediate
application inmolecular assisted breeding and a relevant scientific interest in
thesearchofcausativevariantsandassessmentoftheirfunctions
Thedevelopmentofnewsequencingtechnologieshasgreatlydecreasedthecost
of sequencing so that nowadays the sequencing of few target individuals is
becoming the gold standard to seek for causal mutations for rare monogenic
defectsinhumansandotherspecies(ChunandFay2009)Oftenonlytheexome
the protein coding part of the genome is sequenced searching for those
mutationsthatseverelyalterproteinstructureandorfunction(Bamshadetal
2011 Li et al 2013) A typical mammalian exome consists in a few tens of
millionnucleotides(around2ofthereferencegenome)comparedtothefew
billionofthewholegenomeThereforewiththesamenumberofreadsexome
sequencinghastheadvantageofamoreaccurategenotypingdueto thehigher
sequencing depth an easier data management lower computation load and
easier interpretation (Bieseckeretal2011Majewskietal2011) Indeed the
numbervariants found inexomesareone=twoordersofmagnitude lowerthan
thatdetectedinwholegenomesandmutationsaredetectedinannotatedgenes
orimmediatelynearbyOntheotherhandtheexomeapproachcanonlyidentify
variantson annotatedgenesVariants inunknownexonsor that occur innon=
coding regulatory features will be missed (Bamshad et al 2011) In humans
these are turning out to be of outmost importance for gene regulation and
downstreamphenotypic effects (ENCODEConsortium2012) but they are still
verypoorlyannotatedinlivestock
102
Herewesequencedtheexomesof20Holsteinbullsprogenytestedselectedfrom
the tails of male and female fertility EBVs (Estimated Breeding Value) Since
livestockfertilityisacomplextraitandcausativemutationscannotbesoughtby
the approach used in monogenic diseases we integrated the sequence
information with high density SNP chip data obtained on a much larger
populationofItalianHolsteinbullstodetectgenescarryingallelescandidateto
be deleterious recessives and confirm their likely deleterious effect by
comparativegenomics
Materialsandmethods
1SamplingandDNAextraction
Semen(straws)andbloodsampleswereprovidedbyItaliancertificatedartificial
insemination centres (UE Directive 88407CEE) thus ethics committee
approvalforthisstudyisnotrequired
A total of 20 ItalianHolstein progeny tested bullswere sampled for complete
exome sequencing Among them 19were chosen from the tails of female and
male fertilityEBV(EstimatedBreedingValue)distribution (Table1) tohave5
animals fromeach tail One animalwas to be in theminus=variant tail of both
traitsBullswereselectedasunrelatedaspossibleSamplingwaspossiblethanks
tothecollaborationwithANAFI(ItalianHolsteinBreederAssociation)
ANAFIdevelopedageneticevaluationfor fertilitycombiningdifferenttraitsas
explained by Biffani and colleagues (Biffani et al 2005) The traits are ldquodays
from calving to first inseminationrdquo ldquocalving intervalrdquo ldquofirst=service non return
rateto56daysrdquoldquoangularityrdquoandldquomatureequivalentmilkyieldat305daysrdquo
MalefertilityisexpressedasanEstimatedRelativeConceptionRate(ERCR)and
isameasureofconceptionrateofaservicesirerelativetoservicesiresofherd
mates(Soggiuetal2013)
103
Table1FemaleandmalefertilityEBVsofthe20animalssequenced
Animal EBVFemaleFertility EBVMaleFertility GroupfriA01 97 =111 NAfriA02 98 =166 MalLfriA03 115 0 FemHfriA04 114 0 FemHfriA05 114 0 FemHfriA06 86 0 FemLfriA07 98 195 MalHfriA08 105 213 MalHfriA09 95 =154 MalLfriA10 95 149 MalHfriA11 102 264 MalHfriA12 91 =475 FemLMalLfriA13 105 =179 MalLfriA14 102 207 MalHfriA15 99 =405 MalLfriA16 85 0 FemLfriA17 87 0 FemLfriA18 121 0 FemHfriA19 117 0 FemHfriA20 88 0 FemL
A total of 1009 Italian Holstein bulls were chosen from the entire population
balancing theirpresence in the tailsof theEBV forkgprotein (PROT) somatic
cell count (SCC) fertility (FERT) and functional longevity (LONG) for high
densitygenotyping(TableS1)
DNAwasisolatedfromwholebloodandsemenstrawssamplesinLGSlaboratory
(CremonaItalywwwlgscrit)usingtheGenomixkitaccordingtoLGSinternal
protocolDNAsforexomecaptureandsequencingwereextractedindependently
tosatisfy therequirement indicatedby IGA technologies (Udine Italy)at least
12mgofDNAataconcentration100ngulwithR260280higherthan18and
R260230near19
2Moleculardataproductionandqualitychecking
21Exomecaptureandsequencing
104
Exomes were captured using the ldquoSureSelected All exon Bovinerdquo (Agilent
Technologies) kit in early version Probes coordinates are based on UMD31
bovinereferencesequence(Ziminetal2009)anddesignedagainstNCBIexons
as well as predicted exons UTRs and miRNAs Number and distribution of
probesperchromosomearedetailedinTableS2
rdquoOvation Ultralow Library Systemsrdquo protocol was followed for library
preparation After quantification with Qubit (Life technologies) and quality
controlwith2100Bioanalyzer(AgilentTechnologies)DNAwassonicatedwith
Bioruptor (Diagenode) to obtain fragments between 150 to 200 bp in length
blunt end=repaired adenylated ligated with paired=end adaptors and purified
followingtheprotocolprovidedbythemanufacturer
LibrarieswerehybridizedwithldquoSureSelectedAllexonBovinerdquoprobesusingthe
Encore Target Capture Module (Nugen) SureSelect Target Enrichment and
Herculase II FusionEnzymewithdNTPCombo (AgilentTechnology) kit Then
capturedDNAwasamplifiedandtaggedwithlibraryspecificindexesQualityof
capturedsampleswastestedwithQubitand2100Bioanalyzer
Paired=end 2X100bp sequencing was performed on Illumina HiSeq2500
(IlluminaSanDiegoCA)atanexpectedcoverageof50X
22Sequencealignment
Adapters and reads shorter than 36bp were trimmed using Trimmomatic
(version032)(Bolgeretal2014)ForeachFastqfilethealignmentandpairing
of the reads to theUMD31 reference sequencewas performedwithBurrows=
WheelerAligner(BWA)(version077=r441)(LiandDurbin2009)applyingalso
trimmingoption (=q5) SAM files resultedwere converted intoBAM file using
SAMtools (version0118) (Lietal2009)Foreach individualBAM fileswere
merged in a single one with Picard (version 1111)
(httpsgithubcombroadinstitutepicard)Duplicatereadswereidentifiedand
markedwithPicardFinallyindividualBAMfileswereindexedwithSAMtools
105
Sequencedepthwasevaluatedsimultaneouslyonall20sequencedanimalsand
onlyonregionstargetedbyprobesusingDepthOfCoverageofGATK(McKennaet
al2010)
23ExomeVariantcallingandqualitycheck
Singlenucleotidepolymorphisms(SNPs)insertionsanddeletions(InDels)were
calledsimultaneouslyonall20sequencedanimalsandonlyonregionstargeted
byprobesusingUnifiedGenotyperofGATK(DePristoetal2011)
All variantswere filteredusingVariantFiltration ofGATK to removeoneswith
lowmappingandorsequencingqualityThechoiceoffiltersandthresholdswas
made in accordance to ldquoGATK best practicerdquo for exome sequencing and
distributionoffiltervalues
bull ldquoSNPclusterrdquoexcludedvariantifmorethan3variantswerefoundin10
bp
bull ldquoBaseQualityRankSumTestrdquoexcludedifitrsquoslowerthen=5ormorethan
5
bull ldquoMappingQualityRankSumTestrdquoexcludedif itrsquoslowerthen=7ormore
than6
bull ldquoRead Position Rank Sum Testrdquo excluded if itrsquos lower then =4 or more
than4
bull ldquoFisherStrandrdquoexcludedifitrsquosmorethan30
bull ldquoDepthrdquoexcluded if itrsquos lower then60X(ieaminimumvalueof3Xper
animals)ormorethan3500X
bull ldquoQualityrdquoexcludedifitrsquoslowerthen40
bull ldquoMappingqualityrdquoexcludedifitrsquoslowerthen30
Filters were applied to all variants but only bi=allelic SNPs were used in the
following analyses Genotypes with ldquoGenotype Depthrdquo lower than 5 and
ldquoGenotype Qualityrdquo lower than 20 were discarded The remaining ones were
converted intoldquoABrdquocode(A=referencealleleandB=alternativeallele)Then
data were transformed in PLINK (Purcell et al 2007) format using a home=
made python script To prepare the working dataset the final quality control
106
removedvariantswithmore than25missingdata (that iswithmore than5
animals out of 20 without genotypes) monomorphic (minor allele frequency
MAFlt1)ornotmappinginautosomesBasicpopulationgeneticsparameters
werecalculatedtoevaluatethepresenceofcrypticstructureinthedatathatmay
influencetheoutcomeofthedownstreamanalyses
In total 24390 SNPs identified in the exomes were also included in the HD
SNPchip panel HD genotype from nineteen out to twenty of the sequenced
animals sequenced was used to compare technologies Genotypes at common
locationswerecomparedacrosstechnologies toevaluatesequencequalityand
theeffectofsequencedepthongenotypecalling
24BovineHDGenotypingBeadChipdataandqualitycheck
AllanimalsweregenotypedusingtheBovineHDGenotypingBeadChip(Illumina
San Diego CA) Genotypes were quality controlled and filtered with standard
threshold Markers with more than 5 missing data and MAF lt 1 were
excluded Animals with more than 5missing data were also excluded Only
autosomal SNPs were retained Thereafter genotypes were phased and the
remainingmissingdata imputedusingBeaglev332(BrowningandBrowning
2007)
Ofthe20animalssequenced12hadalsoBovineHDGenotypingBeadChipand7
BovineSNP50 Genotyping BeadChip version 2 genotypes available For
concordance and haplotype analysis we imputed the 50K data to 800K as
previouslydescribedabove
3Strategyfordetectingcandidatedeleteriousvariants
Wesearchedforcandidatedeleteriousvariantsfollowinganintegratedapproach
that joined the analysis of i) exomes of a few Italian Holstein bulls having
plusvariant and minusvariant EBVs for a target trait ii) HD genotyping of a
population of 1009 progeny tested Italian Holstein bulls iii) the comparative
genomic analysis of 100 vertebrate species The strategy followed for the
107
analysisisshowninFigure1Inthefirststep(Search)welookedforcandidate
deleterious variants in the exomes In step2 (Filter)we filtered candidatesby
assessingwhichof thesevariantseitherhadanon=randomdistributionamong
animals having extreme EBVs or co=mapped with genomic regions carrying
outliersignals inthelargepopulationofprogenytestedbulls Instep3(Asses)
candidateswere further evaluated by i) assessing their conservation across a
panel of 100 vertebrate species ii) searching for associated haplotypes and
evaluating their frequencies in the population of progeny tested bulls iii)
evaluatingtheirassociationwithgeneticdefectsinhumanandmodelspeciesiv)
examining their expression profile and function when known Methods are
belowdescribedindetail
Figure1Strategyfollowedtoidentifydeleteriousvariants
108
31Step1Searchforcandidatedeleteriousvariants
Quest for candidate deleterious variants in exome sequence data by
variantannotation
WeannotatedallvariantsusingANNOVAR(Wangetal2010)andVEP(Variant
Effect Predictor) (McClure et al 2014) software They use the Ensembl gene
annotation database (UMD31 reference sequence release 74) to investigate
locationandpotentialeffectoncodingsequencesofallvariantsVEPintegrates
the SIFT algorithm (Kumar et al 2009) to predict the effect of amino acid
substitutiononproteinfunctionForeachsequencechangesequencehomology
and physic=chemical similarity between the reference amino acid and the
alternative is considered providing a score that evaluates the mutation
tolerabilityVariantswithscorelowerthan005wereclassifiedasldquodeleteriousrdquo
inaccordingtosoftwarerecommendations
SIFTalsocomparedallvariants found in theexomeswith thosepresent in the
dbSNPdatabase(version140)toidentifythosethatwerenovel
32Step2Filteringofcandidatedeleteriousvariants
2a Analysis of deleterious variant distribution among animals having
extremesEBVvaluesformaleandfemalefertilitytraits
Association between variants and the classification in the tails of the EBV
distributionwas estimated byFisherexact testwhich compares frequencies of
alleles (a vs A) in the two tails not assumingH=W equilibriumgenotypic test
that compares the frequency of genotypes (aa vs Aa vs AA) and the recessive
geneactiontestwhichtestsforthedistributionofonehomozygousclassversus
theothers(aavsaAAA)AlltestsareimplementedinPLINKfunctionsThe5
significance threshold was set empirically by permutation (N=100000) either
point=wise that is not corrected for multiple testing (EMP1 = Empirical
Probability1)andthereafterfamily=wisecorrectedformultipletesting(EMP2=
EmpiricalProbability2)
109
GenesexceedingEMP1were considered suggestive and thoseexceedingEMP2
significant candidates to be deleterious recessives and their characteristics
conservationacrossspeciesandfunctionfurtherinvestigated
2b Quest for regions candidate to carry deleterious variants in the
populationanalysedwiththeHDBeadChip
Basic population genetics parameters and Multi Dimensional Scaling were
calculated with GenABEL R package (Aulchenko et al 2007) to evaluate the
presenceofcrypticstructure inthedatathatmayinfluencetheoutcomeofthe
downstreamanalyses
ExpectedandobservedallelicandgenotypicfrequenciesandFisherexacttestof
Hardy=Weinberg proportions (Janis E Wigginton 2005) were calculated with
PLINKon theHDSNPdatasetValueswere averaged in slidingwindowsof10
markersThisvalue(10markers)waschosenaftertestingdifferentwindowsize
as represent a good compromise between noise reduction and resolution (not
shown)
Candidate regions containing deleterious mutations were identified following
twodifferentapproaches
In one approach hereafter refereed to as ldquoOut of Hardy=Weinbergrdquo approach
(OHW)markerssignificantlydeviatingfromofHardy=WeinbergequilibriumatP
le84710=8 (nominalvaluePle005Bonferronicorrected formultiple testing)
werefirstidentifiedWindowswithatleastthreesignificantmarkersbelowthis
thresholdandhavingahomozygotedeficitwereconsideredcandidatestocarry
deleterious mutations In this OHW approach we applied very stringent
constraints to reduce false positive discoveries due to the Wahlund effect
inducedbypopulationsubstructure(seeresultsandFigureS5)
In thesecondapproachhereafterreferred toas ldquoLackOfHomozygotesrdquo (LOH)
approach the differences between observed and expected frequencies of
homozygotesfortheminorallelewerecomputedandaveragedacrossmarkers
included in each sliding window Values were standardized and only regions
carrying three overlapping windows below =3Z score value were considered
110
candidates to carry deleterious mutations Thereafter significant windows
carryingatleastonemarkerincommonwerejoinedtodefineregionboundaries
ExomesequencingknowledgeandHDBeadChipdatawerecrossed inorder to
identifygenescarryingseverevariantslocatedwithinsignificantregionsortheir
proximity (50Kb upstream or downstream) These genes were considered
candidates to be deleterious recessives and their characteristics conservation
acrossspeciesandfunctionwasfurtherinvestigated
33Step3Assessmentofcandidatedeleteriousvariants
Genes carrying variations classified as deleterious by SIFT or ANNOVAR and
eitherassuggestivebyfertilityEBVtailsanalysisorassignificantbyOHWeLOH
analyses of the Holstein bull population were submitted to comparative
genomics population genetics and functional analyses In order to prioritize
genesvariants discoveredwe scored each evaluation using a subjective scale
ranging from =3 to 3 to measuring the evidence in favour (scores 1 to 3) or
against(scores=1to=3)thedeleteriouseffectoftheSNPsThescore0inallcases
indicated the absence of evidence or that a specific assessment failed The
completescorescalethereforerangedfrom=9to9
3aComparativegenomics
For comparative genomics purposes the position of eachmutationwas ldquolifted
overrdquo from UMD31 bovine coordinates to hg19 human coordinates using the
specifictoolatUCSC(Kentetal2002=httpgenomeucscedu)Orthologous
proteins were aligned using the UCSC genome browser and amino acid
conservation assessed in a set of 100 vertebrate species that included human
primates (11 species) euarchontoglires (egmouse14 species) laurasiatheria
(egsheep25species)afrotheira(egelephant6species)othermammals(eg
platypus5species)birds(egchicken14species)sarcopterygii(egturtle8
species) fish (eg zebrafish 16 species) Results of amino acid conservation
analysiswererecordedonlyifhomologousproteinfromatleast50speciescould
beretrievedandalignedThereafteraminoacidconservationwasscored3ifthe
111
aminoacidwasconservedinallvertebrates2ifconservedinallmammalsand1
ifconservedinthelargemajority(atleast90)ofmammalsAscoreof=1was
given when the amino acid was not conserved among species and a score =3
when both variants observed in cattle were carried by other species Finally
wheneithertheliftoverprocessortheproteinalignmentfailedthecomparison
wasconsiderednoninformativeandgivenascore0
3bSearchofhaplotypesassociatedtodeleteriousvariantsandanalysisof
HDpopulationdata
To acquire further evidence we searched for the haplotypes that carry the
deleterious variations and estimated their frequency in the
homozygousheterozygousstatuswithinthebullpopulationgenotypedwiththe
HDBeadChip
Weusedamultistepapproachtofindhaplotypescarryingdeleteriousvariantsi)
from the phased HD BeadChip data we extracted the haplotypes of the 20
sequencedanimalsintheregionofthevariant(100markersoneachside)ii)in
these 20 animals we searched the shorter haplotype able to distinguish
homozygous andor heterozygous animals for the deleterious variant from all
other haplotypes iii) we assessed the frequency of homozygotes and
heterozygotes for thehaplotype in thewholepopulationofbulls characterised
withtheHDBeadChip
Scoresinthiscasewere=3inthecaseofuniquehaplotypesshowinganumberof
homozygotesnonsignificantlydifferentfromthenumberexpectedfromHardy=
Weinbergproportions3inthecaseofuniquehaplotypesshowinganumberof
homozygotes significantly lower than expected 1 in the case of unique
haplotypesneverhomozygousandrare0whenevernouniquehaplotypecould
befound
3cFunctionalanalyses
With PantherDB (Mi et al 2010) resources we classified our gene list into
functionalcategoriesaccordingtotheGeneOntologyclassificationAllsuggestive
112
andcandidateEnsemblGeneIDsubmittedtothebulkanalysiswererecognizedand annotated Beyond the GO term characterization in order to understandpotentialdetrimentaleffectsofmutationsandalsopinpointpotentialcandidatesfor fertility traits we retrieved information for each gene according to thefollowing parameters i) the involvement in known Mendelian or complexdiseases fromOMIMdatabase ii) theavailabilityofknock=outmice iii) and theeventual expression in reproductive tissues from the Gene Expression Atlas(httpwwwebiacukgxa)(Table4)
Alsoforthesedataweusedthesamescaleattributingarangefrom=3pointstogenes having no relation with fertility viability and genetic defects and norelevant phenotypic effects in knock=out mouse models to 3 to gene closelyrelated to fertility and viability associated to genetic defects in human andaffectingfertilityorsurvivalwhenknocked=outInparticularascoreranging=1to+1wasattributedtogenesassociatedtohumansyndromes(=1=noassociationrecordedinOMIMdatabase0=noinformationinOMIM1=associationinOMIM)A score with same range to information collected from knock=out mousephenotypes(=1=knock=outmouseshowsnoimpairedphenotype0=noknockoutmouse has been produced 1= knock=out mouse shows impaired phenotype)Thesamescorewasattributedtomolecularfunctions(=1=functionunrelatedtofertilityandviability0=functionindirectlyrelatedasimmunityandmetabolism1=functiondirectlyrelatedasfertilityanddevelopment)
Results
Exomecaptureandsequencing
A totalof225414probeswereused for capturing thecattleexome (5355Mbabout 2 of UMD31 reference genomes) Mitochondrial (MT) and Ychromosome were not included in probe design The number of probes perchromosomespannedfrom2500onBTA27toalmost12000inBTA3(TableS2)
113
The average exome sequencing depth was 4058 slightly lower than the
expected50XAllanimalshadatleast24Xaveragegenomessequencedbutina
same animal the sequencing depth varied across chromosomes and regions
withinchromosomesThelowestchromosomeaveragevalueobservedwas10X
inBTA25oftheshallowersequencedanimalAveragecoverageperanimaland
chromosomeandoverallmeansareshowninFiguresS1
In total1613461402 raw100bppair=ends sequence readswereproducedA
total of 1425514112 100bp reads were mapped and properly paired to the
reference genome Paired reads per animal ranged between 41973510 and
138840392
Insynthesissequencingdepthwasvariableasexpectedandalmostallanimals
hadsomeregionsofscarceornosequencedepthbutoverallallexomeregions
were sufficiently covered to permit a comprehensive analysis of molecular
variantsoftheanimalsinvestigated
Step1Searchforcandidatedeleteriousvariants
Variantdetectionandannotation
Thesequencingdepthofallvariantscalledonthealignmentofallreadsacross
animalsfollowedaChi=squaredistributionAveragedepthwas79694spanning
from2Xto5000Xwithpeakfrequencyclassat150=250X(FigureS2)
A multistep approach was followed to clean the exome sequence dataset
Cleaning process steps and the number onmarkers retained in each step are
describedinTable2
114
Table2Typeandnumberofmarkersdetectedandretainedafterdatasetcleaning
TypeofvariationNumberofvariations(N)
Rawdata
Aftervariantcleaning
Aftergenotypecleaning(onlyautosomal)
Complex 1213 963 0Deletion 6938 5031 0Insertion 5510 3656 0MultiallelicComplex 5710 3995 0MultiallelicSNP 803 512 0SNP 242988 203079 164277Total 263162 217236 164277whencomparedtothereferencegenome
Atotalof263162variantswereidentifiedAmongthese217236(8255)wereretainedafterapplyingthevariantfilters(seematerialsandmethodssection)
According to GATK classification the vast majority (9348) of variationsidentified were biallelic SNPs followed by insertions and deletions (InDels40) more complex variations and multiallelic SNPs (252) Indels andcomplexvariationsalthoughlessfrequentthanSNPsaffectedarelevantportionoftheexome(49589bp)Thenumberofvariantsdetectedperchromosomewassignificantly correlated to the number of probes used (r=08761P=228x10=10)andofbasescaptured(r=08424P=53x10=19)perchromosomehoweverBTA23BTA27 BTAX had an outlier behaviour the former two containing highervariation(450and445variantsperKb)andthelatterlower(084variantsperKb)thantheaverage(322variantsperKb)
Bialleic SNPswere further filtered for genotype quality (sequence depth gt 5Xandsequencequalitygt20)andBTAXmarkersThefinalautosomalSNPdatasetcounted164277SNPsAmongthese17829biallelicSNPs(109)describedforthefirsttimehavebeensubmittedtodbSNP
Variantsrsquo annotation is shown in Table 3a Variantswere detected in differentgenedomains(exonsintrons3rsquountranslatedregions)aswellasupstreamanddownstreamgenesMorethan31ofthevariationsweredetectedinexonsandamong them more than 41 have the potential to have an effect on genefunctionbyaffectingsplicingoralterproteinstructure(Table3b)
115
Table 3 A) number of autosomal SNPs annotated in each gene region B) number and
classificationofautosomalSNPsalteringgenefunction
A)
Generegion NofSNPs(autosomes)Downstream 1170Exonic 51293Intergenic 49901Intronic 58116ncRNA_exonic 160ncRNA_intronic 3ncRNA_splicing 2Splicing 184Upstream 1445Upstreamdownstream 54UTR3 1345UTR5 604Total 164277
B)
EffectofexonicSNPs NofSNPs(autosomes)Nonsynonymoussinglenucleotidevariation 20839Stopgainsinglenucleotidevariation 213Stoplosssinglenucleotidevariation 10Synonymoussinglenucleotidevariation 30226Unknown 5Total 51293
whencomparedtothereferencegenome
Amongthe51293exonicSNPs4166in2978geneswereclassifiedldquodeleteriousrdquo
by SIFT Among these 3210 in 2356 genes were observed at least twice
ANNOVARidentified223stopgain(213)orloss(10)variantsin217genes
Theaverageconcordanceat24390SNPs in commonbetweenexomesand the
HD Beadchip was 9716 (Table S3) Discordance was due to either alleles
detected by sequencing andnot detected by genotyping (exome=heterozygote
Beadchip=homozygote) and vice=versa (exome=homozygote
Beadchip=heterozygote) Discordances were randomly distributed among
markers and inversely correlated to sequencing depth (Figure S3) indicating
that most errors are likely occurring during sequencing even if occasional
116
Beadchip failure indetectingsomeallelecannotbeexcludedTwoanimalshad
anoutlierbehaviouronehaving846andthesecond719concordanceBoth
animalswereretainedforvariantdiscoverywhiletheywereexcludedfromthe
analysesofEBVtailsWithoutthesetwooutlieranimalsthemeanconcordance
increasesto9938
Multidimensional scaling confirmed the absence of a substructure in animals
sequenced(FigureS4)
Step2Filteringofcandidatedeleteriousvariants
AnalysisofEBVtails
Fisherexact test isgenerallyused to test for theassociationbetweenavariant
and a Mendelian disease Samples are divided in cases and controls and the
association between alleles or genotypes and the disease is tested under
different inheritance models Here we used the same method to test for
association between allelesgenotypes and animals from opposite tails of the
EBVdistributionfori)malefertility(N=4high+4low)ii)femalefertility(N=5
high+4low)andiii)maleandfemalefertility(N=9high+8low)
Genotypes were tested under all inheritance models implemented in PLINK
Giventhelownumberofsamplesnovariantwassignificantaftercorrectionfor
multiple comparisons (EMP2) and only suggestive variants could be found
(significant at 5 at EMP1)We considered suggestive of carrying deleterious
allelesalistof101genescarrying112variants(TableS4)
Detectionofregionscarryingdeleteriousmutationfrom800KSNPdata
After applying quality control parameters 26653 and 146991 markers were
removed from the original dataset for excessive missing data and MAF
respectivelyInaddition17individualsremovedforlowgenotypingrateAswith
exomes only autosomal SNPswere retained for the analyses thus obtaining a
workingdatasetof590680SNPsand992animals
117
MDS(FigureS5)analysis indicates that thedatasethassomesubstructure that
mightaffectdownstreamanalysesWeaccountedfortheriskoffindingmarkers
outofHardy=WeinbergproportionsduetotheWahlundeffectbychoosingvery
stringentsignificancethresholdintheOHWanalyses
To detect regions candidate to carry deleterious recessivemutations we used
two approaches the ldquoOutside Hardy Weinbergrdquo (OHW) and the ldquoLack Of
Homozygotesrdquo (LOH) approaches (see materials and methods) The two
approaches complement each other in the detection of regions carrying less
homozygotes than expected OHW identifies regions in which a few markers
have extreme homozygote deficit while LOH detects regions in which many
markershaveamoderatedeficit
In the SNP dataset 1884 markers were significantly out of Hardy=Weinberg
equilibrium(Ple84710=8)Atotalof485windowswereidentifiedwithatleast
3 markers out of 10 exceeding this threshold Among these 84 contained
markerssignificantbecauseofhomozygotedeficitThese84windowsidentified
11OHWcandidateregionsin9chromosomes(TableS5)
TheLOHapproachidentified2335windowsbelow=3Zscoreofthestandardised
differencebetweenobservedandexpected frequencyofhomozygotes forMAF
Theseidentified326LOHcandidateregionsspreadonallautosomes(TableS6)
Onlyregionsformedbyatleast3singlewindowswereretainedreducingLOH
candidateregionsto240
Deleterious variants (SIFT deleterious and stop codon gainedlost) were
intersectedwithsignificantOHWandLOHregionSevenofthesemappedwithin
OHW sliding windows 36 within LOH sliding windows Among these 5 were
located in regions significant in both OHW and LOH Other 51 deleterious
variants mapped in the immediate vicinity (+=50Kb) of significant windows
These 83 variants affect 66 unique genes candidate to carry deleterious
recessivealleles(TableS7)
Five regionswere common toOHWandLOHoneonBTA4 (LOH=74959610=
75151417 OHW=74970653=75151417) two on BTA7 (LOH=9626435=
10099512 OHW=9628735=10099512 LOH=10575691=11171132
118
OHW=10486424=11087441) one on BTA15 (LOH=80299924=80928311
OHW=80299924=80921274) and one on BTA18 (LOH=28122373=
28185525OHW=28106022=28164693)
Step3Assessingcandidatedeleteriousvariants
After the two filtering step previously described a total of 191 SNPs were
retained out of the 3433 classified as deleterious by SIFT and Annovar and
analysed further These are carried by 163 genes Eighteen genes carried 2
candidateSNPsthreegenes3SNPsandonegene5SNPsInaddition4SNPsin
four geneswere identified as potential candidates in both the analysis of EBV
tailsandofbullpopulation
A totalof12SNPs introducedstopcodonswhile the remaining inducedamino
acid substitutionsor splicing variants In the filtereddataset theproportionof
stop codons on total variation (63) was only slightly higher than the
proportion found in the entire dataset (51 225 stop codon out of 4391
deleteriousSNPs)
Comparativegenomics
Proteinmultiplealignmentwassuccessfulin146outof191comparisonsIn26
casesthemostfrequentaminoacidincattlewasveryhighlyconservedeitherin
allvertebrates(N=9)orinallmammals(N=17)Inother23casestheaminoacid
was conserved in at least 90 of mammalian species In the last class were
generally included amino acids that changed in mammalian species
phylogenetically verydistant from cattle (eg platypus and related species) In
97 instances the amino acid could change quite freely and occasionally both
variantsobservedincattlewerecarriedbyotherspecies
HaplotypeanalysisofHDpopulationdata
The search for haplotypes associated to the191 variants for further testing in
the population identified 101 unique and invariant haplotypes common to all
119
carriersofatargetvariantanddifferentfromallthosefoundinnoncarriersIn
additional36casesanoncompletelyinvarianthaplotypecouldbeassociatedto
themutationConverselyin54instancesareliablehaplotypecouldnotbefound
(table S8) In total 38 haplotypes had no homozygous individuals in the
population Thiswas expected in the case of 33 rather rare haplotypes (fewer
than 100 heterozygotes observed in the population) Conversely 5 haplotypes
were rather frequent (from 170 to almost 300 heterozygotes observed in the
population)andtheexpectationwastofind78to222homozygousindividuals
in thepopulation insteadof thenone foundAnothervarianthadaremarkable
high frequency (35) and a clear outlier behaviour Only 11 homozygous
individualswereobservedoutofthe121expected
Functionalanalysisofcandidategenes
To assess the potential deleterious effects of the variants identified we also
retrieved functional information for each gene according to the following
parametersi)associationtoknownMendelianorcomplexdiseaseslistedinthe
OMIM database ii) availability and effect of gene knock=out in mice iii)
expression profileswith particular attention in reproductive tissues from the
GeneExpressionAtlas(Table4)
Twenty genes out of 163 have a recognized phenotype in humans These are
mostlyseveresyndromesUnfortunatelyonlyafewgeneshaveanull=miceline
producedwithphenotypicdataavailable
According to the expression data available from Ensembl these genes are
generally (with someexceptions)expressed inavarietyof tissueswitha clear
tendencytobehighlyexpressedintestis
120
Tableamp4Informationon163genesanalyzedconcerningeventualphenotypesinmanorknockoutsinmouseHeaderampcodesAbrainBcolonCheartDkidney
EliverFlungGskeletalmuscleHspleenItestisColoursrepresentthegenelevelofexpressionindifferenttissuefromlow(whitegrey)tohigh(blue)NCin
FunctioncolumnmeansldquoNotClassifiedrdquo
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000000079
CCSAP
Cells
9
1
03
1
07
2
06
2
2
NC
ENSBTAG00000000149
USP40
Cells
6
7
2
7
5
7
2
8
9
fertility
ENSBTAG00000000414
FUT5
Nomatch
37
1
cellular
process
ENSBTAG00000000918
ERMARD
Periventricular
nodularheterotopia
6Xlinked
periventricular
heterotopia6q
terminaldeletion
syndrome
Periventricular
nodularheterotopia
Nomatch
5
4
4
9
6
7
4
9
17
cellular
process
ENSBTAG00000000998
SLC39A6
Cells
6
9
2
7
4
10
1
10
9
cellular
process
ENSBTAG00000001133
VWA8
Mice
2
3
6
7
10
3
3
4
4
blood
metabolism
ENSBTAG00000001140
IAH1
httpswwwmousephenotypeor
gdatagenesMGI1914982sect
ionassociations
BOTHSEXadiposetissue
behaviourneurological
growthsizebodyFEMALE
homeostasismetabolismMALE
limbsdigitstailskeleton
9
18
30
60
23
19
12
17
30
cellular
process
ENSBTAG00000001262
IRGQ
Cells
6
1
5
3
06
2
5
04
2
immunity
ENSBTAG00000001286
ELMO3
httpswwwmousephenotypeor
gdatagenesMGI2679007sect
ionassociations
Decreasedstartlereflex
02
27
14
7
7
02
1
fertility
ENSBTAG00000001291
OR8H3
Nomatch
fertility
ENSBTAG00000001500
FIGNL1
Cells
08
2
01
05
07
08
04
2
3
cellular
process
ENSBTAG00000001505
GRK4
Cells
10
7
09
4
6
3
2
6
218
signalling
ENSBTAG00000002220
SPTLC1
Hereditarysensory
andautonomic
neuropathytype1
No
17
27
3
28
12
28
5
19
6
metabolism
ENSBTAG00000002288
NT5DC4
Nomatch
02
cellular
process
ENSBTAG00000002289
NLRP14
No
04
cellular
process
ENSBTAG00000002660
CCDC11
Situsinversustotalis
Nomatch
3
1
06
2
07
3
03
2
54
fertility
121
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000002809
CAPN10
AutosomalRecessive
MentalRetardation
DIABETESMELLITUS
NONINSULIN
DEPEND
ENT1
Cells
2
13
41
33
29
immunity
ENSBTAG00000002966
DNAJC13
Youngadultonset
Parkinsonism
No
5
52
94
62
84
fertility
ENSBTAG00000003231
LIM2
Pulverulentcataract
Cells
01
02
01
02
cellular
process
ENSBTAG00000003508
CFAP69
httpswwwmousephenotypeor
gdatagenesMGI2443778sect
ionassociations
Abnormalcirculatinginsulinlevel
MaleInfertilityDecreasedmean
plateletvolum
eIncreasedleanbody
mass
203
07
1
01
7NC
ENSBTAG00000003523
UGT2B15
Nomatch
34
metabolism
ENSBTAG00000004081
FAT3
No
2
03
01
01
05
signalling
ENSBTAG00000004555
LRP2
DONN
AIBARROW
SYND
ROME
Intellectualdisability
Cells
56
3
01
disease
ENSBTAG00000004772
THEM
4
Cells
8
16
295
27
52
63
metabolism
ENSBTAG00000004860
SLC27A6
Cells
6
02
10
602
207
01
immunity
ENSBTAG00000005021
SEMA5A
Monosom
y5p
Cells
9
04
89
13
08
08
6immunity
ENSBTAG00000005170
GPR114
Mice
101
02
01
09
6
01
fertility
ENSBTAG00000005248
OFCC1
No
01
01
03
01
01
01
03
NC
ENSBTAG00000005340
SERGEF
Cells
7
10
35
15
58
23
cellular
process
ENSBTAG00000005357
VPS11
Mice
14
13
619
717
815
15
cellular
process
ENSBTAG00000005466
C8H8orf58
Nomatch
05
01
08
02
103
02
04
NC
ENSBTAG00000005514
TRIO
Cells
17
66
10
210
88
7immunity
ENSBTAG00000005633
RGNEF
httpswwwmousephenotypeor
gdatagenesMGI1346016sect
ionassociations
Vertebralfusion
36
214
05
801
08
1cellular
process
ENSBTAG00000006021
CEP250
Mice
3
22
43
41
88
metabolism
ENSBTAG00000006027
USP34
Mice
11
63
74
88
912
fertility
ENSBTAG00000006532
CDK5RAP2
Autosomalrecessive
primary
microcephaly
httpswwwmousephenotypeor
gdatagenesMGI2384875sect
ionassociations
skeletonimmunesystem
growthsizebodyhem
atopoietic
system
5
34
92
74
784
developm
ent
ENSBTAG00000007001
SLC26A1
No
03
01
02
32
903
01
07
metabolism
ENSBTAG00000007340
LOC101906349
Nomatch
NC
ENSBTAG00000007568
MEPE
Cells
cellular
process
ENSBTAG00000007910
EXOC7
Cells
13
24
23
22
12
22
15
22
28
cellular
process
122
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000008650
CAMK1D
Cells
24
2
01
06
1
2
03
7
3
signalling
ENSBTAG00000008797
Uncharacterized
Nomatch
09
04
07
04
NC
ENSBTAG00000008871
DDX4
Cells
02
01
01
132
cellular
process
ENSBTAG00000008996
SFI1
Mice
2
1
06
3
04
3
2
5
NC
ENSBTAG00000009014
UPK1B
Mice
2
2
2
5
4
4
2
7
3
fertility
ENSBTAG00000009030
NOX5
Nomatch
04
01
8
01
immunity
ENSBTAG00000009033
TEX37
httpswwwmousephenotypeor
gdatagenesMGI1921471sect
ionassociations
Abnormalstartlereflex
03
83
NC
ENSBTAG00000009064
CSF2RB
Congenitalpulmonary
alveolarproteinosis
Mice
01
2
04
06
2
23
03
17
05
signalling
ENSBTAG00000009144
BPIFA2A
Nomatch
6
06
05
metabolism
ENSBTAG00000009337
PNMAL1
Cells
8
05
01
3
02
03
03
04
3
Immunity
ENSBTAG00000009444
SLC15A5
Cells
01
cellular
process
ENSBTAG00000009568
KANK2
Cells
1
17
30
9
3
31
9
10
3
cellular
process
ENSBTAG00000009569
DOCK6
AdamsOliver
syndrome
Cells
2
5
10
12
5
15
5
6
21
metabolism
ENSBTAG00000009836
CHGA
Mice
140
555
01
2
02
01
07
1
metabolism
ENSBTAG00000010522
Uncharacterized
Nomatch
01
02
02
03
2
metabolism
ENSBTAG00000010601
DOLK
Familialisolated
dilated
cardiomyopathy
No
4
8
6
22
9
6
5
5
12
metabolism
ENSBTAG00000010847
FOXN4
Cells
01
01
metabolism
ENSBTAG00000010954
ART3
Cells
02
1
55
01
08
2
10
06
71
metabolism
ENSBTAG00000011011
SSH2
No
3
09
5
4
2
2
6
7
10
cellular
process
ENSBTAG00000011387
NAT9
Cells
6
7
3
8
6
8
4
6
6
metabolism
ENSBTAG00000011420
CA9
Cells
04
3
02
2
03
1
01
2
56
metabolism
ENSBTAG00000011433
RGP1
httpswwwmousephenotypeor
gdatagenesMGI1915956sect
ionassociations
Completepreweaninglethality
7
19
11
18
8
12
9
8
25
cellular
process
ENSBTAG00000011622
C18orf21
Mice
9
10
6
6
2
10
5
10
34
NC
ENSBTAG00000011683
NUMB
Cells
21
25
8
35
17
47
6
8
19
immunity
ENSBTAG00000011684
RAB11FIP1
Cells
6
01
8
09
11
02
1
4
NC
ENSBTAG00000012053
ACCSL
No
03
01
cellular
process
ENSBTAG00000012314
LDLR
Homozygousfamilial
hypercholesterolemia
Cells
8
33
11
21
17
24
2
15
2
cellular
process
123
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000012326
LOC101906218
Nomatch
11
01
01
09
01
NC
ENSBTAG00000012458
PSAPL1
Cells
01
NC
ENSBTAG00000012565
PCNX
No
3
32
32
33
48
developm
ent
ENSBTAG00000012833
ATG2B
No
4
21
21
21
29
NC
ENSBTAG00000012921
KIF24
Mice
05
07
03
08
02
202
214
metabolism
ENSBTAG00000013568
UEVLD
Cells
5
41
53
42
214
metabolism
ENSBTAG00000013685
KRT31
Mice
cellular
process
ENSBTAG00000013753
DDRGK1
Mice
22
32
18
56
55
25
14
22
37
NC
ENSBTAG00000013789
OR6K6
No
NC
ENSBTAG00000013838
ALLC
Mice
02
01
78
metabolism
ENSBTAG00000013916
HERC2
Mentalretardation
autosomalrecessive
38SKINHAIREYE
PIGM
ENTATION
VARIATIONIN1
Developm
entaldelay
withautismspectrum
disorderandgait
instability
No
8
74
83
74
65
metabolism
ENSBTAG00000014001
MAP1A
Cells
136
23
22
202
43
352
cellular
process
ENSBTAG00000014058
LDB2
No
28
33
12
825
410
3cellular
process
ENSBTAG00000014284
ALPK2
No
01
10
01
304
02
cellular
process
ENSBTAG00000014750
EPB41L4A
httpswwwmousephenotypeor
gdatagenesMGI103007secti
onassociations
Decreasedcirculatingglucoselevel
Abnormalconeelectrophysiology
Immunesystem
s1
22
20
07
12
13
43
NC
ENSBTAG00000014752
AKAP11
httpswwwmousephenotypeor
gdatagenesMGI2684060sect
ionassociations
Nopheno
24
72
63
306
68
cellular
process
ENSBTAG00000014768
ZNF786
Cells
3
23
42
48
33
cellular
process
ENSBTAG00000014794
SCYL3
Cells
6
10
59
613
516
26
cellular
process
ENSBTAG00000015027
Clusterin
httpswwwmousephenotypeor
gdatagenesMGI88423sectio
nassociations
Aggressionvertebralfusion
05
03
03
02
05
3NC
124
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000015210
LOXHD1
DEAFNESS
AUTOSOMAL
RECESSIVE77
Autosomalrecessive
nonsyndromic
sensorineuDominant
lateonsetFuchs
cornealdystrophy
deafnessautosomal
recessivetype77
No
3
03
NC
ENSBTAG00000015294
COX10
Leighsyndrome
Mitochondrial
complexIVdeficiency
(MTC4D)
Cells
4
6
24
5
2
3
14
3
16
cellular
process
ENSBTAG00000015335
BAI3
No
38
03
8
01
01
2
signalling
ENSBTAG00000015490
HS1BP3
Cells
1
10
5
9
4
8
2
1
37
cellular
process
ENSBTAG00000015602
C7H5orf45
Nomatch
10
cellular
process
ENSBTAG00000015839
MAP4
No
27
85
104
35
11
100
82
46
61
cellular
process
ENSBTAG00000015912
DMKN
No
01
04
03
NC
ENSBTAG00000015980
FASN
AutosomalRecessive
MentalRetardation
Cells
19
29
7
12
6
39
6
7
22
metabolism
ENSBTAG00000015985
GPR20
httpswwwmousephenotypeor
gdatagenesMGI2441803sect
ionassociations
Anatomicaldefectsvarioustissues
03
02
08
01
09
signalling
ENSBTAG00000016495
LINS
AutosomalRecessive
MentalRetardation
Cells
6
3
1
5
3
3
1
5
12
NC
ENSBTAG00000016691
LACC1
Mice
2
9
1
9
9
8
08
7
27
NC
ENSBTAG00000017096
ERICH1
Cells
6
4
2
4
2
5
2
5
13
metabolism
ENSBTAG00000017165
MATN2
No
2
7
2
20
08
3
2
2
7
cellular
process
ENSBTAG00000017418
RRP1B
Cells
6
4
4
5
3
7
3
12
159
cellular
process
ENSBTAG00000017436
ALS2CR11
Cells
01
37
disease
ENSBTAG00000017505
PAXIP1
Cells
2
3
2
4
2
4
2
3
12
signalling
ENSBTAG00000017576
GPR144
Nomatch
02
signalling
ENSBTAG00000018528
USP45
No
1
1
04
1
05
06
02
2
04
cellular
process
ENSBTAG00000018810
THBS2
INTERVERTEBRAL
DISCDISEASE
Cells
1
2
04
09
1
09
04
1
2
fertility
ENSBTAG00000019072
PSD4
Cells
7
8
1
1
7
02
6
4
cellular
process
125
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000019265
AGK
Sengerssyndrom
e
Congenitalcataract
hypertrophic
cardiomyopathy
mitochondrial
myopathy
Cells
7
36
43
52
65
metabolism
ENSBTAG00000019752
BPIFA2B
Nomatch
01
1metabolism
ENSBTAG00000019799
FCAM
R
Cells
07
01
02
01
2
7immunity
ENSBTAG00000019961
ATP4B
Cells
12
03
1
05
cellular
process
ENSBTAG00000020414
DHX37
No
3
33
73
73
413
cellular
process
ENSBTAG00000021073
KIAA1549
PILOCYTIC
ASTROCYTOM
ASOMATIC
httpswwwmousephenotypeor
gdatagenesMGI2669829sect
ionassociations
behaviourneurological
homeostasism
etabolism
hematopoieticsystem
3
04
23
01
21
04
3NC
ENSBTAG00000021233
OR5B17
No
fertility
ENSBTAG00000021375
OR10Q1
No
fertility
ENSBTAG00000021643
EFCAB12
Cells
01
1
9cellular
process
ENSBTAG00000021645
MBD4
Cells
8
43
82
51
10
6cellular
process
ENSBTAG00000022509
DNAH
9
No
02
5
05
5metabolism
ENSBTAG00000022858
LOC783561
Nomatch
fertility
ENSBTAG00000026247
PKHD1L1
Cells
02
01
cellular
process
ENSBTAG00000026323
LYSB
Nomatch
68
metabolism
ENSBTAG00000026350
SPATC1
Cells
05
06
01
01
01
39
NC
ENSBTAG00000027390
RTFDC1
Mice
28
19
21
20
922
13
20
49
NC
ENSBTAG00000031115
Uncharacterized
Nomatch
01
cellular
process
ENSBTAG00000031718
OGFR
Mice
6
17
612
517
39
1fertility
ENSBTAG00000032264
Uncharacterized
Nomatch
52
4
22
NC
ENSBTAG00000032481
DAPL1
No
1
6
07
01
13
10
cellular
process
126
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000032527
ERCC6
CEREBROOCULOFACI
OSKELETAL
SYND
ROME1
COCKAYNE
SYND
ROMEBDe
SanctisCacchione
syndromeMACULAR
DEGENERATION
AGERELATED5UV
SENSITIVE
SYND
ROME1COFS
syndromeCockayne
syndrometype1
Cockaynesyndrome
type2Cockayne
syndrometype3UV
sensitivesyndrome
Cockaynesyndrome
typeB(CSB)De
SanctisCacchione
syndrome(DSC)UV
sensitivesyndrome
(UVS)cerebrooculo
facioskeletal
syndrometype1
(COFS1)
Mice
3
205
22
207
42
cellular
process
ENSBTAG00000032821
SCEL
No
01
02
17
04
NC
ENSBTAG00000033954
PRPS1L1
No
35
metabolism
ENSBTAG00000034991
SLX4IP
Cells
6
21
32
207
33
NC
ENSBTAG00000035226
TOR1AIP1
Cells
9
12
513
816
716
83
NC
ENSBTAG00000035309
OR4F15
No
fertility
ENSBTAG00000035708
LOC527795
Nomatch
01
01
23
metabolism
ENSBTAG00000035985
LOC101909743
OR8K1
No
fertility
ENSBTAG00000036019
RIPPLY3
httpswwwmousephenotypeor
gdatagenesMGI2181192sect
ionassociations
hemopoieticsystem
01
04
01
9
01
malformation
ENSBTAG00000037399
Uncharacterized
Nomatch
2
92
4
24
05
5signalling
ENSBTAG00000038606
DBDR
No
02
01
signalling
ENSBTAG00000038831
PHYHD1
No
2
18
21
23
30
18
16
15
1metabolism
ENSBTAG00000039110
OR8U1
No
fertility
ENSBTAG00000039117
FAM179B
Cells
14
206
31
308
33
cellular
process
ENSBTAG00000039520
SIRPB1
Cells
4
08
32
516
05
12
01
signalling
127
Geneamp
Geneampnam
eampHum
anampdefectamp
Knock5OutsampMouseamp
MouseampPhenotypeamp
AampBamp
CampDamp
EampFamp
GampHamp
IampFunctionamp
ENSBTAG00000040568
Uncharacterized
Nomatch
4
307
54
33
25
cellular
process
ENSBTAG00000045558
LOC100299084
Nomatch
1
22
110
10
09
52
fertility
ENSBTAG00000045588
LOC100298356
Nomatch
metabolism
ENSBTAG00000045709
LOC514057
Nomatch
fertility
ENSBTAG00000046175
Uncharacterized
Nomatch
01
01
cellular
process
ENSBTAG00000046228
Olfactory
receptor
Nomatch
fertility
ENSBTAG00000046239
C10orf12
No
03
104
07
108
06
209
cellular
process
ENSBTAG00000046285
OR8I2
No
fertility
ENSBTAG00000046360
Uncharacterized
Nomatch
NC
ENSBTAG00000046423
LOC618007
Nomatch
NC
ENSBTAG00000046527
LOC100139830
Nomatch
fertility
ENSBTAG00000046590
FAM181A
Cells
09
44
NC
ENSBTAG00000046593
Uncharacterized
Nomatch
20
metabolism
ENSBTAG00000046727
Olfactory
receptor
Nomatch
04
fertility
ENSBTAG00000047150
RTEL
Cells
4
23
64
55
616
cellular
process
ENSBTAG00000047166
SH2D7
Mice
03
101
01
1
02
02
2immunity
ENSBTAG00000047174
Uncharacterized
Nomatch
03
7105
02
01
271
05
03
NC
ENSBTAG00000047317
CL43
Nomatch
01
120
01
immunity
ENSBTAG00000047587
Uncharacterized
Nomatch
22
09
09
35
03
4
06
2cellular
process
ENSBTAG00000047661
FABP12
httpswwwmousephenotypeor
gdatagenesMGI1922747sect
ionassociations
Growthsizebody
01
02
69
metabolism
ENSBTAG00000048021
PNMAL2
No
04
03
2
03
03
02
NC
ENSBTAG00000048153
Uncharacterized
Nomatch
2
05
01
02
01
cellular
process
128
Discussion(
Deleterious recessive variants arenaturally present in outbred animal species
Theydonotcreateproblemsaslongastheyremainatlowfrequencyinrandom
mating populations Under these conditions these variants remain in the
heterozygousstateanddonotproducephenotypiceffectsHoweverassoonas
their frequency increases andmatingoccursbetween related individuals their
deleteriouseffectsbecomeapparentandhighlyaffectthewelfareandviabilityof
animalscarryingtheminthehomozygousstate
In industrialdairy cattlebreedscrossbreeding isgenerallyavoidedandbreeds
areinpracticegeneticisolatesinwhichartificialinsemination(AI)iswidelyused
for rapidly spreading thegeneticprogress in thepopulationAsa consequence
elitesireshavemanythousandandsometimeshundredsofthousanddaughters
Under these conditions mating between relatives becomes practically
unavoidable in the longterm In fact inspiteof theclosecontrolof inbreeding
andofthematingsystemgeneticdefectsregularlyappeartothesurfaceSome
examples are the Complex Vertebral Malformation (CVM) and the Bovine
Leucocyte Adhesion Deficiency (BLAD) in Holstein (Schuumltz et al 2008) and
WeaverdiseaseinBrown(McClureetal2013)Inthesecasestheeffectsofthe
recessivedeleteriousallelesareclearandmoleculardiagnostictoolshavebeen
developedtoavoidtheirfurtherspreadingandtoinitiatetheeradicationprocess
However recessive deleterious alleles may produce their effects in a more
deceitful way for example by affecting fertility inducing early lethality or
influence disease susceptibility All these effects are difficult to trackwith the
current methods of phenotype measurements but have an impact on the
economyofthelivestocksectorandonanimalwelfare
InthispaperwesoughtfordeleteriousrecessiveallelesinItalianHolsteincattle
This breed is recently experiencing a decrease in the genetic trend of fertility
traits (Pryce et al 2014) therefore we started our search from the exome
sequences of 20 sires 19 of which having extreme EBV values for male and
femalefertility
130
The choiceof sequencingexomeswasa compromisebetween completenessof
informationaccuracyofvariantandgenotypedetectionandcostExomesclearly
give incomplete information since they miss all variation in regulatory sites
outside gene boundaries (ENCODE Consortium 2012) These may be very
relevantforthecontroloftraitvariationbuttheyarestillverypoorlyannotated
in livestock On the other hand sequencing exomes yields millions instead of
billionsbasepairsadeepersequencingoftargetregionsandaconsequentmore
reliablevariantandgenotypecalling(FigureS1andS2)Theanalysisofexome
data is also less demanding in terms of computer time (thousands instead of
millions variants are to be analysed) and easier in terms of biological
interpretation Exome sequencing revealed to be very effective in the
identificationofanumberofmutationsaffectinghumanhealth(Bamshadetal
2011 Yang et al 2013) All thesewere defects caused bymutations in single
genes producing clear phenotypic effects havingMendelian inheritance These
conditionsdonotholdinourinvestigationthereforewehadtodevisestrategies
to integrate exome data with other information collected from the Holstein
populationandotherspecies
Theoverallprocedureforidentifythecandidatedeleteriousrecessivewasbased
on the rationale that deleterious recessive alleles in coding sequences are
expectedtobei)causedbyseverealterationoftheproteinsencodedbygenes
ii)mostlyifnotexclusivelycarriedattheheterozygousstateinthepopulation
iii)unevenlydistributedinanimalsofhighandlowfertilityiv)likelyconserved
indifferentspeciesbecauseofselectionconstraintv)likelyassociatedtogenetic
defectsinanyspecieswhenseverelydamaged
A stepwise procedure was set up for reducing candidates to a number
manageable for functional analysis (Figure1) In the first stepmutationswere
discoveredinthesecondfilteredandinthirdassessed
Thisprocedureidentifiesonlysomeofthedeleteriousrecessivesexistinginthe
populationanalysedSincewesequencedonlyexomesandnotwholegenomes
ouranalysesislimitedtocodingsequencesandatthemomenttoSNPvariation
Wethereforemissregulatoryvariationsoutsidegenesand theeffectsof indels
andmorecomplexvariations
131
ThenumberofanimalssequencedislimitedDeleteriousrecessivesareexpected
toberareinthepopulationandwehavesampledonly20animalsfromasubset
ofsiresauthorizedtoAI inItalyAlthoughwehavetakenamongthemthose in
theminus tail of fertility trait thatmighthave ahigher genetic load itrsquos likely
thatwehavemissedothervariantsexistinginthepopulationOntheotherhand
variantscarriedbyAIauthorizedbullsaretheonesthatwillhavealargerimpact
onthepopulationifwidelyspreadbyelitebullsandthereforetheyarethemost
interestingfromabreedingpointofview
Most deleterious variants are expected to be very rare and therefore not
detected by the approach here adopted or detected and afterwards discarded
duringthefilteringprocesses
Sequencing is prone to errors Some real variants may have been discarded
duringQCofsequencedataduetopoorsequencequalityorpoorcoverageofthe
region
Someofthevariationsclassifiedasldquonondeleteriousrdquobythesoftwareusedmay
indeedhave an effect In fact synonymousmutation arenon`neutralwhenever
thedifferenttRNAsforasameaminoacidhavedifferentrelativeabundanceand
translationrates(Goymer2007)
1(Searching(candidate(deleterious(variants(in(exomes(
ExomesproducedafairamountofdataAlltogether5355Mbweresequenced
These represent 9998 of the exome sequences targeted by probes Among
theseonly35hadasequencingdepthacrossanimalslt60XSequencingdepth
is a key factor for variant detection and even more so when the genotype of
sequencedanimalsistobeinferredInvestigatingtheconcordancebetweenthe
genotypes detected at more than 20K SNP by both Beadchip and exome
sequencingwe observed a clear inverse relationship between sequence depth
andnumberofdiscordances(TableS3andFigureS3)Theaveragesequencing
depth expected (50X) and realised in practice (4058) in this investigation
yieldedratheraccurategenotypessincetheaverageconcordanceobservedwas
over97and increased toover99whentwooutlieranimalswereremoved
132
from the dataset The reason for the outlier behaviour of these samples waslikely a suboptimal quality of DNA submitted to sequencing Overall almost215000 variants passed quality controls and among them 192440 autosomalSNPs have been investigated in detail As found in other investigations mostvariants were rare (Peloso et al 2014) For example the MAF distributionobservedin46904SNPsclassifiedneutralhasameanof0197andamedianof0167values(FigureS6)
ThefirststepinourresearchwastoidentifySNPscausingseverealterationsinthe proteins coded by genes These were relevant changes in conformationalterationsofsplicingsitesortruncationsofproteinsToaccomplishthistaskwereliedontwosoftwareSIFTandANNOVARthatidentified4902suchvariationsin3423genes Interestingly the subsetofdeleteriousvariantshas significantlylowermean (0156)andmedian (010) compared toneutral variations (FigureS6) This subset is therefore at least enriched in variants under purifyingselectionhavingadeleteriouseffectatthepopulationlevel
Thenumberofdeleteriousallelesdetectedishighanditisunlikelytheyallhavea relevant phenotypic effect even if most variants are rare In fact variantsdeleterious for a protein are not automatically deleterious for an organism asthe protein change induced by the variant although relevant may not affectproteinfunctionvariationmayoccurinonememberofagenefamilysothatitsfunction may be compensated by other family members the protein functionmay be compensated by other proteins or the affected metabolic pathwaycompensated by alternative pathways The challenge was therefore to filter asubsetofvariantslikelyhavingaphenotypiceffecttobeproposedforvalidationatalargerscaleToaccomplishthistaskwesoughtforadditionalinformationinexomesequencesandinalargepopulationofHolsteinsiresgenotypedwiththebovineHDBeadchip
( (
133
2(Filtering(deleterious(variants(
The second step consisted in filtering the several thousand deleterious SNPsidentified in exomes to search for SNPs and genes that worth deeperinvestigationWeusedtwostrategiestoaccomplishthistask
21$ Distribution$ of$ deleterious$ variants$ in$ fertility$ EBV$ plus9variant$ and$
minus9variant$animals$
Fertility traits are controlled by a high number of genes and are highlygenetically heterogeneous Even using a selected subset of variants identifiedfrom exome sequencing classic association analyses would need sample sizesmuch larger that the one available in this investigation to have a reasonablestatistical power (Stitziel et al 2011) The first approachwe appliedwas theanalysisof SNPallelic andgenotypicdistributionbetween theminusandplus`variant tails of male and female fertility EBVs Given the very low number ofanimals sequencedwedecided to use a simplemodel to exclude from furtheranalyses all markers showing no evidence of association with the EBV tailsratherthantofindsignificantassociations
We divided samples in cases (minus`variant animals for fertility EBV) andcontrols (plus`variant animals) and tested all possible models of inheritanceimplemented in thePlinkpackage for tackling simple genetic defectsWe thenused an empirical point`wise significant threshold based on permutations tohighlight SNPs having alleles or genotypes showing an uneven distributionbetweencasesandcontrolsandconsideredtheseasworthfurtherinvestigationAtotalof112SNPsin101genespassedthisfilteringprocess
$
22$Detection$of$regions$carrying$deleterious$mutation$from$800K$SNP$data
The analysis of exomes alone was therefore insufficient to tag recessivedeleteriousgenesinItalianHolsteinandwesoughtforadditionalinformationin1009ItalianHolsteinbullsgenotypedathighdensitythatarepartofthegenomicselection program run by ANAFI The rationale of the approach is that
134
deleterious recessives are expected to be absent or rare in the homozygous
status in the population Hence they may be tagged searching for genome
regions having markers or haplotypes lacking or having significantly less
homozygotesthanexpectedunderaneutralmodel
We used two complementary approaches to identify these candidate regions
One(OHW)searchedforverystrongsignalsproducedbysinglemarkersoutof
Hardy`Weinberg equilibrium at Ple847x10`8 A minimum of three significant
markersinaslidingwindowsof10wassettoreachsignificancetocapturemore
reliablesignalsavoidexcessivesinglemarkernoiseandaccount forapossible
Wahlund effect due to presence of the genetic structure in the population
difficulttoavoidinthesetofprogenytestedbullsanalysedThesecondapproach
(LOH) captured more moderate signals but from multiple markers in larger
regions Both approaches were designed to detect signals in regions in which
deleteriousvariantsarestillinsegregationandarenotexcessivelyrare
Asexpectedthetwoapproachesidentifiedregionsthatonlypartiallyoverlapped
(TablesS5S6eS7)Inparticular5regionsoutofthe11identifiedbyOHWwere
in common to the 240 identified by LOH Two of these regions overlap with
previous investigations that used the BovineSNP50 Beadchip Sahana et al
(2013) found a region candidate to carrypotential lethal haplotypes inNordic
Holstein on BTA7 in the region 6708007`10907840 This interval overlaps
withbothBTA7regionsdetectedinthisstudyVanRadenetal(2011)detected
inUSHolsteinalargeregiononBTA15spanning76Mb`82M(onBTA30)that
includedLRP4 gene responsible for themulefootdefectOuranalyseswith the
HDBeadchipindicatesontheonesidethatthelargeregiondetectedbySahana
etal(2013)mayconsistintwonearbyregionsOntheothersidethesignalwe
detectedonBTA15doesnot includetheLRP4 locusthat iscompletely fixed in
theItalianbullspopulationanalysed
In total83mutations in66genes fell in eitherorbothOHWandLOHregions
These were joined to the SNPs output of the EBV tail analyses for further
assessment
(
135
3(Assessing(filtered(deleterious(variants(
Different characteristics of the filtered deleterious variants were assessed to
identify among them strong candidates to submit to a large`scale validation
studyTheassessmentphasepermittedtogainadditionalinformationinfavour
or against SNP and gene candidature to be deleterious We quantified this
evidenceusingasubjectivescore
31$Comparative$genomics$analysis$
In the case of comparative genomics the score quantified the level of
conservationof thealternativeaminoacidresidues inducedbySNPvariants in
ItalianHolsteincattleComparisonwasrunagainstaset100vertebratespecies
that included 62 mammals The level of conservation is a good proxy for the
expected severity of a mutation In the case of the 46 SNPs that induced the
change of an amino acid highly conserved in the species investigatedwemay
hypothesize a strong selection pressure in favour of the maintenance or the
invariant amino acid at that position Therefore the amino acid change or the
stopcodonweobserved incattlemayaffectnotonlytheproteinstructureand
function but also the phenotype of homozygous animals Interestingly in all
thesecasestheSNPallelecodingforthenonconservedvariantswastheminor
allele
Converselyinallcasesofhighvariabilityofthetargetaminoacidacrossspecies
(asmanyassevendifferentaminoacidshavesometimesbeenobserved)poorly
supports the presence of phenotypic effects due to its change Sometimes the
informationcollectedisagainstthepresenceofadeleteriousmutationasinthe
case of the presence in different species of both alleles identified by exome
sequencingThis suggests thatneither of these is greatly affecting the viability
andfitnessofanimalscarryingthem
Wealsoobservedanumberoffailures(N=45)intheliftoverofSNPcoordinates
fromcattletohumanorinproteinalignmentThishappenedforexampleinthe
caseofuncharacterized cattleproteins thathavenoorthologous in thehuman
genome and of someolfactory receptors that belong to a highly variable gene
136
family Even if the failure of alignment is an indication of poor proteinconservationacrossspeciestheconservationofthetargetaminoacidcouldnotbeassessesandinthesecaseswepreferredtoconsidertheinformationmissingandtoscoreitaccordingly
32$Haplotype$analysis$
Asecondassessmentofthefilteredcandidateswasontheirbehaviourinthesirepopulation The ideal test would be the direct genotyping of the SNPs in thepopulationand theevaluationof theirallelicandgenotypicproportions In theabsenceofthisinformationwefirstinferredtheHDhaplotypeassociatedtoSNPallelesandassessedthebehaviourofthehaplotypeinthepopulationassumingthisasasurrogateoftheSNPbehaviourSincedeleteriousallelesareexpectedtoberareandthis isalsowhatweobserved fromcomparativegenomicanalysesweinvestigatethefrequencyonlyofthehaplotypecarryingtheminorSNPalleleagainsttheexpectationto findthecorrespondinghaplotypeneverhomozygousor a number of times much lower than expected from its frequency in thepopulation The strategy yielded some interesting confirmation but was onlypartiallysuccessful
Firstofalltheidentificationofauniquehaplotypecarryingtherarevariantwasnot always possible or straightforward In some cases carriers of a same rarevariantdidnotshareexactlythesamehaplotypeWeacceptedsomevariationifthe haplotype remained clearly distinguishable from those of animals notcarryingtherarevariantbutatthesametimereducedthestringencyofthetestInquiteanumberofcasesasamehaplotypecarriedbothcandidatedeleteriousSNP variants This may occur for example in the case of recent mutations IntheselastcasesthetestcouldnotbecarriedoutFinallymostrarevariantswereassociatedtorarehaplotypesthatarenotlikelytohavehomozygousindividualsinthepopulationindependentlyoftheSNPeffect
Thisassessmentwas thereforeonly significant fora fewhaplotypes thathadalarge number of heterozygous individuals and no or only a very fewhomozygotesandforthosethathadalargenumberofhomozygotes
137
33$Functional$analysis$
The final assessment was the analysis of the function of genes carrying the
candidate deleterious SNP This has some limitation since knowledge of gene
function is only partial and mainly gained from investigations on species
different from cattle In any case we considered the association to genetic
defectsinhumanandtheinductionofseverephenotypesinmouseknockoutas
a positive indication of the likely phenotypic effect of the severe protein
alterationweobservedincattleInadditiongenefunctionandexpressionprofile
was evaluated in search for evidence of gene involvement in fertility
developmentorprocessesfundamentalforcellandorganismviability
Amonggenesidentified22areassociatedtohumansyndromes(Table4)Eleven
syndromesareneurologicaldiseasesaffectingbraindevelopmentand inducing
mental retardation (eg Donnay`Barrow Syndrome and Autosomal recessive
primary microcephaly) the others influence development (eg
Cerebrooculofacioskeletal Syndrome 1 and Adams`Oliver syndrome) or organ
function(egCongenitalpulmonaryalveolarproteinosis)
Most of the same defects translated into Holstein would induce physical and
behavioural anomalies likely causingyoungbulls tobe excluded fromprogeny
testing Itshould in factbeconsideredthattheanimalsweanalysedareonthe
onesidethosethatwillmostinfluencethegeneticmakeoffuturegenerationbut
are not a random sample of Holstein animals These bulls have been progeny
testedandauthorisedtobeusedinartificialinseminationinItalyCandidatesto
progenytestingderivefromthematingofthebest1siresandbest2damsof
the Italian Holstein population At the age of 4 months they are taken to the
Italian Holstein breeder association (ANAFI) test station where they are
subjected to performance test sanitary controls and behavioural tests
Thereafterthequalityoftheirsemenischeckedbeforeauthorizationisgranted
toenterinprogenytestInthissampleofanimalsitisthereforeexpectedtofind
never at the homozygous state mutations causing lethality physical
malformation abnormal behaviour and semen infertility Also those having a
138
majoreffectonsemenqualityandhighsusceptibilitytodiseasesareexpectedto
berarelycarriedatthehomozygousstatus
Association with human syndromes is a strong evidence of the potential
phenotypiceffectofmutationsalteringthegenestructureHoweveritshouldbe
considered that variants found in humans are different fromvariants found in
cattle and that the severity of the phenotype induced by a deleterious gene is
often variable even within species Therefore we evaluated this information
togetherwiththeotherindirectevidencesofvariantstobedeleterious
WealsoenquiredthewebsiteoftheInternationalMousePhenotypeConsortium
(httpswwwmousephenotypeorg) to search for the effects of null alleles in
mice The few genes forwhichnull`mice line is available are involved in basic
functionsfortheorganismthatspanfromvision(IAH1andEPB41L4A)toaging
(RGP1)andtissuesdevelopment(FABP12RIPPLY3RGNEF)orfertility(CFAP96)
Aswithgenesassociatedtohumansyndromesmostoftheabnormalphenotypes
observedinknockmousewouldhamperyoungbullstoenterprogenytesting
Gene Ontology (GO) was evaluated on the entire gene set No enrichment on
particularcategoriesGOtermswasobservedThemostrepresentedcategories
in Biological Processes were as expected ldquometabolic processrdquo and ldquocellular
processrdquo however lower level terms ldquoreproductionrdquo and ldquodevelopmental
processrdquo are represented with meaningful hits The analysis of Molecular
Function indicates thatmany of the hits are enzymes (ldquocatalytic activityrdquo) and
haveregulatoryroles(ldquobindingrdquoandldquoenzymeregulatoractivityrdquo)Thisresultcan
be interpreted indifferentwaysOn theone sidedeleterious allelesmay exert
their deleterious effect in a number of different cellular and developmental
processes the correct process is one while there is many options to make it
wrongAlsothenumberofgenesinvestigatedisrelativelylowandformanyof
them there is little or no information available (eg 11 genes code for a
completelyuncharacterizedprotein)Inadditionsomeofthemmayturnoutnot
tobereallydeleteriousandaddnoisetotheGOanalysis
As a final step all the previous information (GO expression patterns known
function)havebeenused to estimate a scoremeasuring the importanceof the
gene function forcattle survivalandreproductionThescoringof functionwas
139
particularlydifficultGOcategoriesweregeneralandallfunctionstaggedhavearelevant importance for animal survival andproductionTherefore in this casewedidnrsquotusethewholescorerangeplanned(from`1to1)andgavenonegativescoresApositivescoreof1wasattributedonly togenesplayingan importantroleinfertilityanddevelopment
$
34$Strong$candidate$deleterious$variants$
Thescoringprocessyielded21variantswithatotalscore=035variantswithpositive and 135 with negative values (Figure 2 and Table 5) Although theprocesswasnoterrorfreeasitreliedonincompleteinformationandhadsomedegreeofsubjectivityparticularlyinthechoiceofrelativeweightsitidentifiedaset of variants that may be considered strong candidates to have deleteriouseffects in the Italian Holstein breed Variants were evaluated individually andthose in a same genes had sometimes different scores For example the fivevariantscarriedbyALPK2hadscoresrangingfrom`7to`2
140
Figure2Graphicalrepresentationoftotalscoreforeachvariants
minus505
Variation
Score
1_645923001_645985561_1381698111_1464490851_150972144
2_7478962_270362092_270455392_375933832_555609862_904645543_75418123_110403993_110404003_188945763_1137877703_1203356984_53767714_265405974_265406204_749514994_1033779064_1057246734_1130233264_1178417155_445557055_445557425_757447955_940308955_940309556_207762566_382803486_863518016_927092906_1075769256_1078851146_1090916426_1166055226_1186534167_13339757_56553577_97149147_167941797_168358027_168621947_197404097_197409057_263293538_602606098_603324848_658345648_704101558_760297748_770028908_872798628_1117618168_1128416959_83888819_83889249_237928389_510133929_1047396559_10515702910_171251510_1584300310_2802235210_2802311010_8290312110_8517371111_4630576711_4675390311_4740957711_5979141011_7832201011_8794387511_9548610911_9945389411_9946100112_1190618112_1247843312_1248084512_1413689712_5304908112_9076120913_381566413_1178793413_1178793813_5247168413_5393493013_5393494713_5393985313_5454042513_5504774013_5998820513_6304575013_6316961013_6539670814_197749414_364069514_3556894514_4688575014_5718402214_6863536315_18267915_18331715_18340415_248692
15_3018920915_3504878815_4630194115_7503314015_7503764515_8024448915_8025571015_8028875915_8038974515_8048559315_8055593115_8094968915_8275455615_8292533216_456201716_3831963616_6257435517_5309539517_5311265517_6609153117_7237753218_2562355018_3495651318_4637204118_5215480618_5408127718_5413095518_5781976819_2144410319_3107215219_3111397219_3279084319_4225178519_5139519419_5619117519_5726625120_763980120_2341043120_5888423620_6417173921_623911921_2034587821_2034628921_2683773321_3100359921_5527419521_5582646321_5817338421_5912071921_6048248421_6048251021_6284314222_5241595922_5242173422_5242204622_5242610422_5242666822_5695165722_5697484722_5779767423_4588505224_2132460424_2145336424_4676490924_5032719524_5815817824_5815862024_5815878424_5820400324_5820447925_3743318026_1811648726_2571599626_2576457027_503107827_3283255728_28422828_169040328_3571880728_4408176529_198588129_753052329_753066329_26397931
141
Table5SingleanalysesscoreandtotalforeachcandidatevariantsChrchrom
osom
ePosphysicalposition
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
164592300
ENSBTAG00000009014
UPK1B
02
I1
01
2
164598556
ENSBTAG00000009014
UPK1B
0I1
I1
00
I2
1138169811
ENSBTAG00000002966
DNAJC13
03
10
15
1146449085
ENSBTAG00000017418
RRP1B
I3
I1
I1
00
I5
1150972144
ENSBTAG00000036019
RIPPLY3
I3
I1
I1
I1
1I5
2747896
ENSBTAG00000013916
HERC2
30
10
04
227036209
ENSBTAG00000004555
LRP2
I3
I1
10
0I3
227045539
ENSBTAG00000004555
LRP2
I3
31
00
1
237593383
ENSBTAG00000032481
DAPL1
00
I1
00
I1
255560986
ENSBTAG00000032264
Uncharacterized
3I3
I1
00
I1
290464554
ENSBTAG00000017436
ALS2CR11
0I1
I1
00
I2
37541812
ENSBTAG00000046175
Uncharacterized
00
00
00
311040399
ENSBTAG00000013789
OR6K6
I3
I3
I1
00
I7
311040400
ENSBTAG00000013789
OR6K6
I3
I3
I1
00
I7
318894576
ENSBTAG00000004772
THEM
40
I3
I1
00
I4
3113787770
ENSBTAG00000000149
USP40
11
I1
01
2
3120335698
ENSBTAG00000002809
CAPN10
I3
21
00
0
45376771
ENSBTAG00000001500
FIGNL1
01
I1
00
0
426540597
ENSBTAG00000033954
PRPS1L1
1I3
I1
00
I3
426540620
ENSBTAG00000033954
PRPS1L1
1I1
I1
00
I1
474951499
ENSBTAG00000003508
CFAP69
I3
I3
I1
10
I6
4103377906
ENSBTAG00000021073
KIAA1549
0I1
11
01
4105724673
ENSBTAG00000019265
AGK
01
10
02
142
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
4113023326
ENSBTAG00000014768
ZNF786
I3
I3
I1
00
I7
4117841715
ENSBTAG00000017505
PAXIP1
0I1
I1
00
I2
544555705
ENSBTAG00000026323
LYSB
I3
3I1
00
I1
544555742
ENSBTAG00000026323
LYSB
I3
0I1
00
I4
575744795
ENSBTAG00000009064
CSF2RB
I3
I3
10
0I5
594030895
ENSBTAG00000009444
SLC15A5
I3
I1
I1
00
I5
594030955
ENSBTAG00000009444
SLC15A5
I3
I3
I1
00
I7
620776256
ENSBTAG00000008797
Uncharacterized
I3
I3
00
0I6
638280348
ENSBTAG00000007568
MEPE
0I3
I1
00
I4
686351801
ENSBTAG00000003523
UGT2B15
1I1
I1
00
I1
692709290
ENSBTAG00000010954
ART3
I3
2I1
00
I2
6107576925
ENSBTAG00000038606
DBDR
0I3
I1
00
I4
6107885114
ENSBTAG00000001505
GRK4
I3
1I1
00
I3
6109091642
ENSBTAG00000007001
SLC26A1
1I3
I1
00
I3
6116605522
ENSBTAG00000014058
LDB2
01
I1
00
0
6118653416
ENSBTAG00000012458
PSAPL1
I3
I1
I1
00
I5
71333975
ENSBTAG00000015602
C7H5orf45
I3
I3
I1
00
I7
75655357
ENSBTAG00000045588
LOC100298356
1I3
I1
00
I3
79714914
ENSBTAG00000046423
LOC618007
00
I1
00
I1
716794179
ENSBTAG00000012314
LDLR
02
10
03
716835802
ENSBTAG00000009568
KANK2
01
I1
00
0
716862194
ENSBTAG00000009569
DOCK6
01
10
02
719740409
ENSBTAG00000000414
FUT5
I3
I3
I1
00
I7
719740905
ENSBTAG00000000414
FUT5
I3
I3
I1
00
I7
143
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
726329353
ENSBTAG00000004860
SLC27A6
0I1
I1
00
I2
860260609
ENSBTAG00000011420
CA9
I3
I1
I1
00
I5
860332484
ENSBTAG00000011433
RGP1
I3
0I1
10
I3
865834564
ENSBTAG00000035708
LOC527795
00
I1
00
I1
870410155
ENSBTAG00000005466
C8H8orf58
I3
I3
I1
00
I7
876029774
ENSBTAG00000015027
Clusterin
00
I1
10
0
877002890
ENSBTAG00000012921
KIF24
12
I1
00
2
887279862
ENSBTAG00000002220
SPTLC1
00
10
01
8111761816
ENSBTAG00000006532
CDK5RAP2
I3
I3
11
1I3
8112841695
ENSBTAG00000013838
ALLC
01
I1
00
0
98388881
ENSBTAG00000015335
BAI3
00
I1
00
I1
98388924
ENSBTAG00000015335
BAI3
00
I1
00
I1
923792838
ENSBTAG00000046593
Uncharacterized
03
00
03
951013392
ENSBTAG00000018528
USP45
02
I1
00
1
9104739655
ENSBTAG00000018810
THBS2
03
10
15
9105157029
ENSBTAG00000000918
ERMARD
I3
21
00
0
10
1712515
ENSBTAG00000014750
EPB41L4A
0I1
I1
I1
0I3
10
15843003
ENSBTAG00000009030
NOX5
I3
I3
I1
00
I7
10
28022352
ENSBTAG00000035309
OR4F15
I3
1I1
00
I3
10
28023110
ENSBTAG00000035309
OR4F15
I3
I1
I1
00
I5
10
82903121
ENSBTAG00000012565
PCNX
I3
2I1
01
I1
10
85173711
ENSBTAG00000011683
NUM
BI3
0I1
00
I4
11
46305767
ENSBTAG00000002288
NT5DC4
0I3
I1
00
I4
11
46753903
ENSBTAG00000019072
PSD4
0I1
I1
00
I2
144
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
11
47409577
ENSBTAG00000009033
TEX37
I3
I2
I1
10
I5
11
59791410
ENSBTAG00000006027
USP34
02
I1
01
211
78322010
ENSBTAG00000015490
HS1BP3
00
I1
00
I1
11
87943875
ENSBTAG00000001140
IAH1
I3
3I1
10
011
95486109
ENSBTAG00000017576
GPR144
I3
I3
I1
00
I7
11
99453894
ENSBTAG00000038831
PHYHD1
11
I1
00
111
99461001
ENSBTAG00000010601
DOLK
12
10
04
12
11906181
ENSBTAG00000001133
VWA8
I3
1I1
00
I3
12
12478433
ENSBTAG00000014752
AKAP11
1I3
I1
I1
0I4
12
12480845
ENSBTAG00000014752
AKAP11
1I1
I1
I1
0I2
12
14136897
ENSBTAG00000016691
LACC1
I3
I3
I1
00
I7
12
53049081
ENSBTAG00000032821
SCEL
I3
I3
I1
00
I7
12
90761209
ENSBTAG00000019961
ATP4B
I3
I3
I1
00
I7
13
3815664
ENSBTAG00000034991
SLX4IP
00
I1
00
I1
13
11787934
ENSBTAG00000008650
CAMK
1D
I3
0I1
00
I4
13
11787938
ENSBTAG00000008650
CAMK
1D
I3
0I1
00
I4
13
52471684
ENSBTAG00000013753
DDRGK1
02
I1
00
113
53934930
ENSBTAG00000039520
SIRPB1
0I3
I1
00
I4
13
53934947
ENSBTAG00000039520
SIRPB1
00
I1
00
I1
13
53939853
ENSBTAG00000039520
SIRPB1
0I3
I1
00
I4
13
54540425
ENSBTAG00000047150
RTEL
I3
0I1
00
I4
13
55047740
ENSBTAG00000031718
OGFR
01
I1
00
013
59988205
ENSBTAG00000027390
RTFDC1
01
I1
00
013
63045750
ENSBTAG00000009144
BPIFA2A
0I3
I1
00
I4
145
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
13
63169610
ENSBTAG00000019752
BPIFA2B
0I1
I1
00
I2
13
65396708
ENSBTAG00000006021
CEP250
0I3
I1
00
I4
14
1977494
ENSBTAG00000026350
SPATC1
I3
I3
I1
00
I7
14
3640695
ENSBTAG00000015985
GPR20
00
I1
00
I1
14
35568945
ENSBTAG00000037399
Uncharacterized
30
I1
00
2
14
46885750
ENSBTAG00000047661
FABP12
0I1
I1
00
I2
14
57184022
ENSBTAG00000026247
PKHD1L1
I3
3I1
00
I1
14
68635363
ENSBTAG00000017165
MATN2
01
I1
00
0
15
182679
ENSBTAG00000046228
Olfactoryreceptor
00
I1
0o
I1
15
183317
ENSBTAG00000046228
Olfactoryreceptor
00
I1
00
I1
15
183404
ENSBTAG00000046228
Olfactoryreceptor
00
I1
00
I1
15
248692
ENSBTAG00000045558
LOC100299084
00
I1
0o
I1
15
30189209
ENSBTAG00000005357
VPS11
02
I1
00
1
15
35048788
ENSBTAG00000005340
SERGEF
0I3
I1
00
I4
15
46301941
ENSBTAG00000002289
NLRP14
I3
I3
I1
00
I7
15
75033140
ENSBTAG00000012053
ACCSL
1I3
I1
00
I3
15
75037645
ENSBTAG00000012053
ACCSL
I3
I3
I1
00
I7
15
80244489
ENSBTAG00000046527
LOC100139830
I3
0I1
00
I4
15
80255710
ENSBTAG00000046285
OR8I2
10
I1
00
0
15
80288759
ENSBTAG00000001291
OR8H3
1I1
I1
00
I1
15
80389745
ENSBTAG00000045709
LOC514057
13
I1
00
3
15
80485593
ENSBTAG00000035985
LOC101909743OR8K1
1I1
I1
00
I1
15
80555931
ENSBTAG00000022858
LOC783561
I3
0I1
00
I4
15
80949689
ENSBTAG00000039110
OR8U1
1I3
I1
00
I3
146
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
15
82754556
ENSBTAG00000021375
OR10Q1
01
I1
00
015
82925332
ENSBTAG00000021233
OR5B17
I3
0I1
00
I4
16
4562017
ENSBTAG00000019799
FCAM
RI3
I3
I1
00
I7
16
38319636
ENSBTAG00000014794
SCYL3
I3
0I1
00
I4
16
62574355
ENSBTAG00000035226
TOR1AIP1
00
I1
00
I1
17
53095395
ENSBTAG00000020414
DHX37
1I1
I1
00
I1
17
53112655
ENSBTAG00000020414
DHX37
0I3
I1
00
I4
17
66091531
ENSBTAG00000010847
FOXN4
0I1
I1
00
I2
17
72377532
ENSBTAG00000008996
SFI1
0I3
I1
00
I4
18
25623550
ENSBTAG00000005170
GPR114
0I3
I1
01
I3
18
34956513
ENSBTAG00000001286
ELMO
33
3I1
11
718
46372041
ENSBTAG00000015912
DMKN
0
1I1
00
018
52154806
ENSBTAG00000001262
IRGQ
I3
1I1
00
I3
18
54081277
ENSBTAG00000009337
PNMA
L1
0I3
I1
00
I4
18
54130955
ENSBTAG00000048021
PNMA
L2
I3
1I1
00
I3
18
57819768
ENSBTAG00000003231
LIM2
1
I3
10
0I1
19
21444103
ENSBTAG00000011011
SSH2
I3
I3
I1
00
I7
19
31072152
ENSBTAG00000022509
DNAH
9I3
0I1
00
I4
19
31113972
ENSBTAG00000022509
DNAH
90
0I1
00
I1
19
32790843
ENSBTAG00000015294
COX10
12
10
04
19
42251785
ENSBTAG00000013685
KRT31
I3
0I1
00
I4
19
51395194
ENSBTAG00000015980
FASN
1I1
10
01
19
56191175
ENSBTAG00000007910
EXOC7
3I3
I1
00
I1
19
57266251
ENSBTAG00000011387
NAT9
12
I1
00
2
147
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
20
7639801
ENSBTAG00000005633
RGNEF
1I1
I1
10
0
20
23410431
ENSBTAG00000008871
DDX4
1I3
I1
00
I3
20
58884236
ENSBTAG00000005514
TRIO
I3
0I1
00
I4
20
64171739
ENSBTAG00000005021
SEMA5A
01
10
02
21
6239119
ENSBTAG00000016495
LINS
02
10
03
21
20345878
ENSBTAG00000012326
LOC101906218
00
I1
00
I1
21
20346289
ENSBTAG00000012326
LOC101906218
I3
0I1
00
I4
21
26837733
ENSBTAG00000047587
Uncharacterized
10
00
01
21
31003599
ENSBTAG00000047166
SH2D7
0I3
I1
00
I4
21
55274195
ENSBTAG00000039117
FAM179B
0I3
I1
00
I4
21
55826463
ENSBTAG00000014001
MAP1A
0I3
I1
00
I4
21
58173384
ENSBTAG00000009836
CHGA
I3
I1
I1
00
I5
21
59120719
ENSBTAG00000046590
FAM181A
01
I1
00
0
21
60482484
ENSBTAG00000017096
ERICH1
0I3
I1
00
I4
21
60482510
ENSBTAG00000017096
ERICH1
01
I1
00
0
21
62843142
ENSBTAG00000012833
ATG2B
0I3
I1
00
I4
22
52415959
ENSBTAG00000015839
MAP4
00
I1
00
I1
22
52421734
ENSBTAG00000015839
MAP4
1I1
I1
00
I1
22
52422046
ENSBTAG00000015839
MAP4
I3
I3
I1
00
I7
22
52426104
ENSBTAG00000047174
Uncharacterized
I3
00
00
I3
22
52426668
ENSBTAG00000047174
Uncharacterized
10
00
01
22
56951657
ENSBTAG00000021645
MBD4
1I1
I1
00
I1
22
56974847
ENSBTAG00000021643
EFCAB12
0I1
I1
00
I2
22
57797674
ENSBTAG00000031115
Uncharacterized
01
00
01
148
Mutation
Score
Chr
Pos
Gene
Genenam
eHaplotypes
Conservation
Hum
andefect
Mousemodel
Function
TotalScore
23
45885052
ENSBTAG00000005248
OFCC1
00
I1
00
I1
24
21324604
ENSBTAG00000000998
SLC39A6
02
I1
00
1
24
21453364
ENSBTAG00000011622
C18orf21
I3
I3
I1
00
I7
24
46764909
ENSBTAG00000015210
LOXHD1
0I3
10
0I2
24
50327195
ENSBTAG00000002660
CCDC11
I3
11
01
0
24
58158178
ENSBTAG00000014284
ALPK2
I3
I1
I1
00
I5
24
58158620
ENSBTAG00000014284
ALPK2
0I1
I1
00
I2
24
58158784
ENSBTAG00000014284
ALPK2
I3
I3
I1
00
I7
24
58204003
ENSBTAG00000014284
ALPK2
I3
I3
I1
00
I7
24
58204479
ENSBTAG00000014284
ALPK2
I3
2I1
00
I2
25
37433180
ENSBTAG00000040568
Uncharacterized
00
I1
00
I1
26
18116487
ENSBTAG00000046239
C10orf12
0I1
I1
00
I2
26
25715996
ENSBTAG00000010522
Uncharacterized
10
00
01
26
25764570
ENSBTAG00000046727
Olfactoryreceptor
00
I1
00
I1
27
5031078
ENSBTAG00000007340
LOC101906349
0I1
I1
00
I2
27
32832557
ENSBTAG00000011684
RAB11FIP1
10
I1
00
0
28
284228
ENSBTAG00000000079
CCSAP
0I1
I1
00
I2
28
1690403
ENSBTAG00000048153
Uncharacterized
10
00
01
28
35718807
ENSBTAG00000047317
CL43
1I1
I1
00
I1
28
44081765
ENSBTAG00000032527
ERCC6
I3
01
00
I2
29
1985881
ENSBTAG00000004081
FAT3
30
I1
00
2
29
7530523
ENSBTAG00000046360
Uncharacterized
01
00
01
29
7530663
ENSBTAG00000046360
Uncharacterized
1I1
00
00
29
26397931
ENSBTAG00000013568
UEVLD
I3
I3
I1
00
I7
149
Interestingly13ofthe22genesassociatedtohumansyndromescarryvariants
appearing in the short list of the 35 positives and only 6 in that of 135 with
negative values This in spite of the low weight (from 1 to 1) given to this
parameterEightofthe35positivevariantsareinproteinsstilluncharacterized
Sixvariantshadascoreof4orhigherThehighestscore(totalscore=7)wasfor
ELMO3 (Engulfment and cell motility gene 3) This gene is involved in cell
motility and migration ELMO3 is also a homolog of C( elegans Ced12 that is
required for many developmental processes included gonadal morphogenesis
Mutant for this gene in C( elegans suffer from abnormal development usually
with lethal consequences or sublethal developmental defects including small
embryosdelayeddevelopmentandreducedfertility(Gumiennyetal2001)In
human it is expressed in testis prostate and ovary
(httpwwwgenecardsorgcgibincarddispplgene=ELMO3)
ThegeneDNAJC13(DnaJ(Hsp40)homologsubfamilyCmember13)hadatotal
score of 5 This gene is involved in the onset of Parkinson disease DNAJC13
regulates the dynamics of clathrin coats on early endosomes Cellular analysis
shows that the mutation Arg855Ser identified by VilarintildeoGuumlell et al (2014)
confers a gainoffunction that impairs endosomal transportApossible roleof
DNAJC13geneinTourettesyndromechronicticdisorderisunderinvestigation
(Sundarametal2011)
A second genehad a scoreof 5THBS2 (Thrombospondin2) It belongs to the
thrombospondin family and is a powerful inhibitor of tumour growth and
angiogenesis It is highly expressed in testis in fetal Leydig cell Hirose et al
(2008) found a significant association between an intronic SNP in the THBS2
gene and lumbar disc herniation Thrombospondin2 is also a matricellular
proteinfoundinhumanserumDeletionofTSP2causesagedependentdilated
cardiomyopathy (Hanatani et al 2014) Mutant mice for this gene display a
variety of mutant phenotype Kyriakides et al 1998 suggest that THBS2
modulates the cell surface properties of mesenchymal cells affecting cell
functionssuchasadhesionandmigration
150
Three genes had a score of 4 all three are associated to human syndromes
Mutation in HERC2 (HECT and RLD domain containing E3 ubiquitin protein
ligase2)isinvolvedinautosomalrecessivementalretardation38(Puffenberger
etal2012)Jietal(2000)identifiedinmutantmouseneuromuscularsecretory
vesicledefectsandspermacrosomedefectsMoreovergeneticvariationsinthis
gene are associated with skinhaireye pigmentation variability Mutations in
DOLK gene (dolichol kinase) are associated with dolichol kinase deficiency
(Lefeberetal2011)COX10(cytochromecoxidaseassemblyhomolog10)gene
is associated to mitochondrial complex IV deficiency causing Leigh syndrome
(Antonickaetal2003)
Conclusions)
Exomesequencingthereforeprovidedvaluableinformationoncodingsequence
variants and on their potential effect on protein function As a group variants
classifiedasdeleteriouswererarerthanvariantsclassifiedasneutralSincethis
isanexpectedeffectofpurifyingselectionthissetofvariantwaslikelyenriched
forvariantshavingaphenotypiceffect
Theinclusionintheanalysesofplusandminusvariantanimalsforfertilitytraits
likelypermittedtosequencethemostimportantvariantsinfluencingthesetraits
butthelownumberofanimalssequencedcombinedwiththecomplexityofthe
traitpermittedtofindonlyasuggestiveassociationbetweenSNPsandtraitsItrsquos
likely that amuch larger number of individuals is to be sequenced if genome
wideassociationisthestrategyoneswanttopursue
We explored an alternative strategy to confirm suggestivedeleterious variants
based on indirect evidence of the variant effect by assessing i) evolutionary
constraintsassociatedtotheaminoacidresiduecodedbythevariantii)variant
behaviour in a large population In the absence of direct genotyping of the
candidatevariantsweusedhaplotypeinformationassurrogateiii)functionand
effectsinotherspecies
151
We attempted to quantify these criteria through a score subjective and only
indicativebutpermitted tohighlighted35genes so farneglected in cattlebut
deservingfurtherattentionandpotentiallyhavingdeleteriouseffectsinHolstein
andothercattlebreedsThesemaybepartoftherecessivedeleterioussetthat
constitutethegeneticloadofItalianHolstein
References)AntonickaHLearySCGuercinGHAgarJNHorvathRKennawayNG
HardingCOJakschMShoubridgeEA2003MutationsinCOX10
resultinadefectinmitochondrialhemeAbiosynthesisandaccountfor
multipleearlyonsetclinicalphenotypesassociatedwithisolatedCOX
deficiencyHumMolGenet122693ndash2702doi101093hmgddg284
AshwellMSHeyenDWSonstegardTSVanTassellCPDaYVanRaden
PMRonMWellerJILewinHA2004DetectionofQuantitativeTrait
LociAffectingMilkProductionHealthandReproductiveTraitsin
HolsteinCattleJournalofDairyScience87468ndash475
doi103168jdsS00220302(04)731860
AstonKICarrellDT2009GenomeWideStudyofSingleNucleotide
PolymorphismsAssociatedWithAzoospermiaandSevere
OligozoospermiaJournalofAndrology30711ndash725
doi102164jandrol109007971
AulchenkoYSRipkeSIsaacsADuijnCMvan2007GenABELanRlibrary
forgenomewideassociationanalysisBioinformatics231294ndash1296
doi101093bioinformaticsbtm108
BamshadMJNgSBBighamAWTaborHKEmondMJNickersonDA
ShendureJ2011ExomesequencingasatoolforMendeliandisease
genediscoveryNatRevGenet12745ndash755doi101038nrg3031
BieseckerLGShiannaKVMullikinJC2011Exomesequencingtheexpert
viewGenomeBiology12128doi101186gb2011129128
152
BiffaniSMarusiMBiscariniFCanavesiF2005Developingagenetic
evaluationforfertilityusingangularityandmilkyieldascorrelatedtraits
InterbullBulletin063
BiffaniSSamoreacuteABCanavesiF2002Inbreedingdepressionforproduction
reproductionandfunctionaltraitsinItalianHolsteincattleInstitut
NationaldelaRechercheAgronomique(INRA)pp0ndash4
BolgerAMLohseMUsadelB2014TrimmomaticAflexibletrimmerfor
IlluminaSequenceDataBioinformaticsbtu170
doi101093bioinformaticsbtu170
BrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingand
MissingDataInferenceforWholeGenomeAssociationStudiesByUseof
LocalizedHaplotypeClusteringAmJHumGenet811084ndash1097
CharlesworthDWillisJH2009ThegeneticsofinbreedingdepressionNat
RevGenet10783ndash796doi101038nrg2664
ChunSFayJC2009Identificationofdeleteriousmutationswithinthree
humangenomesGenomeRes191553ndash1561
doi101101gr092619109
ConsortiumTEP2012AnintegratedencyclopediaofDNAelementsinthe
humangenomeNature48957ndash74doi101038nature11247
DePristoMABanksEPoplinRGarimellaKVMaguireJRHartlC
PhilippakisAAdelAngelGRivasMAHannaMMcKennaA
FennellTJKernytskyAMSivachenkoAYCibulskisKGabrielSB
AltshulerDDalyMJ2011Aframeworkforvariationdiscoveryand
genotypingusingnextgenerationDNAsequencingdataNatGenet43
491ndash498doi101038ng806
GoymerP2007SynonymousmutationsbreaktheirsilenceNatRevGenet8
92ndash92doi101038nrg2056
GumiennyTLBrugneraEToselloTrampontACKinchenJMHaneyLB
NishiwakiKWalkSFNemergutMEMacaraIGFrancisRSchedl
TQinYVanAelstLHengartnerMORavichandranKS2001CED
12ELMOaNovelMemberoftheCrkIIDock180RacPathwayIs
RequiredforPhagocytosisandCellMigrationCell10727ndash41
doi101016S00928674(01)005207
153
HanataniSIzumiyaYTakashioSKimuraYArakiSRokutandaTTsujita
KYamamotoETanakaTYamamuroMKojimaSTayamaS
KaikitaKHokimotoSOgawaH2014Circulatingthrombospondin2
reflectsdiseaseseverityandpredictsoutcomeofheartfailurewith
reducedejectionfractionCircJ78903ndash910
HiroseYChibaKKarasugiTNakajimaMKawaguchiYMikamiY
FuruichiTMioFMiyakeAMiyamotoTOzakiKTakahashiA
MizutaHKuboTKimuraTTanakaTToyamaYIkegawaS2008
AfunctionalpolymorphisminTHBS2thataffectsalternativesplicingand
MMPbindingisassociatedwithlumbardischerniationAmJHum
Genet821122ndash1129doi101016jajhg200803013
httpsgenomeucsceduindexhtml[WWWDocument]nd
JamrozikJFatehiJKistemakerGJSchaefferLR2005EstimatesofGenetic
ParametersforCanadianHolsteinFemaleReproductionTraitsJournalof
DairyScience882199ndash2208doi103168jdsS00220302(05)728952
JanisEWiggintonDJC2005AnoteonexacttestsofHardyWeinberg
equilibriumAmericanjournalofhumangenetics76887ndash93
doi101086429864
JiYRebertNAJoslinJMHigginsMJSchultzRANichollsRD2000
StructureoftheHighlyConservedHERC2GeneandofMultiplePartially
DuplicatedParalogsinHumanGenomeRes10319ndash329
KentWJSugnetCWFureyTSRoskinKMPringleTHZahlerAM
HausslerandD2002TheHumanGenomeBrowseratUCSCGenome
Res12996ndash1006doi101101gr229102
KumarPHenikoffSNgPC2009Predictingtheeffectsofcodingnon
synonymousvariantsonproteinfunctionusingtheSIFTalgorithmNat
Protoc41073ndash1081doi101038nprot200986
KyriakidesTRZhuYHSmithLTBainSDYangZLinMTDanielson
KGIozzoRVLaMarcaMMcKinneyCEGinnsEIBornsteinP
1998Micethatlackthrombospondin2displayconnectivetissue
abnormalitiesthatareassociatedwithdisorderedcollagenfibrillogenesis
anincreasedvasculardensityandableedingdiathesisJCellBiol140
419ndash430
154
LaneEACroweMABeltmanMEMoreSJ2013Theinfluenceofcowand
managementfactorsonreproductiveperformanceofIrishseasonal
calvingdairycowsAnimReprodSci14134ndash41
doi101016janireprosci201306019
LefeberDJdeBrouwerAPMMoravaERiemersmaMSchuurs
HoeijmakersJHMAbsmannerBVerrijpKvandenAkkerWMR
HuijbenKSteenbergenGvanReeuwijkJJozwiakAZuckerN
LorberALammensMKnopfCvanBokhovenHGruumlnewaldSLehle
LKapustaLMandelHWeversRA2011AutosomalRecessive
DilatedCardiomyopathyduetoDOLKMutationsResultsfromAbnormal
DystroglycanOMannosylationPLoSGenet7e1002427
doi101371journalpgen1002427
LiHDurbinR2009FastandaccurateshortreadalignmentwithBurrows
WheelertransformBioinformatics251754ndash1760
doi101093bioinformaticsbtp324
LiHHandsakerBWysokerAFennellTRuanJHomerNMarthG
AbecasisGDurbinR1000GenomeProjectDataProcessingSubgroup
2009TheSequenceAlignmentMapformatandSAMtoolsBioinformatics
252078ndash2079doi101093bioinformaticsbtp352
LiMXKwanJSHBaoSYYangWHoSLSongYQShamPC2013
PredictingMendelianDiseaseCausingNonSynonymousSingle
NucleotideVariantsinExomeSequencingStudiesPLoSGenet9
e1003143doi101371journalpgen1003143
MacArthurDGBalasubramanianSFrankishAHuangNMorrisJWalter
KJostinsLHabeggerLPickrellJKMontgomerySBAlbersCA
ZhangZConradDFLunterGZhengHAyubQDePristoMA
BanksEHuMHandsakerRERosenfeldJFromerMJinMMuXJ
KhuranaEYeKKayMSaundersGISunerMMHuntTBarnes
IHAAmidCCarvalhoSilvaDRBignellAHSnowCYngvadottirB
BumpsteadSCooperDNXueYRomeroIGWangJLiYGibbs
RAMcCarrollSADermitzakisETPritchardJKBarrettJCHarrow
JHurlesMEGersteinMBTylerSmithC2012Asystematicsurvey
155
oflossoffunctionvariantsinhumanproteincodinggenesScience335
823ndash828doi101126science1215040
MajewskiJSchwartzentruberJLalondeEMontpetitAJabadoN2011
WhatcanexomesequencingdoforyouJMedGenet48580ndash589
doi101136jmedgenet2011100223
McClureMCBickhartDNullDVanRadenPXuLWiggansGLiuG
SchroederSGlasscockJArmstrongJColeJBVanTassellCP
SonstegardTS2014BovineExomeSequenceAnalysisandTargeted
SNPGenotypingofRecessiveFertilityDefectsBH1HH2andHH3Reveal
aPutativeCausativeMutationinSMC2forHH3PLoSONE9e92769
doi101371journalpone0092769
McClureMKimEBickhartDNullDCooperTColeJWiggansGAjmone
MarsanPColliLSantusELiuGESchroederSMatukumalliLVan
TassellCSonstegardT2013FineMappingforWeaverSyndromein
BrownSwissCattleandtheIdentificationof41ConcordantMutations
acrossNRCAMPNPLA8andCTTNBP2PLoSONE8e59251
doi101371journalpone0059251
McKennaAHannaMBanksESivachenkoACibulskisKKernytskyA
GarimellaKAltshulerDGabrielSDalyMDePristoMA2010The
GenomeAnalysisToolkitAMapReduceframeworkforanalyzingnext
generationDNAsequencingdataGenomeRes201297ndash1303
doi101101gr107524110
MiHDongQMuruganujanAGaudetPLewisSThomasPD2010
PANTHERversion7improvedphylogenetictreesorthologsand
collaborationwiththeGeneOntologyConsortiumNuclAcidsRes38
D204ndashD210doi101093nargkp1019
MrodeRKearneyJFBiffaniSCoffeyMCanavesiF2009Short
communicationGeneticrelationshipsbetweentheHolsteincow
populationsofthreeEuropeandairycountriesJournalofDairyScience
925760ndash5764doi103168jds20081931
PelosoGMAuerPLBisJCVoormanAMorrisonACStitzielNOBrody
JAKhetarpalSACrosbyJRFornageMIsaacsAJakobsdottirJ
FeitosaMFDaviesGHuffmanJEManichaikulADavisBLohman
156
KJoonAYSmithAVGroveMLZanoniPRedonVDemissieS
LawsonKPetersUCarlsonCJacksonRDRyckmanKKMackey
RHRobinsonJGSiscovickDSSchreinerPJMychaleckyjJC
PankowJSHofmanAUitterlindenAGHarrisTBTaylorKD
StaffordJMReynoldsLMMarioniREDehghanAFrancoOHPatel
APLuYHindyGGottesmanOBottingerEPMelanderOOrho
MelanderMLoosRJFDugaSMerliniPAFarrallMGoelA
AsseltaRGirelliDMartinelliNShahSHKrausWELiMRader
DJReillyMPMcPhersonRWatkinsHArdissinoDZhangQWang
JTsaiMYTaylorHACorreaAGriswoldMELangeLAStarrJM
RudanIEiriksdottirGLaunerLJOrdovasJMLevyDChenYDI
ReinerAPHaywardCPolasekODearyIJBoreckiIBLiuY
GudnasonVWilsonJGvanDuijnCMKooperbergCRichSSPsaty
BMRotterJIOrsquoDonnellCJRiceKBoerwinkleEKathiresanS
CupplesLA2014AssociationofLowFrequencyandRareCoding
SequenceVariantswithBloodLipidsandCoronaryHeartDiseasein
56000WhitesandBlacksTheAmericanJournalofHumanGenetics94
223ndash232doi101016jajhg201401009
PryceJEVeerkampRFThompsonRHillWGSimmG1997Genetic
aspectsofcommonhealthdisordersandmeasuresoffertilityinHolstein
FriesiandairycattleAnimalScience65353ndash360
doi101017S1357729800008559
PryceJEWoolastonRBerryDPWallEWintersMButlerRShafferM
2014WorldTrendsinDairyCowFertilityin10ThWorldCongressof
GeneticsAppliedtoLivestockProduction
PuffenbergerEGJinksRNWangHXinBFiorentiniCShermanEA
DegrazioDShawCSougnezCCibulskisKGabrielSKelleyRI
MortonDHStraussKA2012Ahomozygousmissensemutationin
HERC2associatedwithglobaldevelopmentaldelayandautismspectrum
disorderHumMutat331639ndash1646doi101002humu22237
PurcellSNealeBToddBrownKThomasLFerreiraMARBenderD
MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKA
157
ToolSetforWholeGenomeAssociationandPopulationBasedLinkage
AnalysesAmJHumGenet81559ndash575
SahanaGNielsenUSAamandGPLundMSGuldbrandtsenB2013Novel
HarmfulRecessiveHaplotypesIdentifiedforFertilityTraitsinNordic
HolsteinCattlePLoSONE8e82909doi101371journalpone0082909
SchuumltzEScharfensteinMBrenigB2008Implicationofcomplexvertebral
malformationandbovineleukocyteadhesiondeficiencyDNAbased
testingondiseasefrequencyintheHolsteinpopulationJDairySci91
4854ndash4859doi103168jds20081154
SimmonsMJCrowJF1977MutationsAffectingFitnessinDrosophila
PopulationsAnnualReviewofGenetics1149ndash78
doi101146annurevge11120177000405
SoggiuAPirasCHusseinHACanioMDGaviraghiAGalliAUrbaniA
BonizziLRoncadaP2013UnravellingthebullfertilityproteomeMol
BioSyst91188ndash1195doi101039C3MB25494A
StitzielNOKiezunASunyaevS2011Computationalandstatistical
approachestoanalyzingvariantsidentifiedbyexomesequencing
GenomeBiology12227doi101186gb2011129227
SundaramSKHuqAMSunZYuWBennettLWilsonBJBehenME
ChuganiHT2011ExomesequencingofapedigreewithTourette
syndromeorchronicticdisorderAnnNeurol69901ndash904
doi101002ana22398
VanRadenPMOlsonKMNullDJHutchisonJL2011Harmfulrecessive
effectsonfertilitydetectedbyabsenceofhomozygoushaplotypesJournal
ofDairyScience946153ndash6161doi103168jds20114624
VilarintildeoGuumlellCRajputAMilnerwoodAJShahBSzuTuCTrinhJYuI
EncarnacionMMunsieLNTapiaLGustavssonEKChouP
TatarnikovIEvansDMPishottaFTVoltaMBeccanoKellyD
ThompsonCLinMKShermanHEHanHJGuentherBL
WassermanWWBernardVRossCJAppelCresswellSStoesslAJ
RobinsonCADicksonDWRossOAWszolekZKAaslyJOWuR
MHentatiFGibsonRAMcPhersonPSGirardMRajputMRajput
158
AHFarrerMJ2014DNAJC13mutationsinParkinsondiseaseHumMolGenet231794ndash1801doi101093hmgddt570
WangKLiMHakonarsonH2010ANNOVARfunctionalannotationofgeneticvariantsfromhighthroughputsequencingdataNuclAcidsRes38e164ndashe164doi101093nargkq603
WathesDCPollottGEJohnsonKFRichardsonHCookeJS2014HeiferfertilityandcarryoverconsequencesforlifetimeproductionindairyandbeefcattleAnimal8Suppl191ndash104doi101017S1751731114000755
YangYMuznyDMReidJGBainbridgeMNWillisAWardPABraxtonABeutenJXiaFNiuZHardisonMPersonRBekheirniaMRLeducMSKirbyAPhamPScullJWangMDingYPlonSELupskiJRBeaudetALGibbsRAEngCM2013ClinicalWhole
ExomeSequencingfortheDiagnosisofMendelianDisordersNewEnglandJournalofMedicine3691502ndash1511doi101056NEJMoa1306555
ZiminAVDelcherALFloreaLKelleyDRSchatzMCPuiuDHanrahanFPerteaGTassellCPVSonstegardTSMarccedilaisGRobertsMSubramanianPYorkeJASalzbergSL2009Awholegenome
assemblyofthedomesticcowBostaurusGenomeBiology10R42doi101186gb2009104r42
159
Supplementary)files)
YoucanfindChapter6supplementaryfilesat
- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)
- GoogleDrive(httptinyurlcomPhDMMChapter06)orscantheQRcode
)
Supplementary)Figure)S1)Coverage)analyses)
A)Averagecoverageperanimals
B)Boxplotofchromosomescoverageperanimal
C)BoxplotofanimalcoverageperchromosomeBluelinerepresenttheexpected
50Xcoverage
Supplementary)Figure)S2)Distribution)of)variant)sequencing)depth)
)
Supplementary) Figure) S3) Correlation) between) sequencing) depth) and)
number) of) discordant) genotypes) In) red) values) for) two) outlier) animals)
having)concordance)lt846))
)
Supplementary)Figure)S4)Multidimensional)scaling)of)Italian)Holstein)bulls)
sequenced))
)
Supplementary)Figure)S5)Multidimensional)scaling)of)Italian)Holstein)bulls)
analysed)with)the)800K)SNPchip))
160
Supplementary) Figure) S6) MAF) distribution) on) neutral) (blue)) and)
deleterious)(red))variants))
)
Supplementary) Table) S1) EBV) distribution) on) Italian) Holstein) population)
and)SNPchip)dataset)
Sheet1EBVthresholdvaluetodefineplusandminustailforPROT(kgprotein)
SCC(somaticcellcount)FERT(fertility)andLONG(functional longevity) All
ItalianHolsteinFAbullswereincludeinthisanalysis
Sheet2Numberofdatasetanimalsineachtailforeachtrait
Supplementary)Table)S2)Exome)probes)statistics)number)and)bp)covered)
per)chromosomes)
Supplementary) Table) S3) Concordance) between) animal) genotypes) in)
sequencing)and)SNPchip)data)
Error1 HD data homozygotes ndash Exome data heterozygotes Error2 HD data
heterozygotesndashExomedatahomozygotes
Supplementary) Table) S4) List) of) suggestive) genes) deriving) from) the)
analysis)of)exomes)of)animals) in)the)tails)of)male)and)female)fertility)EBV)
distributions))
Header(code
bull Bfreq_totFrequencyofBalleleinentiredataset
bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset
bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset
bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset
bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset
bull CallRate_totSNPcallratendashentiredataset
bull CallRate_MHSNPcallratendashhighmalefertilitydataset
bull CallRate_MLSNPcallratendashlowmalefertilitydataset
bull CallRate_FHSNPcallratendashhighfemalefertilitydataset
bull CallRate_FLSNPcallratendashhighmalefertilitydataset
bull HWeqHardyWeinbergequilibriumpvalue
bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele
bull Geno_HetNumberofanimalswithheterozygotegenotype
161
bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele
bull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)dataset
bull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)dataset
bull LOW_CRSNPcallratefromlowfertility(maleandfemale)dataset
bull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallele
bull LOW_Het Number of animals from low fertility (male and female) dataset with
heterozygotegenotype
bull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallele
bull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and
female)dataset
bull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)dataset
bull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)dataset
bull HIGH_CRSNPcallratefromhighfertility(maleandfemale)dataset
bull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallele
bull HIGH_Het Number of animals from high fertility (male and female) dataset with
heterozygotegenotype
bull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallele
bull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and
female)dataset
bull Male_TREND inheritance models implemented in PLINK trend test on male fertility
dataset
bull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale
fertilitydataset
bull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male
fertilitydataset
bull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility
dataset
bull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on
femalefertilitydataset
bull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale
fertilitydataset
bull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male
andfemalefertilitydataset
bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test
onmaleandfemalefertilitydataset
162
bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston
maleandfemalefertilitydataset
bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset
bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset
bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset
Supplementary)Table) S5) Regions) candidate) to) carry) deleterious) variants)
identified)by)OHW)approach))
Supplementary)Table) S6) Regions) candidate) to) carry) deleterious) variants)
identify)by)LOH)approach))
Ingreyregionsremovedinfurtheranalysis
Supplementary)Table)S7)List)of)candidate)deleterious)variants)mapping)in)
OHW)and)LOH)regions))Header(code
bull Bfreq_totFrequencyofBalleleinentiredataset
bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset
bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset
bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset
bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset
bull CallRate_totSNPcallratendashentiredataset
bull CallRate_MHSNPcallratendashhighmalefertilitydataset
bull CallRate_MLSNPcallratendashlowmalefertilitydataset
bull CallRate_FHSNPcallratendashhighfemalefertilitydataset
bull CallRate_FLSNPcallratendashhighmalefertilitydataset
bull HWeqHardyWeinbergequilibriumpvalue
bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele
bull Geno_HetNumberofanimalswithheterozygotegenotype
bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele
bull HW_insideDeleteriousmutationsinsideldquoOutofHardyWeinbergrdquoregion
bull HW_outsideDeleteriousmutationsoutsideldquoOutofHardyWeinbergrdquoregion
bull DH_insideDeleteriousmutationsinsideldquoLackofHomozygosityrdquoregion
bull DH_outsideDeleteriousmutationsoutsideldquoLackofHomozygosityrdquoregion
163
bull GeneClass Classifications of gene 1 deleterious mutations inside OHW and LOHregion2deleteriousmutationsinsideOHWorLOHregion
bull 2deleteriousmutationsoutsideOHWandorLOHregionbull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)datasetbull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)datasetbull LOW_CRSNPcallratefromlowfertility(maleandfemale)datasetbull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallelebull LOW_Het Number of animals from low fertility (male and female) dataset with
heterozygotegenotypebull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallelebull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and
female)datasetbull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)datasetbull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)datasetbull HIGH_CRSNPcallratefromhighfertility(maleandfemale)datasetbull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeforminorallelebull HIGH_Het Number of animals from high fertility (male and female) dataset with
heterozygotegenotypebull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith
homozygotegenotypeformajorallelebull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and
female)datasetbull Male_TREND inheritance models implemented in PLINK trend test on male fertility
datasetbull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale
fertilitydatasetbull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male
fertilitydatasetbull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility
datasetbull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on
femalefertilitydatasetbull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale
fertilitydatasetbull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male
andfemalefertilitydataset
164
bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test
onmaleandfemalefertilitydataset
bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston
maleandfemalefertilitydataset
bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset
bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset
bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset
Supplementary)Table)S8)Results)of)haplotype)analysis)))Header(code
bull Chromchromosome
bull PosphysicalpositiononUMD31genome
bull Exome_homonumberofhomozygotefordeleteriousmutationsequencedanimals
bull Exome_hetnumberofheterozygotesequencedanimalsfordeleteriousmutation
bull Outcome NotFound It was impossible to associated a haplotype to the mutation
NotOpt Non completely invariant haplotype could be associated to themutationOK
Putativehaplotype associate tomutationwas found In few case alternativehaplotype
wasreported
bull LenlengthofhaplotypeinnumberofSNPs
bull Pop_numnumbersofselectedhaplotypedetectedinHolsteinpopulation
bull Pop_homo numbers of homozygote animals in Holstein population for selected
haplotype
bull Pop_het numbers of heterozygotes animals in Holstein population for selected
haplotype
bull ExpExpectednumberofhomozygoteanimal
bull Alternative if only one heterozygote sequenced animals for deleteriousmutationwas
foundalltwopossiblehaplotypeswheretested
Supplementary)Table)S9)Descriptive)results)for)the)Biological)Process)(BP)))
NumberofgenesbelongingtoBiologicalProcessGOterms)
Supplementary) Table) S10) Descriptive) results) for) the)Molecular) Function)
(MF))
NumberofgenesbelongingtoMolecularFunctionGOterms
165
CHAPTER7
Conclusion
Since the domestication event occurred in the Fertile Crescent around 10000yearsagothebovinespecieswasselectedbyhumankindtofulfilhisneedsThecreationofbreedsoccurredaround200yearsagointensifiedthedifferentiationbetween populations and promoted the development of intensive selectionschemesTraditionalAandnowadaysgenomicAselectionsareproducingpositivegenetictrendsinmanyproductivetraitsdespitethestillpoorknowledgeaboutphenotype biology A deeper understanding of genome organization and genebiology would increase the accuracy of genomic evaluation by incorporatingpriorknowledgeThisthesisexploresthebovinegenomebyhighAthroughputtechnologiesAsuchasNextGenerationSequencingandHighADensityGenotypingAwithestablishedand innovative procedures to investigate the biology of complex traits and toprovidenewdataandmoleculartoolstobovinebreedingThese goals have been achieved by complementary approaches and differentmethodsInChapter2selectionsignatureswereinvestigatedindependentlyintofive Italian breeds using Illumina BovineSNP50 BeadChip Then amultiAbreedapproach was applied clustering breeds into dairy (Italian Holstein ItalianBrownand Italian Simmental) andbeef (Marchigiana Piedmontese and ItalianSimmental) groups Regions under recent positive selection shared by breedswithasameproductiveaptitudewereinvestigatedinfurtherdetailbyretrievinggenes in the region and analysing their function by ontology and pathwayanalysis In the dairy group selection signals appear nearby genes related tomammarygland(metabolismandresistancetomastitis)andfeedingadaptationIn thebeefgroupgenes inselectedregionsare involved inanimalgrowthandmeatquality(textureandjuiciness)Hencecandidategenesidentifiedbelongtonetworks and pathways important in directing a breed towards either beef ordairyproductionIn Chapter 3 a GWAS approach was used in dairy cattle to correlate SNP tocomplex phenotypes having an economic value Three independent ldquoclassicalrdquosingleAmarkerregressionswereruninthreeItaliandairycattle(ItalianHolsteinItalianBrownandItalianSimmental)UsingthetraditionalsingleSNPapproach
168
only few SNPs were above the nominal genomeAwide significance thresholdwhile applying a geneAbased method identified significant candidate genesassociatedwithmilkphysiology andmammary glanddevelopment in all threepopulationsInterestinglygenesfoundindifferentbreedshavesimilarfunctionbut are not the same suggesting that breed history demography selectioncriteriaandstochasticeventsasgeneticdrifthaveanimpactonthedefinitionofthemainmoleculartargetofselectionschemesCurrent intensive selection strategies rapidly spread the genetic gain in thepopulationbutholdtheassociatedriskofdisseminatingdefectivegenescausinggenetic defects (eg mulefoot complex vertebral malformation and others) ordecreasing fertility In Chapter 6 different technologies and approaches werecombined to obtain filter and prioritize genes candidate to be deleteriousExome sequence datawere exploited to identify deleterious variants affectingItalianHolsteincattleinasetof20animalsextremeforfertilitytraitsMorethan5000 candidate deleterious SNP variants were identified To bypass the smallnumberlimitationandthelackofpowerofanystatisticanalysison20animalsastepwise approachwas used to first filter and then prioritize these candidatedeleterious variants Polymorphisms were retained if having asymmetricdistribution in bulls from opposite tails of fertility EBV distributions andormapping in genomic regions outlier for lack of homozygosity in a 1009 bullpopulation genotyped with the Illumina BovineHD BeadChip A total of 191variants in163genespassed theprocessesThesewereassessedby searchingadditional evidences in favour or against their deleterious effect throughdifferentapproachesi)checkingthevariantconservationacross100vertebratespecies by comparative genomics ii) finding haplotypes associated withcandidatemutationsandevaluatingtheirfrequencyinHolsteinsiii)integratinghumanandmouse functionalannotationsA scorewasproposed toweight theresult of these assessments and rank the variants 35 of them had positivescores These occurred in genes involved in basic biological mechanisms asfertilitydevelopmentand immunityandresultedenriched ingenesassociatedtogeneticdefectsreportedinhumanThesemutationsarestrongcandidatestobe recessive deleterious variants worth to be further investigated in a larger
169
populationtobetterunderstandtheirbiologyimpactandpossibleapplicationinbreedingTo increase the reliability of results based on genomic data and to reduce theimpact of subjective choices in data analysis new toolsapproaches weredeveloped(iethemultiAbreedapproachinSelectionSignaturendashChapter2thepostAGWAS geneAbased approach ndash Chapters 3 and 4) and the existent onescriticallyevaluated(imputationmethodsndashChapter5)In Chapter 4 the postAGWAS geneAbased method used in Chapter 3 wasimplemented in the MUGBAS (MUlti species GeneABased Association Suite)software In contrast to existing software MUGBAS is species and annotationfree AmultipleAtesting correctionwas included in the package in addition tocomputing parallelization for faster processing and plot representation forbetterunderstandingThesoftwareusessingleAmarkerGWASdataandagivenannotation to estimate geneAregionAwise association pAvalues As previouslymentioned in Chapter 3 the ldquoclassicalrdquo GWAS approach (single SNP) foundsignals only in ItalianHolsteinwhileMUGBAS geneAbased approach identifiedsignificantsignalsinallbreedsinvestigatedIn Chapter 5 the impact of different bovine reference sequences on genotypeimputation was investigated In this work Illumina BovineSNP50 BeadChipgenotypesof ItalianSimmentalbullswerereAmappedonthethreemostrecentreferencegenomeassembliesWecomparedfourdifferent imputationmethodsfromlow(IlluminaBovineLDv10Beadchip)tohighdensityThetestpopulationwasdividedintofoursubpopulationsonthebasisofrelationshipandgenotypedataavailabilityUpdatingSNPcoordinatesonthe three testedcattlereferencegenome assemblies determined only a slight variation on imputation resultswithinmethodsOntheotherhandlargedifferencesintermsofaccuracywereobservedamongthefourimputationmethodstestedGenetic structure of industrial breeds was evaluated assessing LinkageDisequilibrium (LD) Since LD decay is correlated to intensity of selection weexpected different behaviours in the breeds analysed Indeed as evident inChapter 2 ndash Figure 7 breeds subjected to lower selection intensity (ie
170
MarchigianaPiedmonteseandItalianSimmental)displayalowerpersistenceof
LDatgreaterdistancescomparedtothosesubjectedtohigherselectionpressure
(ieItalianHolsteinandItalianBrown)Chapter3investigatesLDintheDGAT1
region of dairy cattle breeds In Italian Simmental LD is lower compared to
Italian Holstein and Italian Brown In spite of this significant association
betweentheDGAT1regionandmilktraitswasfoundbothinItalianSimmental
andinItalianHolsteinInItalianBrownnoassociationwasfoundbecauseDGAT1
isfixed
Data and results presented in this thesis represent a clear example of the
progress of molecular technologies occurred in last few years spanning from
mediumdensitygenotypingtonextgenerationsequencing
Thebiologylandscapesofsomecomplextraits(iemilkproductionandquality
fertility)were improved finding candidate genes andpathwaysnotpreviously
associated to the phenotype in the bovine species Moreover insights on
deleterious mutations were inferred through a deep and integrated analysis
usingNGSdata
This new knowledge could be used to improve cattle breeding For example
deleteriousrecessivevariantscouldbeincludedinbullevaluationsbybreedersrsquo
associations to reduce the genetic load of deleterious genes in parallel to
improvingproductivity
While technological progress has been impressivewe are still only scratching
thesurfaceofanimalbiologyNewfrontiersarebeingopenedbywholegenome
sequencing of many animals (eg in the 1000 bull genome project) and by
stepping beyond sequencing (a number of epigenomic projects in animals are
following the footprints of the human ENCODE project) The trend is towards
unravelling complexity and data production ability is to be flanked by data
storingandprocessingandaboveallbynewmethodsandtoolsfordataanalysis
171
ACKNOWLEDGMENTS
ldquoNon egrave la destinazione ma il viaggio che contardquo Questi tre anni sono stati un gran bel viaggio di crescita personale e ricchi diesperienze Non egrave stato un viaggio solitarioma condiviso con vecchie e nuoveconoscenzechemisembradoverosomenzionareInnanzitutto vorrei ringraziare il prof Paolo AjmoneMarsan chemi ha dato lapossibilitagrave di fare questa esperienza Grazie per gli insegnamenti e per leopportunitagravechemihaoffertoSperodinonavertraditolesueaspettativeAgli attuali colleghi Lorenzo Stefano Licia Elisa Riccardo ed Elia e a quellipassati Nicola Fatima Raffaele Ezequiel e Barbara va il mio piugrave sincero edaffettuoso ringraziamento per aver condiviso anche solo in parte questomiopercorso Grazie per avermi sopportato (e non egrave poco) guidato (non solo nelcampolavorativo)efattovivereinunmodospecialelrsquoambientedilavoroUnsincerograzievaatuttiimieicolleghidelXXVIIcicloAgrisystemRicorderogravesempretuttiimomentichehopotutocondividereconvoi UngrazieancheachihafattopartedellamiaesperienzasvedeseGraziedicuoreaCarlFlavioChristianEricaatuttiiragazziitalianiatuttiicolleghildquosvedesirdquoea tutte le altre persone che ho conosciuto Mi avete fatto vivere appienolrsquoesperienzaesteraefattotornareacasaconqualcheconsapevolezzainpiugrave UnsentitoringraziamentoatuttiglialtricolleghiconcuihocollaboratoinquestianniRingrazio tutti gli amici e conoscenti che mi sono stati vicino per avermisupportatoesopportatoognunoamodoproprio Ultimomanonperimportanza ilringraziamentochevaatuttalamiafamigliaper avermi aiutato spronato sostenuto sempre e in particolare in questaesperienzaSemplicementegrazie PS Adirlatuttaancheladestinazionenonegravepoicosigravemale
173
- INDEX
- CHAPTER 1 - Introduction
- CHAPTER 2 - Relative extended haplotype homozygosity (rEHH) signals across breeds reveal dairy and beef specific signatures of selection
- CHAPTER 3 - Searching new signals for productive traits in three Italian cattle breeds
- CHAPTER 4 - MUGBAS a species free gene based ased association suite
- CHAPTER 5 - Imputation accuracy is robust to cattle reference genome updates
- CHAPTER 6 - Quest for deleterious recessive variants in Italian Holstein bulls combining exome sequencing and high density SNP analysis
- CHAPTER 7 - Conclusion
- ACKNOWLEDGMENTS
-